c# - The Tesseract OCR engine isn't able to read the text from an auto generated image, but can from a CUT in MS Paint -
i'm using .net wrapper tesseract ocr engine. have large document png. when cut out section of image in ms paint , feed engine, works. when in code, engine can't recognize text in image. images same , properties don't appear off. i'm little confused.
here 2 images. ms paint:

from code:

this ms paint image:

and through code:

they're similar i'm not sure why can't recognize second text. following how i'm generating image.
public bitmap cropimage(bitmap source, rectangle section)     {         bitmap bmp = new bitmap(section.width, section.height);         graphics g = graphics.fromimage(bmp);         g.drawimage(source, 0, 0, section, graphicsunit.pixel);          return bmp;     }      private void form1_load(object sender, eventargs e)     {         bitmap source = new bitmap(test);         rectangle section = new rectangle(new point(78, 65), new size(800, 50));         bitmap croppedimage = cropimage(source, section);         croppedimage.save(@"c:\users\user\desktop\test34.png", system.drawing.imaging.imageformat.png);          this.picturebox1.image = croppedimage;     }      
the default resolution of new bitmap 96 dpi, not adequate ocr purpose. try increase 300 dpi, such as:
bmp.setresolution(300, 300);
update 1: when scale image, dimension should change well. here's example rescale function:
public static image rescale(image image, int dpix, int dpiy) {     bitmap bm = new bitmap((int)(image.width * dpix / image.horizontalresolution), (int)(image.height * dpiy / image.verticalresolution));     bm.setresolution(dpix, dpiy);     graphics g = graphics.fromimage(bm);     g.interpolationmode = interpolationmode.bicubic;     g.pixeloffsetmode = pixeloffsetmode.highquality;     g.drawimage(image, 0, 0);     g.dispose();      return bm; }      
Comments
Post a Comment