c# - The Tesseract OCR engine isn't able to read the text from an auto generated image, but can from a CUT in MS Paint -

June 15, 2015

i'm using .net wrapper tesseract ocr engine. have large document png. when cut out section of image in ms paint , feed engine, works. when in code, engine can't recognize text in image. images same , properties don't appear off. i'm little confused.

here 2 images. ms paint:

enter image description here

from code:

enter image description here

this ms paint image:

enter image description here

and through code:

enter image description here

they're similar i'm not sure why can't recognize second text. following how i'm generating image.

public bitmap cropimage(bitmap source, rectangle section)     {         bitmap bmp = new bitmap(section.width, section.height);         graphics g = graphics.fromimage(bmp);         g.drawimage(source, 0, 0, section, graphicsunit.pixel);          return bmp;     }      private void form1_load(object sender, eventargs e)     {         bitmap source = new bitmap(test);         rectangle section = new rectangle(new point(78, 65), new size(800, 50));         bitmap croppedimage = cropimage(source, section);         croppedimage.save(@"c:\users\user\desktop\test34.png", system.drawing.imaging.imageformat.png);          this.picturebox1.image = croppedimage;     }

the default resolution of new bitmap 96 dpi, not adequate ocr purpose. try increase 300 dpi, such as:

bmp.setresolution(300, 300);

update 1: when scale image, dimension should change well. here's example rescale function:

public static image rescale(image image, int dpix, int dpiy) {     bitmap bm = new bitmap((int)(image.width * dpix / image.horizontalresolution), (int)(image.height * dpiy / image.verticalresolution));     bm.setresolution(dpix, dpiy);     graphics g = graphics.fromimage(bm);     g.interpolationmode = interpolationmode.bicubic;     g.pixeloffsetmode = pixeloffsetmode.highquality;     g.drawimage(image, 0, 0);     g.dispose();      return bm; }

Search This Blog

Three

c# - The Tesseract OCR engine isn't able to read the text from an auto generated image, but can from a CUT in MS Paint -

Comments

Post a Comment

Popular posts from this blog

Socket.connect doesn't throw exception in Android -

SPSS keyboard combination alters encoding -

iphone - How do I keep MDScrollView from truncating my row headers and making my cells look bad? -