c# - The Tesseract OCR engine isn't able to read the text from an auto generated image, but can from a CUT in MS Paint -
i'm using .net wrapper tesseract ocr engine. have large document png. when cut out section of image in ms paint , feed engine, works. when in code, engine can't recognize text in image. images same , properties don't appear off. i'm little confused.
here 2 images. ms paint:
from code:
this ms paint image:
and through code:
they're similar i'm not sure why can't recognize second text. following how i'm generating image.
public bitmap cropimage(bitmap source, rectangle section) { bitmap bmp = new bitmap(section.width, section.height); graphics g = graphics.fromimage(bmp); g.drawimage(source, 0, 0, section, graphicsunit.pixel); return bmp; } private void form1_load(object sender, eventargs e) { bitmap source = new bitmap(test); rectangle section = new rectangle(new point(78, 65), new size(800, 50)); bitmap croppedimage = cropimage(source, section); croppedimage.save(@"c:\users\user\desktop\test34.png", system.drawing.imaging.imageformat.png); this.picturebox1.image = croppedimage; }
the default resolution of new bitmap 96 dpi, not adequate ocr purpose. try increase 300 dpi, such as:
bmp.setresolution(300, 300);
update 1: when scale image, dimension should change well. here's example rescale function:
public static image rescale(image image, int dpix, int dpiy) { bitmap bm = new bitmap((int)(image.width * dpix / image.horizontalresolution), (int)(image.height * dpiy / image.verticalresolution)); bm.setresolution(dpix, dpiy); graphics g = graphics.fromimage(bm); g.interpolationmode = interpolationmode.bicubic; g.pixeloffsetmode = pixeloffsetmode.highquality; g.drawimage(image, 0, 0); g.dispose(); return bm; }
Comments
Post a Comment