Including OCR?

Scan documents to PDF and other file types, as simply as possible.

Brought to you by: ben-cyanfish

Including OCR?

Forum: General Discussion

Created: 2013-11-10

Updated: 2013-11-10

Ben - 2013-11-10

Hi, awesome program. It seems like the one thing that it might be lacking is OCR. I also noticed that there's an open-source OCR engine called Tesseract which has a page on .NET - see http://code.google.com/p/tesseractdotnet/

Incidentally, what compression scheme do you use? I'm still learning about PDFs but based on this http://blogs.adobe.com/acrolaw/2009/08/reducing-the-file-size-of-scanned-pdfs/ post it seems like allowing compression scheme options might be good, and JPBG2 seems like a good default.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Ben Olden-Cooligan - 2013-11-10

Hi Ben,

Thanks for pointing out that project. OCR is something that I'm looking to add in the future (almost certainly using Tesseract), though it might be a while before I get around to implementing it.

Images in NAPS2's PDF files are encoded in either JPEG or PNG format. If the "Maximum quality" option is checked, then it's always PNG; otherwise, it's PNG for black/white, and JPEG for grayscale and color.

I might look at JBIG2 in the future (I think JPBG2 was a typo on that site), though I think PNG is good enough for most people, since black/white images have inherently small file sizes.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Log in to post a comment.