Menu

Including OCR?

Ben
2013-11-10
2013-11-10
  • Ben

    Ben - 2013-11-10

    Hi, awesome program. It seems like the one thing that it might be lacking is OCR. I also noticed that there's an open-source OCR engine called Tesseract which has a page on .NET - see http://code.google.com/p/tesseractdotnet/

    Incidentally, what compression scheme do you use? I'm still learning about PDFs but based on this http://blogs.adobe.com/acrolaw/2009/08/reducing-the-file-size-of-scanned-pdfs/ post it seems like allowing compression scheme options might be good, and JPBG2 seems like a good default.

     
  • Ben Olden-Cooligan

    Hi Ben,

    Thanks for pointing out that project. OCR is something that I'm looking to add in the future (almost certainly using Tesseract), though it might be a while before I get around to implementing it.

    Images in NAPS2's PDF files are encoded in either JPEG or PNG format. If the "Maximum quality" option is checked, then it's always PNG; otherwise, it's PNG for black/white, and JPEG for grayscale and color.

    I might look at JBIG2 in the future (I think JPBG2 was a typo on that site), though I think PNG is good enough for most people, since black/white images have inherently small file sizes.

     

Log in to post a comment.

MongoDB Logo MongoDB