Menu

Tesseract-4.0.x

OSS fan
2018-09-05
2018-09-18
  • OSS fan

    OSS fan - 2018-09-05

    Hello!

    You recently added /components/tesseract-4.0.0b4 on your NAPS2 download page.
    Tried to use it without success. NAPS2 autodounload tesseract-3.0.4 when try to use OCR. When unpacking tesseract-4.0.0b4 downloaded files to components folder and tried to cheat NAPS2 with copy them to folder tesseract-3.0.4, NAPS2 correctly detect language file (traineddata) , but scanned PDF is without OCR and errorlog.txt shows:

    2018-09-05 14:00:51.8076 Error running OCR System.IO.FileNotFoundException: Could not find file 'C:\Program Files (x86)\util\scan\Data\temp\4m2fsebo.3cb.hocr'.
    File name: 'C:\Program Files (x86)\util\scan\Data\temp\4m2fsebo.3cb.hocr'
       at System.IO.__Error.WinIOError(Int32 errorCode, String maybeFullPath)
       at System.IO.FileStream.Init(String path, FileMode mode, FileAccess access, Int32 rights, Boolean useRights, FileShare share, Int32 bufferSize, FileOptions options, SECURITY_ATTRIBUTES secAttrs, String msgPath, Boolean bFromProxy, Boolean useLongPath, Boolean checkHost)
       at System.IO.FileStream..ctor(String path, FileMode mode, FileAccess access, FileShare share, Int32 bufferSize)
       at System.Xml.XmlDownloadManager.GetStream(Uri uri, ICredentials credentials, IWebProxy proxy, RequestCachePolicy cachePolicy)
       at System.Xml.XmlUrlResolver.GetEntity(Uri absoluteUri, String role, Type ofObjectToReturn)
       at System.Xml.XmlTextReaderImpl.FinishInitUriString()
       at System.Xml.XmlTextReaderImpl..ctor(String uriStr, XmlReaderSettings settings, XmlParserContext context, XmlResolver uriResolver)
       at System.Xml.XmlReaderSettings.CreateReader(String inputUri, XmlParserContext inputContext)
       at System.Xml.XmlReader.Create(String inputUri, XmlReaderSettings settings, XmlParserContext inputContext)
       at System.Xml.Linq.XDocument.Load(String uri, LoadOptions options)
       at NAPS2.Ocr.TesseractOcrEngine.ProcessImage(String imagePath, String langCode, Func`1 cancelCallback)
    

    Need assistance to use tesseract-4.

    Regards,

    Zdenko

     
  • Ben Olden-Cooligan

    Hi,

    Tesseract 4 requires a new version of NAPS2, which is coming soon (1-2 weeks hopefully).

    Ben

     
  • OSS fan

    OSS fan - 2018-09-06

    Thank you.

    Thought so.
    Hardly waiting. The best will be even better! Is that possible?

    Best regards,

    Zdenko

     
  • Timo

    Timo - 2018-09-07

    Please enlighten me:

    1. What is Tesseract 4?
    2. Does it come build in with next NAPS2 and most importantly,
    3. How does it make end-use better/faster/more accurate?

    Sorry if this was trivial.

    Appreciated :)

    //Timo

     
  • Ben Olden-Cooligan

    Tesseract is the underlying program NAPS2 uses for OCR. Version 4 uses a new engine based on a type of neural network.

    In the new NAPS2 version it will be integrated the same way as OCR is now, where you just press the OCR button and download the language you want.

    I've been very impressed in my testing, with as many as 80% fewer recognition errors (though overall it's probably more modest, 30% less) and no noticeable regressions

    It will take more CPU (but for small numbers of pages may be faster due to improved multi-core use). To compensate, I'm adding the ability to run OCR preemptively (before you click the Save PDF button), so from a user perspective it may be much faster (or even instant).

     
  • Timo

    Timo - 2018-09-10

    Appreciated, thx!

     
  • Ben Olden-Cooligan

    This should now work with the latest version (6.0b1).

     

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.