You recently added /components/tesseract-4.0.0b4 on your NAPS2 download page.
Tried to use it without success. NAPS2 autodounload tesseract-3.0.4 when try to use OCR. When unpacking tesseract-4.0.0b4 downloaded files to components folder and tried to cheat NAPS2 with copy them to folder tesseract-3.0.4, NAPS2 correctly detect language file (traineddata) , but scanned PDF is without OCR and errorlog.txt shows:
2018-09-05 14:00:51.8076 Error running OCR System.IO.FileNotFoundException: Could not find file 'C:\Program Files (x86)\util\scan\Data\temp\4m2fsebo.3cb.hocr'.
File name: 'C:\Program Files (x86)\util\scan\Data\temp\4m2fsebo.3cb.hocr'
at System.IO.__Error.WinIOError(Int32 errorCode, String maybeFullPath)
at System.IO.FileStream.Init(String path, FileMode mode, FileAccess access, Int32 rights, Boolean useRights, FileShare share, Int32 bufferSize, FileOptions options, SECURITY_ATTRIBUTES secAttrs, String msgPath, Boolean bFromProxy, Boolean useLongPath, Boolean checkHost)
at System.IO.FileStream..ctor(String path, FileMode mode, FileAccess access, FileShare share, Int32 bufferSize)
at System.Xml.XmlDownloadManager.GetStream(Uri uri, ICredentials credentials, IWebProxy proxy, RequestCachePolicy cachePolicy)
at System.Xml.XmlUrlResolver.GetEntity(Uri absoluteUri, String role, Type ofObjectToReturn)
at System.Xml.XmlTextReaderImpl.FinishInitUriString()
at System.Xml.XmlTextReaderImpl..ctor(String uriStr, XmlReaderSettings settings, XmlParserContext context, XmlResolver uriResolver)
at System.Xml.XmlReaderSettings.CreateReader(String inputUri, XmlParserContext inputContext)
at System.Xml.XmlReader.Create(String inputUri, XmlReaderSettings settings, XmlParserContext inputContext)
at System.Xml.Linq.XDocument.Load(String uri, LoadOptions options)
at NAPS2.Ocr.TesseractOcrEngine.ProcessImage(String imagePath, String langCode, Func`1 cancelCallback)
Need assistance to use tesseract-4.
Regards,
Zdenko
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Tesseract is the underlying program NAPS2 uses for OCR. Version 4 uses a new engine based on a type of neural network.
In the new NAPS2 version it will be integrated the same way as OCR is now, where you just press the OCR button and download the language you want.
I've been very impressed in my testing, with as many as 80% fewer recognition errors (though overall it's probably more modest, 30% less) and no noticeable regressions
It will take more CPU (but for small numbers of pages may be faster due to improved multi-core use). To compensate, I'm adding the ability to run OCR preemptively (before you click the Save PDF button), so from a user perspective it may be much faster (or even instant).
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hello!
You recently added /components/tesseract-4.0.0b4 on your NAPS2 download page.
Tried to use it without success. NAPS2 autodounload tesseract-3.0.4 when try to use OCR. When unpacking tesseract-4.0.0b4 downloaded files to components folder and tried to cheat NAPS2 with copy them to folder tesseract-3.0.4, NAPS2 correctly detect language file (traineddata) , but scanned PDF is without OCR and errorlog.txt shows:
Need assistance to use tesseract-4.
Regards,
Zdenko
Hi,
Tesseract 4 requires a new version of NAPS2, which is coming soon (1-2 weeks hopefully).
Ben
Thank you.
Thought so.
Hardly waiting. The best will be even better! Is that possible?
Best regards,
Zdenko
Please enlighten me:
Sorry if this was trivial.
Appreciated :)
//Timo
Tesseract is the underlying program NAPS2 uses for OCR. Version 4 uses a new engine based on a type of neural network.
In the new NAPS2 version it will be integrated the same way as OCR is now, where you just press the OCR button and download the language you want.
I've been very impressed in my testing, with as many as 80% fewer recognition errors (though overall it's probably more modest, 30% less) and no noticeable regressions
It will take more CPU (but for small numbers of pages may be faster due to improved multi-core use). To compensate, I'm adding the ability to run OCR preemptively (before you click the Save PDF button), so from a user perspective it may be much faster (or even instant).
Appreciated, thx!
This should now work with the latest version (6.0b1).