I am new to Tess4j. I am trying to read text from any document formats like PDF,JPEG, BMP, TIFF.
I came across this API, want to check couple of things with you as below.
1) am working on JDK 64 bit version and Netbeans 7.3.1, From the Tess4j documentation they are supporting only 32 bit only.If it is the case how i can go and implement?
2)How to integrate with Java application?
Please help me so that i can start something.
Tesseract team only made 32-bit DLLs available. You may want to submit on their board a request for 64-bit support. Until that becomes available, you'd have to stick with Java 32-bit, or somehow you can compile Tesseract to 64-bit (several people have claimed to be able to do that).
You can check out VietOCR for a desktop example of Tess4J integration.
Thanks Quan for your quick reply.
I tried to work on the VietOCR project, that is exe file not having any source code. How to integrate with Java. Please help me.
VietOCR supports both .exe and .dll integration. Under Settings/Options, check Use libtesseract Library box to work with .dll. VietOCR is itself a NetBeans project. Be sure to use JDK 32-bit.
Use libtesseract Library
There are pertinent 64-bit DLLs available from Tesseract .NET wrapper project.
I have downloaded 64bit dlls and ran the project, still getting below error while running test cases.
Unable to load library 'libtesseract302': Native library (win32-x86-64/libtesseract302.dll) not found in resource path
java.lang.UnsatisfiedLinkError: Unable to load library 'libtesseract302': Native library (win32-x86-64/libtesseract302.dll) not found in resource path "
You got that error because you were loading 32-bit DLLs in 64-bit JVM.
I got the problem now. even though we have 64 bit DLL's still we are facing issue with the 64 JVM.
Because refering to this link https://code.google.com/p/tesseract-ocr/wiki/ReadMe
An installer is available for Windows from our download page. This includes the English training data.
If you want to use another language, download the appropriate training data, unpack it using 7-zip, and copy the .traineddata file into the 'tessdata' directory, probably C:\Program Files\Tesseract OCR\tessdata." So the test data is residing 32 bit. That's why we are facing above problem.
I'm still not sure what your problem is regarding 64-bit version. But you should not have different versions of Tesseract installed. That Windows installer has set TESSDATA_PREFIX environment variable to a location that may be not what you expect for your app. You may want to consider uninstalling that version and cleaning up any Registry settings that it may have set.
In initially when i tried the Tess4j jar, i got similar error like
"Exception in thread "main" java.lang.UnsatisfiedLinkError: %1 is not a valid Win32 application."
Then i got the info from this blog to update the ddl 64 bit. After moving to 64 bit dll file , now I'm facing new a issue which i could not get any fix for it.. Can you help me in fixing this issue.
""Exception in thread "main" java.lang.UnsatisfiedLinkError: The specified module could not be found.""
I used the three files from "Tesseract .NET wrapper project."
Put the two DLLs in the search path.
Another thing. Those DLLs were built with VS2012/VS2013 and therefore depend on the Visual C++ Redistributable for VS2012 or Visual C++ Redistributable for VS2013.