Reg: Tess4j and Java integration

shan
2013-12-23
2014-02-14
  • shan
    shan
    2013-12-23

    Hi,

    I am new to Tess4j. I am trying to read text from any document formats like PDF,JPEG, BMP, TIFF.

    I came across this API, want to check couple of things with you as below.

    1) am working on JDK 64 bit version and Netbeans 7.3.1, From the Tess4j documentation they are supporting only 32 bit only.If it is the case how i can go and implement?

    2)How to integrate with Java application?

    Please help me so that i can start something.

    Thanks,
    Santhi

     
  • Quan Nguyen
    Quan Nguyen
    2013-12-24

    Hi Santhi,

    Tesseract team only made 32-bit DLLs available. You may want to submit on their board a request for 64-bit support. Until that becomes available, you'd have to stick with Java 32-bit, or somehow you can compile Tesseract to 64-bit (several people have claimed to be able to do that).

    You can check out VietOCR for a desktop example of Tess4J integration.

    Quan

     
  • shan
    shan
    2013-12-26

    Thanks Quan for your quick reply.

    I tried to work on the VietOCR project, that is exe file not having any source code. How to integrate with Java. Please help me.

    Thanks,
    Santhi

     
  • Quan Nguyen
    Quan Nguyen
    2013-12-27

    VietOCR supports both .exe and .dll integration. Under Settings/Options, check Use libtesseract Library box to work with .dll. VietOCR is itself a NetBeans project. Be sure to use JDK 32-bit.

    https://sourceforge.net/projects/vietocr/files/vietocr/

     
  • Quan Nguyen
    Quan Nguyen
    2013-12-30

    Santhi,

    There are pertinent 64-bit DLLs available from Tesseract .NET wrapper project.

    Quan

     
  • shan
    shan
    2013-12-30

    Hi Quan,

    I have downloaded 64bit dlls and ran the project, still getting below error while running test cases.

    "
    Unable to load library 'libtesseract302': Native library (win32-x86-64/libtesseract302.dll) not found in resource path
    java.lang.UnsatisfiedLinkError: Unable to load library 'libtesseract302': Native library (win32-x86-64/libtesseract302.dll) not found in resource path "

    Santhi

     
  • Quan Nguyen
    Quan Nguyen
    2013-12-30

    You got that error because you were loading 32-bit DLLs in 64-bit JVM.

     
  • shan
    shan
    2014-01-02

    Hi Quan,

    I got the problem now. even though we have 64 bit DLL's still we are facing issue with the 64 JVM.

    Because refering to this link https://code.google.com/p/tesseract-ocr/wiki/ReadMe
    "Windows
    An installer is available for Windows from our download page. This includes the English training data.

    If you want to use another language, download the appropriate training data, unpack it using 7-zip, and copy the .traineddata file into the 'tessdata' directory, probably C:\Program Files\Tesseract OCR\tessdata." So the test data is residing 32 bit. That's why we are facing above problem.

     
  • Quan Nguyen
    Quan Nguyen
    2014-01-02

    I'm still not sure what your problem is regarding 64-bit version. But you should not have different versions of Tesseract installed. That Windows installer has set TESSDATA_PREFIX environment variable to a location that may be not what you expect for your app. You may want to consider uninstalling that version and cleaning up any Registry settings that it may have set.

     
  • avinash
    avinash
    2014-02-13

    Hi Quan,

    In initially when i tried the Tess4j jar, i got similar error like
    "Exception in thread "main" java.lang.UnsatisfiedLinkError: %1 is not a valid Win32 application."
    Then i got the info from this blog to update the ddl 64 bit. After moving to 64 bit dll file , now I'm facing new a issue which i could not get any fix for it.. Can you help me in fixing this issue.

    ""Exception in thread "main" java.lang.UnsatisfiedLinkError: The specified module could not be found.""

    I used the three files from "Tesseract .NET wrapper project."
    1. liblept168.dll
    2. libtesseract302.dll
    3. tesseract.exe

    avinash