Menu

Ubuntu - cannot find Tesseract or Tessdata. Please enter its name in Settings

Help
JJ Méric
2018-07-05
2018-07-06
  • JJ Méric

    JJ Méric - 2018-07-05

    Hi there ! Please help me get started !
    I'm on Ubuntu 16.04 LTS, just installed Java, Tesseract as per instructions in readme file, and VietOCR3.
    I keep having this message "Either Tesseract executable or Tessdata not found...". When I go into Settings / Options
    1) Tesseract path usr/bin is correct - I checked that there is a Tesseract file in there. I specified it, saved, restarted. Same problem.
    2) there is no option to specify Tessdata. But it does exist is usr/share/tesseract-ocr/tessdata - I was even able to download and "sudo mv" file fra.traineddata in there.

    I also have tried to Shut down and restart the computer, same problem again.

    I have low experience in Linux/Ubuntu. My previous experience with VietOCR is on Windows - I have created there a bam.traineddata for bambara, a west african language, which works just fine.

    Thanks for help!

     
  • JJ Méric

    JJ Méric - 2018-07-05

    Fixed, maybe in a bad way but fixed, see below.

    the only way to get rid of the message is to point Settings to /../home/user1/VietOCR3/tesseract-ocr
    I have two files here, tried each, also tried with and without the option "use libtesseract library".
    I got rid of the message and can now select OCR Language English or Vietnamese (but not the newly installed fra and bam traineddata files ?)

    But when I test VietOCR on a sample, it crashes with this console output :

    Error opening data file /usr/share/tesseract-ocr/4.00/tessdata/vie.traineddata
    Please make sure the TESSDATA_PREFIX environment variable is set to the parent directory of your "tessdata" directory.
    Failed loading language 'vie'
    Tesseract couldn't load any languages!

    A fatal error has been detected by the Java Runtime Environment: (...)

    I have tried to change the TESSDATA_PREFIX using the sample command given
    export TESSDATA_PREFIX=/usr/local/share/
    later changed it to /usr/share/tesseract-ocr
    since this is where the tessdata directpory is after install, but this does not seem to work on my computer/ubuntu : VietOCR still keeps trying opening opening data file
    /usr/share/tesseract-ocr/4.00/tessdata/vie.traineddata

    I finally decided to change the installed /usr/share/tesseract-ocr structure and create there a 4.00 directory where I moved tessdata (by the way this is more in line with what is found under ~/VietOCR/tesseract-ocr)

    This way VietOCR no longer complains, I can now see the OCR Languages I have just installed, and it runs just fine on the vietnames sample given.
    It also works OK with my "bam" language on my bambara texts. Great !

     

    Last edit: JJ Méric 2018-07-05
  • Quan Nguyen

    Quan Nguyen - 2018-07-06

    Sorry for the troubles you went through. On Linux, Tesseract 4.00 suddenly changed its default location of tessdata to under a 4.00 subdirectory. I believe if you install a 4.00-compatible language pack, it will be placed under there. Be sure to use the version compatible with your installed Tesseract's version.

     
  • JJ Méric

    JJ Méric - 2018-07-06

    Thanks Quan
    as I want to play with all options, I'm currently stuck with DanAmbigs. My initial attempt fails, I don't quite see why... Is there an error in the attached file ?

     
  • JJ Méric

    JJ Méric - 2018-07-06

    my settings (screenshot)

     
  • JJ Méric

    JJ Méric - 2018-07-06

    sorry, question of an idiot : I had not presse trhe Post-process icon. Apologies

     

Log in to post a comment.