VietOCR / Discussion / Help: Ubuntu - cannot find Tesseract or Tessdata. Please enter its name in Settings

JJ Méric - 2018-07-05

Hi there ! Please help me get started !
I'm on Ubuntu 16.04 LTS, just installed Java, Tesseract as per instructions in readme file, and VietOCR3.
I keep having this message "Either Tesseract executable or Tessdata not found...". When I go into Settings / Options
1) Tesseract path usr/bin is correct - I checked that there is a Tesseract file in there. I specified it, saved, restarted. Same problem.
2) there is no option to specify Tessdata. But it does exist is usr/share/tesseract-ocr/tessdata - I was even able to download and "sudo mv" file fra.traineddata in there.

I also have tried to Shut down and restart the computer, same problem again.

I have low experience in Linux/Ubuntu. My previous experience with VietOCR is on Windows - I have created there a bam.traineddata for bambara, a west african language, which works just fine.

Thanks for help!

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

JJ Méric - 2018-07-05

Fixed, maybe in a bad way but fixed, see below.

the only way to get rid of the message is to point Settings to /../home/user1/VietOCR3/tesseract-ocr
I have two files here, tried each, also tried with and without the option "use libtesseract library".
I got rid of the message and can now select OCR Language English or Vietnamese (but not the newly installed fra and bam traineddata files ?)

But when I test VietOCR on a sample, it crashes with this console output :

Error opening data file /usr/share/tesseract-ocr/4.00/tessdata/vie.traineddata
Please make sure the TESSDATA_PREFIX environment variable is set to the parent directory of your "tessdata" directory.
Failed loading language 'vie'
Tesseract couldn't load any languages!

A fatal error has been detected by the Java Runtime Environment: (...)

I have tried to change the TESSDATA_PREFIX using the sample command given
export TESSDATA_PREFIX=/usr/local/share/
later changed it to /usr/share/tesseract-ocr
since this is where the tessdata directpory is after install, but this does not seem to work on my computer/ubuntu : VietOCR still keeps trying opening opening data file
/usr/share/tesseract-ocr/4.00/tessdata/vie.traineddata

I finally decided to change the installed /usr/share/tesseract-ocr structure and create there a 4.00 directory where I moved tessdata (by the way this is more in line with what is found under ~/VietOCR/tesseract-ocr)

This way VietOCR no longer complains, I can now see the OCR Languages I have just installed, and it runs just fine on the vietnames sample given.
It also works OK with my "bam" language on my bambara texts. Great !

Last edit: JJ Méric 2018-07-05

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Quan Nguyen - 2018-07-06

Sorry for the troubles you went through. On Linux, Tesseract 4.00 suddenly changed its default location of tessdata to under a 4.00 subdirectory. I believe if you install a 4.00-compatible language pack, it will be placed under there. Be sure to use the version compatible with your installed Tesseract's version.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

JJ Méric - 2018-07-06

Thanks Quan
as I want to play with all options, I'm currently stuck with DanAmbigs. My initial attempt fails, I don't quite see why... Is there an error in the attached file ?

bam.DangAmbigs.txt

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

JJ Méric - 2018-07-06

my settings (screenshot)

bamDangAmbigs.png

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

JJ Méric - 2018-07-06

sorry, question of an idiot : I had not presse trhe Post-process icon. Apologies

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Ubuntu - cannot find Tesseract or Tessdata. Please enter its name in Settings

Forums

Help

Ubuntu - cannot find Tesseract or Tessdata. Please enter its name in Settings document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

A fatal error has been detected by the Java Runtime Environment: (...)

Ubuntu - cannot find Tesseract or Tessdata. Please enter its name in Settings