jTessBoxEditor v2.6.0 & jTessBoxEditorFX 2.6.0 Releases
VietOCR v6.14.0 & VietOCR.NET v6.13.0 Releases
Thanks a lot for the guidance. I will try and update
It's because the program does not know where tesseract's tessdata directory is. You can define TESSDATA_PREFIX environment variable that contains the path to the directory. https://stackoverflow.com/questions/65597552/how-exactly-to-set-up-and-use-environment-variables-on-a-mac
VietOCR.NET v6.13.0 Release
Dear Nguyenq, Need advice on the VietOCR installation in macOS Sonoma. Whenever I open VietOCR I get the message: "tessdata folder is not found. Please install lanugage packs and/or set TESSDATA_PREFIX environment variable to parent directory of tessdata." Please help
Dear Nguyenq, Need advice on the VietOCR installation in macOS Sonoma. Whenever I open VietOCR I get the message: "tessdata folder is not found. Please install lanugage packs and/or set TESSDATA_PREFIX environment variable to parent directory of tessdata." Please help
LSTM or WORDSTR box files generated are to be used in Tesseract 4.x training, not in this program itself. The support for generating these box files was incorporated into the program on a request by a Tesseract developer. You can edit the box files, though, before using them in the training process. https://tesseract-ocr.github.io/tessdoc/tess4/TrainingTesseract-4.00.html
Hello everyone ! I have created a box file with the latest jtessboxeditor. I have deleted all empty spaces. When I try to train with the "trainer", I get the following error message : “cannot train with LSTM or WORDSTR box files. Training for Tesseract 4.0x is not supported” What is the issue ? creates jtessboxeditor only WORDSTR or LSTM ADDENDUM: if i do not edit the boxfile (delete blank characters) the boxfile is not recognized as WORDSTR or LSTM best regards !
Hello everyone ! I have created a box file with the latest jtessboxeditor. I have deleted all empty spaces. When I try to train with the "trainer", I get the following error message : “cannot train with LSTM or WORDSTR box files. Training for Tesseract 4.0x is not supported” What is the issue ? creates jtessboxeditor only WORDSTR or LSTM best regards !
Hello everyone ! I have created a box file with the latest jtessboxeditor. I have deleted all empty spaces. When I try to train with the "trainer", I get the following error message : “cannot train with LSTM or WORDSTR box files. Training for Tesseract 4.0x is not supported” What is the issue ? Erstellt jtessboxeditor nur WORDSTR oder LSTM ? best regards !
VietOCR v6.13.1 & VietOCR.NET v6.11.1 Releases
VietOCR v6.13.0 & VietOCR.NET v6.11.0 Releases
jTessBoxEditor v2.5.0 & jTessBoxEditorFX 2.5.0 Releases
VietOCR v6.12.0 & VietOCR.NET v6.10.0 Releases
Hello all, I am using vietocr 6.10.0 with tesseract 5.3.2/3 support to extract a English sanskrit iast image file (attached). I have a proper train data set that i downloaded (also attached) and I put it in tessdata. It is a page with two columns and it is extracted quite well except a few issues. The main issue I face is that it misses a line (the line underlined in blue in the image). I have attached the output file too. Someone suggested me to use different PSMs (as opposed to PSM 3 default).....
results
traineddata
Hello all, I am using vietocr 6.10.0 with tesseract 5.3.2/3 support to extract a English sanskrit iast image file (attached). I have a proper train data set that i downloaded (also attached) and I put it in tessdata. It is a page with two columns and it is extract quite well except a few issues. The main issue I face is that tesseract misses a line (the line underlined in blue in the image). I have attached the output file too. Someone suggested me to use different PSMs (as opposed to PSM 3 default).....
Hello all, I am using vietocr 6.10.0 with tesseract 5.3.2/3 support to extract a English sanskrit iast image file (attached). I have a proper train data set that i downloaded (also attached) and I put it in tessdata. It is a page with two columns and it is extract quite well except a few issues. The main issue I face is that tesseract misses a line (the line underlined in blue in the image). I have attached the output file too. Someone suggested me to use different PSMs (as opposed to PSM 3 default).....
Hello all, I am using vietocr 6.10.0 with tesseract 5.3.2/3 support to extract a English sanskrit iast image file (attached). I have a proper train data set that i downloaded (also attached) and I put it in tessdata. It is a page with two columns and it is extract quite well except a few issues. The main issue I face is that tesseract misses a line (the line underlined in blue in the image). I have attached the output file too. Someone suggested me to use different PSMs (as opposed to PSM 3 default).....
VietOCR v6.10.0 & VietOCR.NET v6.9.0 Releases
i have installed jtessboxeditor but i am not able to see devnagari font there .from where do u get devnagari font .please can u tell i am new to jtessboxeditor. so when i use box file in jtessboxeditor it doesnot recognize even a single word of hindi
We tested it with Oracle Java 20.0.2 on Windows. Can you try it with Oracle JRE? Thanks.
I have jTessBoxEditor running. I clicked File, Open and opened an image file. Now what? Is there anyone out there willing to give very simple instructions to walk me through the software? Much appreciated.
I have jTessBoxEditor running. I clicked File, Open and opened an image file. Now what? Is there anyone out there willing to give very simple instructions to walk me throught the software? Much appreciated.
Hi, I'm not able to choose the output formats for bulk processing. The dropdown activates and I see the choices - nothing happens when I click any choice. Running the batch process in this manner, I get a .box and a .unlv file per image (saved in the output directory). Also, the image magnification buttons are greyed out and not accessible. Choosing segmented regions' options doesn't draw any boxes on the left pane. Running VietOCR 6.9, ubuntu 22 java -version openjdk version "11.0.20" 2023-07-18...
VietOCR v6.9.0 & VietOCR.NET v6.8.0 Releases
VietOCR v6.8.0 & VietOCR.NET v6.7.0 Releases
It appears that the program cannot read the specified file because of the file name or path contains Unicode characters. At the beginning of the script, export LC_ALL=C was set, which might have interfered with that. https://www.ibm.com/support/pages/what-lcall-variable You can either remove that line or run at the command line with options to give more heap memory for Java program startup. java -Xms128m -Xmx2048m -jar VietOCR.jar https://docs.oracle.com/en/java/javase/13/docs/specs/man/java.htm...
Hi, I'm having some strange errors with VietOCR on ArchLinux. I start the program using ./ocr I try to open a new file. But VietOCR won't read my Documents folder. I get the following: Exception in thread "Basic L&F File Loading Thread" java.nio.file.InvalidPathException: Malformed input or input contains unmappable characters: /.../Downloads/Docs of Sort/????????????.html at java.base/sun.nio.fs.UnixPath.encode(UnixPath.java:121) at java.base/sun.nio.fs.UnixPath.(UnixPath.java:68) at java.base/sun.nio.fs.UnixFileSystem.getPath(UnixFileSystem.java:278)...
Hi, I'm having some strange errors with VietOCR on ArchLinux. I start the program using ./ocr I try to open a new file. But VietOCR won't read my Documents folder. I get the following: Exception in thread "Basic L&F File Loading Thread" java.nio.file.InvalidPathException: Malformed input or input contains unmappable characters: /.../Downloads/Docs of Sort/????????????.html at java.base/sun.nio.fs.UnixPath.encode(UnixPath.java:121) at java.base/sun.nio.fs.UnixPath.(UnixPath.java:68) at java.base/sun.nio.fs.UnixFileSystem.getPath(UnixFileSystem.java:278)...
It appears that Capture2Text uses Tesseract engine as well (not sure which version); thus I would think it would produce the same output. PS: Someone has put in a request for them to upgrade the OCR engine to the latest Tesseract version. You may want to up vote the ticket so they put more attention to it. https://sourceforge.net/p/capture2text/tickets/218/
It appears that Capture2Text uses Tesseract engine as well (not sure which version); thus I would think it would produce the same output.
Is your Windows 64-bit and you have .NET 4.8 installed on the machine? Can you try the Java version? It has the same functionality as the .NET version. Thanks.
Still nothing. This new version crashes right after running. With the previous version, sometimes the software would seem like it was processing each file, but mostly it reported being complete in less than 2 seconds. There is no output file. ever. This newest version crashes after processing.
Still nothing. This new version crashes right after running. With the previous version, sometimes the software would seem like it was processing each file, but mostly it reported being complete in less than 2 seconds. There is no output file. ever.
Hi, thanks for your feedback. I think you can add some options in the setting, so that people can opt in or opt out some features, like "always delete line break", "always convert to lower case" or "always copy text to clipboard". That would make the workflow faster. Currently, I have to use another software to record mouse clicks and key strokes, so that it can play back the actions and I don't have to do those repetitive tasks. And again, thanks for your software, it really saves me a lot of time...
Thanks for the input. It's certainly doable but does not seem like typical usage. I'm not familiar with Capture2Text, which probably is designed for that niche, specific workflow. Best regards.
Please try uninstalling the current version and installing the latest update. Thanks.
We verify that BulkOCR function is working -- make sure you select an OCR Language option first -- however, the PDF output files seem to be corrupted. We're investigating the issue.
I'm very happy with VietOCR.net. Congratulations on creating a highly accurate product. I've tried several that were very bad at OCR. I have a large number of image files to scan and am hoping to use the Bulk OCR function. When I run the tool, it produces no output in the program or in the designated directory. I have tried this setting the output to text, PDF, and PDF texct only. Windows 10, 64 Gigabytes RAM.
jTessBoxEditor v2.4.0 & jTessBoxEditorFX 2.4.0 Releases