Tesseract upgrade missing text when extracting
I remember the 8.3 filename limitation in old DOS or Windows 95 era, but all modern OSes should be able to handle the long filenames. Which Windows version are you seeing the issue in?
VietOCR v6.8.0 & VietOCR.NET v6.7.0 Releases
@Praveen Anand Please use the Lept4J version compatible with your Leptonica installation.
It appears that the program cannot read the specified file because of the file name or path contains Unicode characters. At the beginning of the script, export LC_ALL=C was set, which might have interfered with that. https://www.ibm.com/support/pages/what-lcall-variable You can either remove that line or run at the command line with options to give more heap memory for Java program startup. java -Xms128m -Xmx2048m -jar VietOCR.jar https://docs.oracle.com/en/java/javase/13/docs/specs/man/java.htm...
It appears that Capture2Text uses Tesseract engine as well (not sure which version); thus I would think it would produce the same output. PS: Someone has put in a request for them to upgrade the OCR engine to the latest Tesseract version. You may want to up vote the ticket so they put more attention to it. https://sourceforge.net/p/capture2text/tickets/218/
It appears that Capture2Text uses Tesseract engine as well (not sure which version); thus I would think it would produce the same output.
Is your Windows 64-bit and you have .NET 4.8 installed on the machine? Can you try the Java version? It has the same functionality as the .NET version. Thanks.