Is there any documentation for Tesseract? I haven't found anything aside from the usage message:
./tesseract imagename outputbase [configfile [[+|-]varfile]...]
What are 'configfile' and 'varfile' here?
What are the options that can be tuned?
More specifically, I'd like to know if there is a way to restrict the set of characters that Tesseract is looking for (like the -C option of gocr). I need to convert a set of floating point numbers, so only the characters ".0123456789" are needed.
My first stab at it. Not in a way of a "user's manual" but more for the hackers.
Untar tess_docs01.tar.gz (a bit over 63MB
decompressed!) somewhere handy and point your favorite
browser to html-102/index.html - you should get two
frames with a lot of nifty relationships. Enjoy!
To download (since sf won't let me upload it):
2) Near the bottom, click on the button that says "list
of all files available
3) Scroll down until you see "Doxyfied Tessearct-1.02
documentation" and click on DOWNLOAD
4) Enter the magic number
5) Be sure to download it from the RFO mirror!
(Sorry, this IS a pain but keeps our BW under
when I follow the suggestions supplied by filipg in
The requested URL /bigfiles/tess_docs03.tar.gz was not found on this server.
when i decompose the tess_docs01.tar.gz , there is some problems.
then when i open the file folder html-102 , and i cann't find index.html.
i want the dear Filip Gieszczykiewicz check.
V 0.03 will be out by end of this week. It will have theory of operation, some stack-traces for common procedures, MUCH better data structure definitions, and a glossary. Where possible, I reference other GNU ocr programs. I will also try to put the sources up on my server so it will be available on-line (or at least parts of it, I don't think we need a copy of the sources, that's way too big).
Why not, read here ?
Sign up for the SourceForge newsletter:
You seem to have CSS turned off.
Please don't fill out this field.