Java OCR / Support Requests / #1 Using JavaOCR in Java code

Ronald B. Cemer - 2010-09-07

Hi Christina,

Currently there is no documentation other than the original article on my website at http://www.roncemer.com , the JavaDoc API documentation (which should be built automatically whenever you build the project), and whatever documentation may have been contributed by the other two developers who have been working on the project since I released it. But we'd welcome any contributions you'd like to make in that area. If you're interested, we could add you as a contributor to the project. Just let me know.

Thanks!
Ron

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anonymous
  
  Add attachments
  Cancel
  You seem to have CSS turned off. Please don't fill out this field.
  
  You seem to have CSS turned off. Please don't fill out this field.

Christina Kaskoura - 2010-09-07

Hi Ron.

Thanks for your reply. It seems I'll have to make do with the API documentation for the time being. However, I would be more than grateful if you or one of the other developers could answer my question whether it is possible to perform OCR on images without having to train the OCRScanner every time I create a new OCRScanner instance.
If during my experiments with JavaOCR I get the time to write something that could be used as documentation I will let you know.

Thanks again,
Christina

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anonymous
  
  Add attachments
  Cancel
  You seem to have CSS turned off. Please don't fill out this field.
  
  You seem to have CSS turned off. Please don't fill out this field.

Ronald B. Cemer - 2010-09-09

Hi Christina,

You're not bothering me at all! I'm happy to see people put the code to use.

The algorithm is a very simple image-matching algorithm using a least-mean-square-error formula to score each training image's resemblance to the character being decoded. So without having the training images in memory, it won't be able to recognize any characters.

It's not really so much a "training" process as just the process of loading up the reference (training) images into memory so it has something to compare against.

This OCR engine is font-specific, BTW. So for each font you want it to recognize, you need to have training images for all of the characters you want it to recognize in that font.

Hope that helps!
Ron

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anonymous
  
  Add attachments
  Cancel
  You seem to have CSS turned off. Please don't fill out this field.
  
  You seem to have CSS turned off. Please don't fill out this field.

Christina Kaskoura - 2010-09-09

Hi Ron.

Thanks for your input. If I have any more questions or comments while working with JavaOCR I'll let you know.

Christina

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anonymous
  
  Add attachments
  Cancel
  You seem to have CSS turned off. Please don't fill out this field.
  
  You seem to have CSS turned off. Please don't fill out this field.

Christina Kaskoura - 2010-09-13

Hello Ron.

I've starting using JavaOCR with images I am creating myself and it seems that it's having a hard time recognising how many characters are in each training image. So, I keep getting error messages like this when loading the training images:

Expected to decode 26 characters but actually decoded 29 characters in training

The method I am using to create the training images is nothing really fancy. I am just writing the characters I want in MSWord and take a print screen of the area containing the characters in order to save them as an image.

Is there something in particular I should be doing in order to create my images (ex. use a specific font size, a specific number of spaces between the characters or a specific image format)? How do you think I could get over this problem?

Thank you,
Christina

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anonymous
  
  Add attachments
  Cancel
  You seem to have CSS turned off. Please don't fill out this field.
  
  You seem to have CSS turned off. Please don't fill out this field.

Christina Kaskoura - 2010-09-13

And one more question: Are there any plans to make JavaOCR available through Maven?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anonymous
  
  Add attachments
  Cancel
  You seem to have CSS turned off. Please don't fill out this field.
  
  You seem to have CSS turned off. Please don't fill out this field.

Ronald B. Cemer - 2010-09-13

Hi Christina,

I recommend looking at the minCharBreakWidthAsFractionOfRowHeight attribute of the DocumentScanner class. There are accessor methods to get and set this value. It defaults to 0.05. If you increase it a little, it may help with your font. The algorithm which determines where one character ends and another begins, is very simplistic. It sounds as if the scanner is finding more character breaks than are acutally there.

Also, be careful not to include any "dust" or non-white pixels in your image that aren't part of the characters you're trying to get it to recognize.

The DocumentScanner class is used to both load training images and scan documents, so the same algorithms are used for breaking apart the characters in both the training images and the documents themseleves. That way, the thing is consistent in how it handles any specific font.

Another thing that may help, is to fiddle with whiteThreshold, which ranges from 0 to 255 inclusive, and defaults to 127. There are also accessor methods for this attribute.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anonymous
  
  Add attachments
  Cancel
  You seem to have CSS turned off. Please don't fill out this field.
  
  You seem to have CSS turned off. Please don't fill out this field.

Ronald B. Cemer - 2010-09-14

Sourceforge user ko5tik is working on adding maven build capability. Sorry, I think I forgot to answer this before.

I'm not sure whether he's planning on uploading it to any maven repository, but it does seem like that would be an ongoing pain that someone would have to go through in order to keep the current version up on that repository, unless sourceforge or the target repo can do that automatically somehow.

I know nothing about maven, except that it's an apache project that provides build capabilities similar to ant, so please forgive me if I'm a little uninformed on this subject. It would probably be a good idea to send a private message to ko5tik and see if he has any plans for maven support beyond just providing maven build capability.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anonymous
  
  Add attachments
  Cancel
  You seem to have CSS turned off. Please don't fill out this field.
  
  You seem to have CSS turned off. Please don't fill out this field.

Konstantin Pribluda - 2010-09-15

HI all,

Maven build is in repository, core is separeated from app, and builds already.
App needs some work though. It's not clear where to upload maven artefacts - I can
provide my private repository though.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anonymous
  
  Add attachments
  Cancel
  You seem to have CSS turned off. Please don't fill out this field.
  
  You seem to have CSS turned off. Please don't fill out this field.

Konstantin Pribluda - 2010-09-15

Hi Christina,

I already have some artifacts on maven central - and I'm aware that it takes a long time to deploy something there. Unfortunqately
javaocr is not in state deployable on central repos (there is still work necessary) - so the best option will be to build it yourself into your
local repo ( core shall build fine ) or wait till today evenyng
when I can deploy it to my private repo under:

http://www.pribluda.de/m2

BTW, please check your SF mail alias as I can not reply to it vie email

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anonymous
  
  Add attachments
  Cancel
  You seem to have CSS turned off. Please don't fill out this field.
  
  You seem to have CSS turned off. Please don't fill out this field.

Konstantin Pribluda - 2010-09-15

I deployed core and parent snapshot in my private maven repository:

http://www.pribluda.de/m2

Coordinates:
<groupId>net.sourceforge.javaocr</groupId>
<artifactId>javaocr-parent</artifactId>
<packaging>pom</packaging>
<name>Java OCR Parent project</name>
<version>1.102-SNAPSHOT</version>

and:
<groupId>net.sourceforge.javaocr</groupId>
<artifactId>javaocr-core</artifactId>
<packaging>pom</packaging>
<name>Java OCR Parent project</name>
<version>1.102-SNAPSHOT</version>

You need to include core in your pom as dependency

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anonymous
  
  Add attachments
  Cancel
  You seem to have CSS turned off. Please don't fill out this field.
  
  You seem to have CSS turned off. Please don't fill out this field.

Christina Kaskoura - 2010-09-16

Hi Konstantin.
I included java ocr core in my pom in order to get it from you repository. However, you might be interested in knowing that I get some warnings about failed checksums when downloading java ocr from the repository.
Thank you,
Christina

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anonymous
  
  Add attachments
  Cancel
  You seem to have CSS turned off. Please don't fill out this field.
  
  You seem to have CSS turned off. Please don't fill out this field.

craig - 2013-08-29

Is this project still being developed? Any plans to publish newer revs to mvn public repos (eg 1.1+)?

I have played with training and OCRScannerDemo, can someone please help with any of the following?

seems to be some deprecated code (eg DocumentScanner). Is there a newer ocr demo which uses non-deprecated code?

I have failed to train the letter 'H' in Broadway font, tried tweaking some values mentioned above. It manages to find 2 chars 'H' and 'I'. I think the problem is that the horizontal bar is single pixel, if I paint it using 2 pixel that works. Curious what setting/value will allow single pixel so i can use the font without custom changes. A blown up visual of the pixels is as follows:

******* ** ******* ** ******* ** ******* ** ******* ** ************** ******* ** ******* ** ******* ** ******* ** ******* ** ******* **
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anonymous
  
  Add attachments
  Cancel
  You seem to have CSS turned off. Please don't fill out this field.
  
  You seem to have CSS turned off. Please don't fill out this field.

craig - 2013-08-29

A question about CharacterRange and the training API. Rather than the API requiring a min and max char in a strict range I'd prefer to have training images with any random set of characters not strictly in a range and have the API accept an array (or ordered list etc) of the characters in the image. Has anyone considered/implemented this in javaocr? Does it make sense to do this?

eg

String[] randomChars = { 'H', 'I', '9', ... } loader.load("random.jpg", randomChars, trainingMap);
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anonymous
  
  Add attachments
  Cancel
  You seem to have CSS turned off. Please don't fill out this field.
  
  You seem to have CSS turned off. Please don't fill out this field.

Konstantin Pribluda - 2013-08-30

Hi Craig, at the moment there is no active development. Project reached state
suitable for all developers so we do not have active plans. As for character ranges -
there is no strict defined character range. You may use whatever set you like.

While training, you just say - this glyph is for certain character - then data in matcher are updated.

Latest version of javaocr is published on maven central:
http://mvnrepository.com/artifact/net.sourceforge.javaocr/javaocr-core

( there is misleading latest package in SF download page )

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anonymous
  
  Add attachments
  Cancel
  You seem to have CSS turned off. Please don't fill out this field.
  
  You seem to have CSS turned off. Please don't fill out this field.

Using JavaOCR in Java code

Group

Searches

Help

#1 Using JavaOCR in Java code

Discussion