TextExtraction on food packaging

Brought to you by: davedupplaw, jonhare, sinjax

TextExtraction on food packaging

Forum: General Discussion

Creator: Matthieu

Created: 2013-05-16

Updated: 2016-09-13

Matthieu - 2013-05-16

Hi,

For a new development project, I need to implement a program to detect and extract texts from food packaging.
I found the paper "MULTISCALE EDGE-BASED TEXT EXTRACTION FROM COMPLEX IMAGES" that seems to correspond with what I want to do.
Do you think the implementation "LiuSamarabanduTextExtractorMultiscale" is appropriate for me ?
I tried to test this function :

final FImage testImage = ImageUtilities.readF(this.getClass().getResource("image.jpg") ).normalise().process( new ResizeProcessor( 620 ) );
// Process the image
final LiuSamarabanduTextExtractorBasic te = new LiuSamarabanduTextExtractorBasic();
te.processImage( testImage );
Map<rectangle,fimage> imageRegions = this.getTextRegions();</rectangle,fimage>

...but the processImage function opens some windows and does not allow the rest of the script to run ?

Do you have an example and some advice for this type of treatment ?

Thanks,
Matthieu

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anonymous
  
  Add attachments
  Cancel
  You seem to have CSS turned off. Please don't fill out this field.
  
  You seem to have CSS turned off. Please don't fill out this field.

Anonymous - 2013-05-16

Hi,

Yes, sorry about all those windows popping up. I guess it was from the time I was debugging it - to see what it was doing. As you say, LiuSamarabanduTextExtractorBasic is an implementation of their single-scale version of the algorithm and LiuSamarabanduTextExtractorMultiscale is an implementation of their multiscale algorithm (described in the paper you cite).

These detect text in the image and provide bounding boxes for regions that may be text. I copied and slightly tweaked your code above and it works just fine - it shouldn't stop the execution of the rest of the code; it will just pop up lots of windows.

I've attached the Java for a working version that shows the bounding boxes of the regions of text.

I've also gone in and turned off the debug windows that were being produced by the SkewCorrector. If you use the latest version from the svn (rev 1997), you won't get those pop up.

Hope that helps,
Dave

TextExtractorTest.java

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anonymous
  
  Add attachments
  Cancel
  You seem to have CSS turned off. Please don't fill out this field.
  
  You seem to have CSS turned off. Please don't fill out this field.

Matthieu - 2013-05-16

Thank you very much for your response !

Indeed your script works fine for me, but I made a mistake on my example, I would like to use the multiscale implementation (LiuSamarabanduTextExtractorMultiscale).
I tried to start your test with this method. it is a bit slower and text areas are not recognized. Any Idea ?
From what I understood from the article, the multiscale algorithm would be better for my use ? You confirm ? For information, food packaging will be taken with smartphone.

Examples:
Sample 1

Thanks again,
Matthieu

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anonymous
  
  Add attachments
  Cancel
  You seem to have CSS turned off. Please don't fill out this field.
  
  You seem to have CSS turned off. Please don't fill out this field.

David Dupplaw - 2013-05-16

Hey Matthieu,

I just looked and noticed the multiscale code had debug stuff in it too - and one of those debug methods did a wait which is why the script appeared to stop working. I've removed the wait and set the default so that windows are not displayed (rev. 2000).

I think the multiscale algorithm attempts to improve on the basic version by allowing detection of larger text. It will, of course, be slower as it's doing the extraction processing on a pyramid of images. It doesn't seem to work well on the small text though.

One way to help towards this is to double the size of the top picture in the pyramid. This functionality wasn't there, so I've just added it (rev.2001). There's a method setDoubleSizePyramid(boolean) now to set this. The default is TRUE. I found the detection on the olives much better with this option set, but still worse than the basic version on the baby food.

To be honest, the basic version is going to be best for finding small text. The multiscale version is designed to find larger text that isn't detected so well with the basic version - although it looks like it does worse on the small text which isn't ideal!

Dave

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anonymous
  
  Add attachments
  Cancel
  You seem to have CSS turned off. Please don't fill out this field.
  
  You seem to have CSS turned off. Please don't fill out this field.

Matthieu - 2013-05-16

Hi David,

Thanks, I will try your revision.
So You can find a real file example in attachment ; others are internet samples.
If I understand your advice Basic implementation for these texts is better ?

Is there an OCR implementation on OpenImaj ? I tried tess4J on the Basic extractor results but text does not have smooth edges. I hope it will be conclusive but for now I have no text result.

Matthieu

porc_caramel copy.jpg

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anonymous
  
  Add attachments
  Cancel
  You seem to have CSS turned off. Please don't fill out this field.
  
  You seem to have CSS turned off. Please don't fill out this field.

Anonymous - 2015-05-07

Post awaiting moderation.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anonymous
  
  Add attachments
  Cancel
  You seem to have CSS turned off. Please don't fill out this field.
  
  You seem to have CSS turned off. Please don't fill out this field.

Anonymous - 2016-09-13

Post awaiting moderation.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anonymous
  
  Add attachments
  Cancel
  You seem to have CSS turned off. Please don't fill out this field.
  
  You seem to have CSS turned off. Please don't fill out this field.

Anonymous