Menu

#4 Add the ability to return confidence level

None
closed
None
5
2021-05-05
2013-06-06
No

Would be nice if Tess4J can also return:

Discussion

  • Quan Nguyen

    Quan Nguyen - 2013-06-07
     
  • Dmitry Katsubo

    Dmitry Katsubo - 2013-06-07

    Indeed they are available in TessAPI, but handle is deleted in doOCR(). What should be the flow then? Expected something like:

    Tesseract tess = Tesseract.getInstance();
    String result = tess.doOCR(...);
    int confidence = tess.getMeanTextConf(); ?
    
     
  • Quan Nguyen

    Quan Nguyen - 2013-06-07

    The more general Tesseract class is not final, so you can certainly extend it to expose more functionality provided by the lower level TessAPI interface.

     
  • Dmitry Katsubo

    Dmitry Katsubo - 2013-06-10

    Extending Tesseract does not help too much, as still whole method Tesseract.doOCR(int xsize, int ysize, ByteBuffer buf, Rectangle rect, int bpp) should be copy-pasted. Would be nice if, for example, initialization block would be extracted to separate function:

    protected TessAPI.TessBaseAPI prepareTessAPI(int xsize, int ysize, ByteBuffer buf, Rectangle rect, int bpp) {
        TessAPI api = TessAPI.INSTANCE;
        TessAPI.TessBaseAPI handle = api.TessBaseAPICreate();
        ...
        api.TessBaseAPISetRectangle(handle, rect.x, rect.y, rect.width, rect.height);
        retrun handle;
    }
    

    plus maybe another helper:

    protected String getOCRText(TessAPI.TessBaseAPI handle) {
        TessAPI api = TessAPI.INSTANCE;
        Pointer utf8Text = hocr ? api.TessBaseAPIGetHOCRText(handle, pageNum - 1) : api.TessBaseAPIGetUTF8Text(handle);
        String text = utf8Text.getString(0);
        api.TessDeleteText(utf8Text);
        return text;
    }
    

    Basically, above mentioned doOCR() is now decomposed:

    public String doOCR(int xsize, int ysize, ByteBuffer buf, Rectangle rect, int bpp) throws TesseractException {
        TessAPI.TessBaseAPI handle = prepareTessAPI(xsize, ysize, buf, rect, bpp);
        String text = getOCRText(handle);
        TessAPI.INSTANCE.TessBaseAPIDelete(handle);
        return text;
    }
    

    Now extending Tesseract makes sense. If you have another scenario in mind, please share a complete example.

    Another note: I think that expression

    if (rect != null && !rect.equals(EMPTY_RECTANGLE)) {
    

    should be better turned into:

    if (rect != null && !rect.isEmpty()) {
    

    and then one don't need static EMPTY_RECTANGLE.

     
  • Quan Nguyen

    Quan Nguyen - 2013-06-15

    doOCR is a simple method that encapsulates Tesseract engine initialization, processing a single image, and then shutdown. It is not efficient if you process multiple images. Sure you can override it with a more efficient algorithm in which the engine is initialized once, processes or manipulates all the images, and finally shuts down to release used resources.

    Due to my personal work, it could be some time before I can get back on this. You're welcome to submit a patch. Thanks.

     
  • Dmitry Katsubo

    Dmitry Katsubo - 2013-06-20

    I am attaching my first attempt for your review. From my perspective it is a step for better because:

    • Batch image processing is now faster, as init() / dispose() are called only once.
    • Class that extends Tesseract1 can implement other OCRing strategy easily, as all needed functions are now a separate blocks.

    Notes:

    • I think that all occurrences of IIOImage can be replaced by RenderedImage with no impact as IIOImage is used as wrapper for RenderedImage. ImageIO is forced to read thumbnails, which are not used.
    • Using of System.err in the library is mauvais ton, as if it is used in AS application, you don't know where it is logged to (if logged at all). So logger, logger is the way out. Or throw further.
    • Having two approaches as Tesseract1 and Tesseract makes no sense to me. If one is left it simplifies the development, reduces code duplication. Extension (Tesseract1) or aggregation (Tesseract): you need to choose one. I personally thing that extension (Tesseract1) is more natural in respect to handle.
     
  • Quan Nguyen

    Quan Nguyen - 2013-09-07

    Dmitry,

    It turns out that Tesseract class cannot be extended due to the private constructor. Tesseract1 is the extensible one here as necessary elements are exposed for access to inheriting classes.

    I incorporated many of your suggestions, including logging, into the code baseline. Tesseract is maintained because the alternative direct mapping method that Tesseract1 is based on was until recently still an experimental feature for JNA.

    Please help test the changes. Version 1.2 will be released soon. Thanks.

    Quan

     
  • Anonymous

    Anonymous - 2013-09-09

    Could you upload .jar and .source.jar somewhere (e.g. Maven snapshots)? I will test against binaries that you will create when you make a release.

     
  • Quan Nguyen

    Quan Nguyen - 2013-09-09

    1.2-Beta attached.

     
  • Quan Nguyen

    Quan Nguyen - 2013-09-22

    Fixed with release of v1.2. Special thanks to Dmitry Katsubo for the software patch, testing, and valuable suggestions.

     
  • Quan Nguyen

    Quan Nguyen - 2013-09-22
    • status: open --> closed
     
  • Anonymous

    Anonymous - 2021-05-03

    I know this issue is a years old, but I'm wondering what is the current 'best' way to get the confidences? Like others, I am also confused by the difference between Tesseract vs Tesseract1 and TessAPI vs TessAPI1

    I see what you said about doOcr() being intended for a single image because it shuts down after processing. What is the best way to be able to process multiple images? Is there any documentation on the best way to do this (as well as getting the confidences)

    thank you

     
  • Peter Kronenberg

    I just entered that last post, but I wasn't logged in.

     
  • Peter Kronenberg

    I see TessBaseAPIAllWordConfidences, which says that it returns the same number of values as that returned by GetUTF8. But TessBaseAPIGetUTF8Text returns a single string, not an array. Can you provide an example? I've read the Javadoc, but it's not always clear without an example.

    Is there an efficient way to process multiple images, but one at a time, without sending them all in as an array.

    TessBaseAPIAllWordConfidences() doesn't seem to work with doOCR(), because doOCR() closes everything down instead of leaving it open for the TessBaseAPIAllWordConfidences() call

     

    Last edit: Peter Kronenberg 2021-05-04
  • Quan Nguyen

    Quan Nguyen - 2021-05-05

    Please continue the discussion either in the Discussion section or over on GitHub site rather than on this old, closed ticket.

    Thanks.

     

Anonymous
Anonymous

Add attachments
Cancel