#5 ImageIOHelper#convertImageData() assumes that DataBufferByte is always returned

None
closed
None
5
2014-08-15
2013-06-11
No

This is one scenario when ClassCastException is thrown:

java.lang.ClassCastException: java.awt.image.DataBufferUShort cannot be cast to java.awt.image.DataBufferByte
    at net.sourceforge.vietocr.ImageIOHelper.convertImageData(Unknown Source)
    at net.sourceforge.vietocr.ImageIOHelper.getImageByteBuffer(Unknown Source)
    at net.sourceforge.tess4j.Tesseract.doOCR(Unknown Source)
    at net.sourceforge.tess4j.Tesseract.doOCR(Unknown Source)
    at net.sourceforge.tess4j.Tesseract.doOCR(Unknown Source)

ImageIOHelper#convertImageData() assumes that DataBufferByte is always returned. I think more correct would be to use something from this list:

  • int[] java.awt.image.Raster#getPixels(int x, int y, int w, int h, int iArray[])
  • byte[] com.sun.media.imageio.common.ImageUtil#getUnpackedBinaryData(Raster, Rectangle) (works only for binary data, but implementation can be used as a reference)

Discussion

  • Quan Nguyen

    Quan Nguyen - 2013-06-15

    I haven't spent time to investigate on this, but having DataBufferUShort is, I think, a result of images with 24-bit or more bit depth, in other words, colored image. Ones would need to convert them to gray or binary images first for Tess4J to process.

     
    Last edit: Quan Nguyen 2013-07-02
  • Anonymous - 2013-06-16

    I also think so, I will doublecheck this. However I think Tess4J can also automatically convert colored images to greyscale like here:

    private BufferedImage colorFrame = ...;
    private BufferedImage grayFrame =
        new BufferedImage(width, height, BufferedImage.TYPE_BYTE_GRAY);
    
    BufferedImageOp grayscaleConv = 
        new ColorConvertOp(colorFrame.getColorModel().getColorSpace(), 
                         grayFrame.getColorModel().getColorSpace(), null);
    grayscaleConv.filter(colorFrame, grayFrame);
    
     
  • Quan Nguyen

    Quan Nguyen - 2013-06-16

    It should not automatically convert, like Tesseract. The users will have to determine whether they want or need to do that on their images. ImageHelper class already includes a few useful conversion methods.

     
    Last edit: Quan Nguyen 2013-06-16
  • Dmitry Katsubo

    Dmitry Katsubo - 2013-06-20

    Thanks!
    Well, JavaDocs of doOCR(int xsize, int ysize, ... says:

    @param bpp bits per pixel, represents the bit depth of the image, with 1 for binary bitmap, 8 for gray, and 24 for color RGB.

    so it's somehow hints that 24 bpp is supported. Also I think that if there are 3 bytes per solor, Tess4J can convert it automatically. Indeed somebody can complain that this conversion may be not very good, but this can be written in JavaDoc! Namely:

    Tesseract supports only binary and greyscale images. So if you pass 24-colored image it will be automatically converted to greyscale. In case this conversion is not good for your needs, do the conversion before passing RenderedImage to Tess4J API.

    What do you think? In general it's just a matter of few lines:

    if (image.getColorModel().getPixelSize() > 8) {
        image = ImageHelper.convertImageToGrayscale(image);
    }
    
     
    • Quan Nguyen

      Quan Nguyen - 2013-07-02

      Sorry, I misstated about 24-bit images as they are actually supported.

      The ClassCastException could be handled in a more robust or informative way to suggest users what to do, such as converting to supported image types (gray, binary).

       
  • Anonymous - 2013-06-21

    Correction: conversion to GreyScale does not work, one need conversion to binary. I have got the same exception for 4 bpp image.

     
  • Quan Nguyen

    Quan Nguyen - 2013-06-21

    Can you attach a sample 4-bpp image for testing purpose?

     
  • Anonymous - 2013-06-27

    Here go example of 2-bpp image that fails.

     
  • Anonymous - 2013-06-27

    And here goes example of 4-bpp image that fails. In both cases java.lang.ClassCastException: java.awt.image.DataBufferUShort cannot be cast to java.awt.image.DataBufferByte

     
  • Quan Nguyen

    Quan Nguyen - 2013-07-01

    I was able to get satisfactory results with the test images after converting them to gray scale, as follows:

    File imageFile = new File("test.motorola.tif");
    BufferedImage bi = ImageIO.read(imageFile);
    bi = ImageHelper.convertImageToGrayscale(bi);
    String result = instance.doOCR(bi);
    
     
  • Quan Nguyen

    Quan Nguyen - 2013-09-07

    A call to ImageHelper.convertImageToGrayscale method has been added for images whose raster data are not of DataBufferByte type. It successfully reads all the problematic images. Please verify. Thanks.

     
  • Quan Nguyen

    Quan Nguyen - 2013-09-22

    Fixed in v1.2.

     
  • Quan Nguyen

    Quan Nguyen - 2013-09-22
    • status: open --> closed
     


Anonymous

Cancel  Add attachments