#5 ImageIOHelper#convertImageData() assumes that DataBufferByte is always returned

None
closed
Quan Nguyen
None
5
2014-08-15
2013-06-11
Dmitry Katsubo
No

This is one scenario when ClassCastException is thrown:

java.lang.ClassCastException: java.awt.image.DataBufferUShort cannot be cast to java.awt.image.DataBufferByte
    at net.sourceforge.vietocr.ImageIOHelper.convertImageData(Unknown Source)
    at net.sourceforge.vietocr.ImageIOHelper.getImageByteBuffer(Unknown Source)
    at net.sourceforge.tess4j.Tesseract.doOCR(Unknown Source)
    at net.sourceforge.tess4j.Tesseract.doOCR(Unknown Source)
    at net.sourceforge.tess4j.Tesseract.doOCR(Unknown Source)

ImageIOHelper#convertImageData() assumes that DataBufferByte is always returned. I think more correct would be to use something from this list:

  • int[] java.awt.image.Raster#getPixels(int x, int y, int w, int h, int iArray[])
  • byte[] com.sun.media.imageio.common.ImageUtil#getUnpackedBinaryData(Raster, Rectangle) (works only for binary data, but implementation can be used as a reference)

Discussion

  • Quan Nguyen
    Quan Nguyen
    2013-06-15

    I haven't spent time to investigate on this, but having DataBufferUShort is, I think, a result of images with 24-bit or more bit depth, in other words, colored image. Ones would need to convert them to gray or binary images first for Tess4J to process.

     
    Last edit: Quan Nguyen 2013-07-02

  • Anonymous
    2013-06-16

    I also think so, I will doublecheck this. However I think Tess4J can also automatically convert colored images to greyscale like here:

    private BufferedImage colorFrame = ...;
    private BufferedImage grayFrame =
        new BufferedImage(width, height, BufferedImage.TYPE_BYTE_GRAY);
    
    BufferedImageOp grayscaleConv = 
        new ColorConvertOp(colorFrame.getColorModel().getColorSpace(), 
                         grayFrame.getColorModel().getColorSpace(), null);
    grayscaleConv.filter(colorFrame, grayFrame);
    
     
  • Quan Nguyen
    Quan Nguyen
    2013-06-16

    It should not automatically convert, like Tesseract. The users will have to determine whether they want or need to do that on their images. ImageHelper class already includes a few useful conversion methods.

     
    Last edit: Quan Nguyen 2013-06-16
  • Dmitry Katsubo
    Dmitry Katsubo
    2013-06-20

    Thanks!
    Well, JavaDocs of doOCR(int xsize, int ysize, ... says:

    @param bpp bits per pixel, represents the bit depth of the image, with 1 for binary bitmap, 8 for gray, and 24 for color RGB.

    so it's somehow hints that 24 bpp is supported. Also I think that if there are 3 bytes per solor, Tess4J can convert it automatically. Indeed somebody can complain that this conversion may be not very good, but this can be written in JavaDoc! Namely:

    Tesseract supports only binary and greyscale images. So if you pass 24-colored image it will be automatically converted to greyscale. In case this conversion is not good for your needs, do the conversion before passing RenderedImage to Tess4J API.

    What do you think? In general it's just a matter of few lines:

    if (image.getColorModel().getPixelSize() > 8) {
        image = ImageHelper.convertImageToGrayscale(image);
    }
    
     
    • Quan Nguyen
      Quan Nguyen
      2013-07-02

      Sorry, I misstated about 24-bit images as they are actually supported.

      The ClassCastException could be handled in a more robust or informative way to suggest users what to do, such as converting to supported image types (gray, binary).

       

  • Anonymous
    2013-06-27

    And here goes example of 4-bpp image that fails. In both cases java.lang.ClassCastException: java.awt.image.DataBufferUShort cannot be cast to java.awt.image.DataBufferByte

     
    Attachments
  • Quan Nguyen
    Quan Nguyen
    2013-07-01

    I was able to get satisfactory results with the test images after converting them to gray scale, as follows:

    File imageFile = new File("test.motorola.tif");
    BufferedImage bi = ImageIO.read(imageFile);
    bi = ImageHelper.convertImageToGrayscale(bi);
    String result = instance.doOCR(bi);
    
     
  • Quan Nguyen
    Quan Nguyen
    2013-09-07

    A call to ImageHelper.convertImageToGrayscale method has been added for images whose raster data are not of DataBufferByte type. It successfully reads all the problematic images. Please verify. Thanks.

     


Anonymous


Cancel   Add attachments