#1 Error in text extraction

closed-fixed
Elfi Heck
Rendering (1)
9
2009-10-28
2009-09-14
Michael Haufler
No

We found a funny error using JPOD for Textextraction

The attached PDF generates the following error:

Caused by: de.intarsys.pdf.content.CSError: unexpected exception
at de.intarsys.pdf.content.CSInterpreter.process(Unknown Source)
at de.intarsys.pdf.content.CSDeviceBasedInterpreter.process(Unknown Source)
at scireum.common.pdf.content.ExtractText.extractText(ExtractText.java:34)
at scireum.common.pdf.content.ExtractText.extractText(ExtractText.java:52)
at scireum.common.pdf.content.ExtractText.extractText(ExtractText.java:65)
... 7 more
Caused by: java.lang.ClassCastException: de.intarsys.pdf.cos.COSStream cannot be cast to de.intarsys.pdf.cos.COSDictionary
at de.intarsys.pdf.pd.PDShading$MetaClass.doDetermineClass(Unknown Source)
at de.intarsys.pdf.cos.COSBasedObject$MetaClass.createFromCos(Unknown Source)
at de.intarsys.pdf.pd.PDResources.getResource(Unknown Source)
at de.intarsys.pdf.pd.PDResources.getShadingResource(Unknown Source)
at de.intarsys.pdf.content.CSDeviceBasedInterpreter.lookupShading(Unknown Source)
at de.intarsys.pdf.content.CSDeviceBasedInterpreter.render_sh(Unknown Source)
at de.intarsys.pdf.content.CSInterpreter.process(Unknown Source)
... 12 more

We resolved this issue by by skipping the render_sh operator which is not needed for text extraction

Discussion

  • File to reproduce error

     
    Attachments
  • mtraut
    mtraut
    2009-09-15

    we missed accepting streams as legal shading objects.

    thanks for this test document.

     
  • mtraut
    mtraut
    2009-09-15

    • priority: 5 --> 9
    • assigned_to: nobody --> eheck
    • status: open --> open-accepted
     
  • Elfi Heck
    Elfi Heck
    2009-10-28

    Fixed.
    A note to users of JPodRenderer: please be aware that while shadings with types 4 to 7 don't cause an exception anymore we still don't have code to render them correctly.

     
  • Elfi Heck
    Elfi Heck
    2009-10-28

    • status: open-accepted --> open-fixed
     
  • Elfi Heck
    Elfi Heck
    2009-10-28

    • status: open-fixed --> closed-fixed