#1 Error in text extraction

Rendering (1)

We found a funny error using JPOD for Textextraction

The attached PDF generates the following error:

Caused by: de.intarsys.pdf.content.CSError: unexpected exception
at de.intarsys.pdf.content.CSInterpreter.process(Unknown Source)
at de.intarsys.pdf.content.CSDeviceBasedInterpreter.process(Unknown Source)
at scireum.common.pdf.content.ExtractText.extractText(ExtractText.java:34)
at scireum.common.pdf.content.ExtractText.extractText(ExtractText.java:52)
at scireum.common.pdf.content.ExtractText.extractText(ExtractText.java:65)
... 7 more
Caused by: java.lang.ClassCastException: de.intarsys.pdf.cos.COSStream cannot be cast to de.intarsys.pdf.cos.COSDictionary
at de.intarsys.pdf.pd.PDShading$MetaClass.doDetermineClass(Unknown Source)
at de.intarsys.pdf.cos.COSBasedObject$MetaClass.createFromCos(Unknown Source)
at de.intarsys.pdf.pd.PDResources.getResource(Unknown Source)
at de.intarsys.pdf.pd.PDResources.getShadingResource(Unknown Source)
at de.intarsys.pdf.content.CSDeviceBasedInterpreter.lookupShading(Unknown Source)
at de.intarsys.pdf.content.CSDeviceBasedInterpreter.render_sh(Unknown Source)
at de.intarsys.pdf.content.CSInterpreter.process(Unknown Source)
... 12 more

We resolved this issue by by skipping the render_sh operator which is not needed for text extraction


  • Anonymous - 2009-09-14

    File to reproduce error

  • mtraut

    mtraut - 2009-09-15

    we missed accepting streams as legal shading objects.

    thanks for this test document.

  • mtraut

    mtraut - 2009-09-15
    • priority: 5 --> 9
    • assigned_to: nobody --> eheck
    • status: open --> open-accepted
  • Elfi Heck

    Elfi Heck - 2009-10-28

    A note to users of JPodRenderer: please be aware that while shadings with types 4 to 7 don't cause an exception anymore we still don't have code to render them correctly.

  • Elfi Heck

    Elfi Heck - 2009-10-28
    • status: open-accepted --> open-fixed
  • Elfi Heck

    Elfi Heck - 2009-10-28
    • status: open-fixed --> closed-fixed

Log in to post a comment.