#3 Error reading an PDF with Image in COSStream

closed-fixed
Elfi Heck
None
5
2009-10-19
2009-10-16
Andreas Haufler
No

When reading the attached PDF the exception below is thrown.

The reason and fix for this is, that before the EI there is neither a LF nor a blank but a CR character. The released version of jpob can be fixed by adding || (next == 13) to the if:

CSContentParser (303-310):

/*
* spec is not clear but some internet articles claim that before
* "EI" a line break is required but spaces have been seen in real
* world documents; accept LF and space as possible end and check if
* valid operation follows.
*/
if ((next == '\n') || (next == ' ') || (next == 13)) {
// remember position

Exception:
de.intarsys.pdf.cos.COSRuntimeException: de.intarsys.pdf.parser.COSLoadError: EI expected at character index 18420
at de.intarsys.pdf.content.CSContent.createFromBytes(CSContent.java:111)
at de.intarsys.pdf.content.CSContent.createFromCos(CSContent.java:125)
at de.intarsys.pdf.pd.PDPage.getContentStream(PDPage.java:389)
at scireum.common.pdf.content.ExtractText.extractText(ExtractText.java:34)
at scireum.common.pdf.content.ExtractText.extractText(ExtractText.java:57)
at scireum.common.pdf.ExtractTextTest.expectTermsNotInFile(ExtractTextTest.java:73)
at scireum.common.pdf.ExtractTextTest.testExtract(ExtractTextTest.java:40)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41)
at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28)
at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:31)
at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:73)
at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:46)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:180)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:41)
at org.junit.runners.ParentRunner$1.evaluate(ParentRunner.java:173)
at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28)
at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:31)
at org.junit.runners.ParentRunner.run(ParentRunner.java:220)
at org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:45)
at org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38)
at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:460)
at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:673)
at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:386)
at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:196)
Caused by: de.intarsys.pdf.parser.COSLoadError: EI expected at character index 18420
at de.intarsys.pdf.parser.CSContentParser.parseOperationEI(CSContentParser.java:404)
at de.intarsys.pdf.parser.CSContentParser.parseStream(CSContentParser.java:465)
at de.intarsys.pdf.parser.CSContentParser.parseStream(CSContentParser.java:433)
at de.intarsys.pdf.content.CSContent.createFromBytes(CSContent.java:107)
... 30 more

best regards
Andy

Discussion

  • Example file

     
    Attachments
  • Elfi Heck
    Elfi Heck
    2009-10-19

    • assigned_to: nobody --> eheck
     
  • Elfi Heck
    Elfi Heck
    2009-10-19

    I've now changed our code as you propose to also accept a single CR as delimiter. (Hoping this won't break other documents)

     
  • Elfi Heck
    Elfi Heck
    2009-10-19

    • status: open --> closed-fixed