Menu

#471 Error getting pdf version

closed-out-of-date
parsing (91)
5
2010-04-07
2007-10-30
Anonymous
No

java.io.IOException: Error getting pdf version:java.lang.NumberFormatException: For input string: "-"
at org.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:166)
at org.pdfbox.pdmodel.PDDocument.load(PDDocument.java:707)
at org.pdfbox.pdmodel.PDDocument.load(PDDocument.java:691)
at org.pdfbox.pdmodel.PDDocument.load(PDDocument.java:633)
at test.pdfbox.pdfparser.TestPDFParser.test_exception_version1(TestPDFParser.java:112)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at junit.framework.TestCase.runTest(TestCase.java:154)
at junit.framework.TestCase.runBare(TestCase.java:127)
at junit.framework.TestResult$1.protect(TestResult.java:106)
at junit.framework.TestResult.runProtected(TestResult.java:124)
at junit.framework.TestResult.run(TestResult.java:109)
at junit.framework.TestCase.run(TestCase.java:118)

Discussion

  • Nobody/Anonymous

     
  • Nobody/Anonymous

    Logged In: NO

    Tested on 0.7.2, 0.7.3, latest 0.7.4-2007-10-22

     
  • Nobody/Anonymous

    Logged In: NO

    Debugged it with a hex dump on the submitted file
    ---
    Appears that the Version started at office 0x80 instead of the first line.
    AdobeReader 7.x appears to have skipped to the right version and display the rest properly.

    So I think something needs to be done with PDFParser::parse() version checking.

    00000000: 001f 3339 3339 202d 2057 4648 202d 2050 ..3939 - WFH - P
    00000010: 7265 7020 666f 2331 3533 3245 332e 7064 rep fo#1532E3.pd
    00000020: 6600 0000 0000 0000 0000 0000 0000 0000 f...............
    00000030: 0000 0000 0000 0000 0000 0000 0000 0000 ................
    00000040: 0000 0000 0000 0000 0000 0000 0000 0000 ................
    00000050: 0000 0000 0300 2100 0000 00c2 550d 05c2 ......!.....U...
    00000060: 550d 0500 0000 0000 0000 0000 0000 0000 U...............
    00000070: 0000 0000 0000 0000 0000 8181 af49 0000 .............I..
    00000080: 2550 4446 2d31 2e33 0a25 c4e5 f2e5 eba7 %PDF-1.3.%......
    00000090: f3a0 d0c4 c60a 3220 3020 6f62 6a0a 3c3c ......2 0 obj.<<

     
  • Nobody/Anonymous

    Logged In: NO

    Someone can put a better more throughtful fix in.
    Here is what I did to fix it.

    PDFParser.java:

    public void parse() throws IOException
    {
    try
    {
    if ( raf == null )
    {
    checktmpDir();
    document = new COSDocument( tempDirectory );
    }
    else
    {
    document = new COSDocument( raf );
    }
    setDocument( document );
    findVersion(); // New method see below.
    // Code to find version moved to method findVersion();
    skipHeaderFillBytes();
    Object nextObject;
    [...]

    ----

    /**
    * Attempt to find version in the following form %PDF-<number><0a|0d>%
    * @throws IOException
    */
    private void findVersion() throws IOException
    {
    String header = null;
    // try 5 lines to get PDF Version.
    for ( int i = 0; i < 5; i++) {
    header = readLine();

    //sometimes there are some garbage bytes in the header before the header
    //actually starts, so lets try to find the header first.
    int headerStart = header.indexOf( PDF_HEADER );

    //greater than zero because if it is zero then
    //there is no point of trimming
    if( headerStart > 0 )
    {
    //trim off any leading characters
    header = header.substring( headerStart, header.length() );
    } else if (headerStart < 0)
    continue; // Did not find the Header Go look at next line

    document.setHeaderString( header );
    try
    {
    float pdfVersion = Float.parseFloat(
    header.substring( PDF_HEADER.length(), Math.min( header.length(), PDF_HEADER.length()+3) ) );
    document.setVersion( pdfVersion );
    return; // Express return.
    }
    catch( NumberFormatException e )
    {
    throw new IOException( "Error getting pdf version: " + header + "\n" + e );
    }
    }
    throw new IOException( "Unable to find version");
    }
    ----

     
  • Ben Litchfield

    Ben Litchfield - 2010-04-07
    • status: open --> closed-out-of-date
     
  • Ben Litchfield

    Ben Litchfield - 2010-04-07

    PDFBox has moved to Apache. Bugs have been moved over to the Apache bug tracking system. If you don't see the bug and it's still not fixed in the current release then please create a new bug on the Apache site.

    http://pdfbox.apache.org

     

Log in to post a comment.

MongoDB Logo MongoDB