PDFBox / Bugs / #471 Error getting pdf version

Nobody/Anonymous - 2007-10-30

exception_version1.pdf

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Nobody/Anonymous - 2007-10-30

Logged In: NO

Tested on 0.7.2, 0.7.3, latest 0.7.4-2007-10-22

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Nobody/Anonymous - 2007-10-30

Logged In: NO

Debugged it with a hex dump on the submitted file
---
Appears that the Version started at office 0x80 instead of the first line.
AdobeReader 7.x appears to have skipped to the right version and display the rest properly.

So I think something needs to be done with PDFParser::parse() version checking.

00000000: 001f 3339 3339 202d 2057 4648 202d 2050 ..3939 - WFH - P
00000010: 7265 7020 666f 2331 3533 3245 332e 7064 rep fo#1532E3.pd
00000020: 6600 0000 0000 0000 0000 0000 0000 0000 f...............
00000030: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00000040: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00000050: 0000 0000 0300 2100 0000 00c2 550d 05c2 ......!.....U...
00000060: 550d 0500 0000 0000 0000 0000 0000 0000 U...............
00000070: 0000 0000 0000 0000 0000 8181 af49 0000 .............I..
00000080: 2550 4446 2d31 2e33 0a25 c4e5 f2e5 eba7 %PDF-1.3.%......
00000090: f3a0 d0c4 c60a 3220 3020 6f62 6a0a 3c3c ......2 0 obj.<<

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Nobody/Anonymous - 2008-01-24

Logged In: NO

Someone can put a better more throughtful fix in.
Here is what I did to fix it.

PDFParser.java:

public void parse() throws IOException
{
try
{
if ( raf == null )
{
checktmpDir();
document = new COSDocument( tempDirectory );
}
else
{
document = new COSDocument( raf );
}
setDocument( document );
findVersion(); // New method see below.
// Code to find version moved to method findVersion();
skipHeaderFillBytes();
Object nextObject;
[...]

----

/**
* Attempt to find version in the following form %PDF-<number><0a|0d>%
* @throws IOException
*/
private void findVersion() throws IOException
{
String header = null;
// try 5 lines to get PDF Version.
for ( int i = 0; i < 5; i++) {
header = readLine();

//sometimes there are some garbage bytes in the header before the header
//actually starts, so lets try to find the header first.
int headerStart = header.indexOf( PDF_HEADER );

//greater than zero because if it is zero then
//there is no point of trimming
if( headerStart > 0 )
{
//trim off any leading characters
header = header.substring( headerStart, header.length() );
} else if (headerStart < 0)
continue; // Did not find the Header Go look at next line

document.setHeaderString( header );
try
{
float pdfVersion = Float.parseFloat(
header.substring( PDF_HEADER.length(), Math.min( header.length(), PDF_HEADER.length()+3) ) );
document.setVersion( pdfVersion );
return; // Express return.
}
catch( NumberFormatException e )
{
throw new IOException( "Error getting pdf version: " + header + "\n" + e );
}
}
throw new IOException( "Unable to find version");
}
----

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Ben Litchfield - 2010-04-07

status: open --> closed-out-of-date
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Ben Litchfield - 2010-04-07

PDFBox has moved to Apache. Bugs have been moved over to the Apache bug tracking system. If you don't see the bug and it's still not fixed in the current release then please create a new bug on the Apache site.

http://pdfbox.apache.org

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Error getting pdf version

Group

Searches

Help

#471 Error getting pdf version

Discussion