NullPointerException in PDPageNode.getAllKids
Brought to you by:
benlitchfield
The parser cannot seem to find the Pages object in files created with Acrobat Pro 9. A sample file is attached.
public static void main(String[] argv) throws Exception {
String name = "./test.pdf";
PDDocument doc = PDDocument.load(name);
doc.close();
PDPageNode root = doc.getDocumentCatalog().getPages();
ArrayList<PDPage> pages = new ArrayList<PDPage>();
root.getAllKids(pages);
System.out.println("pages.size() == "+pages.size());
}
Exception in thread "main" java.lang.NullPointerException
at org.pdfbox.pdmodel.PDPageNode.getAllKids(PDPageNode.java:194)
at org.pdfbox.pdmodel.PDPageNode.getAllKids(PDPageNode.java:182)
created with Acrobat 9 Pro, default settings
Logged In: YES
user_id=1693709
Originator: YES
This happens with the latest code from CVS and also in older versions.
Logged In: YES
user_id=853566
Originator: NO
We are experiencing the same problem. Offending pdf available if any of you need it (jwilson@nmcourt.fed.us). Looks like pdfbox does not support some new feature introduced in Acrobat 9.
Logged In: YES
user_id=1693709
Originator: YES
In Acrobat 8, the default was to generate PDFs following version 1.4 of the PDF specification. In Acrobat 9, the default is to to generate PDFs following version 1.5 of the PDF specification. PDF1.5 has objects known as cross-reference streams and it turns out that PDFBox does not parse them correctly.
I can confirm foundart's comments - Acrobat 9 is indeed using XRef streams. This is going to become a pretty big problem as Adobe 9 is adopted.
Information on cross reference streams is here:
Section 7.5.8 of http://www.adobe.com/devnet/acrobat/pdfs/PDF32000_2008.pdf
long and short: the xref parser is going to need to be enhanced to do a stream read
I'd also like to confirm this issue. We have a custom web server application that uses PDFbox to merge FDF's with template PDF contracts and after upgrading to Acrobat 9, the java server began throwing NullPointer Exception's with the following messages:
[#|2008-11-03T07:13:45.375-0600|INFO|sun-appserver-ee8.2|javax.enterprise.system.stream.out|_ThreadID=15;| java.lang.NullPointerException
at org.pdfbox.pdmodel.PDPageNode.getAllKids(PDPageNode.java:194)
at org.pdfbox.pdmodel.PDPageNode.getKids(PDPageNode.java:171)
at org.pdfbox.pdmodel.PDPageNode.updateCount(PDPageNode.java:90)
at org.pdfbox.pdmodel.PDDocument.save(PDDocument.java:606)
at org.pdfbox.pdmodel.PDDocument.save(PDDocument.java:592)
I'm hoping that this problem is being reviewed, but I see that the priority setting is only at 5. We are going to have to revert back to Acrobat 6 since that was our previous version in order to get our contracts working again. I can provide several pdf documents created with Acrobat 9 that failed if examples are necessary.
Thanks!
PDFBox has moved to Apache. Bugs have been moved over to the Apache bug tracking system. If you don't see the bug and it's still not fixed in the current release then please create a new bug on the Apache site.
http://pdfbox.apache.org