PDF Clown / Bugs / #29 System.Xml.XmlException when attempting to parse v1.2 PDF

#29 System.Xml.XmlException when attempting to parse v1.2 PDF

Milestone: 0.1.2.1

Status: wont-fix

Owner: Stefano Chizzolini

Labels: XMP (1)

Priority: 5

Updated: 2015-04-28

Created: 2012-02-21

Creator: JazzyFizzles

Private: No

PDF Clown v0.1.1.0.
Document:
PDF Version: v1.2 (Acrobat 3.x).
PDF Producer: Acrobat Distiller 5.0.2 for Macintosh.
When attempting to access this property:
pdfDoc.Metadata.Content
It reports the following:
\'pdfDoc.Metadata.Content\' threw an exception of type \'System.Xml.XmlException\'
Message: \"\'dc\' is an undeclared namespace. Line 4, position 2.\"

Unable to upload PDF doc as it is 3MB and SourceForge limit is 256KB - can email directly on request or upload to share site.
Another PDF document with the same PDF version, but created with a different \'producer\' is OK.

Discussion

Stefano Chizzolini - 2015-04-28

Apparently your XMP serialization is invalid as it omitted to bind the "dc" namespace prefix (which is typically associated to Dublin Core metadata) to its namespace declaration, like this:

<x:xmpmeta xmlns:x="adobe:ns:meta/" x:xmptk="XMP Core 5.4.0" > <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dc="http://purl.org/dc/elements/1.1/" > . . .

If that's the case, the problem is up to the file producer -- in order to work around this parsing issue, you have to programmatically get the metadata stream and read its contents with a more relaxed parser:

import org.pdfclown.bytes.IBuffer; import org.pdfclown.objects.PdfStream; PdfStream metadataStream = (PdfStream)document.getBaseDataObject().resolve(PdfName.Metadata); IBuffer contentBody = metadataStream.getBody(); . . . // Read the buffer using your parser.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Stefano Chizzolini - 2015-04-28

labels: --> XMP

status: open --> wont-fix

assigned_to: Stefano Chizzolini

Group: --> 0.1.2.1
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

System.Xml.XmlException when attempting to parse v1.2 PDF

General-Purpose PDF Library for Java and .NET

Group

Searches

Help

#29 System.Xml.XmlException when attempting to parse v1.2 PDF

Discussion