Menu

#29 System.Xml.XmlException when attempting to parse v1.2 PDF

0.1.2.1
wont-fix
XMP (1)
5
2015-04-28
2012-02-21
No

PDF Clown v0.1.1.0.
Document:
PDF Version: v1.2 (Acrobat 3.x).
PDF Producer: Acrobat Distiller 5.0.2 for Macintosh.
When attempting to access this property:
pdfDoc.Metadata.Content
It reports the following:
\'pdfDoc.Metadata.Content\' threw an exception of type \'System.Xml.XmlException\'
Message: \"\'dc\' is an undeclared namespace. Line 4, position 2.\"

Unable to upload PDF doc as it is 3MB and SourceForge limit is 256KB - can email directly on request or upload to share site.
Another PDF document with the same PDF version, but created with a different \'producer\' is OK.

Discussion

  • Stefano Chizzolini

    Apparently your XMP serialization is invalid as it omitted to bind the "dc" namespace prefix (which is typically associated to Dublin Core metadata) to its namespace declaration, like this:

    <x:xmpmeta
      xmlns:x="adobe:ns:meta/"
      x:xmptk="XMP Core 5.4.0"
      >
      <rdf:RDF
        xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" 
        xmlns:dc="http://purl.org/dc/elements/1.1/"
        >
    . . .
    

    If that's the case, the problem is up to the file producer -- in order to work around this parsing issue, you have to programmatically get the metadata stream and read its contents with a more relaxed parser:

    import org.pdfclown.bytes.IBuffer;
    import org.pdfclown.objects.PdfStream;
    
    PdfStream metadataStream = 
    (PdfStream)document.getBaseDataObject().resolve(PdfName.Metadata);
    IBuffer contentBody = metadataStream.getBody();
    . . . // Read the buffer using your parser.
    
     
  • Stefano Chizzolini

    • labels: --> XMP
    • status: open --> wont-fix
    • assigned_to: Stefano Chizzolini
    • Group: --> 0.1.2.1
     

Log in to post a comment.