Re: [Podofo-users] PoDoFo cannot load some PDFs with XMP metadata
A PDF parsing, modification and creation library.
Brought to you by:
domseichter
|
From: Leonard R. <lro...@ad...> - 2014-04-18 01:56:27
|
I haven’t looked at the code in Load() in a while, but I can’t think of a reason that it is loading the XMP. Are you SURE that this where the exception is being thrown - XMP parsing?? I would think it’s somewhere else in the PDF. Can you post a sample? Leonard On 4/17/14, 9:47 PM, "Uli Zappe" <ul...@ri...> wrote: >Hi, > >I found that PoDoFo 0.9.2. cannot load some PDFs with XMP metadata. > >In these cases, when PoDoFo::PdfMemDocument::Load() is called (I didn't >try reading from a file), an exception is thrown with the error message >"A number was expected but not found." > >Upon closer inspection of the affected PDFs, I found that in all these >cases, the XMP packet header > > <?xpacket begin='' id='W5M0MpCehiHzreSzNTczkc9d'?> > >contains no character between the quotes of > > begin='' > >This is allowed according to Adobe's XMP specification. In >http://wwwimages.adobe.com/www.adobe.com/content/dam/Adobe/en/devnet/xmp/p >dfs/cc-201306/XMPSpecificationPart1.pdf on page 11, it says: > >> the character [the character between the quotes, U.Z.] represents >>the Unicode character U+FEFF used as a byte-order marker. The U+FEFF may >>be omitted from the begin="". > > >The current version of PoDoFo, however, stumbles as soon as there is no >character between the quotes. > >NOTE: Also contrary to Adobe's XMP specification (and the behavior of >Adobe's XMP Toolkit SDK), PoDoFo seems to allow *any* character between >the quotes of begin='', not only U+FEFF. However, I do feel that this is >a *good thing* because it makes PoDoFo more fault tolerant. There seem to >be several applications out there to write XMP data to PDFs, which don't >get the packet header's U+FEFF character right. As a result, Adobe's XMP >Toolkit SDK cannot read the XMP data of PDFs edited with these >applications, while PoDoFo can. I consider this an advantage that should >not be removed. (Specifically, there is a Java application from 2008 >(still linked from creativecommons.org) that writes Creative Commons >licenses in such an incorrect XMP format to PDF files. PoDoFo recognizes >these licenses, Adobe's XMP Toolkit does not.) > >Bye > >Uli >-------------------------------------------------------------------------- >---- >Learn Graph Databases - Download FREE O'Reilly Book >"Graph Databases" is the definitive new guide to graph databases and their >applications. Written by three acclaimed leaders in the field, >this first edition is now available. Download your free book today! >http://p.sf.net/sfu/NeoTech >_______________________________________________ >Podofo-users mailing list >Pod...@li... >https://lists.sourceforge.net/lists/listinfo/podofo-users |