#6 Reading and Editing of Metadata

closed
None
5
2011-04-19
2010-03-21
Torian Ironfist
No

It would be great to be able to use this library to read metadata from pdf files. I have a large collection of pdf's which I was trying to create a library system to keep them more organised. Most of them have keywords in the pdf metadata which I was hoping to search.

This would be a great feature.

Thanks

Discussion

    • assigned_to: nobody --> stechio
     
  • Metadata are already available in current version (PDF Clown 0.0.7):

    - Information class [PDF:1.7:10.2.1] provides access to the document keywords, this way:
    document.getInformation().getKeywords();

    - metadata streams [PDF:1.7:10.2.2] aren't explicitly implemented yet, but PDF Clown's flexible architecture allows you to get them through a quite easy operation:
    PdfStream metadataStream = (PdfStream)File.resolve(document.getBaseDataObject().get(new PdfName("Metadata")));
    Buffer metadataBuffer = metadataStream.getBody();
    String metadata = metadataBuffer.readString((int)metadataBuffer.getLength());
    After you got the metadata string (which is XMP/XML), you can feed it to an XML parser in order to extract what you need.
    Metadata streams will be explicitly supported in a future release (maybe 0.0.9).

     
  • I'm happy to announce that support to metadata streams (XMP) has been implemented and committed to the HEAD revision of the project's SVN repo; it will be released in the next 0.1.1 version.

     
    • status: open --> closed