#6 Reading and Editing of Metadata


It would be great to be able to use this library to read metadata from pdf files. I have a large collection of pdf's which I was trying to create a library system to keep them more organised. Most of them have keywords in the pdf metadata which I was hoping to search.

This would be a great feature.



  • Stefano Chizzolini

    • assigned_to: nobody --> stechio
  • Stefano Chizzolini

    Metadata are already available in current version (PDF Clown 0.0.7):

    - Information class [PDF:1.7:10.2.1] provides access to the document keywords, this way:

    - metadata streams [PDF:1.7:10.2.2] aren't explicitly implemented yet, but PDF Clown's flexible architecture allows you to get them through a quite easy operation:
    PdfStream metadataStream = (PdfStream)File.resolve(document.getBaseDataObject().get(new PdfName("Metadata")));
    Buffer metadataBuffer = metadataStream.getBody();
    String metadata = metadataBuffer.readString((int)metadataBuffer.getLength());
    After you got the metadata string (which is XMP/XML), you can feed it to an XML parser in order to extract what you need.
    Metadata streams will be explicitly supported in a future release (maybe 0.0.9).

  • Stefano Chizzolini

    I'm happy to announce that support to metadata streams (XMP) has been implemented and committed to the HEAD revision of the project's SVN repo; it will be released in the next 0.1.1 version.

  • Stefano Chizzolini

    • status: open --> closed

Log in to post a comment.