Menu

#869 XMP Export Privacy Settings Ignored

next release
closed
nobody
Export (66)
5
2015-05-20
2010-04-12
jwq
No

JabRef 2.5; Windows XP sp3

I tried to exclude certain fields from XMP export into PDF files by checking the "Do not write the following fields to XMP metadata", but see (using a hex editor to examine the file) that the fields from the "Fields to filter" list are still written to the PDF.

Also, new XMP data appears to be appended to old data which means that once privacy is lost by writing XMP data to a PDF it can't be regained other than by deleting the PDF. Perhaps this is a separate bug?

Discussion

  • Nobody/Anonymous

    This bug seems to exist also in 2.6 and 2.4.2.

     
  • Ambrogio Oliva

    Ambrogio Oliva - 2012-01-07

    I've investigated a little bit the problem, which is still there in versions 2.7.2 and 2.8b, and I've found that the preferences are honored for the metadata written in the “http://jabref.sourceforge.net/bibteXMP/” namespace, however the metadata are also written as Dublin Core elements in the “http://purl.org/dc/elements/1.1/” namespace and as custom properties, and there also the filtered fields are written.

    To reproduce the behavior:
    1) Create a blank PDF (e.g. print a blank page on a PDF printer, or export
    2) In JabRef create a bib file and with an entry linked to the blank PDF
    3) Fill the entry fiends with some text, in particular those that should be filtered.
    4) Save the database
    5) In JabRef Options>Preferences check the “Do not write the following fields to XMP Metadata” tick box in the “XMP Metadata” tab.
    6) Write the metadata to the PDF file: Tools>Write XMP-Metadata to PDFs
    7) Open the PDF with a text editor to inspect the metadata sections.

    To check the XMP metadata in Windows the freeware “PDF-XChange Viewer” could be used. The custom properties could be inspected in the properties dialog of Acrobat Reader.

    My suggestions are:
    - Let the user choose which fields are to be written in metadata (with a sensible default) and in which namespace.
    - Give the user a mechanism to clear the metadata, optionally with a preview.

     
  • C Bhushan

    C Bhushan - 2014-04-20

    Unfortunately, this bug still exists in version Version 2.10 (released March 11th, 2014). I could reproduce the bug on Linux (Ubuntu 12.04). The XMP metadata was inspected using command line tools pdfinfo and exiftool.

    It would be great to have this fixed. Thanks.

     
    • Adrian Daerr

      Adrian Daerr - 2014-04-22

      Could you please be more specific as to what is still buggy ? Are
      field written to the XMP-metadata section of the PDF file which should
      not ? Are previously written changes not cleared upon a change of
      preferences and re-writing of the XMP data ? What exactly goes wrong ?

      I tried to reproduce the bug: I created a new article entry in
      2.10dev, linked it to a random PDF file on my hard drive and
      XMP-tagged the file. None of the fields listed in the XMP Export
      Privacy Settings were written. Then I added "bibtexkey" to the list of
      fields not to be written, in Preferences->XMP Export Privacy Settings.
      Re-tagging the PDF file cleared the bibtexkey from its XMP metadata
      (both from the dc and bibtex namespaces). The bug described above by
      Ambrogio Oliva (and at the end of the OP) therefore appears to be
      solved (since commit 5fe12834, merged 2013-03-12).

      If you do see a different behaviour, please provide step-by-step
      instructions on how to reproduce it.

      This being said, there is a bug in the current tagging code, albeit
      not with the XMP metadata: all bibtex keys appear to be written to
      the PDF metadata object (the one containing the PDF
      /Author,/Title,... information) ! I am quite sure this bug is somewhat
      new, but I won't have time to investigate for some time. If this is
      the private information leakage that you found, may I ask you to open
      a new bug report ? It is a serious bug, but I think it is unrelated to the
      XMP-metadata code.

       
      • Adrian Daerr

        Adrian Daerr - 2014-04-23

        Nevermind the two last phrases in my previous comment. The bug was triggered by a menu entry called "Write XMP metadata to PDFs", and by that alone was related to the XMP-code (even if, as I explained, the buggy filtering did not pertain to the XMP part, but the PDF-specific metadata).

        Anyhow, I did finally look into the matter and found it quick to fix. The corrected JabRef version can be cloned from
        https://github.com/adaerr/jabref.git
        or you can wait for the maintainers to pull the commits into the original project. It would be helpfull to have your confirmation that the bug is indeed corrected, so that we can close this bug.

        The following is a copy of my pull-request, for those interested in more details about the changes:

        PDF-file metadata: Privacy Filtering all metadata

        This pull-request pertains to the addition of metadata to PDF files associated with entries, as triggered by the menu entry "Write XMP metadata to PDFs" in the "Tools" menu. XMP is an extremely interesting feature that allows tagging PDF files (amongst others) with automatically retrievable metadata in much the same way mp3-tags allow adding title/author/... information to mp3 music files. Actually JabRef exports the metadata not only to two XMP namespaces (Dublin Core and a custom JabRef namespace), but also to the PDF DocumentInformation Object.

        Practically from the beginning of the XMP-writing capabilities of JabRef, Christopher Oezbek had added privacy filtering for the XMP-tagging of PDF-files with data from the bibtex-record, meaning that the user could define a list of fields (in Preferences->XMP metadata) which should not be exported to the PDF file. Unfortunately, the filtering was incomplete: jabref exports the metadata in three different forms, only one of which was originally filtered. In 2013 filtering was extended to both XMP namespaces, but JabRef still exported all fields into the PDF DocumentInfo object. The two present commits correct this problem. The first (b45316f) prevents private fields from being exported to the PDF DocumentInfo. The second one more agressively erases these fields even if they already exist in the PDF document.

        The deletion of existing fields might be debateable. It seems the right thing to do for fields clearly generated by JabRef (viz. those prefixed by "jabref/"), but there are four fields which might be of other origin (Author,Title,Subject and Keywords). Making a systematic exception for these four fields, i.e. not erasing them even if they are privacy filtered, is a bad idea and violates the principle of least surprise. This is why the second commit makes no exception. Deactivating the erasure for the four generic fields could however easily be added as an option in the XMP export preferences if it is judged important. The current behaviour has the advantage of reliably correcting PDF files previously tagged with a buggy privacy filtering.

        If these commits are pulled into the master branch and confirmed to work, the bug #869 on the sourceforge tracker:
        https://sourceforge.net/p/jabref/bugs/869/
        can be closed.

         
  • Oliver Kopp

    Oliver Kopp - 2015-05-20
    • status: open --> closed
    • Group: --> next release
     
  • Oliver Kopp

    Oliver Kopp - 2015-05-20

    Thanks for the PR. Already integrated in JabRef 2.11 beta.

     

Log in to post a comment.