JabRef / Bugs / #869 XMP Export Privacy Settings Ignored

Ken - 2010-04-13

The issue of fields being written twice is related to:
BUG: Avoid rewriting XMP metadata when it's not necessary - ID: 2940625
https://sourceforge.net/tracker/index.php?func=detail&aid=2940625&group_id=92314&atid=600306

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Nobody/Anonymous - 2010-05-17

This bug seems to exist also in 2.6 and 2.4.2.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Ambrogio Oliva - 2012-01-07

I've investigated a little bit the problem, which is still there in versions 2.7.2 and 2.8b, and I've found that the preferences are honored for the metadata written in the “http://jabref.sourceforge.net/bibteXMP/” namespace, however the metadata are also written as Dublin Core elements in the “http://purl.org/dc/elements/1.1/” namespace and as custom properties, and there also the filtered fields are written.

To reproduce the behavior:
1) Create a blank PDF (e.g. print a blank page on a PDF printer, or export
2) In JabRef create a bib file and with an entry linked to the blank PDF
3) Fill the entry fiends with some text, in particular those that should be filtered.
4) Save the database
5) In JabRef Options>Preferences check the “Do not write the following fields to XMP Metadata” tick box in the “XMP Metadata” tab.
6) Write the metadata to the PDF file: Tools>Write XMP-Metadata to PDFs
7) Open the PDF with a text editor to inspect the metadata sections.

To check the XMP metadata in Windows the freeware “PDF-XChange Viewer” could be used. The custom properties could be inspected in the properties dialog of Acrobat Reader.

My suggestions are:
- Let the user choose which fields are to be written in metadata (with a sensible default) and in which namespace.
- Give the user a mechanism to clear the metadata, optionally with a preview.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

C Bhushan - 2014-04-20

Unfortunately, this bug still exists in version Version 2.10 (released March 11th, 2014). I could reproduce the bug on Linux (Ubuntu 12.04). The XMP metadata was inspected using command line tools pdfinfo and exiftool.

It would be great to have this fixed. Thanks.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Adrian Daerr - 2014-04-22
  
  Could you please be more specific as to what is still buggy ? Are
  field written to the XMP-metadata section of the PDF file which should
  not ? Are previously written changes not cleared upon a change of
  preferences and re-writing of the XMP data ? What exactly goes wrong ?
  
  I tried to reproduce the bug: I created a new article entry in
  2.10dev, linked it to a random PDF file on my hard drive and
  XMP-tagged the file. None of the fields listed in the XMP Export
  Privacy Settings were written. Then I added "bibtexkey" to the list of
  fields not to be written, in Preferences->XMP Export Privacy Settings.
  Re-tagging the PDF file cleared the bibtexkey from its XMP metadata
  (both from the dc and bibtex namespaces). The bug described above by
  Ambrogio Oliva (and at the end of the OP) therefore appears to be
  solved (since commit 5fe12834, merged 2013-03-12).
  
  If you do see a different behaviour, please provide step-by-step
  instructions on how to reproduce it.
  
  This being said, there is a bug in the current tagging code, albeit
  not with the XMP metadata: all bibtex keys appear to be written to
  the PDF metadata object (the one containing the PDF
  /Author,/Title,... information) ! I am quite sure this bug is somewhat
  new, but I won't have time to investigate for some time. If this is
  the private information leakage that you found, may I ask you to open
  a new bug report ? It is a serious bug, but I think it is unrelated to the
  XMP-metadata code.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Adrian Daerr - 2014-04-23
    
    Nevermind the two last phrases in my previous comment. The bug was triggered by a menu entry called "Write XMP metadata to PDFs", and by that alone was related to the XMP-code (even if, as I explained, the buggy filtering did not pertain to the XMP part, but the PDF-specific metadata).
    
    Anyhow, I did finally look into the matter and found it quick to fix. The corrected JabRef version can be cloned from
    https://github.com/adaerr/jabref.git
    or you can wait for the maintainers to pull the commits into the original project. It would be helpfull to have your confirmation that the bug is indeed corrected, so that we can close this bug.
    
    The following is a copy of my pull-request, for those interested in more details about the changes:
    
    PDF-file metadata: Privacy Filtering all metadata
    
    This pull-request pertains to the addition of metadata to PDF files associated with entries, as triggered by the menu entry "Write XMP metadata to PDFs" in the "Tools" menu. XMP is an extremely interesting feature that allows tagging PDF files (amongst others) with automatically retrievable metadata in much the same way mp3-tags allow adding title/author/... information to mp3 music files. Actually JabRef exports the metadata not only to two XMP namespaces (Dublin Core and a custom JabRef namespace), but also to the PDF DocumentInformation Object.
    
    Practically from the beginning of the XMP-writing capabilities of JabRef, Christopher Oezbek had added privacy filtering for the XMP-tagging of PDF-files with data from the bibtex-record, meaning that the user could define a list of fields (in Preferences->XMP metadata) which should not be exported to the PDF file. Unfortunately, the filtering was incomplete: jabref exports the metadata in three different forms, only one of which was originally filtered. In 2013 filtering was extended to both XMP namespaces, but JabRef still exported all fields into the PDF DocumentInfo object. The two present commits correct this problem. The first (b45316f) prevents private fields from being exported to the PDF DocumentInfo. The second one more agressively erases these fields even if they already exist in the PDF document.
    
    The deletion of existing fields might be debateable. It seems the right thing to do for fields clearly generated by JabRef (viz. those prefixed by "jabref/"), but there are four fields which might be of other origin (Author,Title,Subject and Keywords). Making a systematic exception for these four fields, i.e. not erasing them even if they are privacy filtered, is a bad idea and violates the principle of least surprise. This is why the second commit makes no exception. Deactivating the erasure for the four generic fields could however easily be added as an option in the XMP export preferences if it is judged important. The current behaviour has the advantage of reliably correcting PDF files previously tagged with a buggy privacy filtering.
    
    If these commits are pulled into the master branch and confirmed to work, the bug #869 on the sourceforge tracker:
    https://sourceforge.net/p/jabref/bugs/869/
    can be closed.
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Oliver Kopp - 2015-05-20

status: open --> closed

Group: --> next release
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Oliver Kopp - 2015-05-20

Thanks for the PR. Already integrated in JabRef 2.11 beta.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

XMP Export Privacy Settings Ignored

JabRef is a graphical application for managing bibliographical data

Group

Searches

Help

#869 XMP Export Privacy Settings Ignored

Discussion