Menu

PMT - close but no cigar?

2002-05-12
2002-05-22
  • Nobody/Anonymous

    We have been evaluating the Picture Metadata Toolkit in the hope it could be used for accessing and managing meta data from images in a photographic application.

    Clearly a lot of work has been done on the toolkit, and I can see that it has the potential to be the primary interface of choice for meta data.

    However, as it stands, PMT seems to suffer from a number of critical flaws that limits usability.

    We would like to use the PMT toolkit if these problems can be overcome (or because I might be missing something obvious & not seeing solutions to the problems we ran into).

    So any comments on the following would be appreciated:

    Critical: exception when reading unknown meta data
    ---------
    PMT seems to be built to a "fail completely" approach instead of a "fail safe" approach. Specifically, if meta data is not exactly to its liking, an exception is thrown and *none* of the meta data returned.

    For example, the ITPC/NAA (0x83BB) tag returns long data (Nikon images et al). But PMT expects 8 bit values for ITPC_NAA, so throws an exception for this tag.  It would be OK if the tag was not read, but the result is that any image file with the ITPC/NAA tag can't be read by PMT.

    Granted I could modify the source code to fix this, but perusal of the code suggests that this problem is right through the PMT code.

    The problem is that images come from a wide range of sources.  It is very likely that some images will have data that is not understood by PMT, or invalid by PMT's definition.  The current approach in PMT is that all the data must be perfect, otherwise none is returned.

    I feel a better approach is to take the philosphy of "if we can return any data, we will" rather than "if there is anything at all we don't understand, we will return no data".

    PMT failed (threw an exception) on about 30% of the different TIFF / JPEG files we tried it on (from different cameras/scanners/etc).  So this problem looks to be quite serious.

    Critical: Most meta data is not be read by PMT
    ---------
    In tests, I found that PMT returned 40% to 60% of the meta data present in files.  Specifically, it makes no attempt to read manufacturer specific data.  This is a major problem.  Nikon, Canon, Sony, Adobe and others all put a lot of the critical meta data inside their own tags.  These tags are well know, yet PMT simply returns the data as an an undecoded block of manufacturer specific data.

    Whilst it is true that much of this data is not in the open specs, the fact remains that Nikon and Canon alone account for a major portion of professional digital camera imagery, and it is essential to be able to extract their meta data.

    The limited approach taken by PMT in my view defeats the major purpose for the toolkit.

    It is worth comparing output from ExifReader et al to PMT - the difference in valuable information returned is significant.

    Question: Am I right in thinking that PMT can be extended to add new tags? What about manufacturer specific tags which typically have the manufacturer name as the first characters (e.g. Nikon) and then the rest of the data as an sub tiff tag.  Can these be added just by extending the schema, or are software code changes required?

    I ran into a number of other problems, all of which can be worked around. The above two are show stoppers for us, and probably would also be for other software vendors.

    Any feedback or suggestions would be greatly appreciated, as if we can overcome the two critical issues, PMT would be our toolkit of choice for meta data.

    Thanks!

    Stuart

     
    • George Sotak

      George Sotak - 2002-05-13

      Stuart,

      Thank you for taking the time to evaluate PMT and, more importantly, provide the feedback.

      I'll address the two critical issues you have identified. Regarding the throwing of errors while accessing metadata. PMT Accessors have two modes of operations: aborting a read and throwing an error and logging an error and continuing a read. The mode is set through the "throwErrors()" method on the PmtAccessor class, e.g., acc->throwErrors() = false; will prevent errors from being thrown. Also, not throwing errors is the default mode, so perhaps you have exposed a bug.

      Regarding the second critical error, the default set of definitions purposely covers the standard set of metadata definitions across Exif and Tiff (plus a few important extras). More importantly, PMT is completely extensible -- what is needed is better documentation regarding this fact. In fact, adding new metadata definitions can generally be accomplished without programming changes, one need only supply a schema (the metadata definitions) and a translation table (which states how to access the metadata in Exif and Tiff). Part of the translation table is a reference to a type translator -- Pmt provides built-in translators for all the primitive data types (uint8, int8, etc., and vectors of all the primitive types). If the metadata stored in the file is structured, then you are faced with the only situation in which you will need to write code, i.e., you will need to write a translator for the structured metadata.

      Note that Pmt currently does not support two schema constructs that would make adding new definitions relatively straight forward: "include" and "redefine". These are at the top of our list for implementation.

      Hope this is helpful and please continue the dialog.

      - George

       
    • Nobody/Anonymous

      George,

      Thanks for your detailed response.  And once again, let
      me emphasize that the work on PMT and OpenTIFF that you
      and others have authored is excellent. So please accept
      my comments in this light.

      First, regarding the exceptions:
      Although throwErrors() was false, I found a number of
      places where errors are being thrown without testing
      against throwErrors() first.  From memory line 251
      of PmtTiffAccessor.cpp bit me, and a check of source
      showed quite a few others without tests.  If the library
      took a more defensive approach to errors (the "I really
      *won't* fail unless the hard disk crashes and the CPU
      melts down kind of positive approach" :-) it will help
      when parsing all those weird and wonderful image files
      out there.

      Regarding the defaults tags being parsed:
      IMHO, it would be ideal if the library tried to read
      every possible tag from every possible sub-format
      (particularly from "maker notes"), as much of the
      critical information is buried in those elements.
      Much of this information is available (exifdump.py
      does a good job on many of them), and it will I feel
      help increase the appeal and wide spread use of PMT,
      so hopefully we might end up with some standards in
      this crazy area of EXIF/TIFF/EP/ITPT/makernotes.

      I take your point that many of these can be fairly
      easily added to PMT, however people are more likely
      to consider alternatives if the standard PMT code
      defines only the "pure" subset of EXIF/TIFF tags.

      After looking closely at building on PMT, and contributing
      additional PmtAccessor classes to the code, I ended
      up writing my own EXIF code (which is why the delay
      in responding to your feedback).

      My requirements were:

      -    Very fast reading & access of tag info, so 1,000+
          images can be read quickly. This pretty
          much meant a direct coded approach (more in
          solving this problem later).

      -    Gather/scatter approach, where information
          stuck in all sorts of obscure places is
          gathered into a simple, clean, hierarchy.
          Writes in turn scatter it back out to wherever
          it has to hide in makernotes or whatnot.
          The idea was to present users & programmers
          the data in a very clean structure; never mind
          what magic had to happen to get it that way.

      -    Tracking down the thumbnail, regardless of where
          it might be hidden (for example buried in
          a sub element of a makernote), and presenting
          the thumbnail (if it exists) as a simple
          unpacked image regardless of source.

      -    Fast XML style reading/writing of meta data.

      -    A simple way to implement the thing without
          going mad.

      I ended up creating a simple spreadsheet which defined
      my structure, including where data comes from.
      A parser breaks the spreadsheet down into C++ classes,
      which are then compiled with a few hand coded helper classes,
      sorting the whole thing out with a minimum of pain.
      More importantly (for my application), reading
      and accessing image meta data is very fast.

      Anyway, I learnt more about meta tags than I ever
      really wanted to know, and gained even more respect
      for your work.

      Hope the PMT code continues to mature.

      Kind regards,

      Stuart

       

Log in to post a comment.

MongoDB Logo MongoDB