We have been evaluating the Picture Metadata Toolkit in the hope it could be used for accessing and managing meta data from images in a photographic application.
Clearly a lot of work has been done on the toolkit, and I can see that it has the potential to be the primary interface of choice for meta data.
However, as it stands, PMT seems to suffer from a number of critical flaws that limits usability.
We would like to use the PMT toolkit if these problems can be overcome (or because I might be missing something obvious & not seeing solutions to the problems we ran into).
So any comments on the following would be appreciated:
Critical: exception when reading unknown meta data
---------
PMT seems to be built to a "fail completely" approach instead of a "fail safe" approach. Specifically, if meta data is not exactly to its liking, an exception is thrown and *none* of the meta data returned.
For example, the ITPC/NAA (0x83BB) tag returns long data (Nikon images et al). But PMT expects 8 bit values for ITPC_NAA, so throws an exception for this tag. It would be OK if the tag was not read, but the result is that any image file with the ITPC/NAA tag can't be read by PMT.
Granted I could modify the source code to fix this, but perusal of the code suggests that this problem is right through the PMT code.
The problem is that images come from a wide range of sources. It is very likely that some images will have data that is not understood by PMT, or invalid by PMT's definition. The current approach in PMT is that all the data must be perfect, otherwise none is returned.
I feel a better approach is to take the philosphy of "if we can return any data, we will" rather than "if there is anything at all we don't understand, we will return no data".
PMT failed (threw an exception) on about 30% of the different TIFF / JPEG files we tried it on (from different cameras/scanners/etc). So this problem looks to be quite serious.
Critical: Most meta data is not be read by PMT
---------
In tests, I found that PMT returned 40% to 60% of the meta data present in files. Specifically, it makes no attempt to read manufacturer specific data. This is a major problem. Nikon, Canon, Sony, Adobe and others all put a lot of the critical meta data inside their own tags. These tags are well know, yet PMT simply returns the data as an an undecoded block of manufacturer specific data.
Whilst it is true that much of this data is not in the open specs, the fact remains that Nikon and Canon alone account for a major portion of professional digital camera imagery, and it is essential to be able to extract their meta data.
The limited approach taken by PMT in my view defeats the major purpose for the toolkit.
It is worth comparing output from ExifReader et al to PMT - the difference in valuable information returned is significant.
Question: Am I right in thinking that PMT can be extended to add new tags? What about manufacturer specific tags which typically have the manufacturer name as the first characters (e.g. Nikon) and then the rest of the data as an sub tiff tag. Can these be added just by extending the schema, or are software code changes required?
I ran into a number of other problems, all of which can be worked around. The above two are show stoppers for us, and probably would also be for other software vendors.
Any feedback or suggestions would be greatly appreciated, as if we can overcome the two critical issues, PMT would be our toolkit of choice for meta data.
Thanks!
Stuart
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Thank you for taking the time to evaluate PMT and, more importantly, provide the feedback.
I'll address the two critical issues you have identified. Regarding the throwing of errors while accessing metadata. PMT Accessors have two modes of operations: aborting a read and throwing an error and logging an error and continuing a read. The mode is set through the "throwErrors()" method on the PmtAccessor class, e.g., acc->throwErrors() = false; will prevent errors from being thrown. Also, not throwing errors is the default mode, so perhaps you have exposed a bug.
Regarding the second critical error, the default set of definitions purposely covers the standard set of metadata definitions across Exif and Tiff (plus a few important extras). More importantly, PMT is completely extensible -- what is needed is better documentation regarding this fact. In fact, adding new metadata definitions can generally be accomplished without programming changes, one need only supply a schema (the metadata definitions) and a translation table (which states how to access the metadata in Exif and Tiff). Part of the translation table is a reference to a type translator -- Pmt provides built-in translators for all the primitive data types (uint8, int8, etc., and vectors of all the primitive types). If the metadata stored in the file is structured, then you are faced with the only situation in which you will need to write code, i.e., you will need to write a translator for the structured metadata.
Note that Pmt currently does not support two schema constructs that would make adding new definitions relatively straight forward: "include" and "redefine". These are at the top of our list for implementation.
Hope this is helpful and please continue the dialog.
- George
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Thanks for your detailed response. And once again, let
me emphasize that the work on PMT and OpenTIFF that you
and others have authored is excellent. So please accept
my comments in this light.
First, regarding the exceptions:
Although throwErrors() was false, I found a number of
places where errors are being thrown without testing
against throwErrors() first. From memory line 251
of PmtTiffAccessor.cpp bit me, and a check of source
showed quite a few others without tests. If the library
took a more defensive approach to errors (the "I really
*won't* fail unless the hard disk crashes and the CPU
melts down kind of positive approach" :-) it will help
when parsing all those weird and wonderful image files
out there.
Regarding the defaults tags being parsed:
IMHO, it would be ideal if the library tried to read
every possible tag from every possible sub-format
(particularly from "maker notes"), as much of the
critical information is buried in those elements.
Much of this information is available (exifdump.py
does a good job on many of them), and it will I feel
help increase the appeal and wide spread use of PMT,
so hopefully we might end up with some standards in
this crazy area of EXIF/TIFF/EP/ITPT/makernotes.
I take your point that many of these can be fairly
easily added to PMT, however people are more likely
to consider alternatives if the standard PMT code
defines only the "pure" subset of EXIF/TIFF tags.
After looking closely at building on PMT, and contributing
additional PmtAccessor classes to the code, I ended
up writing my own EXIF code (which is why the delay
in responding to your feedback).
My requirements were:
- Very fast reading & access of tag info, so 1,000+
images can be read quickly. This pretty
much meant a direct coded approach (more in
solving this problem later).
- Gather/scatter approach, where information
stuck in all sorts of obscure places is
gathered into a simple, clean, hierarchy.
Writes in turn scatter it back out to wherever
it has to hide in makernotes or whatnot.
The idea was to present users & programmers
the data in a very clean structure; never mind
what magic had to happen to get it that way.
- Tracking down the thumbnail, regardless of where
it might be hidden (for example buried in
a sub element of a makernote), and presenting
the thumbnail (if it exists) as a simple
unpacked image regardless of source.
- Fast XML style reading/writing of meta data.
- A simple way to implement the thing without
going mad.
I ended up creating a simple spreadsheet which defined
my structure, including where data comes from.
A parser breaks the spreadsheet down into C++ classes,
which are then compiled with a few hand coded helper classes,
sorting the whole thing out with a minimum of pain.
More importantly (for my application), reading
and accessing image meta data is very fast.
Anyway, I learnt more about meta tags than I ever
really wanted to know, and gained even more respect
for your work.
Hope the PMT code continues to mature.
Kind regards,
Stuart
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
We have been evaluating the Picture Metadata Toolkit in the hope it could be used for accessing and managing meta data from images in a photographic application.
Clearly a lot of work has been done on the toolkit, and I can see that it has the potential to be the primary interface of choice for meta data.
However, as it stands, PMT seems to suffer from a number of critical flaws that limits usability.
We would like to use the PMT toolkit if these problems can be overcome (or because I might be missing something obvious & not seeing solutions to the problems we ran into).
So any comments on the following would be appreciated:
Critical: exception when reading unknown meta data
---------
PMT seems to be built to a "fail completely" approach instead of a "fail safe" approach. Specifically, if meta data is not exactly to its liking, an exception is thrown and *none* of the meta data returned.
For example, the ITPC/NAA (0x83BB) tag returns long data (Nikon images et al). But PMT expects 8 bit values for ITPC_NAA, so throws an exception for this tag. It would be OK if the tag was not read, but the result is that any image file with the ITPC/NAA tag can't be read by PMT.
Granted I could modify the source code to fix this, but perusal of the code suggests that this problem is right through the PMT code.
The problem is that images come from a wide range of sources. It is very likely that some images will have data that is not understood by PMT, or invalid by PMT's definition. The current approach in PMT is that all the data must be perfect, otherwise none is returned.
I feel a better approach is to take the philosphy of "if we can return any data, we will" rather than "if there is anything at all we don't understand, we will return no data".
PMT failed (threw an exception) on about 30% of the different TIFF / JPEG files we tried it on (from different cameras/scanners/etc). So this problem looks to be quite serious.
Critical: Most meta data is not be read by PMT
---------
In tests, I found that PMT returned 40% to 60% of the meta data present in files. Specifically, it makes no attempt to read manufacturer specific data. This is a major problem. Nikon, Canon, Sony, Adobe and others all put a lot of the critical meta data inside their own tags. These tags are well know, yet PMT simply returns the data as an an undecoded block of manufacturer specific data.
Whilst it is true that much of this data is not in the open specs, the fact remains that Nikon and Canon alone account for a major portion of professional digital camera imagery, and it is essential to be able to extract their meta data.
The limited approach taken by PMT in my view defeats the major purpose for the toolkit.
It is worth comparing output from ExifReader et al to PMT - the difference in valuable information returned is significant.
Question: Am I right in thinking that PMT can be extended to add new tags? What about manufacturer specific tags which typically have the manufacturer name as the first characters (e.g. Nikon) and then the rest of the data as an sub tiff tag. Can these be added just by extending the schema, or are software code changes required?
I ran into a number of other problems, all of which can be worked around. The above two are show stoppers for us, and probably would also be for other software vendors.
Any feedback or suggestions would be greatly appreciated, as if we can overcome the two critical issues, PMT would be our toolkit of choice for meta data.
Thanks!
Stuart
Stuart,
Thank you for taking the time to evaluate PMT and, more importantly, provide the feedback.
I'll address the two critical issues you have identified. Regarding the throwing of errors while accessing metadata. PMT Accessors have two modes of operations: aborting a read and throwing an error and logging an error and continuing a read. The mode is set through the "throwErrors()" method on the PmtAccessor class, e.g., acc->throwErrors() = false; will prevent errors from being thrown. Also, not throwing errors is the default mode, so perhaps you have exposed a bug.
Regarding the second critical error, the default set of definitions purposely covers the standard set of metadata definitions across Exif and Tiff (plus a few important extras). More importantly, PMT is completely extensible -- what is needed is better documentation regarding this fact. In fact, adding new metadata definitions can generally be accomplished without programming changes, one need only supply a schema (the metadata definitions) and a translation table (which states how to access the metadata in Exif and Tiff). Part of the translation table is a reference to a type translator -- Pmt provides built-in translators for all the primitive data types (uint8, int8, etc., and vectors of all the primitive types). If the metadata stored in the file is structured, then you are faced with the only situation in which you will need to write code, i.e., you will need to write a translator for the structured metadata.
Note that Pmt currently does not support two schema constructs that would make adding new definitions relatively straight forward: "include" and "redefine". These are at the top of our list for implementation.
Hope this is helpful and please continue the dialog.
- George
George,
Thanks for your detailed response. And once again, let
me emphasize that the work on PMT and OpenTIFF that you
and others have authored is excellent. So please accept
my comments in this light.
First, regarding the exceptions:
Although throwErrors() was false, I found a number of
places where errors are being thrown without testing
against throwErrors() first. From memory line 251
of PmtTiffAccessor.cpp bit me, and a check of source
showed quite a few others without tests. If the library
took a more defensive approach to errors (the "I really
*won't* fail unless the hard disk crashes and the CPU
melts down kind of positive approach" :-) it will help
when parsing all those weird and wonderful image files
out there.
Regarding the defaults tags being parsed:
IMHO, it would be ideal if the library tried to read
every possible tag from every possible sub-format
(particularly from "maker notes"), as much of the
critical information is buried in those elements.
Much of this information is available (exifdump.py
does a good job on many of them), and it will I feel
help increase the appeal and wide spread use of PMT,
so hopefully we might end up with some standards in
this crazy area of EXIF/TIFF/EP/ITPT/makernotes.
I take your point that many of these can be fairly
easily added to PMT, however people are more likely
to consider alternatives if the standard PMT code
defines only the "pure" subset of EXIF/TIFF tags.
After looking closely at building on PMT, and contributing
additional PmtAccessor classes to the code, I ended
up writing my own EXIF code (which is why the delay
in responding to your feedback).
My requirements were:
- Very fast reading & access of tag info, so 1,000+
images can be read quickly. This pretty
much meant a direct coded approach (more in
solving this problem later).
- Gather/scatter approach, where information
stuck in all sorts of obscure places is
gathered into a simple, clean, hierarchy.
Writes in turn scatter it back out to wherever
it has to hide in makernotes or whatnot.
The idea was to present users & programmers
the data in a very clean structure; never mind
what magic had to happen to get it that way.
- Tracking down the thumbnail, regardless of where
it might be hidden (for example buried in
a sub element of a makernote), and presenting
the thumbnail (if it exists) as a simple
unpacked image regardless of source.
- Fast XML style reading/writing of meta data.
- A simple way to implement the thing without
going mad.
I ended up creating a simple spreadsheet which defined
my structure, including where data comes from.
A parser breaks the spreadsheet down into C++ classes,
which are then compiled with a few hand coded helper classes,
sorting the whole thing out with a minimum of pain.
More importantly (for my application), reading
and accessing image meta data is very fast.
Anyway, I learnt more about meta tags than I ever
really wanted to know, and gained even more respect
for your work.
Hope the PMT code continues to mature.
Kind regards,
Stuart