Thread: [Pmt-exif] Re: Exif and PNG
Brought to you by:
glennrp
From: Adam M. C. <am...@cs...> - 2000-08-07 02:48:58
|
Glenn Randers-Pehrson <gl...@ho...> wrote: > I still don't much like the "\n=" having a dual purpose of > keyword-terminator and escaped-newline. We're still not on the same wavelength. Let me once more clarify how I view my syntax, and then I'll propose a restriction on your syntax that makes it more palatable to me (perhaps even more attractive than my own). I don't think of my syntax as escaping anything; in fact the whole point of it is to avoid the need for anything to be escaped or quoted. Every newline character means the same thing: the end of the line. The parser is always in the same state at the start of every line. Here's the entire syntax spec: Every line has the form [keyword][=valueline], where either half may be missing. If the valueline is present, it gets appended to the value of the most recent keyword you've seen. So a key/value pair is not terminated by a newline. It's terminated by the appearance of the next keyword, or the end of the chunk. Therefore it is not necessary to escape newlines. Your counter-proposal has a quirk that I'm having a hard time making sense of: firstkey=one secondkey=two = =three thirdkey= =four =five If the value of secondkey is three lines with the second line blank, then the value of thirdkey must be three lines with the first blank. It's so obvious that it would be cruel to specify otherwise. One way out is to forbid the form used by secondkey. You could require that key/value pairs either use the one-line syntax, or the value *must* be on its own lines. Then it becomes completely arbitrary whether a line with a key and no value has an equal sign or not, and I'd have no objection to requiring the equal sign. The syntax could then be described: A key/value pair has either the form keyword=value (on one line), or keyword= on one line and zero or more instances of =valueline on subsequent lines. The latter form allows for multi-line values. Come to think of it, thinking of the syntax as a choice between two separate syntaxes provides another opportunity: You could say that for the keyword=value syntax the newline is not part of the value, but for the =valueline syntax all the newlines in the =valueline lines are part of the value (including the one in the last line). For EXIF data it probably doesn't matter, but in other applications you sometimes want to distinguish between foo=some text (no newline) versus foo= =one line of text (ending with a newline) And hey, here's something neat: If you wanted the value of foo to be the empty string, you would say: foo= Is that a degenerate case of the one-line syntax or the multi-line syntax? Both! And it's still the empty string (not even a newline) either way you look at it. If you wanted a single blank line (a single newline character), you would say: foo= = Do you still see a need to ignore up to one whitespace character after an equal sign? With the secondkey syntax forbidden, I think it would be an unnecessary complication. AMC |
From: Adam M. C. <am...@cs...> - 2000-08-09 19:57:47
|
Glenn Randers-Pehrson <gl...@ho...> wrote: > But the only case that is interesting to applications is when the > string contains multibyte values in little-endian format, so I think > it's best to go with "L " in that case, and nothing in the others. I don't mind combining unknown with not-applicable, but I think it's good to keep the distinction between those and big-endian. > BTW I wasn't familiar with the I/M flag. Does it mean Intel/Mac? The "Description of Exif File Format" document mentions that every TIFF file begins with either II (for Intel byte order) or MM (for Motorola byte order). AMC |
From: Glenn Randers-P. <gl...@ho...> - 2000-08-09 21:36:29
|
At 07:57 PM 8/9/00 +0000, "Adam M. Costello" <am...@cs...> wrote: >Glenn Randers-Pehrson <gl...@ho...> wrote: >I don't mind combining unknown with not-applicable, but I think it's >good to keep the distinction between those and big-endian. OK. I'll go with "G" for "biG-endian", "L" for "Little-endian", and omitted for "Unknown/NotApplicable". >> BTW I wasn't familiar with the I/M flag. Does it mean Intel/Mac? > >The "Description of Exif File Format" document mentions that every TIFF >file begins with either II (for Intel byte order) or MM (for Motorola >byte order). Duh. Thanks. It's in the TIFF/EP spec as well. I'll make a reference to that in the discussion of HEX. Glenn |
From: Glenn Randers-P. <gl...@ho...> - 2000-08-07 10:40:54
|
At 02:48 AM 8/7/00 +0000, Adam M. Costello wrote: >I don't think of my syntax as escaping anything; in fact the whole point >of it is to avoid the need for anything to be escaped or quoted. > >Every newline character means the same thing: the end of the line. The >parser is always in the same state at the start of every line. > >Here's the entire syntax spec: Every line has the form >[keyword][=valueline], where either half may be missing. If the >valueline is present, it gets appended to the value of the most recent >keyword you've seen. > >So a key/value pair is not terminated by a newline. It's terminated by >the appearance of the next keyword, or the end of the chunk. Therefore >it is not necessary to escape newlines. I like this. More completely, "...may be missing. The valueline can be empty and it can contain any characters allowed in PNG text chunks except for the newline." BTW: Do you mind if I copy some of our correspondence from png-list, to place this in context? Glenn |
From: Adam M. C. <am...@cs...> - 2000-08-07 21:20:33
|
Glenn Randers-Pehrson <gl...@ho...> wrote: > Do you mind if I copy some of our correspondence from png-list, to > place this in context? Not at all. > The valueline can be empty and it can contain any characters allowed > in PNG text chunks except for the newline. Right. This reminds me that we need to be a little careful if we start talking about newlines in values. If we think of a value as a sequence of lines, then there's nothing to be confused about: firstkey=one secondkey=two =three thirdkey =four =five Clearly, the value of firstkey is one line, the values of secondkey and thirdkey are each two lines. But if we think of a value as a sequence of characters, some of which may be newline characters, then the question arises of whether the value of firstkey is "one" or "one\n", and whether the value of secondkey is "two\nthree" or "two\nthree\n", and so on. In other words, we need to decide whether the lines of the value are separated by newline characters or terminated by newline characters. If we opt for terminated, then every value ends with a newline, so you can't assign an empty string. If we opt for separated, then you can always force an extra newline using a blank line; for example, you could assign "two\nthree\n" using: foo =two =three = AMC |
From: Glenn Randers-P. <gl...@ho...> - 2000-08-07 23:11:13
|
At 09:20 PM 8/7/00 +0000, Adam M. Costello wrote: >This reminds me that we need to be a little careful if we start >talking about newlines in values. Yes. We must be able to encode all of these STRINGs and SUBSTRINGs using the definitions in the proposal: STRING or SUBSTRING with no newlines STRING or SUBSTRING with a newline at the end STRING or SUBSTRING with a newline at the beginning STRING or SUBSTRING with an embedded newline empty lines in STRING or SUBSTRING empty STRING or SUBSTRING Glenn |
From: Glenn Randers-P. <gl...@ho...> - 2000-08-08 09:58:15
|
I think we need to say something about the "endian" condition of data stored in a HEX string. The HEX data is simply copied from a TIFF/EP tag as is, and therefore uses the same endian style as the camera uses. One possibility is to require applications to write a PNG_ByteOrder subkeyword (value 0: unknown, 1: BigEndian, 2: LittleEndian) if they plan to write HEX that contains multibyte entities. Glenn |
From: Glenn Randers-P. <gl...@ho...> - 2000-08-08 10:21:11
|
I am considering eliminating the possibility of having multiple textExif chunks in a PNG datastream. It seemed useful at first, but it made the proposed eXIF chunk more complex, and I no longer think it's that useful to be able to spread the Exif information over multiple chunks. Glenn |
From: Glenn Randers-P. <gl...@ho...> - 2000-08-08 10:33:57
|
The work we are doing on defining the syntax for the textExif chunk may be useful outside the specific task of recording Exif information. There are several other applications that have been mentioned in the past on png-list or directly to me, e.g., chemical structure information and geographical coordinate system data. Therefore it might be a good idea to split the PNG/Exif proposal into two documents, one relating specifically to Exif and its subkeyword list, and another dealing with the subkeyword=value syntax. The latter could eventually become a part of the PNG Extensions document. (Draft 0.13 talks about "keywords" instead of "subkeywords" but I'm going to change that, so that "keyword" only applies to the text chunk keyword ("Exif", in the case of the textExif chunk) and all the rest are "subkeywords"). Glenn |
From: Glenn Randers-P. <gl...@ho...> - 2000-08-08 10:45:14
|
At 05:57 AM 8/8/00 -0400, Glenn Randers-Pehrson wrote: >I think we need to say something about the "endian" condition of data >stored in a HEX string. [...] >One possibility is to require applications to write a >PNG_ByteOrder subkeyword (value 0: unknown, 1: BigEndian, 2: LittleEndian) I meant to write that the other obvious possibility is to require that the application writing a HEX string convert it to network byte order first. This is the choice that I prefer. Glenn |
From: Adam M. C. <am...@cs...> - 2000-08-08 20:12:47
|
Glenn Randers-Pehrson <gl...@ho...> wrote: > I think we need to say something about the "endian" condition of data > stored in a HEX string. > > ...require that the application writing a HEX string convert it to > network byte order first. HEX is used only for the MakerNote subkeyword. If the encoder does not understand the layout of the data, then it cannot convert the byte order of the subfields. But if it does understand the layout, why stop at fixing the byte order? Why not convert it to a human-readable form like all the other data in the textExif chunk? > One possibility is to require applications to write a PNG_ByteOrder > subkeyword (value 0: unknown, 1: BigEndian, 2: LittleEndian) If it's unknown, why not just omit this subkeyword? Also, this would be more human-friendly as: PNG_HexByteOrder: big-endian PNG_HexByteOrder: little-endian or maybe: PNG_HexEndian: big PNG_HexEndian: little Another possibility is to put the information in the HEX value itself: MakerNote=M 04B129C6 MakerNote=I C629B104 (I took the M/I convention from TIFF.) AMC |
From: Glenn Randers-P. <gl...@ho...> - 2000-08-08 20:29:10
|
At 08:12 PM 8/8/00 +0000, Adam M. Costello wrote: >HEX is used only for the MakerNote subkeyword. If the encoder does not >understand the layout of the data, then it cannot convert the byte order >of the subfields. But if it does understand the layout, why stop at >fixing the byte order? Why not convert it to a human-readable form like >all the other data in the textExif chunk? It's not the encoder that has a problem; it's the decoder that would not know how to convert that human-readable stuff back to a proper Makernote. In fact I am proposing that some of the interesting fields in the Makernote also be converted to human-readable form, for example Digital Zoom which I already show in draft 0.14, and panorama data, which I think is also a good candidate. But those will also have to be retained in the HEX-coded Makernote for potential reconversion to camera format. >Another possibility is to put the information in the HEX value itself: > >MakerNote=M 04B129C6 >MakerNote=I C629B104 > >(I took the M/I convention from TIFF.) Yes, something like that had occurred to me as well, but I was thinking of B/L/U/N as the flag (Big/Little/Unknown/NotApplicable). Apps that convert from camera format to PNG would be required to retain the camera's byte order and write the proper flag, if known. This way, apps that convert back to camera format wouldn't have to do any byte swapping. Converting to camera format for loading into a *different* camera would be interesting; the app would either need to be able to reconstruct the second camera's Makernote or just discard the stored Makernote. I don't know enough about digital cameras to know how well that would work. What happens now if you simply upload an Exif file from one camera into a different brand? Glenn |
From: Glenn Randers-P. <gl...@ho...> - 2000-08-09 10:54:51
|
At 04:28 PM 8/8/00 -0400, you wrote: >At 08:12 PM 8/8/00 +0000, Adam M. Costello wrote: >>Another possibility is to put the information in the HEX value itself: >> >>MakerNote=M 04B129C6 >>MakerNote=I C629B104 >> >>(I took the M/I convention from TIFF.) > >Yes, something like that had occurred to me as well, but I was thinking >of B/L/U/N as the flag (Big/Little/Unknown/NotApplicable). "G" for "biG" instead of "B" for "Big" would probably be better, to put it outside the range of hex digits. But the only case that is interesting to applications is when the string contains multibyte values in little-endian format, so I think it's best to go with "L " in that case, and nothing in the others. "I "/"M " would be OK, too, with the "M " being optional. Whitespace between the flag and the hex codes can be optional. e.g., Makernote containing 2 4-byte values and 2 2-byte values: MakerNote=M 04B129C6 01020304 0506 0708 MakerNote=I C629B104 04030201 0605 0807 MakerNote=04B129C60102030405060708 MakerNote=LC629B1040403020106070807 BTW I wasn't familiar with the I/M flag. Does it mean Intel/Mac? Glenn |
From: Glenn Randers-P. <gl...@ho...> - 2000-08-08 12:42:32
|
I posted draft 0.14 (http://pmt.sourceforge.net/exif/drafts/d014.html) which revises the description of the subkeyword syntax along the lines Adam recommended. Glenn |
From: Adam M. C. <am...@cs...> - 2000-08-08 19:55:21
|
Glenn Randers-Pehrson <gl...@ho...> wrote: > We must be able to encode all of these STRINGs and SUBSTRINGs using > the definitions in the proposal: > > STRING or SUBSTRING with no newlines > STRING or SUBSTRING with a newline at the end > STRING or SUBSTRING with a newline at the beginning > STRING or SUBSTRING with an embedded newline > empty lines in STRING or SUBSTRING > empty STRING or SUBSTRING If we consider a string to be a sequence of characters, some of which may be newlines, then the following syntax allows any string to be assigned to a keyword: Every line has the form [keyword][=valueline]\n where valueline contains zero or more non-newline characters. When valueline is present, it is appended to the value of the most recent keyword seen, separated by a newline if it is not the first valueline to be appended. Note that if the string to be assigned ends with a newline, the last valueline must be empty. Presumably, if any string can be represented, and your syntax for breaking up strings into substrings is general enough, then any substring can be represented. > I posted draft 0.14 (http://pmt.sourceforge.net/exif/drafts/d014.html) It still says: A single whitespace character (the "\n=" sequence is considered to be a single whitespace character for this purpose) may precede the STRING and is not included in it. If multiple whitespace characters appear, all but the first form the beginning of the string. So in this example: firstkey=foo secondkey= foo thirdkey= foo firstkey and secondkey have the same value "foo", and thirdkey has the value " foo". Is this complexity justified? Why not just say everything between the equal sign and the newline is the valueline? I think this rule was originally added so that key=foo and key= =foo would be equivalent. But now that we have key =foo there should be no need for it. AMC |