From: Peter M. <pkm...@po...> - 2004-05-30 03:33:09
|
All, I've been interested in YAML for a couple of years, and currently working on (yes, yet anothor :-) Python parser for the language. My first thoughts first: the 29 Jan Spec is a vast improvement over its predecessor, with the new "Information Models" sections. My respects to Clark, Owen and Brian for a good job well done. However, I do have a few questions. 1: Would an empty document: --- ... Be equivalent to a document with one null scalar node? I.e.: --- > ~ ... 2. In YAML, equality of scalars (I gather) is equivalent to sharing the same tag value and canonical representation. Does this mean that the following map keys would be illegal: --- !float 12.0: "a" !float 12.000: "b" !float 11.9999999999: "c" ... (The related question - does the canonical representation of a floating point depend on the floating point library on the user's computer?) Best Regards, Peter ___________________________________________________ Cost effective technology solutions for business. Sign up for a free trial today! http://www.officemaster.net This email has been scanned for viruses. |
From: Oren Ben-K. <or...@be...> - 2004-05-30 07:53:42
|
On Sunday 30 May 2004 06:32, Peter Murphy wrote: > I've been interested in YAML for a couple of years, and currently working > on (yes, yet anothor :-) Python parser for the language. The more the merrier... > My first thoughts first: the 29 Jan Spec is a vast improvement over its > predecessor, with the new "Information Models" sections. My respects to > Clark, Owen and Brian for a good job well done. s/Owen/Oren/ :-) Thanks! > However, I do have a few questions. > 1: Would an empty document: > > --- > > ... > > Be equivalent to a document with one null scalar node? I.e.: > > --- > > ~ > ... The short answer: "no". But probably not for the reasons you'd think :-) The long answer: - The first document contains an empty plain scalar. So, the style is "plain", the value is "". It is subject to tag resolution (implicit typing). So, it can anything at all, depending on the schema. If "!null" is used for implicit typing, it will not be a null, because we changed the rules for "!null" to require an explicit "~". - The second document contains a folded scalar. The style is not "plain", the value is "~\n". It is subject to tag resolution (implicit typing). So, it can be anything at all, depending on the schema. The "standard" way to implicitly type non-"plain" scalars is "!str". Even if you matched non-plain scalars with type regexps, the value contains a trailing "\n" so it fails to match "!null"; the regexp for "!null" only matches a single "~" without a "\n". > 2. In YAML, equality of scalars (I gather) is equivalent to sharing the > same tag value and canonical representation. Does this mean that the > following map keys would be illegal: > > --- > !float 12.0: "a" > !float 12.000: "b" > !float 11.9999999999: "c" > ... Yes. However, this error can only be detected well into the loading process. Specifically, a YAML pretty-printer that has no knowledge of "!float" may process this without a warning. But if you load it into some native data structure, with awareness of "!float" semantics, the error will be detected. The spec allows a YAML processor to recover by ignoring the subsequence keys having an equal value, after emitting an appropriate warning. Or the processor can just bail out, after emitting an appropriate error. > (The related question - does the canonical representation of a floating > point depend on the floating point library on the user's computer?) No and yes :-( A floating point number has a unique representation as a combination of a sign, a mantissa, and an exponent (all integers). So, in theory, this establishes a "==" operation that is independent of library and representation issues. The above would be a complete answer if we were to write our floating point numbers in base 2. In practice, people use base 10. This means that not every Unicode string describing a float has an exact equivalent native representation as an IEEE float (or whatever else is used.) When a float that does not have an exact representation is loaded (e.g., "0.2"), it is converted to the nearest one that does (0.200000000000000011102230246251565404236316680908203125) . Taking the reasonable view that only floats that _do_ have an exact representation are "canonical" and all the rest are "aliases" for them, we inevitably end up with each different native representation defining a different set of canonical float representations. We even get different canonical representations for single-precision vs. double-precision representations. Ugh! You are right, this is a wart. It reflects the problems of using floating point values as mapping keys. Any code that handles a mapping with floating point keys must be very careful in order to make it work as expected... The place where this really bites you is when writing a dumper. Should one emit the "friendly" value of "0.2", or the "true" value of "0.200000000000000011102230246251565404236316680908203125"? What I usually do is to emit the shortest possible representation that is loaded to the exact same native representation. My "ftoa" function does something like: use P such that sprintf("%.Pf", x) == x or sprintf("%.Pe", x) == x and strlen(the sprintf used) is minimal (BTW, people expect a simple sprintf("%g", x) to do the above. It doesn't :-) This way I get the benefit of a "friendly" representation ("0.2"), and the safety of knowing the value will be loaded to the *exact* same native value - in my application/programming platform. Come to think of it, that's also what I'd use for "canonical representation" of float numbers, instead of the much longer "true" value. It satisfies all the requirements... Of course, if the value is loaded into a different platform (using a different floating representation), I no longer have the safety that the value will be *exactly* the same. But since the representation is different, I can't have that safety emitting more digits is not going to solve the problem anyway. Have fun, Oren Ben-Kiki |