On Fri, Nov 27, 2009 at 10:21 PM, Osamu TAKEUCHI <osamu@big.jp> wrote:
Hi all,

To discuss the equality, we have to distinguish 1 == 1.0 from a_map[1] == a_map[1.0] .

Very true; However, YAML's equality always talks about the latter, though, so there's no ambiguity.
 
In many languages, the former is true because 1 is implicitly converted to 1.0 before conversion and then compared to 1.0. In contrast, the latter is almost always false because 1 and 1.0 are unequal objects by themselves. JavaScript is an exceptional language.

Not that exceptional; it is also true for Python, for example.
 
But, since the exception is not negligible, I agree with giving
up specifying YAML equality rules.

On the other hand, I see no benefit but some drawback to define the "must-be-rejected" mapping as option 4. As I wrote in previous posts, the default way of evaluating equality of reference-based objects in many applications are based on the identity of objects...

USA:
 Presidents: !President[]
...
 - &PR41    name: George Bush
...
 - &PR43    name: George Bush
...
 Parties: !Party[]
...
 *PR41: ...
...
 *PR43: ...

In this example, we need some extra property like UniqueHash in
President class just to make the YAML document to be valid. Note that *PR41 and *PR43 are equal to each other by the YAML's definition.

I'm not convinced by this example. It seems to me this is the same president, and should justifiably belong to just one party. I suspect what you call "President" is actually a "Term in Office" where one of the properties is the actual president name (other properties would be date, or whatever). But that's neither here nor there; granted some languages would allow the above (at the native data structure level).

IMO, we need to clarify what are our expectations/goals from YAML here. Our tension is between:
- Be as perfect as possible match to "every" language model.
And
- Be as portable as possible between "all" languages.
Neither goal is 100% achievable, given we also want YAML to be "useful".

A "perfect" match to every language model is impossible given some languages have built-in types that are foreign to YAML (e.g., multi-dimensional matrix types in APL, ordered mappings in PHP, etc.). YAML will always be forced to "simulate" certain types, hopefully with a minimal (non zero) overhead.

Being completely portable means supporting a "least common denominator" approach. In general YAML was designed more towards portability, and by choosing the "right" abstractions we could cover a feature set that spans across "all" (well, many :-) languages.
 
I see no benefit to reject such an input.
So, I disagree with the latter part of the option 4. The next simple statement seems to work well. 

I do see a benefit. I think our goal was (and should still be) more towards portability and we should only make compromises where we have no other option. The above case, IMO, is an example where we made the right call. We picked a highly portable, highly useful subset to support. Specifically, identity-based mapping were intentionally considered to be outside the "least common denominator". The benfit is that, if every valid YAML file has "different" keys (different in tags and/or values), then these files would work in all languages (whether or not they use identity-based mappings). If "a few" applications in "a few" languages are forced to add "some" overhead (such as unique hashes to the example above), so be it.

As xitology demonstrated, we haven't got as close to the "100% portability" as we thought we had. Ideally, I would like to do is to keep "a valid YAML file is acceptable to every implementation". That is, instead of making YAML more relaxed, I would make it more strict - e.g., disallow { 1: a, "1": b } in _all_ implementations.

However, there simply does not seem to be any practical way to define such a safe "strict" rule, given all the different languages and tags out there. So, we are forced to relax the rules instead ("being dragged, kicking and screaming" :-). I would like to minimize the amount of damage (to portability); that is, since I see relaxing the rules as a necessary evil, not as a desirable goal by itself, I would like to keep things as strict (that is, as portable) as possible.

Hence, in the bottom line, while I am forced to accept:
 
 Implementations "may" reject mappings that have "equal" keys,  according to their own *implementation-specific* definition  of equality.

I don't see that I am forced to give up:

Constrain this to say that nodes with equal tags and equal content are always equal and hence "must" be rejected as duplicates.

We could have gone another way. We could say that "YAML semantics is whatever the language makes it to be", define no constraints about equality, and while at it, allow YAML mappings to consider key order (which would make the PHP people happy), and so on. This would make each language-specific YAML implementations simpler, and "generic" implementations harder; it would also reduce YAML's usefulness in cross-platform use cases. This way is not what we set out to do when we started the YAML project, and I still believe that we made the right call, even if we need to adapt our lofty ideals to the cruel reality :-)

Have fun,

    Oren Ben-Kiki