The following issue has been raised by xitology. YAML has always defined equality as follows:

1. If the tags equal, and the content is equal, the nodes are equal. For example: !!int 1 == !!int 1 ; this is the easy part.

2. If the tags are equal, and the content is not equal, the nodes may or may not be equal, depending on the tag's definition of a canonical form. For example !!float 1 == !!float 1.0 but !!float 1 != !!float 0.1 ; this complicates matters (requires the introduction of the concept of a canonical form).

2. If the tags are not equal, the nodes are not equal. For example !!int 1 != !!float 1.0 ; this is where things start to break down.

Xitology's points are:

1. The above definition causes an impedance mismatch between YAML's definition of equality and the native data structure's definition of equality. For example, in most languages, !!int 1 == !!float 1.0

2. All existing YAML implementations follow the native data structure's definition of equality and not the YAML definition of equality.

And I would like to add:

3. The native data structure's definition of equality is typically justified. E.g. most people would expect !!int 1 == !!float 1.0 and would be surprised if this were not the case. However, some platforms have a strange idea of equality which does surprise people (e.g., in Javascript !!int 1 == !!str "1").

We have several options:

0. Keep the current YAML equality rules, let implementations do something different, and ignore the contradiction. This is obviously a problem, as the spec would become irrelevant for implementers.

1. Keep the current YAML equality rules, and change the implementations to follow the spec as it is. This is a dual problem - first, it is be unintuitive (people would find it surprising that 1 != 1.0); second, it is impractical to implement (e.g., how do you force a Python mapping to accept { 1: "int", 1.0: "float" } ?).

2. Keep the current YAML equality rules, and redefine the tags to minimize the mismatch. E.g., give up on !!int and !!float and switch to !!num. This is still a problem in platforms where some nodes are equal with wildly different tags (e.g., Javascript's case where !!num 1 == !!str "1").

3. Change the YAML equality rules to allow for the possibility that nodes with different tags are equal. This would complicate the definition of canonical form, but would allow us to better model "reasonable" definitions of equality (!!int 1 == !!float 1.0). This would still be a problem in platforms whose definition is "unreasonable" (e.g., this would still not solve the Javascript problem).

4. Do not specify YAML equality rules. Eliminate most of the discussion of equality, canonical formats etc. and replace it by a stating that implementations "may" reject mappings that have "equal" keys, according to their own *implementation-specific* definition of equality. Constrain this to say that nodes with equal tags and equal content are always equal and hence "must" be rejected as duplicates. The problem here is that { 1: "int", "1" : "string" } would work in Python and not in Javascript. Arguably, anyone defining a cross-platform schema would be able to "easily" avoid such issues (e.g., by requiring all keys of the mapping to have the same tag, which is pretty trivial). But there's no longer a universal cross-platform validity guarantee.

IMO we are being driven - kicking and screaming - towards option 4. We have a precious requirement that "all valid YAML files can be read by all conforming YAML implementations". As xitology points out, this is an illusion, since option 4 is what all implementations do in practice. Barring some creative idea not covered above (come up with one! please! :-), it seems that we have to give this illusion up, painful though it might be :-(

Thoughts, comments, ideas, etc. are welcome. Don't be shy!

    Oren Ben-Kiki