On Sat, Nov 28, 2009 at 8:55 PM, Osamu TAKEUCHI <osamu@big.jp> wrote:
I do not find very much importance to keep an arbitrary YAML file to be acceptable to every implementation. Each YAML file must have its own purpose. So, I expect no chance where one would like to feed the YAML file in my example to any YAML implimentation that do not have reference-based object model.

Well, that's where we differ.

YAML's goal 2 (portability) has higher priority than goal 3 (matching native data structures). That is, wo do see the point in being able to create generic "schema-blind" YAML tools, having a well-defined consistent YAML data model, and so on. I can see why someone only interested in a particular application (or implementation) may disagree; someone interested in generic tools and portability would agree. It is a matter of priorities; had we flipped the order of the goals, we would have had a different set of rules.
 
IMO, the main purpose of defining the semantics in YAML spec is to make the YAML file readable by human eyes. If we can believe the order of mapping keys is always neglected by the YAML processor, it becomes easier to understand the meaning of the data model and to hand-write a YAML file.

I think you are biased here; PHP developers would disagree :-)

So, I do not like to define the must-be-rejected mapping as you proposed because it does not seem to make YAML files much more readable. I think it is more benefitial to remov the constraint and to increase the adaptablity of YAML language to more applications.

Readability is YAML's first and foremost goal. However, I don't see how saying that { a: 1, a: 2 } may be legal in some applications and illegal in some other applications increases readability. I find it to be confusing and that it decreases readability; that is, I can no longer tell just by looking at a YAML file whether it is valid or not, or what it means if it is valid.

At any rate:

I think I have an alternative everyone could live with, though, based on my original option 3. The following suggestion is to codify the current behavior of existing implementations, but on a per-tag rather than per-platform basis. This preserve portability (which, like it or not, YAML views as an important goal, second only to readability).

1. Expand the definition of scalar tags to specify a list of other "potentially equal" tags. Values of the defined tag are considered equal to values of the listed tags, if and only if their canonical form is identical. This would cover the !!int 1 and !!float 1.0 case.

It would even cover the case of !!int 1 == !!str 1 if we list !!str as potentially equal to !!int, to accomodate Javascript and any other feebly-typed language out there. I think this is something we can live with (we'll definitely blame Javascript for it in the spec :-).

2. For collection tags - a mapping tag could specify its values are "potentially equal" to values of another mapping tag, if the values associated with some set of keys are equal. Thus, for example, an !!omap would be equal to a !!map if all the values for all the keys are equal between the two. An value of an Employee tag could be equal to a value of a Supervisor if their FirstName and LastName keys are equal, and so on.

There's little or no effect on current implementations. "Obviously equal" keys (such as { a: 1, a: 2 }) "should" still be rejected. Otherwise the implementation is able to rely on the native data type to do the dirty work of testing for equality, under the assumption the native type is a faithful implementation of the appropriate tag (and therefore provide the appropriate cross-tag comparisons). Sure, some implementations are lax in allowing equal keys that are not "obviously equal", but this will always be the case when arbitrary tags are used.  After all, not every implementation can be aware of every tag.

So, this should make xitology happy :-)

Portability-wise, since this solution depends only on the tag definition instead of the specific language/platform used, a file that is valid under the above definition should be valid everywhere.

So, this would me me happy (except that I'd need to tweak the spec again :-)

For now, this is my preferred way to resolve the issue raised by xitology. I won't start to codify it without cce's approval, of course.

This does not address Osamu's identity vs. value based equality issue for collections. I view this as a completely orthogonal issue. It is one we discussed at length at the time (years back); in a nutshell, since YAML is a data serialization language, and since value-based semantics are a subset of identity based semantics (that is, value-based data works in identity based systems, but not the other way around), I feel that we made the right call. Changing this would be a much deeper modification than the above suggested tweak to the equality rules. That is, IMO changing this would be a YAML 1.3 or even a YAML 2.0 issue.

Have fun,

    Oren Ben-Kiki