[Yaml-core] Canonical YAML?

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Hi

I'm thinking of writing a YAML parser (or two) and I'm interested in
defining a restricted, canonical subset of YAML with which to write the
output side of test cases with - so that the validation aspect of spec
compliance (as opposed to the "correct native representation" aspect) can
be tested with a bank of language-independent testcases.

Obviously, such a Canonical YAML spec could not be expected to
canonicalise datatypes it knows nothing about. The introduction to the
Canonical XML spec[1], however, draws a misleading conclusion from the
same problem in XML-land, IMO - it says (sec. 1.3):

 "Although two XML documents are equivalent (aside from limitations given
in this section) if their canonical forms are identical, it is not a goal
of this work to establish a method such that two XML documents are
equivalent if and only if their canonical forms are identical."

But the analagous goal for YAML should be achievable, I think, if we use a
weak notion of equivalency such that nodes with tags not defined in the
spec are compared as if they were strings.

So, logically, my criteria for Canonical YAML would be:

1. Every legal Canonical YAML stream is a legal YAML stream.

2. The spec should in effect define an idempotent function f from YAML to
Canonical YAML

3. f(x)=f(y) iff x and y are weakly equivalent streams, as defined above.

As a bonus, Canonical YAML might also be useful to create
ultra-lightweight parsers, for example for embedded systems.

Any objections to this idea? If not, how should we proceed? I'll just
whip up a draft spec, or what?

-- 
Robin

[1] http://www.w3.org/TR/2001/REC-xml-c14n-20010315

P.S. What's up with the yaml.org nameserver?