Ya. This is a supreme idea Robin. It can especially be used to drive
home the rationale for the information model. A few more comments.
On Fri, Oct 17, 2003 at 05:18:39PM +0000, Clark C. Evans wrote:
| On Fri, Oct 17, 2003 at 12:50:40PM -0400, Robin G. wrote:
| | I'm thinking of writing a YAML parser (or two) and I'm interested in
| | defining a restricted, canonical subset of YAML with which to write the
| | output side of test cases with - so that the validation aspect of spec
| | compliance (as opposed to the "correct native representation" aspect) can
| | be tested with a bank of language-independent testcases.
| Well, may I suggest a few items:
| 1. You use double quoted for all strings, as this is the only
| scalar style that can encode all content.
| 2. All !tag thingies are always provided, even if it is the
| empty tag, ! , which specifies implict typing
| 3. Since double quoted is a 'flow', I also suggest using only
| flow mappings and sequences.
| 4. All mapping keys are sorted according to unicode code point.
5. All nodes are given an anchor, starting with 0 and
increasing numerically without a leading zero.
6. No comments, or other styles, unnecessary spaces, or line
breaks are used (all line breaks are escaped in the double
7. It is assumed that all nodes with a !tag are in canonical
form, that is !int 10 and not !int 0xA
| | So, logically, my criteria for Canonical YAML would be:
| | 1. Every legal Canonical YAML stream is a legal YAML stream.
| | 2. The spec should in effect define an idempotent function f from YAML to
| | Canonical YAML
| | 3. f(x)=f(y) iff x and y are weakly equivalent streams, as defined above.
Yes, this makes the 'weak vs strong' graph model very concrete.
This is *such* a good idea, it should drive home the point.
| | As a bonus, Canonical YAML might also be useful to create
| | ultra-lightweight parsers, for example for embedded systems.
| By using only double quoted and flow collections, a minimal parser
| would be freed of having to keep track of whitespace, indenting,
| styles, and most of the complexity of YAML.
Heck, this would make it very easy to write language loaders
as the resulting BNF would be much simpler. Each entity is
on its own line, thus working well with readline(). Each
port could call a "C" library to make a string canonical,
or it could run a standalone yamlcanon program using a pipe.