[Yaml-core] revised model: is sequence/pairing a kind or a style?

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Ok.  With a revised model that allows key/value pairs to
be ordered and have duplicate keys, we have the following:

   A node is either a sequence, pairing, scalar, or alias
   we call the variety of node it's "kind".

   A pairing is a series of zero or more pairs, where 
   each pair consists of two nodes, a key and a value.

   A sequence is a series of zero or more nodes.

   A scalar is a series of zero or more characters.

   Scalar nodes may have one of the following "styles":
   block/literal, block/folded, quoted/double, 
   quoted/single, unquoted

   Both pairing and sequence nodes also have a "style"
   property, which is "flow" or "block"

   etc.

There is another option, that is, sequences (-) could
be treated in the information model as a pairing where
the key is "null".  And then the difference between
a pairing (mapping key:value syntax) and the sequence
(bulleted - syntax) could be treated as a style; much
like there are quoted and block scalar styles.  In this
case, the model would read:

   A node is either a collection, scalar, or alias.

   A collection is a series of zero or more pairs,
   where each pair consists of two or more nodes.

   A scalar is a series of zero or more characters.

   Scalar nodes may have one of the following "styles":
   block/literal, block/folded, quoted/double, 
   quoted/single, unquoted

   Collection nodes also have a style, which can
   be one of "flow/pairing", "flow/sequence", 
   "block/pairing", "block/sequence".

   etc.

In this case, the difference beween the sequence 
and the pairing syntax would be the "style" attribute
attached to each collection (just as the difference
between the block and quotes syntax for scalars).
For sequence style, the "key" would be null.

Thus, the only difference between

   ---
   !null: one
   !null: two

and

   ---
   - one
   - two

Is the style of the former would be "pairing", while
the style of the latter would be "sequence".   Just like 
the only differnce between:

    --- "x x"

and 

    --- \
      x x

is that the former style is "double quoted" and the 
latter style is "block folded". 

The advantage of this alternative model is that it is 
simplier.  The "sequence" concept is merged into the 
"pairing" concept with the difference being delegated 
to a "style".   This could make the parser APIs simpler 
or more consistent.

...

Anyway, which way we choose to model this is important
since it should probably reflect every implementation's
parser API.   If we choose the former, there would be
two sorts of node kinds; one for key/value pairs and 
one for just values.   If we choose the latter, then
only one node kind exists; and if you could ask for
the "key" of a list entry... it'd just return null.

So.  Some thought would be nice here.  For greater
XML compatibliity, we should choose the latter.  Although
I think that the former works better for a serialization
use case with hashtable/array as primary data types.
But since we arn't going to assume an application,
perhaps normalizing it into a single "collection"
is a better bet.

This, hopefully is a clear example of why a Model
is important.  It sets our vocabulary about how
we want to talk about parts of a YAML document
(is it a pairing kind or a pairing style of
a collection) and how our APIs should look.  If
we didn't have a model people would use different
words and APIs in different implementations may
choose to do it differently...

Clark