Thread: [Yaml-core] Proposed Changes to YAML by example

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Hi all,

Oren Clark and I have been making several changes to YAML, and I want to
run them by the list by giving examples. Most of these changes are
optimizations that loosen the rules to allow some nice use case tricks.
A few of these actually tighten the rules in a non backwards compatible
way.

As far as we can tell, we don't think that we are breaking very many, if
any, valid YAML documents in the wild. If this is not the case, please
let us know. The jist of all this is that YAML is being polished to make
things better, and that the typical YAML of yesterday remains exactly
the same.

1) The first change is to allow a clever syntax for unordered sets. Since
mapping keys work nicely for this, we used to be able to say:

    ---
    banana:
    peach:
    orange:

which would load as a hash with null values. This makes a decent set
object since you, by definition, can't have duplicate keys, and key
order is not important or preserved.

Clark wanted to do this:

    ---
    ? banana
    ? peach
    ? orange

which was a relaxation or the previously valid:

    ---
    ? banana
    :
    ? peach
    :
    ? orange
    :

We decided this was a nice use case. The '?' is more declarative. The
':' looks more accidental. Of course you can mix these:

    ---
    ? banana
    peach:
    orange: yum

But that would be silly.

2) The next change is for flow collections. You know, { ... } and [ ... ].
We decided to allow lists inside of curly brackets, and pairs inside
square brackets; with the following semantics:

    --- # These
    - { banana, peach, orange }
    - [ banana: yellow, peach: pink, orange: orange ]
    --- # are equivalent to these
    - ? banana
      ? peach
      ? orange
    - - banana: yellow
      - peach: pink
      - orange: orange

So we get flow collections for "sets" and "ordered maps". These forms
were previously invalid YAML, so this is just a loosening of the rules
and therefore doesn't break backwards compatibility. 

If it isn't immediately obvious, the above are different from the old
flow forms of:

    --- # These
    - [ banana, peach, orange ]
    - { banana: yellow, peach: pink, orange: orange }
    --- # are equivalent to these
    - - banana
      - peach
      - orange
    - banana: yellow
      peach: pink
      orange: orange

And of course, mixing pairs with values is allowed:

    ---
    - { foo, bar: baz }
    - [ foo, bar: baz ]
    ---
    - foo:
      bar: baz
    - - foo
      - bar: baz

3) We decided that default typing should go away completely. Previously,
a plain scalar (no quotes, no '|' or '>') was required to be reported by
the parser as having an empty tag. Having an empty tag is a signal for
implicit typing. All other scalars (quoted, etc) were assigned a tag of
'!str', thereby defeating implicit typing. And this made sense, because
if you quote something it should be a string, right?

    ---
    implicit number: 123
    quoted string: '123'
    null:
    empty string: ''
    explicitly implicit number: ! '123'

Also collections without an explicit tag, were assigned '!map' or
'!seq', and thus avoided implicit collection typing.

Well we decided that default typing was just too weird. But we wanted to
keep the same overall effect. So now we require that a parser report
whether a scalar is plain or not. Then the receiver can use that
information itself to determine the appropriate type.

Since this is the case, we no longer allow an explicit empty tag (like
above) since it adds no value.

For the most part this is a transparent change to YAML users. We
consider the new way the lesser of evils, but the use case of forcing a
string is too important to ignore. So this is the cleanest way we can
explain it.

4) We eliminated some nasty unbounded lookaheads, to make it easier to
write parsers. The basic rule is this:

    '?' starts a mapping key, and ':' starts a mapping value. They are
    always required unless the mapping key is a single line (and less
    than 1024 characters long), or the mapping value is null.

So before this was valid:

    ---
    [ 1,
      2,
      100000
     ]
    : value1
    "multi line
     string key"
    : value2
    simple key: value3

and now it must be:

    ---
    ? [ 1,
      2,
      100000
     ]
    : value1
    ? "multi line
     string key"
    : value2
    simple key: value3

This makes programming a parser immensely simpler, because a complex key
can be discerned from a complex scalar without having to parse the whole
thing. Remember, YAML parsing requires the autodetection of a new node,
and what looks like a gigantic mapping might really be a key of another
mapping.

Luckily, this change affects mostly oddball mapping keys, since
programmers tend to just use simple keys for mappings. The extra '?'
indicators actually seem to *add* clarity for human readers of YAML.
Even humans have to look ahead; wetware is just generally better at
parsing than most software. :)

Cheers, Brian

Thread: [Yaml-core] Proposed Changes to YAML by example

yaml-core