[Yaml-core] New Draft, and some thoughts.

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

First, there a new draft in http://yaml.org/16jun2001.html - it isn't a big
change over the previous one. I cleaned up the indentation issue, enhanced
the character escapes, cleared up the anchor string semantics, and opened
the door to 32-bit characters (this needs more work). As usual, please check
me.

I didn't touch the binary issue, or the class/comment/color issue. Here are
some thoughts about these:

Fact: We must support scripting languages native data structures in a
transparent manner. This is a crucial factor to YAML's usefulness and hence
its chances of "making it".

Fact: We must support the color idiom as it solves so many problems which
otherwise would mess up our simple language: comments, classes, schema
evolution, push/pull anotation (trust me on this), legacy syntax
interoperability (that was the original use case for it), and yes, even
binary vs. unicode (see below).

Fact: the lowest level API is insensitive to the color idiom. It provides a
stream of events, period; a 'value' method is outside its scope.

Fact: In completely static languages (C, C++), we must use our own YamlNode
data type, and hence the color idiom is supported immediatly without any
special syntax.

Fact: Clark is right, it isn't possible to support the color idiom directly
in scripting languages. Certainly JavaScript isn't up to it; Python and Perl
are similarly limited. Even the weaker schema I proposed (using '_' for
color members) fails when one considers writing into the data structure.

Result: We must support the color idiom in scripting languages via external
methods, as Clark proposed. This requires two operations (read and write -
see below).

Fact: Java allows both types of APIs. It also has potential for defining an
intermediate API. Such an API could use something like YamlHash and
YamlVector, each deriving from its counterpart system class. It could also
interact intelligently with the Java (de)serialization mechanism.

Result: The Java API will be interesting :-)

That seems to settle things... Build the model around the color idiom. In
each language, support it to the best of its ability. The rest is "mere
details" :-)

Using color in interesting ways:

Classes/comments: A node with a class/comment is a map with '!'/'#' being a
key for the class/comment string. If the original node wasn't a map to begin
with, it is wrapped in a map node under the '=' key.

Round trip: A parser could attach formatting information to nodes which
would allow reproducing the output in a byte-to-byte identical format. For
example, the key 'org.yaml.format.indent' could contain the exact white
space used to indent the content of a node. The key
'org.yaml.format.scalar.style' could contain the scalar variant used, etc. A
printer could use this color data as a guide to formatting the output YAML
file.

Binary vs. Unicode: suppose we have a way to distinguish between the two,
which works most of the time (along the lines of what I proposed). A
programmer who fears that some binary scalar instance would be incorrectly
taken to be text (or vice-versa) would color the scalar with a directive for
the printer, overriding the automatic decision (e.g., using the
aforementioned 'org.yaml.format.scalar.style' color).

Color makes many problems just disappear (given color-sensitive code). This
means we have to make using it as painless as possible. The two operations I
had in mind are:

v(<map/list>, <key/index>, [ <color> ])

Read the value of an entry in a container. If the container is a map, the
entry is identified by a key. If the container is a list, the entry is
identified by its index. An optional color argument allows accessing any
color associated with the entry. For example, in Perl:

invoice: %
    price: #"Cheap!" 12.5$

v($doc->{invoice}, 'price') == '12.5';
v($doc->{invoice}, 'price', '#') == 'Cheap!';
v($doc->{invoice}, 'price', '!') == undef;

w(<map/list>, <key/index>, [ <color> ], <value>)

Write the value of an entry in the container. Entry identification is as
above. This 'does the right thing' - if the entry is actually a map, writing
the value will actually set the value of its '=' entry. The optional color
argument allows attaching color to an entry. Again, this 'does the right
thing' - attaching a color to an entry may convert it from a scalar to a
map, moving its original value to the '=' key, and adding the color as an
additional key/value pair. For example, in Perl:

Start with:

invoice: %
    price: 12.5

Apply w($doc->{invoice}, 'price', 13.5), you get:

invoice: %
    price: 13.5

Apply w($doc->{invoice}, 'price', '#', 'Expensive!'), you get:

invoice: %
    price: #"Expensive" 13.5

Apply w($doc->{invoice}, 'price', 'currency', '$'), you get:

invoice: %
    price: %
        =: 13.5
        #: "Expensive"
        currency: $

Etc. I'm toying with the idea that if 'color' is an integer, then the entry
would convert to a list instead of a map, and the value be written to the
appripriate list entry. This assumes the language used has a strong enough
distinction between 1 and "1" ...

Note that in the general case you can't replace 'v' and 'w' by the built-in
scripting language access mechanism, since these won't "do the right thing".
For example, writing:

$doc->{invoice}->{price}->{'#'} = 'Expensive!';

Above would not have worked. But you can do it in specific cases: for
example, writing:

$doc->{invoice}->{price}->{currency} = '$';

Would have worked. A careful programmer may thus minimize the number of
calls to 'v' and 'w'.

Have fun,

    Oren Ben-Kiki