Thread: RE: [Yaml-core] Re: New Proposals from user feedback.

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Hi Guys,

First, the listmail server is going berserk. And the web server doesn't let
me post for some reason. And this is my weekend, so I'm working from home,
and my dialup office server has its own strange notions about this. So I'm
going to be out of it for the next few days. (Brian, it seems us sharing
views has caused us to share connectivity problems. A conspiracy by
gremlins, no doubt :-)

At any rate, I like your indicator-less syntax. I'll have to consider the
details, though. Using '=' as an indicator for simple scalar values is nice.
We'll have to require it for multi-line quoted strings as well, to allow a
map whose first key is quoted.

As to the '---' separator issue. I think that Clark got it right in
suggesting the top level production would be either a simple map (just one)
or a simple list (just one). Log files etc. will just use the list
production:

: %
    first map...
: %
    second map...

For that matter, we could also allow a scalar as the top level value, now
that we have the indicator to disambiguate it. In fact, we could use the
exact same auto-detection algorithm for the whole file as we'll use for a
value within the file:

File/value starts with '@' or ':' -> list.
File/value starts with '=' or '|' -> scalar.
File value starts with '%' or anything else -> map.

(Note that a file/value starting with '"' would be taken to mean a map whose
first key is quoted).

Thus, Brian's and Clark's proposals are really just two sides of the same
coin. Neat!

As to the data model issue. Thanks Joe for writing it up. We'll have to
modify the "info model" section of the spec on similar lines. As for the
details...

I'm going to play the "purist" here. What I want is for the *core* info
model to be a *tree* of map, list, null and text scalar. And that's all. No
integers, reals, types, classes, binaries, id-based-references,
key-based-references, YPath-based-references, URL-based-references, or
anything else, no matter how vital it seems at first glance.

This info model is immensly useful (e.g., conf files, log files, database
record dumps, simple messaging). Yes, it isn't enough for some applications
(especially native data serialization). The answer is not to add a selection
of the above to the core. Not even one. Especially not "types", because that
would kill the 1-1 load/save API into native data structures.

Instead, we'll ensure the YAML APIs will allow for a *graph* whose nodes
include "native data object" as a valid node. Note that this node may be a
non-leaf node, references and cycles are allowed, etc.

How is it possible to reconcile these two opposing statements? The answer is
(star wars music) "Trust the Color, Luke!". The color idiom is built on two
concepts. One, the ability to put on "colored glasses" and see only keys of
a certain "color". Two, the ability to easily and efficiently construct an
application by layering different modules. These two reinforce each other.

(Think of each "color" as a regexp pattern. If the key matches the pattern,
it has this color. In XML, for example, a color would typically be
'some-namespace-prefix:.*' - let's not delve any deeper there :-).

The core YAML layer (parser/printer) is as trivial as you can get - just
handling a tree of map/list/string.

A different layer would handle mapping from this core data model to native
objects. This layer could use specially colored keys to control this
mapping: which native type to use, how to parse the value, how to identify
references, etc. It could also use patterns on the values ("this looks like
an integer"). It could use schema information ('delivery' is expected to be
of type 'date'). It could use some combination of the above.

Since this layer is not part of the core, I don't care much. So if Clark's
application needs the concept of reference-by-key, Joe's needs
reference-by-path, and Brian's needs reference-by-anchor, There's no problem
involved. We can all do it without messing up our simple, clean YAML core.
In fact, it may be possible to mix and match their implemenations in one
application, as long as they are well behaved.

How would Briant write YAML.pm, then? He needs references and classes. Would
his implementation be "Brian's YAML variant", then?

Well, no. First, he'll have to separate it to a different layer somehow (at
minimum make it optional). Second, to make it interoperable with other
implementations, I suggest we define a dictionary of a language neutral set
of colored keys and values, one which would allow a simple way of
interchanging most useful data types between multiple languages. This would
include a basic way to handle references (what we have today), and a set of
common basic types.

Make this an accompanying spec to the core spec itself (YAML-DATA-1.0?). Or
bundle it with the core spec (as an appendix?), as long as it is
"sufficiently speparate" so we can create a new version for it without
touching the core itself. We will need these future versions! Surely we'll
want to have interoperable reference-by-url one day, right?

Where does all this does effect the YAML core itself? In three ways:

- The YAML APIs should be defined so that they will work for a graph
containing native data structures. This means storing all the state of a
visitor/iterator in the visiting/iteration context as opposed to within the
nodes themselves. It should be possible to apply a YAML visitor/iterator to
any native data structure. This is achievable in Perl, Python, JavaScript
and even in Java (with more effort). In C++ etc. we'll have to require the
native data structure to cooperate somewhat - implement some interface etc.

- We need to define the color pattern for the mapping keys. We can define
just one such pattern (today: single character indicator keys), or we can
delve into the issue of how to solve the general problem of managing
globally unique application specific colors (or ids in general).

- We need to have a way to make YAML files readable even when colored keys
are used. At minimum, we have to ensure they are readable when the mapping
layer's colored keys are used. This is too basic a use case to leave it to
the full map syntax. We may also choose to try to provide a general
mechanism for more readable application specific colored keys.

The first issue (API) is technically difficult. However it is certainly
possible to solve. Also, it has no bearing whatsoever on the core YAML
format spec (it would effect the core API spec).

My proposal: Let's table it for a while. This doesn't effect Brian or anyone
working on the high-level load/save API.

The second issue (Color patterns scheme) is politically difficult - or at
least the general problem is.

Historically, XML has chosen a horribly complex way to do it
(document-defined mapping from prefixes to URIs). Java has chosen a simple
but verbose way to do it (reverse DNS strings); a shorthand mechanism
('import ...') battles verbosity. IANA has suggested a simple, terse but
centralized way to do it (central registry of universal color patterns).

My proposal: Let's keep the "single indicator character" pattern reserved
for this. Define a set of such keys in DATA-1.0: '!' for type, '#' for
comment, and '&' for anchor; reserve some for future use; and reserve some
for application-specific use.

As for the general problem, let's (gulp) just ignore it. This issue has been
trashed to death in XML-DEV and SML-DEV. Clark and I have seen many ideas to
solve this problem, which nobody was happy with. I think simply ignoring it
is the 80/20 benefit/cost point. And I really don't want to reconstruct all
the arguments. Just trust us on this :-)

The third issue (readability) is a matter of taste. That may be the most
difficult issue of all :-) The problem here is the conflict between a
special, terse syntax for more readable files and a verbose but simple
syntax for more intuitive files.

My proposal: use the %(<key><value> ...) shorthand for the color keys. The
'%' makes it clear there's a map involved (or, for a complete newbie, that
something fishy is going on). It is short, extendible, but not *too*
extendible.

As for the general problem: I don't think we should do anything other then
reserve some special keys for application specific use. As Joe pointed out,
the use of the shorthand weakens the simplicity of the files, so it had
better be reserved for sufficiently painful cases. Restricting the set of
special keys ensures it will be so.

So, I propose:

- We review the information model - make it more formal and support only
map/list/string/null.

- We unify Clark and Brian's proposals for auto-detection of a value,
including the top-level value.

- We put the %(...) shorthand format in the core spec, but without
describing the semantics of any special keys.

- We get an extremely stable YAML-CORE-1.0 spec.

- We create a YAML-DATA-1.0 spec describing # & ! (comment, type and
reference).

- We start working on YAML-API-1.0 with reference implementations.

Have fun,

    Oren Ben-Kiki

Thread: RE: [Yaml-core] Re: New Proposals from user feedback.

yaml-core