[Yaml-core] YAML spec, complicated

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

I tried to read the spec again yesterday, but again it made my head
spin.  I don't have a background in tokenizer/parser theory so I 
can't say which parts are necessary and which are not.  I just noticed
a lot of words like "syntax", "serial", "native", "generic", 
"parser", "loader", "dumper", "emitter", "type family", "kind",
"graph", "transfer", "taguri" that are too much to assimilate all
at once.  It leaves me with the feeling -- and this is what I told Steve
-- "Does YAML really need a spec that's 73 pages long?"  If it does,
maybe it's not a "simple" replacement for XML, and maybe it's not the
tool for small jobs.

Now, Python's grammar and parser module (AST objects) also make my head
spin for the same reason.  But fortunately they take up only one small
corner of the documentation and don't prevent me from *using* Python.
They do prevent me from using the parser module to write a program that
lists all a program's dependencies (imports), a project I've tried to do
on and off for years but have always given up on.  (Yes, I could do a
text search for "import" but I want to use Python's real parser.)  But
that's just one insignificant application, not something that prevents
me from using Python.

XML on the other hand requires you to learn the gory details of DTDs,
namespaces, XSLT, etc, in order to do anything more than the simplest
file.  You can't even use most public DTDs with a pretty intricate
knowledge, much less add a couple private tags to them, and the
whole thing gives you a sense of "Why does this have to be so
complicated?  Why do I have to specify so many details just because
some huge corporation needs them for their documents?"

YAML has a large gap between the quick reference on the site
(and the other tutorials) and the spec.  The tutorials get you started
but then what?  For instance, the date format.  Four formats, but
no explanation about what it's supposed to be used for, what it's
not supposed to be used for, and why it doesn't handle certain
less-precise formats users would commonly want.  I realize YAML
is a new and rapidly-changing project and that this will come.
Still, these are the kinds of problems users face.

The gap can eventually be filled by a big users' guide so that most
people don't have to look at the spec, -or- by simplifying the spec so
it can double as a users' guide.  For instance, do we need five 
quoting styles for strings (bare, '', "", |, > )?  Or 3+ styles for
single- and multi-line mappings?  I wasn't here when each style was
adopted, and maybe they are all necessary for significant problem
domains, but it's worth considering whether any can be dropped.
That would shorten the spec and the quickref, and perhaps make YAML
more accessible.  At the least, it would make the Perl/Python/Ruby
implementors' job easier, not having to code so many special cases.
Note that I'm not saying we need to do something about strings and
mappings in particular; I'm just using them as examples.

I like Oren's direction in his big message today.  Supporting 
schema-blind and schema-specific usages equally seems wise, and I
like the idea of banishing type conversions to the smallest portion
of code possible.

> | Nobody's goldfish is gonna die if I provide this *optional* method. 
> 
> This is where we differ.  No body is going to die if _why doesn't
> throw an error when he finds duplicate keys in his loader.  

It's possible for the loader to have a default recommended mode that
does whatever, -and- provide options like "extract mappings as pairs",
"all values are strings", or anything we find a significant
constituency for in the future.  The options make certain uses of YAML
possible *without* detracting from the core/recommended usage.  
(Now, maybe Oren's schema proposal will handle all the situations these
options would, we'll see.)

Perhaps some comparisions between YAML and Cheetah would be helpful.
I'm a developer for Cheetah, a string template system for Python
(www.cheetahtemplate.org).  I don't work on the parser code (can you
guess why?), but I do other coding and also wrote the users' guide
and developers' guide.  

1) We want Cheetah to be the template system of choice for the widest
variety of tasks.  Its primary purpose is for Webware servlets,
and we won't add anything that detracts from that.  But we've added
features needed by CGI scripts, Python source generators, Java source
generators, and shell scripts running templates as standalone commands.
We don't add features one person needs.  But we add features that
make Cheetah suitable for an entire class of applications it wasn't
before.  Anyway, I see YAML as being able to fulfill a similar role
in the data-storage/data-exchange realm.  Certain core uses, but 
also flexible for other uses on the side.

2) Rather than working from a written grammar and spec, Cheetah's
design was based on a spec inside somebody's head.  Then the users'
guide was written, and I'm now writing an EBNF grammar based on the
users' guide.  Not a suitable approach for an interoperational
project like YAML, but it's interesting that the two grew in almost
opposite directions.

3) YAML is in rapid change now anticipating a 1.0 spec freeze.  Cheetah
was in rapid change last summer and then entered a long beta while we
slowly decided whether we're all comfortable with the current spec.
Since there haven't been any requests for backward-incompatible changes
for several months (except one), we're pretty confident about entering a
final beta as soon as a few small tasks are completed.  There are plenty
of requests after that, but they are all behind-the-scenes work or
adding optional features, not things that would change the spec.
It's like popping popcorn, or at least that's been our experience.
Eventually the popping slows down and you can freeze the spec, 
confident most people are comfortable with it.  And if you end up in
a stalemate argument, the BDFL says this is the way it's gonna be,
and the dissenters either take it or leave it.  But usually the best
design emerges on its own.

Thanks to Brian, Clark and Oren for putting all this work into what
might turn out to be a lifesaver for XML refugees.

-- 
-Mike (Iron) Orr, ir...@ms...  (if mail problems: ms...@oz...)
   http://iron.cx/     English * Esperanto * Russkiy * Deutsch * Espan~ol