Thread: RE: [Yaml-core] High level feedback on YAML

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Joe Lapp [mailto:jo...@bu...] wrote:
> Hi YAMLers!
> 
> I've had some time to play with YAML and to digest the 
> details of the spec.  I thought I'd share the thoughts I'm 
> having, some of which you'll like, some of which you won't.  
> First, here's the good:
> ...
> Okay, that's the good.  There's plenty of it.  Now for the 
> bad.  I'll be more detailed with the bad since I'm expecting 
> less agreement:
> 
> (B1) Proliferation of cryptic identifiers.  It's hard or even 
> impossible to guess the meaning of a YAML file without first 
> memorizing the identifiers, and there are more identifiers 
> than I'm comfortable memorizing.  I also have to memorize the 
> placement syntax of each identifier.  I recognize that many 
> other data formats also have this problem, but I still don't like it.

Do you mean the keys (what would be, in XML, the tag names)? I
agree it is an issue, but I don't think humanity has ever found
a solution to that. Each "domain" tends to invent its own mini
language. It is an information-theory thing; when both sides
share a common data base, information may be more efficiently
passed around and processed by referring to it instead of spelling
it out. Which means an outsider, lacking the common base, has a
hard time entering the domain.

Consider HTML - how would it look if you had to write:

<paragraph>....</paragraph>
....<hard-line-break/>
<keep-in-one-line>.....</keep-in-one-line>

Ugh. Sure,

<p>...</p>
....<br/>
<nobr>....</nobr>

is cryptic. That seems to be life...

> (B2) Lack of extensibility.  Many syntactic features seem ad 
> hoc, such as the top-most map separator, shorthand, block 
> format indicators, anchors, and references.  I'm not saying 
> that the functionality seems ad hoc, just that the syntax 
> seems ad hoc.  How long can we grow the language in this way 
> before it become impossibly complex?  In order to enrich the 
> language during its initial development, and order to provide 
> some of the inevitable features to come, YAML needs some 
> inherent mechanism for extensibility.  

I don't know whether I want YAML to be extensible. Supposing we
settle the serialization issue somehow, I think that anything
else should be layered on top of YAML-CORE, not "extended" in
the core itself.

The current shorthand mechanism is an escape hatch allowing us
to extend YAML to some extent without quite having to touch the
core; it might be wise to leave it (or something like it) in
for that reason.

> (B3) Inflexible formatting.  Being a file intended for user 
> production and consumption, formatting is critical.  The user 
> should be able to choose the format that is easiest to enter, 
> should data-entry have the highest priority, or the format 
> that is easiest to read, should readability have the highest 
> priority.  It should be possible to format data in the strict 
> structured way YAML describes, and it should be possible to 
> run a file through a pretty-printer to force this format, but 
> it should not be forced.  While enforcement keeps users in 
> line and guarantees that the data is readable, I think user 
> convenience should take priority.  (Note that flexible 
> formatting wouldn't nullify (G4); it would just make (G4) optional.)

Python allows a more flexible whitespace based indentation scheme,
and so did YAML at first. The reason we gave it up were because it
allowed a drastic simplification of the syntax for all types of
scalars, while increasing their expressive power.

It was thought that any modern editor should be able to allow you
to easily enter valid YAML, even given the strict indentation.
However, you say:

> (B4) Limits editors.  I'm finding that creating and editing 
> YAML is a pain-in-the-butt unless you're using a suitable 
> editor.  Early last week I reviewed a number of simple 
> Windows editors and finally settled on one I liked.  This 
> week I'm finding that it's too hard to edit YAML with this 
> editor.  Because it's very hard to get users to change 
> editors, those using a non-YAML-friendly editor will probably 
> limit their use of YAML.

Can you be more specific about the difficulties you faced? I
(naively) expected that any modern editor would have a user-
tunable automatic indentation, and that this should be enough
for convenient YAML entry. Wrong?

> (B5) Serialization semantics in the core.  I think it's 
> reasonable to build serialization semantics into a syntax 
> that is intended exclusively for serialization, but I don't 
> think its baggage should weigh those who won't use 
> serialization.  While a YAML processor need not implement 
> serialization, it still needs to recognize the associated 
> syntax, and users reading the spec are still forced to wade 
> through serialization.  Besides, if serialization is just one 
> of many applications of YAML, shouldn't serialization be 
> layered on top of YAML, like the other applications?  If the 
> problem is that there is otherwise no place to put 
> serialization information, maybe we need to ignore 
> serialization and focus on extensibility.  (BTW, I think I 
> can argue pretty strongly that anchors and references are 
> neither needed nor desirable in the YAML core.)

Oh boy. One at a time:

- Serialization: is not at the core. What is in the core is
an alternative syntax form which is useful for serialization
(as well as other things). "By convention" certain keys of
that form are used for serialization, others for comments, etc.

- Extensibility: the current shorthand syntax does allow some
extensibility. Clark is making a case we should completely drop
extensibility (I don't know if he's looking at it this way).
We need to think about what we mean by "extensibility", anyway.
is it adding different syntax forms (e.g., more scalar formats)?
Adding something like namespaces? XLink?

- References and anchors are vital for representing general
graphs. We consider these important enough to support. Now,
it is an interesting notion to treat anchor and reference
syntax as a form of shorthand with a layer converting them
to native references...

one: [&12] data
two: [*12]

I don't know. References are as basic a data type in most
languages as are map, list and scalar. There's something
to be said for directly supporting them. On the other
hand, there's something to being able to manipulate the
anchors and references using normal map access operations.

> (B6) Lack of comments.  I think any file format intended for 
> user viewing and/or editing requires a clear and flexible 
> means for comments.  YAML allows me to add comments, but only 
> in certain places, but even so the fact that it is a comment 
> would not be universally recognized.  I realize that adding 
> comments could jeopardize (G5), though I'm not convinced it must.

OK, you got your wish. YAML allows comments just in certain places:
attached to maps. Simply use '#' as a key and type your comment in.
Problem solved? :-)

> So there you go.  I realize that opinions on these matters 
> will vary.  But still, because I find the good very good and 
> the bad very bad, I find myself in a love/hate relationship with YAML.

Life is full of compromises...

Have Fun,

	Oren Ben-Kiki

Thread: RE: [Yaml-core] High level feedback on YAML

yaml-core