Over the next 20 days (till end of October) I'm going to be doing
a "incremental rewrite" of the information model to reflect recent
clarifications of understanding, and to make the section more
palitable to your average implementer; ie, better understanding
of the target reader. I need your help reviewing/commenting
along the way...
I spent a bit of time reviewing the spec this morning, and I must say,
Oren has done a fantastic job keeping all of the productions clean. One
thing which struck me hard is that we make *extensive* use of examples;
and without the examples, the productions would be very hard (impossible?)
to follow. The information model section, by stark contrast, shys away
from examples, and I feel that this is its biggest weakness. Aside, if we
were to 'touch up' the production section, what would be helpful is
more 'motivation' discussion as to why the productions are broken up
as they are. I think we have deviated significantly with prior-art
with our productions in that we have broken them down less upon the
actual syntax lines; but more along the logical functional ground. This
has made the production section easy to follow; a bit more talking about
why the productions are broken up as they are would probably help the
reader... as my brain starts to hurt about 1/2 way down the productions.
Anyway, I didn't mean to talk about that. This email is about the
information model section.
Ok. Reading over the information model section there are a few things
which stand out:
1. It seems to be random set of formalisms; that is, it fails
to motivate the implementer to understand *why* the model is
there, and more importantly, how the model can help them.
We have a very nice "Design" section, and it should build
from there. In other words, our audience is the YAML
implementer and the model section should talk clearly
to the implementer about their pratical concerns; backing
up the recommendiations with necessary formalism.
2. It lacks specific examples. It talks about what each model
should be, but does not describe specific examples of how
one would comply, or, not comply with the model.
3. A listing of the three core challenges which confront a
YAML implementation: (a) human presentation,
(b) incremental delivery, and (c) native typing.
Specifically with a paragraph or so discussing the
various aspects of these challenges, and recommended
(read interoperable) methods for dealing with them.
4. The break-down of the graph vs native aspect is not quite
clear; but, on a second reading isn't anywhere near as bad
as I had imagined. In particular, what it requires is
a better diagram; one like:
->(parser) -> ->(loader)-> NATIVE NODES /
/ \ / \ TYPE-AWARE
/ \ / \ .
SYNTAX SERIAL GRAPH or
\ / \ / .
\ / \ / GENERIC NODES /
<-(emitter)<- <-(dumper)<- TYPE-IGNORANT
HUMAN INCREMENTAL NATIVE
PRESENTATION DELIVERY TYPING
The nice thing about this diagram is that it shows the
loader as addressing two challenges: (a) converting
incremental delivery into a complete target, and (b)
applying native typing where the typing is known.
An example of the "native graph" for a "C" implementation
could be a system where strings are used for scalars, however,
the type-aware nodes are converted to a canonical
form so that strong-equality holds. The example "C"
implementation can also note that some of the tags it
may not convert to canonical form, so for those nodes
only weak-equality is available.
Anyway. I will probably only have an hour or so per day to
work on this over the next month; this is quite a bit of
work, and I need feedback at all points of the way. Every
few days I'll post an update to the list, and collect
comments... your feedback would be useful.
Oren, could you "update" the rest of the spec to handle
the change from typefamily/format to tag? This would be
very cool; and this gives us a clear dividing line.
My "outline" for the model section is now:
- motivating why we do models (as a shared vocabulary)
- gives a pretty diagram (Brian's will work)
- very quickly outline of the models (syntax -> serial -> graph)
- discusses how there are actually two graph models: strong and weak
- perhaps gives a "stacked" diagram showing how each model
builds onto the previous model, making it more "syntaxish"
- human presentation
- incremental delivery
- native typing
- getting information to flow nicely between stages
4.3 Strong (Native) Graph
- an idealized model where all types are known by the loader
- quick overview of what this models: native objects or
a generic model using canonical forms, or cave drawings
- defines a scalar node as an opaque value with a mandatory tag
- defines each scalar tag as having a canonical form
- defines strong equality of scalars based on canonical form
- defines strong equality of collections recursively
- explain that this model represents how YAML views native
systems or canonical representations of YAML information.
4.3 Weak (Generic) Graph
- a more realistic model where one or more types are not
known by the loader; so we have weak equality for some
subset of the nodes.
- quick overview discussing how not every native binding
will know about every scalar tag, and the problems caused
- defines a scalar node as a string value with an optional tag
- defines weak equality as yes, no, unknown terinary operator
- specifies how YAML texts which comply with the weak graph may
fail to load into the strong graph: (a) keys which are
unknown in a mapping that are equivalent under strong
equality, (b) scalar values which have a specified tag, but
where the string provided does not match the tag's expectations,
for example --- !int X
- explain how this model represents what would be exposed
via a random access (DOM-like) viewer API.
4.4 Serial (Tree) Model
- quick overview how this model adds only those elements
necessary to break a graph (either weak or strong) into
a sequential pipe
- introduces alias/anchor
- introduces key order
- introduces scalar "chunking"
- mentions that this model specifies the "minimum"
requirements for a push/pull parser or emitter API.
4.5 Text (Syntax) Model
- quick overview how this model adds all of the YAML goodies
that makes human presentation nice
- add styles
- add comments
- add directive
- add details (new line endings, indentation, etc.)
- mentions that this model specifies the requirements for
a lexer API that preserves all aspects of a YAML text.
Comments, of course, are very welcome.