Information Model
An information model is an abstraction of how
each processing system must treat YAML data.
In particular the model describes which chunks
of data are stored and the relationship between
those chunks. Implicitly, these relationships
are invariants which must be maintained for
YAML data to be consistent (round-trip).
An information model is useful beacuse it codifies
expected behavior of processing systems. Those
which do not preserve all of the datums mentioned
by the information model or fail to preserve the
relationships between those datums are non-compliant.
For example, we have said that our "map" structure
is not ordered. Thus, the YAML documents below are
the same in the information model.
---
a: one
b: two
---
b: two
a: one
What this means is that if a process is using YAML, it
cannot expect to encode semantic information which
will be preserved by YAML in this ordering. For another
example, in YAML the *style* of the scalar is not in
the information model. Thus, the items in this list
are all equivalent:
---
- This is a scalar
- |-
This is a scalar
- \
This is a scalar
What this means is that if a process is using YAML, it
should not expect to encode valueable inforamation by
using different styles. After all, this document may
go into a visual editor (that is YAML aware) and be
written out as...
---
- "This is a scalar"
- "This is a scalar"
- "This is a scalar"
For another example, in the Alpha (our current)
information model, there is a distinction drawn
between a map and a sequence; even if the map and
sequence constructs both have an explicit class
(say an array). Thus, the two documents below
are different.
--- !array
0: one
1: two
--- !array
- one
- two
What this means is that a valid YAML process *must*
preserve this difference. If an YAML implemetation
reads both of these documents into memory, must
somehow record the the first is a "map" and the
second is an "sequence" in such a way that the user
can know which is which. And finally, it must be
able to have the distinction when it serializes
the structures above...
In the Beta information model, the sequence/map
distinction is not in the information model; instead
it is replaced by a "branch" construct. In this way,
the sequence construct is a short-hand used when
the map entries are 0,1,2,... Thus, if we use the
Beta information model, a processor doesn't have
to remember which syntax style was used, and could
output the two documents above as... (map version
dicarded).
--- !array
- one
- two
--- !array
- one
- two
Ok. By now you're saying: "your're splitting hairs".
Well... kind of. But when you don't split hairs,
you have to pull them instead. *smile* Seriously.
The information model matters. It describes what
a YAML processor must preserve and what is just "fluff".
This is important beacuse I may want to use YAML in
a system that doesn't obviously have "map" and "lists".
An example is a PostgreSQL database. In the database,
one has "relations". And, the Beta YAML information
model works very well in this situtation.
The informaiton model is related to the canonical form.
The canonical form is a standard serialization of YAML
information, allowing streams to be compared character
by character to determine equivalence. A strong
information model guides a canonical form. However,
the information model must be abstract enough to allow
various native implementations and processing techniques,
while remaining clear enough so that confusion among
various implementations doesn't proceed.
Clark
P.S. XML was written without an informaiton model, and
this has caused big time problems as vendors try to get
their tools to work together. Just beacuse it's in
the syntax, doesn't mean that it's important and will
be preserved.
|