(Note that I changed the subject line, and to ensure continuity, included
the full text of Oren's email in my response.)
At 11:59 AM 8/2/2001 +0200, Oren Ben-Kiki wrote:
Joe Lapp [mailto:firstname.lastname@example.org] wrote:
> I'm still trying to get a grip on why YAML-Core
> should provide special places for application-level
> type information. I can only think of one argument
> that works for me: The purpose a YAML file is to
> express data, even if it happens to be data serialized
> from programming language objects; information the
> serializer needs should be segregated from the data
> to make it easy to tell data from non-data.
That doesn't work for me. To me, (de)serializer isn't
special. There are many "layers" of functionality one
could apply to a document (on parsing or on processing).
What does work for me is: Keep YAML human readable. The
only purpose of the shorthand mechanism is to be a
human-readable shorthand for what we foresee to be a
common case of having a certain type of maps. Nothing
For this reason to make sense it requires two decisions.
1. Should we use a map, or should we use an "attributed
graph" model, as Clark proposed?
2. If we are using a map, is it OK to make it implicit?
(again, Clark feels uneasy about this).
I feel very strongly about (1). Yes, we should use just
a simple map/list/scalar model, instead of an attributed
graph model. My reasons are:
- It avoids the need for a DOM. The ability to slurp a
etc., use only native APIS, and then spit it out unchanged,
is simply priceless. An "attributed graph" destroys this
- It provides for a unified, single way to handle every
YAML document using a trivial (map/list/scalar) API,
regardless of the functionality which is optionally
layered on top of that. This allows simple utilities
to ignore the complexities of the layered processing.
The YAML parser itself would be simpler since it would
know nothing of types, references, etc. These would be
provided by a separate layer. Don't use it - don't even
*write* it - if you don't want to use it. This may be
relevant to embedded devices etc. (e.g., expanding
references may require excessive memory consumption on
such a device).
A YAML diff program should be completely oblivious to
types, references etc. It can work at the most basic
map/list/scalar level. It can be written in straight
A YAML verification program would easily be able to
handle broken references, unknown types etc. since
it would NOT expand them on reading. It would just
treat them as normal maps, and do whatever verification
is required without having to somehow hack the YAML
A YAML-T processor would be able to refer to the type,
reference etc. information using the normal YAML-Path
operations, on both input and output.
So, to summarize, I *strongly* believe that we should
base our information model on simple, unattributed,
map, list and scalar data types.
Accepting (1), (2) becomes a matter of taste. One of
YAML's goals is to be human-readable. This means that:
! : date
= : 2001-07-31
Is unacceptable (at least, Brian and I seem to share
this view). The question becomes, what should we use
delivery: [!date] 2001-07-31
Looks "like a scalar" that's intentional; the scalar
syntax is very readable. Granted, it may be too much
like a scalar. But many other variants are possible.
How about one of:
delivery: %(!date &17) 2001-07-31
As a compromise? It costs just one extra character
to the current syntax - a good balance between
making the map explicit and maintaining a readable
I think this answers Clark's misgivings:
> [The current proposal]
> has two problems:
> 1. It involves an implicit map, which
> I don't think is obvious. Looking
> at the first item I'd assume it is
> a scalar... not a map.
Would the %(...) syntax solve that?
> 2. It's a bit pathalogic, but what about
> a user-defined mapping which has
> two keys... ! and = ?
What about it? It is a perfectly legal alternative
way to write the exact same map, just like:
Is a perfectly legal way to write the exact same pair
Of course, in both cases this is a rather ugly way to
write the same information...
If the problem is that we have reserved the use of some
single-character keys for our purposes (so that an
application trying to use them "normally" would have
problems), yes, that's a concern. I'm not certain
how serious it is in practice. Every language make
some things reserved...
We can limit the set of shorthand keys in some other
way. For example,
delivery: %(!date) 2001-07-31
Could be a shorthand to:
__!__ : date
__=__ : 2001-07-31
(This is similar to the python way of reserving keys).
Or we could find some other creative way to minimize the
pattern of reserved keys...
However as long as we accept (1) above, we *must* have some
set of reserved keys. Hmmm... Is this a good time to raise
the namespace issue for keys? :-)
- I love Clark's idea to make the top level production be
one of a map or a list. Best of both worlds indeed. Way to
- As for using ^ instead of % for 'transfer encoding', I like
Clark's notion of using | for process chain instead:
picture: %(!image/bmp|gzip|base64 &17) ...
This removes the need for ^ or %. It also means that the
value of a shorthand key may contain any character except
for white space and ( ). Seems reasonable...