[Yaml-core] Let's keep it simple

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Clark C . Evans Wrote:

> I was thinking about an explicit indicator for the 
> "flow scalar" (;) and the expansion of scope for the 
> implicit representation to cover multiple lines.  

... from which flows a ^ proposal. I'm unclear about why it is necessary,
exactly. The syntax seems redundant, for example this:

> node: ^flow
> 	- this is a flow node and not a list

Seem to mean exactly what this:

node::
	- this is a flow node and not a list.

(Of course the spec wasn't updated to reflect this yet - that's what I'm
going to do this weekend.)

As far as separating the "kind" of a node from its type, I don't see the
problem in the first place:

Every node has a serialized, text based representation which conforms to one
of three "kinds". Independently, every node has an in-memory,
native-language representation which belongs to some "type". There are three
"types" which happen to have a "direct" mapping to the three "kinds", so
this mapping is used *by default*. However in general the mapping between
the node "type" and its "kind" is up to the application, and can be
specified using implicit/explicit types.

Where's the need for an additional syntax/concept/construct?

About the MIME issue: I read RFC2046 and I came to the sad/glad conclusion
we should just drop the whole thing. That's because RFC2046 says:

``A body part is an entity and hence is NOT to be interpreted as
actually being an RFC 822 message.  To begin with, NO header fields
are actually required in body parts.  A body part that starts with a
blank line, therefore, is allowed and is a body part for which all
default values are to be assumed.  In such a case, the absence of a
Content-Type header usually indicates that the corresponding body has    a
content-type of "text/plain; charset=US-ASCII".''

You'd think this allows the '---' eol eol trick, but...

Problem #1: It has to be '---' CR LF CR LF.

Problem #2: It makes the character encoding issue a problem. One can't
safely place UTF-8 or UTF-16 characters in a multi-part YAML document unless
there's a content encoding as well:

``As stated in the definition of the Content-Transfer-Encoding field
[RFC 2045], no encoding other than "7bit", "8bit", or "binary" is
permitted for entities of type "multipart".  The "multipart" boundary
delimiters and header fields are always represented as 7bit US-ASCII
in any case (though the header fields may encode non-US-ASCII header
text as per RFC 2047) and data within the body parts can be encoded
on a part-by-part basis, with Content-Transfer-Encoding fields for
each appropriate body part.''

So, the *minimal* MIME-compatible separator we could use is:

'---' CR LF 'Content-transfer-encoding: binary' CR LF CR LF

That's too rich for my blood. To make things worse:

``The only header fields that have defined meaning for body parts are
those the names of which begin with "Content-".  All other header
fields may be ignored in body parts.  Although they should generally
be retained if at all possible, they may be discarded by gateways if
necessary. Such other fields are permitted to appear in body parts
but must not be depended on. "X-" fields may be created for
experimental or private purposes, with the recognition that the
information they contain may be lost at some gateways.''

Problem #3: Great, any '[X-]YAML-*' field may be discarded by any MIME
implementation. We'll have to stuff it all into the standard 'Content-*'
fields. Ugh.

In short, the illusion that there's a simple, clean separator we can use so
that by just attaching a header to a multi-document YAML file we'll be able
to feed it into a MIME system is just that - an illusion. Let's face it,
anyone feeding a YANL file into MIME or vice-versa will be using some sort
of (not terribly complex) filter. Let's just leave it at that.

Assuming we forget about MIME, let's revisit the issues.

>  A. The starting production must be easy 
>     to grok, understand and author. 

Make the wording somewhat stronger here - it has to be simple, clean, etc. I
share Brian's caution that this goal would be lost due to pursuing the rest.

>  B. We want to be able to have sequence 
>     as the starting production to support 
>     log files. 

Yes.

>  C. It would be nice if YAML and the email 
>     specification overlapped so that a 
>     particular class of YAML was valid 
>     e-mail (or http response, for that matter). 

Up to a point. I've no problem with, say, YAML being used to parse just the
http header:

header = Yaml.Load(new ReaderUpToEmptyLine(theHttpReader))

There's no need to twist YAML out of shape for this. If the content type
happens to be YAML, fine, follow by a call to:

if(contentType is "text/yaml") {
    body = Yaml.Load(theHttpReader);
} else {
    whatever...
}

>  D. Ideally, the solution would be MIME 
>     compatible.  Note this requirement 
>     is that a *subset* of YAML documents 
>     will be MIME compatible, not *all* 
>     YAML documents will be MIME compatible. 

As I've shown above, this is trickier then it looks at first. Again, I've no
problem with YAML used to parse MIME headers, along the same lines as the
code above. But that's beyond the scope of the YAML spec itself.

>  E. Ideally, it is not just a sequence of 
>     maps, but a sequence of any "kind", 
>     including lists and scalars. 

Yes, we agreed on that, and it doesn't seem to be a problem (see below).

> I'm not sure all of these work.  Here are some 
> initial thoughts (based on memory of the 
> e-mail conversation and not analysis): 

Reading specs is very enlightening here... I should have done it earlier,
considering it was me who raised the MIME compatibility issue in the first
place. I'm now sorry I ever mentioned it.

> 1. Separate each item in the top level 
>      sequence as follows (A,B). 
>
>           eol '--' eol eol 

For *minimum* MIME compatibility this would have to be:

'---' CR LF 'Content-transfer-encoding: binary' CR LF CR LF

If you two think this is a good idea, fine :-) Otherwise, let's goback to
the good old:

'----' eol

(I put 4 '-' there - is there a good reason to use any other number?).

>  2. To allow for easy concatination, ignore 
>     adjacent, leading and trailing separators (A,B).  
>     This allows files which expect to be appended 
>     to always have a trailing seperator. 

This harms even minimal MIME compatibility, because MIME will see adjacent
separators as empty documents in the stream.

MIME aside, I don't see the big deal in deciding that the separator must
come before each document. Or after - as long we are consistent.

Brian feels that requiring the very first separator in a document stream is
too restrictive (it makes it hard to convert a single document file into a
multi-document one).

If MIME is an issue, then note that making the first separator optional
makes it harder to generate the MIME header, because it will have to add the
separator if the document doesn't contain one.

Personally I'd make the separator mandatory (I don't see that converting a
single-document file to a multi-document one is such a common operation),
but I can live with making it optional, especially since MIME is no longer
an issue in my mind.

This does *not* mean that "concatenation" becomes a problem - you still
*always* start with the separator when concatenating a document to a
multi-document stream. There are no ifs or buts here.

So, what is the need for adjacent separators?

>  3. Note that a "restricted" top level map 
>     without a sequence is already e-mail 
>     compliant.  Thus nothing need be done 
>     for (C), other than we don't mess it up. 

Exactly. This is a non-issue.

>  4. To allow for a subset of YAML to be 
>     MIME compatible(D), extend the separator 
>     to allow for: 

This talks about enhanced MIME compatibility - allowing a YAML system to
read (some) MIME multi-part messages as a YAML document stream (the other
direction was already covered above).

After reading RFC2046 again, I'm getting more and more convinced that this
is way too much to aim for. Further complications start to enter the system
(assuming one lives with the horrible separator above):

>     a. An additional sequence of characters 
>        from a limited set (divider), where 
>        the divider is constant throughout 
>        a given YAML document. 
>
>            eol '--' divider eol eol 

Complication #1...

>     b. Allowing for an additional e-mail 
>        compliant map (header) immediately 
>        after the divider. 

This is where we get into serious trouble. Now we have to define what such a
map can contain, and how to handle unknown fields and keys, how to handle
invalid values for keys, etc. etc.

I really don't think this is in scope for YAML 1.0.

>   5.  Enabling more than just maps for each item 
>       in the sequence requires more discussion (E). 
>       There are two approaches, one approach is to 
>       do this implicitly, another is to be explicit. 

This we need to do regardless of MIME. We must to do this implicitly because
it also has to work for a single-document YAML file, which doesn't have a
header.

>     a. To to this explicitly, we can use the 
>        "header" mechanism above. 

Insufficient...

>     b. Or, this can be done implicitly, via 
>        discovery.  This has the slight problem 
>        of requiring some look-ahead and also 
>        may have break the normal YAML rules... 

Like this:

----
This: is a map
----
"so
is" : this
----
- This is a list entry
----
This is an
unquoted scalar.
----
:
- so is this one (the ':' make it
a next line scalar, it can start with
: | etc.)
----
:
"this is quoted"
----
|
This is a block
----
: &012
And this is how you do anchors etc.
----

(The last one above is why Brian's point of placing the descriptors *after*
the indicator makes so much sense.)

At any rate, this seems like a non-problem.

>     c.  Alternatively, we could do 
>         a mix of the above... 

No need. Let's stick with one mechanism, the one we must have anyway.

> Therefore, I propose: 
> ...
> It looks complicated,

It *is* complicated. Too complicated, given the MIME rules. Let's just drop
it.

I'm going to take a risk and do the draft this weekend using '----' eol as a
separator, just so we'll have some summary of the last two weeks
discussions. We'll take it from there...

Have fun,

    Oren Ben-Kiki