RE: [Yaml-core] Plan of battle

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Clark C . Evans [mailto:cc...@cl...] wrote:
> I'd rather have this all be one-spec to give the sense
> of unity.  That being said, there is nothing preventing
> this spec from having multiple sections, and adopting
> each section independently.

So, instead of having multiple, cooperating specs, we'll have incremental
spec versions (version 1 would be just the core; version 2, the core + the
native API; version 3, core + native + incremental API, etc.)? I'm not aware
of any other system using this method (not to say that this is necessarily
bad). It has the benefit of allowing us to choose which section to add each
time (e.g., version N+1 might just add language bindings for a certain new
language, nothing else). Hmmm.

> So... let's finalise on the binary/unicode issue.
> There does not seem to be a good solution at the syntax
> level, so let's punt the issue for now.  Perhaps the API
> may have a binary/unicode distinction, but let us leave
> the YAML 1.0 file format unicode only.

But keep the binary block format, right?

> As for dropping the class notation.  I spoke with 
> Brian on the phone yesterday and he seemed resistant
> to this idea.  So, a bit more thought may have to 
> go into this.

Point: Brian needs it for doing Perl (de)serialization, and in Perl (or
Python) all cases where you need to annotate something with a class, the
"something" is a map. So in practical terms, the only difference is allowing
one to write:

map: !class %
   key: value

Instead of:

map: %
    !: class
    key: value

I'd much rather we always use the second notation for consistency. Would
Brian be happy with:

map: % !:class
    key: value

That is, we'll allow the first key/value pair to be on the same line as the
map, and "we will just happen" to use it for the '!' key. Problem solved at
the cost of a reorder and one additional character :-)

> ... this would allow for another section (very small) 
> called "information model".  

I thought that was already covered in the current core spec. Do you want to
take it out? Why?

> I respectfully withdraw the ASCII suggestion.

So the Unicode/Binary issue is settled. Good.

> I think we can discuss the canonical form when looking
> at the writer code in the impl I'm (slowly) working on.

So, let's just put it in a separate section.

> ... We are using Unicode, so one may use UTF8, UTF16, or UTF32
> as an internal character model; they are all encodings of the
> same character set, UCS4.  Recently USC4 has been restricted to
> those code values expressable by Unicode, appx 2^21 characters.
> Thus the ISO universal character set is now the same character
> set used by Unicode.  So much of this "confusion" is now gone.
> UTF8, UTF16, and UTF32 all encode exactly the Char production
> in the YAML spec.  Nothing more, nothing less.  

OK.

> Of course we cannot dictate what Unicode encoding a language
> binding uses, it could be UTF8 (Perl).  In C, wchar_t is often
> either UTF16 or UTF32, depending upon the unix platform. 

OK. I'll just add wording to that effect in the description of characters.

It seems as though we have everything settled, except for the class notation
issue. If my compromise proposal above is acceptable, we could roll version
1.0 out the door somewhen next week...

Have fun,

    Oren Ben-Kiki