RE: [Yaml-core] Re: YAML Implementations a Plenty

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Clark C . Evans [mailto:cc...@cl...] wrote:
> | So what you seem to be saying is that it isn't the
> | information model which says whether a value is binary
> | or text, it is the application which decides. But in
> | this case, there's no need to provide a special
> | syntax/model/API for binary values. A trivial wrapper
> | around the incremental read/write methods, applied at
> | the discretion of the application, would do the trick
> | just as well.
> 
> Not quite. I'm saying that you can consider all scalars 
> binary by fixing on an encoding of UTF8 for unicode scalars.
> Thus, given a scalar you could have two methods:
> 
>    asBytes()  -- The scalar's binary value (UTF8 if string)
>    asString() -- The scalar's native unicode value

So why not also:

    asInteger() -- The scalar's integer value
    asReal() -- The scalar's real value
    asDate() -- The scalar's date value

Etc.? Why is "byte[]" such a special data type that we have to provide
special syntax for it?

> | Again, this is the same problem as being forced to 
> | pass strings instead of integer values. Which is probably 
> | a much more common problem then binary values.
> 
> Good point, but this is what class is for; perhaps
> the "C" API needs to have one more growth spurt to
> natively understand class.  (A registered class 
> can have a reader/writer pair from the byte stream
> to provide one more function:
> 
>    asNative()
> 
> This begs the question if Unicode string is a 
> class... from the information model perspective,
> and UTF8, UTF16LE, UTF16BE, etc., are just
> transfer encodings like BASE64.

You could argue the same thing about dates - DYM, YMD, ISO, etc. all being
"transfer encodings". Does that mean YAML has to have explicit syntax, in
the core spec, for date formats? Ugh.

> | Seems good enough for me. Of course,
> | 
> |   value: [!gif %base64] <value>
> | 
> | Seems even better :-)
> 
> Hmm.  So the % color is the transfer encoding.
> I think you've shown with this example why
> "base64" isn't a class...

It isn't, but "byte[]" is. A "raw" binary value would be written as:

	value: [!bytes %base64] <value>

> as it is a transfer
> encoding.  Why not just build-in one binary
> transfer encoding -- base64 -- and be done
> with it?

- Because other transfer encodings make sense (gzip etc.).

- Because 'bytes' is a rather useless data type by itself - definitely less
useful then 'int'. Why should 'bytes' be a first-class data type and not
'int'?

- Because mapping 'bytes' to the native data structures in scripting
languages is a mess (e.g., for JavaScript it is flatly impossible; possibly
also for TCL). In general, the only data type we can always rely on is the
text string. So at minimum output of 'bytes' will be shaky.

- Because I think that YAML should have an "everything is a string"
approach. Sure, we should allow for converting these strings/maps/lists to
whatever class you want. But I strongly feel that the information model
should be based on a single scalar type - text string.

Just an example: this makes the mapping between the serialized format and
the information model 1-1. So when one is working on a YAML file as text one
stays within the YAML information model (of course, there's the application
information model - the "schema" - to consider as well).

- Because applications *will* need to handle multiple date formats etc. It
may seem a trivial issue, but multiple formats for the same data is a
common, problematic issue. I'd rather we had a way to tackle it.

- Because I think we should be pushing for 'color' as a way to solve such
issues and adding a special 'bytes' data type is a cop-out to avoid it.

In short, I think that having a 'bytes' scalar data type in the core data
model is a bad idea (and yes, I realize this is different from what I said a
month ago :-)

As for the shorthand notation I proposed. Jason Diamond
[mailto:ja...@in...] wrote:
> I think that I'm in the minority here but I believe that YAML 
> is currently already flexible enough to express just about
> any concept while still being simple enough for most people
> to grasp and work with.

I tend to agree, with a qualification - don't think YAML can
do it in a *readable* way at the moment. Hence the shorthand
notation (which is just that).

> Every time I read a message about
> color or typing or formatting I get worried that YAML will get
> bogged down in complexity. Don't get me wrong--I think that 
> the concept of color is great and should be used but I don't
> believe that it needs to be mandated by any spec.

I completely agree. Nothing in the spec should mandate that color
be used. What I suggest is that we provide a shorthand notation
for a certain set of map keys, so that people who do want to use
color could do it in a readable fashion. This shorthand notation
won't effect the information model (simple map/list/scalar) at all.
It would just be an alternative, more readable way to write certain
types of maps. Nothing more.

> And I also
> don't believe that any more semantics need to be layered onto
> YAML other than the simple scalars, lists, and maps that
> it currently has.

I agree as long as you qualify it by "be layered ... in the core
spec". Such layering has its place - in separate spec(s).

> And while I'm at it, I no longer believe 
> that the API should be part of the spec, either. Let's be
> realistic--one API will not fit everybody's needs.

Again I agree. I think we should have a format/info model
spec, roll it out the door, and make the API spec(s) separately.
We've done one step towards it by breaking the spec into almost-
independent sections. I think that the format/info model section
is close to being complete (barring the new block syntax proposals
and the issue of a shorthand notation).

I have some free time this weekend. I could modify the block
syntax to whatever we agree on, add the shorthand notation for
"color" maps, and create a standalone "YAML 1.0 Core Spec" release
candidate. Clark?

Have fun,

	Oren Ben-Kiki