RE: [Yaml-core] Re: New Proposals from user feedback.

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

On Fri, Aug 03, 2001 at 03:45:13PM -0400, Clark C . Evans wrote:
| I think we are awfully close.  Now that you mention it,
| I *do* like the idea of two information models:
| 
|   YAML-BASE:  Untyped tree information model consisting
|               of Map, List, Scalar, and Null.
| 
|   YAML-CORE:  Typed graph information model building on
|               the base model.

Base and Core? Hmmm. I'd rather stick with "core" and "data" or "interop";
"core" and "base" make it confusing about which one is more basic.

At any rate, I agree that your regexp-based solution to layering one on top
of another doesn't work:

- I gather that:

    3:  2.34e0

Should be a real and:

    4:  "2.34e0"

Should be a string. Correct?

This doesn't work. Because these two are completely identical in the more
basic data model (both are "string"). So it isn't possible to use a layered
approach to implement this distinction. The same applies to:

    6:  &0001 Referenced

Vs.:

    Just what it looks: "&0001 Not Referenced"    

- Regexps aren't enough to distinguish between types. They are OK for
providing defaults, but not enough by themselves. Your eamples already
started to suffer from little ugliness. I'd expect to be able to write:

    4: 2.34

For a real number. Also, is:

    delivery: 1/3/2001

Jan 3rd or March 1st? How about:

    image: ....base64.....

Is it a GIF or a Gzipped BMP?

- Working at the base level becomes terrible. Consider:

    distance: &0001 2.34e0

An application working at the lower level has no easy way to access the
distance value. It *must* apply a whole host of regexps to it to remove any
"color" which may have been applied to it. Which may be almost bearable for
reading the value - but imagine what it does to *writing* it. Ugh.

- Taking a scalar value and smashing it into parts, using regexps, in order
to create a map, just isn't simple or intuitive. Or, as I've shown above,
robust or general enough.

> Conclusion:  Layering is not the holy grail.  

At least, not when done using the regexp method :-)

> I respectfully  withdraw YAML-BASE.

Well, I agree if you are refering to this regexp method. Back to my
shorthand form, I don't see that it suffers from any of the issues I listed
above.

> Frankly, I don't see 
> how you can layer and have consistency.

Well, let's see. Since the shorthand form is already a map in the base
layer, then:

- There's no problem for "2.34" being a real, text, BCD or what have you.

- It is possible to use regexps so that the default type would be 'real':

    real: 2.34
    BCD: %(!bcd) 2.34
    text: %(!text) 2.34

- Generic access to the value is possible using the v/w operations. This
would work at both layers.

- As for "stripping" the color, anchors etc. I've seen this mentioned
several times as a problem, and I'm baffled by it. What exactly is the
problem?

- As for simplicity. I see the lowest level as being the simplest possible
data model which has 1-1 mapping to a YAML file. On the other hand, the
higher data model is hardly a YAML data model, since it allows any native
data structure to be used. I don't think we even have a good definition of
it. "Anything at all", perhaps... At any rate, it isn't simple.

I see a lot of sense in doing a split and using a layered approach.

The lowest "core" level is extremely stable. The design decisions made in it
are independent of the application, programming language, character set etc.
The format is very simple. This is a rock-solid foundation one can safely
build upon.

The desrialization layer is volatile. To be fully specified, it needs
dictionaries of recognized types, reference mechanisms etc. all of which are
application and language dependent. There are environment issues. There are
round trip issues. There are cross-language and cross-application
interoperability issues. It is a grand mansion raised on top of the core
foundation. Everyone will want one which is slightly different.

By keeping them separate, I seek to prevent the situation where we'll be
changing the YAML spec every time someone comes up with a new must-have data
type or reference mechanism for an important new class of applications (ISO
dates? PNG image format? URL based references? MIME based? key based?).
We'll be able to have all this fun in the deserialization spec without
changing one bit in the core YAML spec.

In fact, I'd rather make it clear from day 1 that all the above issues
rightfully belong to the specific YAML document schema. Our higher-level
YAML spec won't be "here is the way you must do references"; it would be
"here are some ways to do references, you can use them in your schemas".
Likewise for types: "here are some types you can use in your schemas".

That's why I wanted to call it YAML-INTEROP. It would define a small set of
types and reference mechanisms so that by sticking to this set you would
ensure interoperability with every YAML implementation following this spec.
Specifically, we should make this set the minimal set which would allow
interoperation between Perl, Python, Java, JavaScript and C++ programs.
That's not a large set.

If someone wants to go beyond this set, that's OK. It is still YAML. If
someone wants to use less than this set, that's also OK. It is still YAML.
In contrast, one can't change one bit in the core spec and still be YAML.

Let's keep them separate.

Have fun,

    Oren Ben-Kiki