From: Oren Ben-K. <or...@ri...> - 2001-08-04 07:52:21
|
On Fri, Aug 03, 2001 at 03:45:13PM -0400, Clark C . Evans wrote: | I think we are awfully close. Now that you mention it, | I *do* like the idea of two information models: | | YAML-BASE: Untyped tree information model consisting | of Map, List, Scalar, and Null. | | YAML-CORE: Typed graph information model building on | the base model. Base and Core? Hmmm. I'd rather stick with "core" and "data" or "interop"; "core" and "base" make it confusing about which one is more basic. At any rate, I agree that your regexp-based solution to layering one on top of another doesn't work: - I gather that: 3: 2.34e0 Should be a real and: 4: "2.34e0" Should be a string. Correct? This doesn't work. Because these two are completely identical in the more basic data model (both are "string"). So it isn't possible to use a layered approach to implement this distinction. The same applies to: 6: &0001 Referenced Vs.: Just what it looks: "&0001 Not Referenced" - Regexps aren't enough to distinguish between types. They are OK for providing defaults, but not enough by themselves. Your eamples already started to suffer from little ugliness. I'd expect to be able to write: 4: 2.34 For a real number. Also, is: delivery: 1/3/2001 Jan 3rd or March 1st? How about: image: ....base64..... Is it a GIF or a Gzipped BMP? - Working at the base level becomes terrible. Consider: distance: &0001 2.34e0 An application working at the lower level has no easy way to access the distance value. It *must* apply a whole host of regexps to it to remove any "color" which may have been applied to it. Which may be almost bearable for reading the value - but imagine what it does to *writing* it. Ugh. - Taking a scalar value and smashing it into parts, using regexps, in order to create a map, just isn't simple or intuitive. Or, as I've shown above, robust or general enough. > Conclusion: Layering is not the holy grail. At least, not when done using the regexp method :-) > I respectfully withdraw YAML-BASE. Well, I agree if you are refering to this regexp method. Back to my shorthand form, I don't see that it suffers from any of the issues I listed above. > Frankly, I don't see > how you can layer and have consistency. Well, let's see. Since the shorthand form is already a map in the base layer, then: - There's no problem for "2.34" being a real, text, BCD or what have you. - It is possible to use regexps so that the default type would be 'real': real: 2.34 BCD: %(!bcd) 2.34 text: %(!text) 2.34 - Generic access to the value is possible using the v/w operations. This would work at both layers. - As for "stripping" the color, anchors etc. I've seen this mentioned several times as a problem, and I'm baffled by it. What exactly is the problem? - As for simplicity. I see the lowest level as being the simplest possible data model which has 1-1 mapping to a YAML file. On the other hand, the higher data model is hardly a YAML data model, since it allows any native data structure to be used. I don't think we even have a good definition of it. "Anything at all", perhaps... At any rate, it isn't simple. I see a lot of sense in doing a split and using a layered approach. The lowest "core" level is extremely stable. The design decisions made in it are independent of the application, programming language, character set etc. The format is very simple. This is a rock-solid foundation one can safely build upon. The desrialization layer is volatile. To be fully specified, it needs dictionaries of recognized types, reference mechanisms etc. all of which are application and language dependent. There are environment issues. There are round trip issues. There are cross-language and cross-application interoperability issues. It is a grand mansion raised on top of the core foundation. Everyone will want one which is slightly different. By keeping them separate, I seek to prevent the situation where we'll be changing the YAML spec every time someone comes up with a new must-have data type or reference mechanism for an important new class of applications (ISO dates? PNG image format? URL based references? MIME based? key based?). We'll be able to have all this fun in the deserialization spec without changing one bit in the core YAML spec. In fact, I'd rather make it clear from day 1 that all the above issues rightfully belong to the specific YAML document schema. Our higher-level YAML spec won't be "here is the way you must do references"; it would be "here are some ways to do references, you can use them in your schemas". Likewise for types: "here are some types you can use in your schemas". That's why I wanted to call it YAML-INTEROP. It would define a small set of types and reference mechanisms so that by sticking to this set you would ensure interoperability with every YAML implementation following this spec. Specifically, we should make this set the minimal set which would allow interoperation between Perl, Python, Java, JavaScript and C++ programs. That's not a large set. If someone wants to go beyond this set, that's OK. It is still YAML. If someone wants to use less than this set, that's also OK. It is still YAML. In contrast, one can't change one bit in the core spec and still be YAML. Let's keep them separate. Have fun, Oren Ben-Kiki |