From: Oren Ben-K. <or...@ri...> - 2002-09-05 08:26:56
|
We've agreed to get rid of #TAB. That was easy. As for timestamps, we got dragged into the whole problem of implicit types, whether they can be extended, and so on. This is heavy stuff and debate was very lively :-) I raised a proposal that was tentatively accepted as "worth pursuing". The core of it is to accept that type family is optional (i.e., a node may have *no* type family, only a *kind*). It is the loader's duty to convert a generic node - which may or may not have an associated type family, and/or a format - into some native data structure (and the dumper's job to do the reverse). This is almost, bit not quite, completely unlike our current "implicit typing". I'll need to write this down "properly" - what are the exact effects on the various data models, processing, consequences for generic tools, the schema language, and so on (most of these issues were discussed a bit in the IRC session, but not all the way through). This will take me some time so the earliest I'll be able to post this would be Sunday. I've already started thinking about formalizing this, and here are some very preliminary notions: - Maybe the native model shouldn't be defined in terms of type family at all; instead maybe it should have the concept of a "native data type", a "native value", and a "kind". "Type family" and "Format" would only exist in the generic model, with the "Viewer" responsible for the mapping. (The Viewer is used by the Loader and Dumper - take a look at the data models diagrams). - In this view type family and format are merely instructions to the Viewer (OK, Loader) on how to map the generic node to a native one (i.e., it is a "transfer method" - what a coincidence :-). - Would containers benefit from format? It seems to me they very well might (admittedly rarely), and given the above view forbidding format for containers is arbitrary and a needless exception. It would be simpler to allow them. - I'm impressed by the fact that this is almost identical to Perl's type system - and we arrived at it independently. Either Larry Wall was extremely lucky, he had a working crystal ball that told him this type system would be good for Yaml, or this approach is "right" in some deep way (for scripting languages) and he "merely" arrived at it as an "inevitable" result. Of course this made our life easier because we had it in front of us, while he more-or-less invented it from scratch (AFAIK). I don't want to start a language war here or anything, and Parrot is supposed to run Python and Ruby programs as well anyway... I'm just rather surprised by this result. If you would have asked me a year back my bet would have been that we'd end up being more "traditional" and Perl would be the "odd man out". In fact when I first encountered "bless" I thought it was a horrible hack; now I want to bless Larry for getting it right. Either way, YAML makes a *perfect* fit for Parrot now. Way to go! - As for timestamps... I think we had better leave them out of the core spec. The use cases we have to day (logging etc.) don't require timestamp as a type family. They are all happy using strcmp on two different values (for ==, >= etc.). They aren't different in any way from the use cases for using URLs in YAML, or IP addresses, or E-mail addresses, etc. In all these cases, simply thinking of them as a string and letting the application worry about its internal format is the right way to go. And in all these cases, there are standards external to YAML that specify how these strings should be formatted (in the case of dates, there's ISO as well as other de-facto standards). When a time data type is actually _needed_ it is when the above isn't enough (e.g. you need generic YAML tools to provide operators on these values). But then a simple timestamp type also isn't enough (e.g., due to time zone issues). We should start work on a separate spec that would cover both time/date and currency (A "Recommended YAML type families for business applications" spec). Clark could take the lead there. We'll cover fun stuff such as time zones and time periods and currency conversion rates and so on. There may also be a similar separate spec for URLs and E-mail addresses and IP addresses and domain names (A "Recommended YAML type families for network applications" spec) - The Ruby people may want to drive this one, as it matches some of their built-in types. Maybe another spec for units (A "Recommended YAML type families for engineering" spec), and so on. The core spec should only contain core "_language_ data types" (as opposed to core "_application_ data types"), which means all the types we have today minus the timestamp. Thoughts? Have fun, Oren Ben-Kiki |