From: Clark C. E. <cc...@cl...> - 2002-09-08 17:02:23
|
On Sun, Sep 08, 2002 at 06:42:46PM +0300, Oren Ben-Kiki wrote: | > If someone wants to write a custom loader and do custom | > type detection via paths/regex then we are not obligated to | > make such type detection work with generic YAML tools. | | Exactly! That's what makes them generic! For example, a YAML-pretty-print | has no need whatsoever for such detection. The generic tool works in layer | (1) above. No, this is what makes transfer|methods useless. Right now they are used to communicate type information between the parser and the loader. If the loader does all of the type detection, then we just don't need transfer|methods nor typing any more. | OK, YPATH will need to be provided with appropriate comparison operators to | be used _at that level_. But even it will _not_ load a !point {x: 1, y: 2} | map to a Point structure, even if has loaded an "A near B" operator. And | besides, the paths given to YPATH are anything but schema-blind - it is the | YPATH _tool_ that is schema-blind. No. The point is it will _know_ that it is a point. So I can write a YAML Transform searching for points and reporting on them, or doing useful things with them such as converting them into circles. | > However, if we provide a pluggable implicit type detection mechanism, | > then we must think hard about how this will impact the | > generic model. | | Right - "Make the type family optional". I was going to go over the spec | today doing this in detail, but instead I had to read >100 messages and | write this looong reply :-) I hope it addresses most of the issues, but I'll | still need to write a proper proposal. I spend some serious time musing, but the "type family optional" is exactly equivalent to not using !transfer|method. | > It is completely unacceptable to me that implementations can | > pick/choose how they want to do implicit typing. Either we have | > it or we don't. | | I fail to see why. It is like saying that you find it unacceptable that some | applications load some values to BigInt or Long instead of to Integer (or | whatever). Who cares? I agree with Mike Orr: BigInt or Integer are particular bindings of !int | The whole question only arises because of the artificial separation of | implicit typing from conversion to native typing, and is a great example of | how far out of shape you'd have to twist YAML and still fail to gain the | impossible goal of having generic tools be semantics-aware. Having nodes _typed_ has nothing to do with semantics. | - Native model is renamed to "native representation"; wording changes to | explain this refers to the way the YAML generic model is realized in native | data structures and that some code is required to "extract" the Generic | model information from this native representation ("Viewer" or "Dumper"). Something like this. | - The Loader makes a decision on the native data type used to store the node | based on (1) the serial model path to it; (2) its content; (3) the transfer | method, if any. Well, at this point (1) and (2) arn't available for generic processing, and thus, why even bother with (3)? | - For round tripping the Dumper/Viewer should reconstruct the same transfer | method (up to format, which may be different); the value (again, up to | format). If format changes the value needs stay the same under the document | semantics. If it's all implicit, why? | - Semantics-blind YAML tools work at the generic model level. They can't do | anything with values other than strcmp-ing them in the hope that the format | is the same (even for a simple thing like integers this fails more often | than not). You are drawing a white/black issue here and saying that I can't support !int in our generic tools? We must have spent a good 1/2 year working out a type system as part of YAML. Are we going to just toss it out the window with one "simple" change? | - For example, YPATH should allow plug-in operators - for either implicit or | explicit types. Hence a YPATH/YQUERY could allow a plug-in called | "Point-near-Point" as well as "integer-greater-than"; both are useful and | neither have any special status. Call such plug-ins the "comparison-schema". Great. However, this will do little good if the information isn't typed. | - All the above schemas imply running code. Some are easier than others to | express in a platform-neutral way (requiring an interpreter written for all | relevant platforms), but this is possible in principle for all of them. No reason why a registry can't have regex and/or ypath listing of various implicit types. I see no reason to support implicit typing of custom data; it should be registered or explicit or completely-out-of-scope of YAML (but perhaps in application scope) | - Since each application can load any node (with either implicit or explicit | semantics) to whatever native type it feels like (as long as it preserves | semantics), there's no need to distinguish implicit types with !, () or any | other special notation (note this is true in the current spec as well). See above. Certainly an application can do what ever it damn pleases but this stuff is out-of-scope of the YAML specification, IMHO. | - In a word: this proposal makes YAML DWIM. If you want it to be. If you | want to be strict about it, add an explicit transfer method to each node, | and/or provide validation/typing/comparison schemas, and so on. If you don't | want to be strict, just treat everything as a string (but always preserve | the transfer method). I want YAML to help facilitate interoperability. I'd like it to do so with a set of standard types and a registry mechanism to add types over time. | - There are no interoperability problems that I can see (if you see one, | please describe a scenario). This is compatible with existing YAML | documents. There are no new directives. There's no ugly syntax for | implicits. New implicits can be added later on without breaking | compatibility or harming existing applications. My scenerio is that each binding makes its own way to match regex to constructors. Thus, each application does this. Since there isn't a type-identifer (transfer|method) most YAML data ends up being typed in an application specific manner. As such, this greatly hinders the ability of scalar values to be portable and as such ends up no better than XML. Further, since type information for scalars isn't generally available via generic tool sets, type conversions and operations involving standard types are unavailable. | - I'm dead set against (), #CONVERT, #IMPLICIT and so on - these are DTDs in | disguise and I'll have none of that. I was thinking of an on-line centralized type registry (perhaps with mirrors). An #IMPLICIT could give a list of transfer or type identifiers which should be used for implicit type detection by the parser. We can set a policy that once a type is registered, it doesn't change (you need a new type) thus allowing permanent cashing at a cost of non-upgradability. | - I want date/time in a separate spec, which would cover it properly; it is | likely one of the types listed there would be a UTC-based timestamp and | would prove useful for more applications than, say, | time-period-in-fortnights, just like some ISO date formats are more useful | than others. Likewise for currency (with currency code). Yes, "timestamp", "date", "time", etc., could be one of those transfer items registered. Just like "int", etc. I'd expect most implementations to "ship" with these build-in so that they don't have to be downloaded. The registry could also contain code (signed) to implement these functions; but this is less important. The primary thing of importance is that they type is registered with a regex. This was our original direction anyway, wasn't it? | I hope that clarifies things a bit. I'm firing this off and will re-check | E-mail later this evening (otherwise I know I'll have 100 E-mails to read | tomorrow :-). It does. Please review my logic as well. I'd rather have this process a bit more formalized and less add-hoc to (a) make it easy for people to do common stuff, and (b) increase the chance of data being supported by more than one language binding. Best, Clark |