From: Sean O'D. <se...@ce...> - 2004-02-04 21:03:32
|
I gave _why a patch that implements an implicit typing mechanism that let's yaml4r/syck developers to create their own implicit types, but I started thinking that this sort of thing might be useful in the specification. The patch I gave him is basically just a library "feature." Do you think we could add something where you can create mappings between a regex and a type? I'm thinking: wherever values match regex's, they're automatically given the mapped type. Example YAML document: --- # - implicit: [/^yeah$/, yaml.org,2002/bool#yes] thisisyes: yeah So when the YAML document is fully loaded, the value "yeah" is checked against the implicit mapping and automatically typed as bool#yes, which is then handled. This sort of thing has two main advantages: 1) It could let developers override the values used to match against built-in types like bool#yes and so on. This would be great for non-English developers, who might want to map implicit types to words in their native language. 2) It lets developers add their own implicit types. Regarding #2: I am implementing a schema which uses a specific syntax to express ranges. Until my patch, I had no way of transforming the range expressions into my custom native range type in Ruby. Actually, I did, but I had to use explicit typing. The thing I hate about that is, I really think since the expression is so common in the schema syntax, and since it's so easily recognizable (it's easy to apply a regex to it), it should just be an implicit type...I would hate to force all schema authors to have to specify the type wherever they use ranges. Their schema would be very messy, and it would be terribly redundant. Implicit typing would be so much more elegant. Opinions? Sean O'Dell |
From: Oren Ben-K. <or...@be...> - 2004-02-04 21:18:30
|
Section 3.3.2 in the (work in progress) spec: "The plain scalar style exception allows unquoted values to signify numbers, dates, or other typed data, while quoted values are treated as generic strings. With this exception, a processor may match plain scalars against a set of regular expressions, to provide automatic resolution of such types without an explicit tag." Is this what you had in mind? Have fun, Oren Ben-Kiki |
From: Sean O'D. <se...@ce...> - 2004-02-05 16:53:59
|
On Wednesday 04 February 2004 01:17 pm, Oren Ben-Kiki wrote: > Section 3.3.2 in the (work in progress) spec: > > "The plain scalar style exception allows unquoted values to signify > numbers, dates, or other typed data, while quoted values are treated as > generic strings. With this exception, a processor may match plain > scalars against a set of regular expressions, to provide automatic > resolution of such types without an explicit tag." > > Is this what you had in mind? Sort of yes, sort of no. We have the yaml.org types, and developers can add their own domain/private types, and Syck deals with that very well. With the patch I gave _why, now developers (the Ruby ones at least) can transparently resolve scalars to their own native data types from inside Ruby. That all works very well, from a programmer's perspective. But from just a YAML author's perspective, it would be nice to have the ability to specify that certain scalars load as certain known types (either yaml.org types or user-defined domain/private types), like the "ohyeah" = "bool#yes" mapping example I gave. Another example is if someone is producing a YAML document in German and in a value where they had intended a "bool#yes" value to appear, they put "ja" or something. Perhaps just for clarity, or because they're working with decidedly non-technical types and they get confused if they don't see the word "ja" or whatever. For whatever reason, they just want to use "ja" instead of the English "true" or "yes." If that German fellow decided to use that YAML with some other program that loaded the YAML values into native data types, it wouldn't see "ja" as "yes." If he could somehow mark-up the YAML document to show that "ja=bool#yes" then "ja" would load as the native type used to show a true value in whatever language/library the program used. Sean O'Dell |
From: Oren Ben-K. <or...@be...> - 2004-02-05 22:11:49
|
Sean O'Dell wrote: > ... from just a YAML author's perspective, it would be nice > to have the > ability to specify that certain scalars load as certain known > types (either > yaml.org types or user-defined domain/private types), like > the "ohyeah" = > "bool#yes" mapping example I gave. Certainly. The ability to define your types, and place this definition in a human and machine readable format, is crucial. Our code name for such a format is the "schema language" and we haven't really started work on it yet - though we did discuss it a bit and have some notion of how we'd like to proceed with it. First things first, though. We really need to finalize the spec... Have fun, Oren Ben-Kiki |
From: Sean O'D. <se...@ce...> - 2004-02-05 22:46:41
|
On Thursday 05 February 2004 01:14 pm, Oren Ben-Kiki wrote: > Sean O'Dell wrote: > > ... from just a YAML author's perspective, it would be nice > > to have the > > ability to specify that certain scalars load as certain known > > types (either > > yaml.org types or user-defined domain/private types), like > > the "ohyeah" = > > "bool#yes" mapping example I gave. > > Certainly. The ability to define your types, and place this definition > in a human and machine readable format, is crucial. Our code name for > such a format is the "schema language" and we haven't really started > work on it yet - though we did discuss it a bit and have some notion of > how we'd like to proceed with it. That's good to hear, what have you got? I need a solution right now, and I'm using my own Syck patch right now to get the job done. When can the real implementation start happening? > First things first, though. We really need to finalize the spec... What's left on the table? What needs work? Sean O'Dell |
From: Oren Ben-K. <or...@be...> - 2004-02-05 23:06:21
|
Sean O'Dell wrote: > > Certainly. The ability to define your types, and place this > > definition > > in a human and machine readable format, is crucial. Our > > code name for > > such a format is the "schema language" and we haven't > > really started > > work on it yet - though we did discuss it a bit and have > > some notion > > of how we'd like to proceed with it. > > That's good to hear, what have you got? I need a solution > right now, and I'm > using my own Syck patch right now to get the job done. When > can the real > implementation start happening? What's we have got is a direction, not much more. We think it would make sense to adapt Relax-NG approach to YAML. This means discarding tons of XML-isms from it, and adding a few things... In general the idea is that a schema is something like a BNF syntax whose "tokens" are YAML primitives. The schema "BNF productions" will each describe a single tag. In this approach, explicit tags would be very rare - except for a tag for the root of the document, indicating the "start production". If not using private tags, it is trivial to merge several schemas together and mix-and-match their elements... And so on. It is all very abstract at this point. We don't have anything like a beginning of a concrete schema language. > > First things first, though. We really need to finalize the spec... > > What's left on the table? What needs work? Well... I'd like to see the terms "tagging category" and "tagging context" make it into the spec. I'm also in the process of going through the syntax section and updating it to match the latest changes and make it consistent with the model section. Once I do that, we need to make a pass through the examples to ensure they are clear and complete. We also need to review the initial core tags repository to ensure it is consistent with the new spec. And then we'll be done ;-) Have fun, Oren Ben-Kiki |
From: Sean O'D. <se...@ce...> - 2004-02-05 23:37:57
|
On Thursday 05 February 2004 03:05 pm, Oren Ben-Kiki wrote: > Sean O'Dell wrote: > > > Certainly. The ability to define your types, and place this > > > definition > > > in a human and machine readable format, is crucial. Our > > > code name for > > > such a format is the "schema language" and we haven't > > > really started > > > work on it yet - though we did discuss it a bit and have > > > some notion > > > of how we'd like to proceed with it. > > > > That's good to hear, what have you got? I need a solution > > right now, and I'm > > using my own Syck patch right now to get the job done. When > > can the real > > implementation start happening? > > What's we have got is a direction, not much more. We think it would make > sense to adapt Relax-NG approach to YAML. This means discarding tons of > XML-isms from it, and adding a few things... In general the idea is that > a schema is something like a BNF syntax whose "tokens" are YAML > primitives. The schema "BNF productions" will each describe a single > tag. In this approach, explicit tags would be very rare - except for a > tag for the root of the document, indicating the "start production". If > not using private tags, it is trivial to merge several schemas together > and mix-and-match their elements... And so on. Blah, I didn't like Relax-NG at all. I'm not even sure that a document format schema is the right place to approach implicit typing schemas; although the value pattern matching situation is similar. If you go with something like Relax-NG, do me a HUGE favor. Don't make it the "one and only" schema. Somehow, create syntax for document headers that not only let you specify which schema to apply to the document, but what style of schema it is, so new and better schema systems can come along after and plug-in at the programmer's will. I'm working on a document format schema that I like 10,000x better than anything like Relax-NG and it would be great if I could plug it right into a parser, instead of running it as a third-party tool. > > What's left on the table? What needs work? > > Well... I'd like to see the terms "tagging category" and "tagging > context" make it into the spec. I'm also in the process of going through > the syntax section and updating it to match the latest changes and make > it consistent with the model section. Once I do that, we need to make a > pass through the examples to ensure they are clear and complete. We also > need to review the initial core tags repository to ensure it is > consistent with the new spec. You know what? If you put a statement in the specification that basically said "implicit typing schemas may change the type tag of a scalar, but it is the individual parser implementations to determine which native data object is used in place of the scalar" I would be really happy. No matter what happens with schemas, I think that's a safe statement to make. I mean, the alternative is "typing schemas strictly govern which native language data objects are loaded in place of scalars" which, I think, would be sort of crazy, eh? Sean O'Dell |
From: Oren Ben-K. <or...@be...> - 2004-02-05 23:57:21
|
Sean O'Dell wrote: > If you go with something like Relax-NG, do me a HUGE favor. > Don't make it the "one and only" schema. We couldn't define a "one and only" schema if we wanted to (remember all the flexibility we allow the implementation to use when resolving tags). Regardless, a lot of Relax-NG's "blah" relates to its trying to describe XML. We believe it would be much better in YAML. > You know what? If you put a statement in the specification > that basically > said "implicit typing schemas may change the type tag of a > scalar, but it is > the individual parser implementations to determine which > native data object > is used in place of the scalar" I would be really happy. First, the YAML processor may not _change_ the tag of a node. Second, the spec _already_ implies what you want - but it doesn't say it in that strong a language. It focuses on what the processor _can't_ do (use syntactical details) and doesn't mention the extreme freedom it has (it can do anything at all, period, as long as it ignores the syntactical details). I guess adding some wording to that effect would be useful. Have fun, Oren Ben-Kiki |
From: Clark C. E. <cc...@cl...> - 2004-02-06 00:02:11
|
On Thu, Feb 05, 2004 at 03:37:55PM -0800, Sean O'Dell wrote: | > What's we have got is a direction, not much more. We think it would make | > sense to adapt Relax-NG approach to YAML. This means discarding tons of | > XML-isms from it, and adding a few things... In general the idea is that | > a schema is something like a BNF syntax whose "tokens" are YAML | > primitives. The schema "BNF productions" will each describe a single | > tag. In this approach, explicit tags would be very rare - except for a | > tag for the root of the document, indicating the "start production". If | > not using private tags, it is trivial to merge several schemas together | > and mix-and-match their elements... And so on. | | Blah, I didn't like Relax-NG at all. I'm not even sure that a document | format schema is the right place to approach implicit typing schemas; | although the value pattern matching situation is similar. Well, we are thinking in this mode as a good starter. Murata San is a very capable thinker and I'd like to reuse his work if possible. That said, YAML is very different than XML, so at some level (a pretty low level) we will be quite distinct. | You know what? If you put a statement in the specification that basically | said "implicit typing schemas may change the type tag of a scalar, but it is | the individual parser implementations to determine which native data object | is used in place of the scalar" I would be really happy. No matter what | happens with schemas, I think that's a safe statement to make. I mean, the | alternative is "typing schemas strictly govern which native language data | objects are loaded in place of scalars" which, I think, would be sort of | crazy, eh? As I se it, you can think of data typing as having two stages: * The first stage is looking at YAML nodes _lacking_ a tag, and filling in the tag. Let us call this 'tagging'. * You can find the appropriate native data type for each node based solely upon its tag, this is called 'resolution'. The restrictions upon 'tagging' are as follows: - you can use what 'context' the node is in, that is, look at its parents to make up your mind - you can use the value of a node, that is, look at its children or its text value - you can use the node's kind, if it is a scalar, sequence, or mapping - in the very special case of a scalar, you can use the distinction between it being plain scalar or not. You may not use serialization or presentation attributes (plain scalar hack excepted) during tagging, these include but are not restricted to: - distinctions between styles other than plain, for instance single vs double quoted - the order of mapping keys - comments, specific spacing, etc. The reason for this restrictions are, of course, to make YAML information more consistent across implementations. In this way, someone is free to convert a single quoted scalar into a double quoted scalar or reorder keys without worring about changing the information being presented. Hope this helps. Clark |