From: Clark C . E. <cc...@cl...> - 2001-06-12 09:33:48
|
The site is updated. On Tue, Jun 12, 2001 at 11:12:37AM +0200, Oren Ben-Kiki wrote: | Here's the new draft. I hope you like it. |
From: Oren Ben-K. <or...@ri...> - 2001-06-14 14:15:17
|
Clark C . Evans [mailto:cc...@cl...] wrote: > | address: @= > | First address > | Additional address > > This as opposed to? > > address: @ > First address > Additional address > > I am curious what the difference between > these would be? At the parser API level, the 'asScalar' method will return null for '@' and the first element for '@='. At the native-language data structures level, '@' is de-serialized into a native list. This is an important use case, both Brian and I agree that it is vital to be able to do that. '@=' is de-serialized to some data structure wrapping/extending a native list, which simulates a scalar value. Thus for the '@=' case it would be possible, in Perl, to write both: print $YAML_DOC->{address}; # Prints 'First address' print $YAML_DOC->{address}->[0]; # Likewise. The point is that doing this has a non-trivial cost (performance, complexity, etc.). There's no point in *requiring* that cost for *each and every list*. For maps we are OK - the extra cost only occurs if one explicitly asks for it (by providing a '=' key or by coloring a scalar value). For lists we need some other way of "explicitly asking for it"... hence the '@=' syntax. > Hmm. This is interesting. I think I'll need > some "implementation experience" before we > should move in this direction (and even > implementing asScalar... ) Thoughts? At the parser API level, this is a trivial change. At the native data structures level, Brian thought the basic concept was acceptable for maps, I don't see why it can't be applied to lists as well. Brian? I'd rather push on and finalize the spec; given the level of implementer interest we've gathered, if we don't do that, we'll end up with "legacy" YAML subsets we'll have to deal with... Speaking of implementations, I'll have more free time available as of July 8th, and I promise to spend some of it on the Java implementation. Another implementation question: we've added the distinction between a 'binary blob' and 'Unicode text' to the data model - can this be supported in some way when mapping YAML to native data structures of Perl/Python? (it does works for Java). Have fun, Oren Ben-Kiki |
From: Clark C . E. <cc...@cl...> - 2001-06-14 15:02:53
|
On Thu, Jun 14, 2001 at 04:15:55PM +0200, Oren Ben-Kiki wrote: | At the parser API level, the 'asScalar' method will return null | for '@' and the first element for '@='. I must say... I don't think I like the proposal all that much. At the parser level, there is nothing wrong with "asScalar" to be called on a list or a map and have it return the first or default value, as appropriate. | At the native-language data structures level, '@' is | de-serialized into a native list. This is an important | use case, both Brian and I agree that it is vital to | be able to do that. Agreed. And for this case, "asScalar" could be implemented as an external, friend function. Thus, those applications which would like to enable the forward-compatibility would have to call asScalar on every object that they *assume* is a scalar. Thus, for the native-language data structure level, asScalar() is an optional level of complexity (which may not be possible in some bindings). | '@=' is de-serialized to some data structure wrapping/extending | a native list, which simulates a scalar value. Thus for | the '@=' case it would be possible, in Perl, to write both: | | print $YAML_DOC->{address}; # Prints 'First address' | print $YAML_DOC->{address}->[0]; # Likewise. Ok. As I see this, with the asScalar() proposal, the first line would be... print asScalar($YAML_DOC->{address}); # Prints 'First address' This requires "extra" effort for those applications which want to enable the forward-compatible feature, but does not burden other applications. | The point is that doing this has a non-trivial cost (performance, | complexity, etc.). There's no point in *requiring* that cost for | *each and every list*. I'd rather put the complexity in the applications which need to be "forward compatible" rather than in the data... | For maps we are OK - the extra cost only | occurs if one explicitly asks for it (by providing | a '=' key or by coloring a scalar value). I don't get this... how are maps any different than lists in this respect? | I'd rather push on and finalize the spec; given the level of | implementer interest we've gathered, if we don't do that, | we'll end up with "legacy" YAML subsets we'll have to deal with... Right. I'd like to hear your opinion on the above. | Speaking of implementations, I'll have more free time available | as of July 8th, and I promise to spend some of | it on the Java implementation. Great. | Another implementation question: we've added the distinction | between a 'binary blob' and 'Unicode text' to the data model - can | this be supported in some way when mapping YAML to native data | structures of Perl/Python? (it does works for Java). In python there is not a "byte" data type. There is, however, a distinction between unicode and non-unicode strings. I believe that perl may be in the same boat. Best, Clark |
From: Oren Ben-K. <or...@ri...> - 2001-06-14 16:49:08
|
Clark C . Evans wrote: > | At the parser API level, the 'asScalar' method will return null > | for '@' and the first element for '@='. > > I must say... I don't think I like the proposal > all that much. At the parser level, there > is nothing wrong with "asScalar" to be called > on a list or a map and have it return the > first or default value, as appropriate. Well, agreed. We could easily do that. I think it would be inconsistent with the rest of the spec, though. There is a difference between "just a list" and "a list which has a default value", from a modeling point of view. This difference is apparant at least in one API (the "native data structures"). For consistency, I think this difference should be apparent in all APIs... > | '@=' is de-serialized to some data structure wrapping/extending > | a native list, which simulates a scalar value. Thus for > | the '@=' case it would be possible, in Perl, to write both: > | > | print $YAML_DOC->{address}; # Prints 'First address' > | print $YAML_DOC->{address}->[0]; # Likewise. > > Ok. As I see this, with the asScalar() proposal, the > first line would be... > > print asScalar($YAML_DOC->{address}); # Prints 'First address' > > This requires "extra" effort for those applications > which want to enable the forward-compatible feature, > but does not burden other applications. Ah, but the whole point was not to have to do anything special to get this forward compatibility! If you could predict in advance which pieces of your schema will evolve in what direction, you could have simply allowed for it in the first place, or write your own logic to handle the multiple forms. > I'd rather put the complexity in the applications > which need to be "forward compatible" rather than > in the data... This, IMVHO, would bury the forward compatibility concept. People won't litter their code with 'asScalar' all over the place, which means that the moment you do evolve your schema, the code will break. Remember that this may be something as trivial as adding a comment or a class to a scalar value... And anyway, the complexity belongs exactly in the data. There is a difference between, say: address: @= First (default) address Alternate1 address Alternate2 address And: installed applications: @ Microsoft Word WinZip Perl In the first case, the first entry is special. In the second case, it isn't. This difference should, I think, be reflected in the data model. Using '@=' allows us to model this difference, API issues aside. > | For maps we are OK - the extra cost only > | occurs if one explicitly asks for it (by providing > | a '=' key or by coloring a scalar value). > > I don't get this... how are maps any different > than lists in this respect? Simple. For a map we already have a syntax to distinguish between a map such as: installed applications: % Microsoft Word: 2000 WinZip: 8.0 Perl: 5 And: price: % =: 15 valid-until: 01-JAN-2002 In the first case, all entries are equal. In the second case, the '=' entry is special. It provides "the value of the map", just like in the addresses list the first one provided "the value of the list". The difference is that for the map case, all the special syntax we need is the special '=' key. For the list case, we can't do that. > | I'd rather push on and finalize the spec; given the level of > | implementer interest we've gathered, if we don't do that, > | we'll end up with "legacy" YAML subsets we'll have to deal with... > > Right. I'd like to hear your opinion on the above. I hope the above answers it. It is more then an API issue, it is a modeling issue (inspired by the color idiom). > | Another implementation question: we've added the distinction > | between a 'binary blob' and 'Unicode text' to the data model - can > | this be supported in some way when mapping YAML to native data > | structures of Perl/Python? (it does works for Java). > > In python there is not a "byte" data type. There is, however, > a distinction between unicode and non-unicode strings. I > believe that perl may be in the same boat. A distinction between unicode and non-unicode stings would be enough. Brian could clear up whether Perl supports that (at least in the later versions). I'll try to advance the draft this weekend. Please go over the list of issues I gave at the end of the current one and send me your thoughts, so I will be able to settle as many of them as possible... Have fun, Oren Ben-Kiki |
From: Clark C . E. <cc...@cl...> - 2001-06-14 17:21:19
|
On Thu, Jun 14, 2001 at 06:49:47PM +0200, Oren Ben-Kiki wrote: | For consistency, I think this difference should be apparent | in all APIs... Well, for the "pull" based API, if you ask for a node using the "next" method, it will return the next node which contains, among other things, the node type. At this point, "read" function can be used to read the "scalar value". This read function should be useable regardless of the node type. If it is a scalar, then read works as expected. If the node is a map or list, then read returns the default or first scalar's value as appropriate. For the "push" interface, if the visitor has a null begin/end function pair, then a scalar value should be pushed to the visitor. Thus, both push/pull sequential interfaces have this capability built-in without an additional functions. | > Ok. As I see this, with the asScalar() proposal, the | > first line would be... | > | > print asScalar($YAML_DOC->{address}); # Prints 'First address' | > | > This requires "extra" effort for those applications | > which want to enable the forward-compatible feature, | > but does not burden other applications. | | Ah, but the whole point was not to have to do anything special to get this | forward compatibility! If you could predict in advance which pieces of your | schema will evolve in what direction, you could have simply allowed for it | in the first place, or write your own logic to handle the multiple forms. Hmm... I'd write code like this, think of asScalar as a cast operator. If it already is a scalar, then it returns itself, otherwise it fetches the default value (for map) or the first value (for list). If you want the "forward compatibilty" and you want to use the native data structures... then you must use the cast. This isn't so bad, is it? | > I'd rather put the complexity in the applications | > which need to be "forward compatible" rather than | > in the data... | | This, IMVHO, would bury the forward compatibility concept. People won't | litter their code with 'asScalar' all over the place, which means that the | moment you do evolve your schema, the code will break. Remember that this | may be something as trivial as adding a comment or a class to a scalar | value... Ok. So this "hybrid" class would "inherit" from both scalar and map/list? I'm not sure this is workable: For python, a string and a list are both sequences and share many of the same functions. Thus, you really can't have a hybrid class that is both a list and a string at the same time beacuse the shared sequence interface will cause a problem. For Java, in particular, is the String class and the Map or List class have disjunction of members functions? If not, then I don't see how it will work. I don't know enough about Perl, but if the map/list methods arn't completely disjoint from the scalar methods then this hybrid approach won't work. Or... am I missing something? | price: % | =: 15 | valid-until: 01-JAN-2002 Ok. So when ever a map has an = sign, it will used the MapScalar class (a special YAML class) rather than the native Map? Not that I don't like the idea -- it just is that the sequential interface doesn't need this distinction and the implementation with a native interface isn't all that clear, and may not even be implementable. | > | Another implementation question: we've added the distinction | > | between a 'binary blob' and 'Unicode text' to the data model - can | > | this be supported in some way when mapping YAML to native data | > | structures of Perl/Python? (it does works for Java). | > | > In python there is not a "byte" data type. There is, however, | > a distinction between unicode and non-unicode strings. I | > believe that perl may be in the same boat. | | A distinction between unicode and non-unicode stings would be enough. Brian | could clear up whether Perl supports that (at least in the later versions). Not really (unfortunately), since regular Python strings will then be encoded using Base64 if we key off this distinction. Yikes. The entire issue YAR had transfers upwards into the language itself since it does not have a byte/character distinction. Not to be an irritant, but perhaps we should re-think this a bit.... Perhaps we need to back to the unicode/non-unicode distinction? Using " " for all unicode values, where simple and block scalars are for 7-bit ASCII? | I'll try to advance the draft this weekend. Please go over the list of | issues I gave at the end of the current one and send me your thoughts, so I | will be able to settle as many of them as possible... Will do. Clark |
From: Clark C . E. <cc...@cl...> - 2001-06-14 17:44:23
|
On Thu, Jun 14, 2001 at 06:49:47PM +0200, Oren Ben-Kiki wrote: | Please go over the list of issues I gave at the end of the | current one and send me your thoughts... - Relationship with MIME Other than using Base64, there is no longer any relationship with MIME. - Character Escapes (within a quoted scalar) Ok. Let us take the more complcated route. Once a lookup table is created, it is a trival amount of code to add the remaining entries. \space Single space ( ) \\ Backslash (\) \" Double quote (") \a ASCII Bell (BEL) \b ASCII Backspace (BS) \f ASCII Formfeed (FF) \n ASCII Linefeed (LF) \r ASCII Carriage Return (CR) \t ASCII Horizontal Tab (TAB) \v ASCII Vertical Tab (VT) \uxxxx Character with 16-bit hex value xxxx (Unicode only) \Uxxxxxxxx Character with 32-bit hex value xxxxxxxx (Unicode only) \xhh ASCII character with hex value hh - List Scalar Prefixes You seem to really want this one... it's ok if it is optional. - Anchor Semantics Hmm. I have not thought of leading zeros being automagically trimmed before comparison. This sounds fair enough. ... On a similar line of thought, I was thinking about "external anchors" which can can refer to other YAML texts via http, https, and ftp protocols. This need not be implemented right away, but this is more or less what I was thinking... *(http://www.clarkevans.com/text.yaml) I'd rather not have this be a "arbitrary URI", but have it a URL that actually can be resolved to a given YAML text. i.e., it should be concrete and expected that the parser can go out and fetch the entity if it so chooses. For now... let's leave this out and add it only if it is needed. - API details | How the API handles the sequence of maps; push/pull issues; | mapping to scripting languages of interest (Python, Perl); etc. Let's work on our first implementations before this gets pinned down further. - Color Idiom | It is possible to allow for schema evolution, attachment | of comments handling unknown classes and many other use | cases by supporting the color idiom. Right. This is the current thread. I think a full section describing this (regardless of how it is implemented) would be very useful. - Comment Attribute | A textual comment may be attached to each node, similar to the class | attribute. It is probably best to do this as part of the color idiom, | but it could concievably be added on its own. I like using the color idiom. Perhaps "reserving" particular keys for use... hmm. Best, Clark |
From: Jason D. <ja...@in...> - 2001-06-14 18:21:08
|
Hi. > And anyway, the complexity belongs exactly in the data. There is a > difference between, say: > > address: @= > First (default) address > Alternate1 address > Alternate2 address > > And: > > installed applications: @ > Microsoft Word > WinZip > Perl > > In the first case, the first entry is special. In the second > case, it isn't. > This difference should, I think, be reflected in the data model. > Using '@=' > allows us to model this difference, API issues aside. FWIW, I think this is a much more consistent approach and would like to extend it to maps as I'll describe below. > Simple. For a map we already have a syntax to distinguish between > a map such > as: > > installed applications: % > Microsoft Word: 2000 > WinZip: 8.0 > Perl: 5 > > And: > > price: % > =: 15 > valid-until: 01-JAN-2002 > > In the first case, all entries are equal. In the second case, the > '=' entry > is special. It provides "the value of the map", just like in the addresses > list the first one provided "the value of the list". The > difference is that > for the map case, all the special syntax we need is the special > '=' key. For > the list case, we can't do that. I would suggest that if the '=' key is special that it not be treated like the non-special keys. I like Oren's @= proposal but would prefer to see maps follow suit with a %= construct. My reasoning for this is that I don't believe that the default value should be considered just another pair in the map (as Oren said above, it's special). If we're iterating over the pairs in deserialized YAML map, I would be surprised to see a pair with a key called "=" if my application didn't put it there. So my first thought was to follow the list's pattern and indicate the default value using a non-pair as the first child like this: price: %= 15 valid-until: 01-JAN-2002 Here the first child of the map-with-default-value isn't a pair but rather just a node (of any type) much like the list-with-default-value. I don't like this because it requires that we repeat the default value if it was already a member of the map. The same problem also occurs with the current default value mechanism, though. Consider a person map: person: % name: Jason Diamond email: ja...@in... If I wanted to make the value of the "name" pair be the default value, I'd have to repeat it like this: person: % =: Jason Diamond name: Jason Diamond email: ja...@in... or like this using my proposal from above: person: %= Jason Diamond name: Jason Diamond email: ja...@in... But I'd rather just do this: person: %= name: Jason Diamond email: ja...@in... Now the first child is the default value (just like lists). I know that maps are unordered (conceptually) but serialization requires that we order them and we can actually take advantage of that. When deserializing, all we need to do is keep track of the key of the first child pair and then get the value for that key whenever the default value is requested. When serializing, if there's a default value, we output that first and then the rest (in alphabetical order minus the default value pair if we need to). The harmony between this and Oren's new list-with-default-value construct really appeals to me. Not repeating the data and allowing the default value to use an application-specific key is also a huge advantage, in my opinion. Jason. |
From: Clark C . E. <cc...@cl...> - 2001-06-14 19:00:41
|
On Thu, Jun 14, 2001 at 11:21:03AM -0700, Jason Diamond wrote: | But I'd rather just do this: | | person: %= | name: Jason Diamond | email: ja...@in... | | Now the first child is the default value (just like lists). Ok. I actually like having the first entry in the map be the default value just like a list. Nice. This is much better since it doesn't require the "special" map key, equal(=), which someone may actually use one day as part of real content... Having the first entry be the default, just like lists, is very appealing. | The harmony between this and Oren's new list-with-default-value | construct really appeals to me. Not repeating the data and | allowing the default value to use an application-specific key | is also a huge advantage, in my opinion. Ok. I do like the "first entry" for both maps and lists. Sold. ... However, I still have the problem with the %= vs % and @= vs @ distinction. How can this distinction be supported in the API? I don't think it can very easily without loosing transparency (via asScalar) and given that fact, having the two variants isn't all that helpful, is it? Mabye it is a useful distinction. Mabye it says that it is "ok" to treat use the first value of the sequence (map or list) if a scalar is expected. Hmmm. Best, Clark |
From: Oren Ben-K. <or...@ri...> - 2001-06-15 05:49:56
|
Clark C . Evans wrote: > | For consistency, I think this difference should be apparent > | in all APIs... > > Well, for the "pull" based API, ... > For the "push" interface, ... I'm not arguing you *can't* hide the difference, I'm questioning whether you *should* hide it, from a purely *modeling* point of view. That is, do we or don't we want to make explicit the fact that for some lists/maps there really is a default value, and for others there simply is no such thing? BTW, I like Jason's idea of "first key" and '%=' to indicate that difference. > | Ah, but the whole point was not to have to do > | anything special to get this > | forward compatibility! If you could predict in > | advance which pieces of your > | schema will evolve in what direction, you could > | have simply allowed for it > | in the first place, or write your own logic to > | handle the multiple forms. > > Hmm... I'd write code like this, think of asScalar > as a cast operator. If it already is a scalar, then > it returns itself, otherwise it fetches the default > value (for map) or the first value (for list). You might :-) Most people won't. > If you want the "forward compatibilty" and you > want to use the native data structures... then > you must use the cast. This isn't so bad, is it? As a last ditch solution, yes. But I'd much rather have it be completely transparent. > Ok. So this "hybrid" class would "inherit" from > both scalar and map/list? Sort of. In Perl at least, I expected that it would simply have some way of detecting of whether it is used in a scalar or list context, and act accordingly. > I'm not sure this is workable: I agree it is shaky. > For python, a string and a list are both sequences > and share many of the same functions. Thus, you really > can't have a hybrid class that is both a list and a > string at the same time beacuse the shared sequence > interface will cause a problem. And there's no "scalar context" to help you out. Hmmm. Good point. > For Java, in particular, is the String class and the > Map or List class have disjunction of members functions? In Java it would be easier. 'asScalar' would be the standard way of getting a value out of a YamlNode, period. In Perl/Pythin, we really want to be able to access the value directly, however... > Or... am I missing something? Probably I am. This needs more thought. I'll brood on it some more... Have fun, Oren Ben-Kiki |
From: Oren Ben-K. <or...@ri...> - 2001-06-15 08:51:25
|
I wrote: > > Ok. So this "hybrid" class would "inherit" from > > both scalar and map/list? > ... > > I'm not sure this is workable: > > I agree it is shaky. > ... I'll brood on it some more... OK, here's an idea. Instead of having a class which is both a string and a map/list, which is impractical, we'll have a class which is a string with one additional method/member - let's call it '_' (reason below). So: Original YAML file: % address: Default Address size: 12.5 Perl code: @ print $doc->{address}; # Print 'Default address'. print $doc->{size}; # Print '12.5'. Evolved YAML file: % address: @= Default address Additional address size: %= =: 12.5 accuracy: 0.5 Perl code: @ print $doc->{address}; # Print 'Default address'. print $doc->{address}->{_}->[0]; # Likewise print $doc->{address}->{_}->[1]; # Print 'Additional address' print $doc->{size}; # Print '12.5'. print $doc->{size}->{_}->{'='}; # Likewise print $doc->{size}->{_}->{accuracy}; # Print '0.5' Incompatible YAML file: % address: @ Default address Additional address size: % =: 12.5 accuracy: 0.5 Perl code: @ print $doc->{address}->[0]; # Print 'Default address' print $doc->{address}->[1]; # Print 'Additional address' print $doc->{size}->{'='}; # Print '12.5' print $doc->{size}->{accuracy}; # Print '0.5' (Isn't it lovely that the above is valid YAML? :-) At any rate, as a schema owner, when you evolve the schema, you have two choices. - If you care about existing code and data, you do it via '@=' and '%='. Existing code goes on working, new code is slightly more cumbersome (an additional '->{_}' for Perl, or '._' for Python). Old data need not be modified. - If you don't care about existing code and data, you do it via '@' and '%'. Existing code is modified to access the proper key/index for the value (but there's no '->{_}' or '._'). Data is modified to convert the scalar into a list or a map. In either case, if someone edits a file and adds some color you don't know and don't want to know about, your code will go on working. For example, suppose you have the configuration file: my-machine: 192.168.1.17 And someone goes into the file to change it to: my-machine: %= =: 192.168.1.17 #: DON'T CHANGE THIS! several clients access this address directly - talk with the Network Administrator first. Then you don't have to worry about the code reading this file. It will just go on working. So, the rule is - for maps/lists with '%='/'@=', there is a "default value" (first key/entry); the native data structure is the same as the default value's data structure (typically a string), with the addition of a '->{_}'/'._' syntax for accessing the "color" - the full map/list. Note the above rule works in case the default value is a non-scalar (a list or a map): Old file: text: string map: % key: value list: entry New file: text: #"With color comment" string map: %= main: % key: value color: some: thing list: @= @ entry color $old_doc->{text} == 'string'; $old_doc->{map}->{key} == 'value'; $old_doc->{list}->[0] == 'entry'; $new_doc->{text} == 'string'; $new_doc->{text}->{_}->{'='} == 'string'; $new_doc->{text}->{_}->{'#'} == 'With color comment'; $new_doc->{map}->{key} == 'value'; $new_doc->{map}->{_}->{main}->{key} == 'value'; $new_doc->{map}->{_}->{color}->{some} == 'thing'; $new_doc->{list}->[0] == 'entry'; $new_doc->{list}->{_}->[0]->[0] == 'entry'; $new_doc->{list}->{_}->[1] == 'color'; That's why I chose '_' - we can say it is reserved so it won't collide with any normal map keys, and it can still be used as an identifier, so in Python/JavaScript one could write the above code as: old_doc.text == 'string'; old_doc.map.key == 'value'; old_doc.list[0] == 'entry'; new_doc.text == 'string'; new_doc.text._."=" == 'string'; new_doc.text._."#" == 'With color comment'; new_doc.map.key == 'value'; new_doc.map._.main.key == 'value'; new_doc.map._.color.some == 'thing'; new_doc.list[0] == 'entry'; new_doc.list._[0][0] == 'entry'; new_doc.list._[1] == 'color'; That's about as clean and powerful as you can get, I think. How about it? Have fun, Oren Ben-Kiki |
From: Clark C . E. <cc...@cl...> - 2001-06-15 16:34:54
|
Oren's brain trust at work... *smile* | Original YAML file: % | address: Default Address | size: 12.5 | Perl code: @ | print $doc->{address}; # Print 'Default address'. | print $doc->{size}; # Print '12.5'. Ok. Here is same code using asScalar(): @ print asScalar($doc->{address}); print asScalar($doc->{size}); | Evolved YAML file: % | address: @= | Default address | Additional address | size: %= | =: 12.5 | accuracy: 0.5 | Perl code: @ | print $doc->{address}; # Print 'Default address'. | print $doc->{address}->{_}->[0]; # Likewise | print $doc->{address}->{_}->[1]; # Print 'Additional address' | print $doc->{size}; # Print '12.5'. | print $doc->{size}->{_}->{'='}; # Likewise | print $doc->{size}->{_}->{accuracy}; # Print '0.5' Ok. Although I must say... this could get ugly especially as the depth increases. It seems that this proposal moves the notion of "forward-compatible" into a more elaborate "backward-compatible". Thus the burden is placed on newer code... hmm. | Incompatible YAML file: % | address: @ | Default address | Additional address | size: % | =: 12.5 | accuracy: 0.5 | Perl code: @ | print $doc->{address}->[0]; # Print 'Default address' | print $doc->{address}->[1]; # Print 'Additional address' | print $doc->{size}->{'='}; # Print '12.5' | print $doc->{size}->{accuracy}; # Print '0.5' Here is both of the above using the asScalar(): @ print asScalar($doc->{address}); print asScalar($doc->{size}); print asScalar($doc->{address}->[0]); # Print 'Default address' print asScalar($doc->{address}->[1]); # Print 'Additional address' print asScalar($doc->{size}->{'='}); # Print '12.5' print asScalar($doc->{size}->{accuracy}); # Print '0.5' | (Isn't it lovely that the above is valid YAML? :-) Oh yea... it is. Fancy that. I didn't even notice! ... | At any rate, as a schema owner, when you evolve the | schema, you have two choices. | | - If you care about existing code and data, you do it via '@=' and '%='. | Existing code goes on working, new code is slightly more cumbersome (an | additional '->{_}' for Perl, or '._' for Python). Old data need not be | modified. | | - If you don't care about existing code and data, you do it via '@' and '%'. | Existing code is modified to access the proper key/index for the value (but | there's no '->{_}' or '._'). Data is modified to convert the scalar into a | list or a map. Ok. Two approaches: A. asScalar() + External function, don't get complexity unless ask for it. + No changes to YAML needed. + Technique works with non-yaml maps/scalars, (ones that have not been saved yet). + Explicit cast... - Requires foresight for code to have the substutibility property. - Extra asScalar() fucntion littered throught code. - Requires a specific "key" value, =, to be reserved. B. =% =@ approach + Does not require foresight, thus substituability is gaurenteed. + Does not require a specific key to be reserved. - Changes to YAML needed - Special object needed to support construct I like the former, but I guess you all like the latter? | In either case, if someone edits a file and adds some color you don't know | and don't want to know about, your code will go on working. For example, | suppose you have the configuration file: | | my-machine: 192.168.1.17 | | And someone goes into the file to change it to: | | my-machine: %= | =: 192.168.1.17 | #: DON'T CHANGE THIS! several clients | access this address directly - talk | with the Network Administrator first. | | Then you don't have to worry about the code reading this file. It will just | go on working. | | So, the rule is - for maps/lists with '%='/'@=', there is a "default value" | (first key/entry); the native data structure is the same as the default | value's data structure (typically a string), with the addition of a | '->{_}'/'._' syntax for accessing the "color" - the full map/list. Yep. Hmm. Best, Clark |
From: Clark C . E. <cc...@cl...> - 2001-06-15 16:49:15
|
Ok. I still don't like option B beacuse it introduces two new types of nodes and special support classes required for each binding. A isn't all that great beacuse it requires "asScalar()" calls everwhere, just-in-case. Let us add another option, C to the list. This introduces a YamlNode which is similar to a DOM node. The user would then have the choice of binding, native (where substutibility doesn't hold) or using our object model. In option C, a YamlNode is introduced... this has characteristics of a map, list, and scalar. For this case, the =% and =@ are not needed. At parse time, the user would request if they want YamlNodes or native constructs. ... In any case, I'm going to start plough ahead with the C implementation... otherwise, I feel we won't go anywhere. P.S. I was wondering why the "stab" production got added and what it proports to solve. Best, Clark On Fri, Jun 15, 2001 at 12:35:36PM -0400, Clark C . Evans wrote: | Ok. Two approaches: | | A. asScalar() | | + External function, don't get complexity | unless ask for it. | + No changes to YAML needed. | + Technique works with non-yaml maps/scalars, | (ones that have not been saved yet). | + Explicit cast... | | - Requires foresight for code to have the | substutibility property. | - Extra asScalar() fucntion littered | throught code. | - Requires a specific "key" value, =, to | be reserved. | | B. =% =@ approach | | + Does not require foresight, thus substituability | is gaurenteed. | + Does not require a specific key to be reserved. | | - Changes to YAML needed | - Special object needed to support construct | |
From: Oren Ben-K. <or...@ri...> - 2001-11-05 07:44:30
|
Clark C . Evans [mailto:cc...@cl...] wrote: > Thank you. This is nice. Updated both SF and YAML.ORG Thanks. BTW, the 'latest.html' file needs to be replaced by a copy of the new draft... It now contains the 12 October version (ancient history :-) > A few nitpicks... I marked them all on my hard-copy, together with some other nit-picks I found (such as a minor production bug). I'll put them in the next draft. > 3. Throwaway comments... remove "sole purpose is to > communicate among human maintainers". As I'm sure > it will be (ab)used for many purposes, including > the unix #!/my/func trick. How about I replace "sole purpose" to "usual purpose"? I want to really stress the fact that YAML-wise these comments don't carry any data to the application. > General comments... > > A. I like that the "indicators" follow the "properties", > it gives consistency between the quoted scalar > and the block scalar -- the indicator ("|) follow the > descriptor (&!) So do I, but Brian still feels uneasy about it. Brian, can you live with it? > B. I'd still like to de-couple the "implicit syntax" > with the type system. In particular, I'd like to > be able to specify... > > picture: !gif [base64 encoded image] > empty-map: !map ~ > float: !float 0 > > To enable this, we must de-couple the type system > from the implicit syntax mechanism. Changes: No. These are perfectly valid today. Well, except for the map case, and that's because I made the decision that: empty map: !map looks better than: !map ~ The latter is confusing, it looks as if you are trying to create a map-typed null, which is something which doesn't exist in any computer language I'm aware off. An empty map, however, is a normal construct. At any rate it is a matter of taste to some degree. Back to the real issue, let me re-cast the current state of affairs in a semi-formal way: - There are three scalar styles. - The YAML parser knows how to extract Unicode state from each, regardless of type. - The type property (implicit or explicit) specifies which "de-serializer" to apply to the extracted text. - The "de-serializer" is a function taking a text string and returning an in-memory data object of an arbitrary in-memory data type. For all I care, a single de-serializer could return objects of different classes, if that makes sense. - The "de-serializer" is free to interpret the text string as it pleases. In particular, it can auto-detect several formats (dates are a good example). It can perform base64 expansion (the "gif" type is a good example). And so on. - Implicit typing means choosing a de-serializer according to the value matching the de-serializer's regexp pattern. Explicit typing means choosing a de-serializer according to the de-serializer's name. Otherwise there's absolutely no difference between them. Consequences: - A de-serializer can't change the way text is extracted from the file. That is, one can't create an implicit type behaving like a block. - Therefore it is possible to safely line-wrap, pretty-print etc. any unquoted scalar, without worrying about types. - Also, if a certain type looks best using a block style (e.g., a code type), then it must be an explicit type: code: !!Perl-script | #!/bin/perl print "Hello, world!\n" - One can't do it with an implicitly type with the pattern '#!/bin/perl.*': code: \ #!/bin/perl print "Hello, world!\n" So, one down side (no block-like implicit types) and one up-side (one trivially knows when it is safe to line-wrap values, regardless of types). Not a bad compromise overall, I think. Other up-sides: we can define the unquoted type to be slightly magical so it is safe in keys (disallowing ':' there) while it is still general (allowing ':' in values), etc. Now, your alternative (if I understand it correctly): - The parser does not know how to extract text from scalars. A syntax module does it for him. - Every syntax module has an independent line-folding, escaping etc. policy. - Every syntax module has a default de-serializer (e.g. a base64 syntax module has a default byte-array de-serializer). - A de-serializer works much as above (turns a text string into an in-memory data type). - A de-serializer still needs to handle multiple formats (e.g., dates). - A syntax module is chosen only according to the regexp pattern the scalar matches. - Implicit typing means using the syntax module's default de-serializer. - Explicit typing means using the named de-serializer. Correct? Consequences: The opposite of the above. Since one can create syntax modules which behave any way they please, then one must be very careful applying line-wrapping etc. to scalar values. Generic YAML editors, pretty-printers etc. will have to be syntax-module-aware to work safely. On the other hand, this would allow us to define stronger implicit types: html: <html><body><pre> Note that YAML will preserve line breaks and spaces here </html></body></pre> but: Not here! A different tradeoff, certainly. Other down-side: Defining unquoted scalars in a way that makes them safe in keys becomes a challenge. It is not enough to specify for each syntax module whether it is safe. Take the following one: regexp: <alpha> .* policy: line-folded, no escaping. safe-in-keys: ??? It seems each syntax module may be invoked in two "modes", one for map keys and one for values. It is the module's responsibility to detect the end of the key in the first case, the parser's job to inform it of the end of the value in the second. Other consequence: This would make it possible to create whole new syntax forms and embed them inside a YANL file. Take tables for example: table: =table ('=table.*' is the regexp pattern for tables) A1 | A2 (line-break ends row) B1 | B2 ... I'm not certain we want to encourage that. Such complex types are best represented as YAML structures (to allow accessing sub-parts using YAML-path etc.): table: - - A1 - A2 - - B1 - B2 Otherwise you'd quickly get thin YAML wrappers for very complex application-specific types, and lose most of the advantages YAML gives you in terms of interoperability, generic addressing of data items, etc. To summarize, I see some merit in the concept, but I think the tradeoff is against it. I consider type-independent line folding a big advantage of YAML and I wouldn't want to give it up unless there's a significant gain, and I don't see that gain yet. Can you give some examples of how the extra power provided by this approach may be put to good use? Additional issues: - I'd like to introduce for implicit types the concept of a "recognition pattern" which is separate from the full pattern. For example, for binary, the recognition pattern is '^\[=' while the full pattern is '\[=(base64 ending with =)\]'. This way the parser won't need unbounded lookahead to handle "streaming" implicit types like "binary" (or "vector", or "matrix", or whatever). Have fun, Oren Ben-Kiki |
From: Clark C . E. <cc...@cl...> - 2001-11-05 16:41:44
|
On Mon, Nov 05, 2001 at 09:45:23AM +0200, Oren Ben-Kiki wrote: | Thanks. BTW, the 'latest.html' file needs to be replaced | by a copy of the new draft... Done. | How about I replace "sole purpose" to "usual purpose"? Great. | > A. I like that the "indicators" follow the "properties", | > it gives consistency between the quoted scalar | > and the block scalar -- the indicator ("|) follow the | > descriptor (&!) | | So do I, but Brian still feels uneasy about it. | Brian, can you live with it? Brian? | > B. I'd still like to de-couple the "implicit syntax" | > with the type system. | > B. I'd still like to de-couple the "implicit syntax" | > with the type system. In particular, I'd like to | > be able to specify... | > | > picture: !gif [base64 encoded image] | | These are perfectly valid today. Really? This only works if the !gif class was some-how associated to use the [base64 decoder mechanism. Let's focus on this use case -- how do we represent a base64 encoded object with an explicit class? ... Nice summary below Oren. Let me chew on this one for a few days and get my thoughts together. I have the idea of spliting the "style" or "encoding" of a scalar from it's "type". Note that we can 'fix' the number of "styles" available within the specification. I think the bulk of my concern is more or less over the productions and normalization. | Back to the real issue, let me re-cast the current state | of affairs in a semi-formal way: | | - There are three scalar styles. | | - The YAML parser knows how to extract Unicode state from | each, regardless of type. | | - The type property (implicit or explicit) specifies which | "de-serializer" to apply to the extracted text. | | - The "de-serializer" is a function taking a text string | and returning an in-memory data object of an arbitrary | in-memory data type. For all I care, a single de-serializer | could return objects of different classes, if that makes sense. | | - The "de-serializer" is free to interpret the text | string as it pleases. In particular, it can auto-detect | several formats (dates are a good example). It can perform | base64 expansion (the "gif" type is a good example). And so on. | | - Implicit typing means choosing a de-serializer according | to the value matching the de-serializer's regexp pattern. | Explicit typing means choosing a de-serializer according | to the de-serializer's name. Otherwise there's | absolutely no difference between them. I think the last point is the key. | Consequences: | | - A de-serializer can't change the way text is extracted | from the file. That is, one can't create an implicit | type behaving like a block. | - Therefore it is possible to safely line-wrap, pretty-print | etc. any unquoted scalar, without worrying about types. | - Also, if a certain type looks best using a block style | (e.g., a code type), then it must be an explicit type: | | code: !!Perl-script | | #!/bin/perl | print "Hello, world!\n" | | - One can't do it with an implicitly type with the pattern '#!/bin/perl.*': | | code: \ | #!/bin/perl | print "Hello, world!\n" | | So, one down side (no block-like implicit types) and one up-side (one | trivially knows when it is safe to line-wrap values, regardless of types). | Not a bad compromise overall, I think. | | Other up-sides: we can define the unquoted type to be slightly magical so it | is safe in keys (disallowing ':' there) while it is still general (allowing | ':' in values), etc. Nice summary. | Now, your alternative (if I understand it correctly): | | - The parser does not know how to extract text from | scalars. A syntax module does it for him. | | - Every syntax module has an independent line-folding, | escaping etc. policy. | | - Every syntax module has a default de-serializer (e.g. a | base64 syntax module has a default byte-array de-serializer). | | - A de-serializer works much as above (turns a text string | into an in-memory data type). | | - A de-serializer still needs to handle multiple formats ( | e.g., dates). | | - A syntax module is chosen only according to the regexp | pattern the scalar matches. | | - Implicit typing means using the syntax module's default | de-serializer. | | - Explicit typing means using the named de-serializer. This is close (my ideas arn't solid though). I'm trying to separate syntax module ("transfer encoding") from the type ("seralizer/de-seralizer pair") | Correct? | | Consequences: The opposite of the above. Since one can create | syntax modules which behave any way they please, then one must | be very careful applying line-wrapping etc. to scalar values. | Generic YAML editors, pretty-printers etc. will have to be | syntax-module-aware to work safely. | | On the other hand, this would allow us to define stronger | implicit types: | | html: <html><body><pre> | Note that YAML will preserve | line breaks and spaces here | </html></body></pre> | but: Not | here! Right. | A different tradeoff, certainly. | | Other down-side: Defining unquoted scalars in a way that makes them safe in | keys becomes a challenge. It is not enough to specify for each syntax module | whether it is safe. Take the following one: | | regexp: <alpha> .* | policy: line-folded, no escaping. | safe-in-keys: ??? | | It seems each syntax module may be invoked in two "modes", one for map keys | and one for values. It is the module's responsibility to detect the end of | the key in the first case, the parser's job to inform it of the end of the | value in the second. | | Other consequence: This would make it possible to create whole new syntax | forms and embed them inside a YANL file. Take tables for example: | | table: =table ('=table.*' is the regexp pattern for tables) | A1 | A2 (line-break ends row) | B1 | B2 | ... | | I'm not certain we want to encourage that. Such complex types are best | represented as YAML structures (to allow accessing sub-parts using YAML-path | etc.): | | table: | - | - A1 | - A2 | - | - B1 | - B2 | | Otherwise you'd quickly get thin YAML wrappers for very complex | application-specific types, and lose most of the advantages YAML gives you | in terms of interoperability, generic addressing of data items, etc. | | To summarize, I see some merit in the concept, but I think the tradeoff is | against it. I consider type-independent line folding a big advantage of YAML | and I wouldn't want to give it up unless there's a significant gain, and I | don't see that gain yet. | | Can you give some examples of how the extra power provided by this approach | may be put to good use? Hmm. | Additional issues: | | - I'd like to introduce for implicit types the concept of a "recognition | pattern" which is separate from the full pattern. For example, for binary, | the recognition pattern is '^\[=' while the full pattern is '\[=(base64 | ending with =)\]'. This way the parser won't need unbounded lookahead to | handle "streaming" implicit types like "binary" (or "vector", or "matrix", | or whatever). I think this was my ^ proposal. Only that I was doing it as another indicator (which I think is a clean way to do it). Clark |
From: Oren Ben-K. <or...@ri...> - 2001-11-06 09:03:04
|
Clark C . Evans [mailto:cc...@cl...] wrote: yam...@li... > | > > | > picture: !gif [base64 encoded image] > | > | These are perfectly valid today. > > Really? This only works if the !gif class > was some-how associated to use the [base64 > decoder mechanism. Let's focus on this > use case -- how do we represent a base64 > encoded object with an explicit class? It isn't the gif class, it is the gif de-serializer. Which is welcome to invoke the base64 routines internally. !<thing> doesn't specify the *value class*, it specifies the *de-serializer* class; of course there's usually a 1-1 mapping between these two... Adding a paragraph somewhere which clarifies this distinction - the gif base64 image is a good example case. > ... > > Nice summary below Oren. Let me chew on this one for > a few days and get my thoughts together. I have the > idea of spliting the "style" or "encoding" of a scalar > from it's "type". Note that we can 'fix' the number > of "styles" available within the specification. I think > the bulk of my concern is more or less over the > productions and normalization. Some more thoughts, and to get the terminology straight: - There's the "class" of the final in-memory value; - There's the "type" string following the '!'; - There's the "de-serializer" mapping between them; - There's the node "kind" (map/list/scalar); - There's the scalar "style" (unquoted, quoted, block); - There's the value "format" (the way it is represented as Unicode text). The current minimal process is, today: (data) scalar (kind) -> (process) parser (according to style) -> (data) unicode text -> (process) deserializer (chosen implicitly or by explicit type; auto-detects the format) -> (data) object of right class. The '!type1!type2' proposal allows one to chain de-serializers. The '^<thing>' proposal means explicitly specifying the format so the de-serializer doesn't have to auto-detect it. Both can achieve the same thing: !gif!base64 (base64 is a de-serializer taking Unicode text and giving byte array; gif is a de-serializer taking byte array giving gif image). !gif ^base64 (gif is a de-serializer taking Unicode text giving gif image; base64 is a possible format this de-serializer can handle). The difference is subtle for the above case; the !<type1>!<type2> proposal allows for arbitrary multi-stage processing (e.g., !ocr!gif!base64, as a really wasteful way to transfer text), while the '^<format>' is restricted for only two levels. So *if* we decide this is important, I'd rather go for the pipeline proposal. > | Back to the real issue, let me re-cast the current state > | of affairs in a semi-formal way: > ... I'm trying > to separate syntax module ("transfer encoding") from > the type ("serializer/de-serializer pair") I think that the best way to do that is using pipelined '!<type1>!<type2' types. The way it works is as follows: !gif!base64 [base64 encoded image, gif de-serializer is given the binary data from the base64 one] !gif! (implicitly typed value, resulting type must be acceptable to the gif de-serializer - e.g., an implicit base64 value will result in the same behavior as above) That gives you everything you need. However, I don't think we should add it in YAML 1.0 (no proven need for this). Instead let's just allow for it in the future. The current wording reserves all names not starting with a DNS entry; let's make it stronger and reserve everything not looking like a DNS entry "Regexp: ( (alnum or '-')+ '.' )+ ( alnunm or '-' )+". This means all names containing '!' or other magical characters would be reserved. This way we could add pipelined types later on if they are useful. Or any other scheme we feel is necessary. Note this would be backward compatible. A YAML 1.0 application would just have to be given the enumeration of all possible combinations instead of creating a pipeline on the fly like a YAML 1.1 application would. > | Can you give some examples of how the extra power provided > | by this approach may be put to good use? > > Hmm. I think this is the core of the issue. That's why I want to leave the door open for it, but not add it into the current spec. Reserving all type names with "magical characters" seems to do this neatly. > | Additional issues: > | > | - I'd like to introduce for implicit types the concept of a > | "recognition > | pattern"... > > I think this was my ^ proposal. Yes; ^format removes the need for implicit detection. But it is useful to be able to write int: 12 So a "recognition pattern" is still needed. The question is whether recognition == validation; I suggest it doesn't have to. Have fun, Oren Ben-Kiki |
From: Clark C . E. <cc...@cl...> - 2001-11-06 14:33:07
|
Oren, Thanks for working through this with me. I like the pipeline proposal, and agree we can put it off to YAML 1.1 if we find use cases which support the added complexity. Clark ... | > | > picture: !gif [base64 encoded image] | > | | > | These are perfectly valid today. | > | > Really? This only works if the !gif class | > was some-how associated to use the [base64 | > decoder mechanism. Let's focus on this | > use case -- how do we represent a base64 | > encoded object with an explicit class? | | It isn't the gif class, it is the gif de-serializer. | Which is welcome to invoke the base64 routines internally. | | !<thing> doesn't specify the *value class*, it specifies | the *de-serializer* class; of course there's usually | a 1-1 mapping between these two... Adding a | paragraph somewhere which clarifies this distinction - | the gif base64 image is a good example case. My proposal was trying to separate this distinction and make ^ for the "seralizer" class and ! for the value class. Perhaps this distinction isn't good at all... since a seralizer will create an object of a given value class. | !gif!base64 (base64 is a de-serializer taking Unicode | text and giving byte array; gif is a de-serializer taking | byte array giving gif image). | | !gif ^base64 (gif is a de-serializer taking Unicode text | giving gif image; base64 is a possible format this | de-serializer can handle). | | The difference is subtle for the above case; the | !<type1>!<type2> proposal allows for arbitrary multi-stage | processing (e.g., !ocr!gif!base64, as a really wasteful | way to transfer text), while the '^<format>' is restricted | for only two levels. So *if* we decide this is important, | I'd rather go for the pipeline proposal. Right. | !gif! (implicitly typed value, resulting type must be | acceptable to the gif de-serializer - e.g., an implicit | base64 value will result in the same behavior as above) Nice. | That gives you everything you need. However, I don't think | we should add it in YAML 1.0 (no proven need for this). Instead | let's just allow for it in the future. The current wording | reserves all names not starting with a DNS entry; let's make | it stronger and reserve everything not looking like a DNS | entry "Regexp: ( (alnum or '-')+ '.' )+ ( alnunm or '-' )+". | This means all names containing '!' or other magical characters | would be reserved. Ok. The private area !!private should exclude "private" from containing a ! as well, to allow for chaining. | This way we could add pipelined types later on if they are | useful. Or any other scheme we feel is necessary. Note this | would be backward compatible. A YAML 1.0 application would | just have to be given the enumeration of all possible | combinations instead of creating a pipeline on the fly like | a YAML 1.1 application would. Sold. Clark |