From: Oren Ben-K. <or...@ri...> - 2001-06-17 17:22:27
|
First, there a new draft in http://yaml.org/16jun2001.html - it isn't a big change over the previous one. I cleaned up the indentation issue, enhanced the character escapes, cleared up the anchor string semantics, and opened the door to 32-bit characters (this needs more work). As usual, please check me. I didn't touch the binary issue, or the class/comment/color issue. Here are some thoughts about these: Fact: We must support scripting languages native data structures in a transparent manner. This is a crucial factor to YAML's usefulness and hence its chances of "making it". Fact: We must support the color idiom as it solves so many problems which otherwise would mess up our simple language: comments, classes, schema evolution, push/pull anotation (trust me on this), legacy syntax interoperability (that was the original use case for it), and yes, even binary vs. unicode (see below). Fact: the lowest level API is insensitive to the color idiom. It provides a stream of events, period; a 'value' method is outside its scope. Fact: In completely static languages (C, C++), we must use our own YamlNode data type, and hence the color idiom is supported immediatly without any special syntax. Fact: Clark is right, it isn't possible to support the color idiom directly in scripting languages. Certainly JavaScript isn't up to it; Python and Perl are similarly limited. Even the weaker schema I proposed (using '_' for color members) fails when one considers writing into the data structure. Result: We must support the color idiom in scripting languages via external methods, as Clark proposed. This requires two operations (read and write - see below). Fact: Java allows both types of APIs. It also has potential for defining an intermediate API. Such an API could use something like YamlHash and YamlVector, each deriving from its counterpart system class. It could also interact intelligently with the Java (de)serialization mechanism. Result: The Java API will be interesting :-) That seems to settle things... Build the model around the color idiom. In each language, support it to the best of its ability. The rest is "mere details" :-) Using color in interesting ways: Classes/comments: A node with a class/comment is a map with '!'/'#' being a key for the class/comment string. If the original node wasn't a map to begin with, it is wrapped in a map node under the '=' key. Round trip: A parser could attach formatting information to nodes which would allow reproducing the output in a byte-to-byte identical format. For example, the key 'org.yaml.format.indent' could contain the exact white space used to indent the content of a node. The key 'org.yaml.format.scalar.style' could contain the scalar variant used, etc. A printer could use this color data as a guide to formatting the output YAML file. Binary vs. Unicode: suppose we have a way to distinguish between the two, which works most of the time (along the lines of what I proposed). A programmer who fears that some binary scalar instance would be incorrectly taken to be text (or vice-versa) would color the scalar with a directive for the printer, overriding the automatic decision (e.g., using the aforementioned 'org.yaml.format.scalar.style' color). Color makes many problems just disappear (given color-sensitive code). This means we have to make using it as painless as possible. The two operations I had in mind are: v(<map/list>, <key/index>, [ <color> ]) Read the value of an entry in a container. If the container is a map, the entry is identified by a key. If the container is a list, the entry is identified by its index. An optional color argument allows accessing any color associated with the entry. For example, in Perl: invoice: % price: #"Cheap!" 12.5$ v($doc->{invoice}, 'price') == '12.5'; v($doc->{invoice}, 'price', '#') == 'Cheap!'; v($doc->{invoice}, 'price', '!') == undef; w(<map/list>, <key/index>, [ <color> ], <value>) Write the value of an entry in the container. Entry identification is as above. This 'does the right thing' - if the entry is actually a map, writing the value will actually set the value of its '=' entry. The optional color argument allows attaching color to an entry. Again, this 'does the right thing' - attaching a color to an entry may convert it from a scalar to a map, moving its original value to the '=' key, and adding the color as an additional key/value pair. For example, in Perl: Start with: invoice: % price: 12.5 Apply w($doc->{invoice}, 'price', 13.5), you get: invoice: % price: 13.5 Apply w($doc->{invoice}, 'price', '#', 'Expensive!'), you get: invoice: % price: #"Expensive" 13.5 Apply w($doc->{invoice}, 'price', 'currency', '$'), you get: invoice: % price: % =: 13.5 #: "Expensive" currency: $ Etc. I'm toying with the idea that if 'color' is an integer, then the entry would convert to a list instead of a map, and the value be written to the appripriate list entry. This assumes the language used has a strong enough distinction between 1 and "1" ... Note that in the general case you can't replace 'v' and 'w' by the built-in scripting language access mechanism, since these won't "do the right thing". For example, writing: $doc->{invoice}->{price}->{'#'} = 'Expensive!'; Above would not have worked. But you can do it in specific cases: for example, writing: $doc->{invoice}->{price}->{currency} = '$'; Would have worked. A careful programmer may thus minimize the number of calls to 'v' and 'w'. Have fun, Oren Ben-Kiki |
From: Clark C . E. <cc...@cl...> - 2001-06-18 13:34:53
|
In general, I'm in very much agreement with Oren's post. Although I wonder a bit if the coloring mechanism may be hard to grok. Jason? It is certainly very powerful, and offers a clear distinction between YAML and XML besides the map/list vs named list difference. I'm also wondering if it will be possible to use YAML without being color aware... I think not. ... | the lowest level API is insensitive to the color idiom. The push/pull API can be designed such that the color idiom is central (see the proposed API). | We must support the color idiom in scripting languages | ia external methods ... This requires two operations | (read and write) Ok. Of course, since the visitor (push) / and iterator (pull) API can support the color idiom easily, this is an alternative way to support colors... | Round trip: A parser could attach formatting information to | nodes which would allow reproducing the output in a | byte-to-byte identical format. For example, the key | 'org.yaml.format.indent' could contain the exact white | space used to indent the content of a node. The key | 'org.yaml.format.scalar.style' could contain the scalar | variant used, etc. A printer could use this color data | as a guide to formatting the output YAML file. Ok, I got "format.scalar.style"... interesting, but I didn't grok the first example. Question. Do we want to re-cast the "class" section as specific keys? Using, perhaps ~ or _ or some other prefix character for 'globally unique keys'? Just pondering. | Binary vs. Unicode: suppose we have a way to distinguish | between the two, which works most of the time (along | the lines of what I proposed). I was taking a walk with my g.f. and discussing this item, it seems that there is a triparted situation: | Java / C# | Python | Perl | ----------+------------+---------+----------------------------+ ASCII | String | String | Scalar (UTF8 or Otherwise) | UNICODE | String | Unicode | UTF8 Marked Scalar | BINARY | Byte [] | String | Not UTF8 Marked Scalar | In other words, we need a three part distinction to determine the appropriate "mapping". Perhaps we require another special indicator for this one (to represent encoding?) or would one re-use the class marker? value: ^ascii This is only ascii value: This is unicode value: ^binary BASE64== value: #"Not sure I like this anymore!" [BASE64==] value: ^null #"Perhaps this can be taken a bit further, as I'm not that much of a fan of the ~ indicator to introduce a null node type..." | A programmer who fears that some binary scalar instance | would be incorrectly taken to be text (or vice-versa) | would color the scalar with a directive for the printer, | overriding the automatic decision (e.g., using the | aforementioned 'org.yaml.format.scalar.style' color). Ok. Although given that this may be one of those "common" colors like comment and class... perhaps it deserves an indicator... | Color makes many problems just disappear (given color-sensitive | code). This means we have to make using it as painless as possible. | The two operations I had in mind are: | | v(<map/list>, <key/index>, [ <color> ]) | w(<map/list>, <key/index>, [ <color> ], <value>) (snip perfectly clear examples) | Etc. I'm toying with the idea that if 'color' is an integer, then | the entry would convert to a list instead of a map, and the value | be written to the appripriate list entry. This assumes the | language used has a strong enough distinction between 1 and "1" ... Hmm. Interesting. | Note that in the general case you can't replace 'v' and 'w' by the built-in | scripting language access mechanism, since these won't "do the right thing". | For example, writing: | | $doc->{invoice}->{price}->{'#'} = 'Expensive!'; | | Above would not have worked. But you can do it in specific cases: for | example, writing: | | $doc->{invoice}->{price}->{currency} = '$'; | | Would have worked. A careful programmer may thus minimize the number of | calls to 'v' and 'w'. Ok. Best, Clark |