From: Oren Ben-K. <or...@ri...> - 2001-06-17 17:22:27
|
First, there a new draft in http://yaml.org/16jun2001.html - it isn't a big change over the previous one. I cleaned up the indentation issue, enhanced the character escapes, cleared up the anchor string semantics, and opened the door to 32-bit characters (this needs more work). As usual, please check me. I didn't touch the binary issue, or the class/comment/color issue. Here are some thoughts about these: Fact: We must support scripting languages native data structures in a transparent manner. This is a crucial factor to YAML's usefulness and hence its chances of "making it". Fact: We must support the color idiom as it solves so many problems which otherwise would mess up our simple language: comments, classes, schema evolution, push/pull anotation (trust me on this), legacy syntax interoperability (that was the original use case for it), and yes, even binary vs. unicode (see below). Fact: the lowest level API is insensitive to the color idiom. It provides a stream of events, period; a 'value' method is outside its scope. Fact: In completely static languages (C, C++), we must use our own YamlNode data type, and hence the color idiom is supported immediatly without any special syntax. Fact: Clark is right, it isn't possible to support the color idiom directly in scripting languages. Certainly JavaScript isn't up to it; Python and Perl are similarly limited. Even the weaker schema I proposed (using '_' for color members) fails when one considers writing into the data structure. Result: We must support the color idiom in scripting languages via external methods, as Clark proposed. This requires two operations (read and write - see below). Fact: Java allows both types of APIs. It also has potential for defining an intermediate API. Such an API could use something like YamlHash and YamlVector, each deriving from its counterpart system class. It could also interact intelligently with the Java (de)serialization mechanism. Result: The Java API will be interesting :-) That seems to settle things... Build the model around the color idiom. In each language, support it to the best of its ability. The rest is "mere details" :-) Using color in interesting ways: Classes/comments: A node with a class/comment is a map with '!'/'#' being a key for the class/comment string. If the original node wasn't a map to begin with, it is wrapped in a map node under the '=' key. Round trip: A parser could attach formatting information to nodes which would allow reproducing the output in a byte-to-byte identical format. For example, the key 'org.yaml.format.indent' could contain the exact white space used to indent the content of a node. The key 'org.yaml.format.scalar.style' could contain the scalar variant used, etc. A printer could use this color data as a guide to formatting the output YAML file. Binary vs. Unicode: suppose we have a way to distinguish between the two, which works most of the time (along the lines of what I proposed). A programmer who fears that some binary scalar instance would be incorrectly taken to be text (or vice-versa) would color the scalar with a directive for the printer, overriding the automatic decision (e.g., using the aforementioned 'org.yaml.format.scalar.style' color). Color makes many problems just disappear (given color-sensitive code). This means we have to make using it as painless as possible. The two operations I had in mind are: v(<map/list>, <key/index>, [ <color> ]) Read the value of an entry in a container. If the container is a map, the entry is identified by a key. If the container is a list, the entry is identified by its index. An optional color argument allows accessing any color associated with the entry. For example, in Perl: invoice: % price: #"Cheap!" 12.5$ v($doc->{invoice}, 'price') == '12.5'; v($doc->{invoice}, 'price', '#') == 'Cheap!'; v($doc->{invoice}, 'price', '!') == undef; w(<map/list>, <key/index>, [ <color> ], <value>) Write the value of an entry in the container. Entry identification is as above. This 'does the right thing' - if the entry is actually a map, writing the value will actually set the value of its '=' entry. The optional color argument allows attaching color to an entry. Again, this 'does the right thing' - attaching a color to an entry may convert it from a scalar to a map, moving its original value to the '=' key, and adding the color as an additional key/value pair. For example, in Perl: Start with: invoice: % price: 12.5 Apply w($doc->{invoice}, 'price', 13.5), you get: invoice: % price: 13.5 Apply w($doc->{invoice}, 'price', '#', 'Expensive!'), you get: invoice: % price: #"Expensive" 13.5 Apply w($doc->{invoice}, 'price', 'currency', '$'), you get: invoice: % price: % =: 13.5 #: "Expensive" currency: $ Etc. I'm toying with the idea that if 'color' is an integer, then the entry would convert to a list instead of a map, and the value be written to the appripriate list entry. This assumes the language used has a strong enough distinction between 1 and "1" ... Note that in the general case you can't replace 'v' and 'w' by the built-in scripting language access mechanism, since these won't "do the right thing". For example, writing: $doc->{invoice}->{price}->{'#'} = 'Expensive!'; Above would not have worked. But you can do it in specific cases: for example, writing: $doc->{invoice}->{price}->{currency} = '$'; Would have worked. A careful programmer may thus minimize the number of calls to 'v' and 'w'. Have fun, Oren Ben-Kiki |