Re: [Yaml-core] pairs, omaps and sets! Oh My!

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Oren Ben-Kiki wrote:

Thanks Oren. That's very helpful. I think I'm beginning to see the light.

-tim

>On Friday 07 May 2004 20:07, Tim Hochberg wrote:
>  
>
>>I've been trying to get a handle on the YAML spec by writing a dumper in
>>Python. I'm trying to base this entirely on the spec, but in practice I
>>end up getting some hints from how PyYaml does things. I recently bumped
>>into yaml.org/tags.html and I'm not sure I like some of the
>>wishiwashiness that the spec allows for the bonus collection types...
>>    
>>
>
>They are rather wishy-washy, aren't they? :-)
>
>OK, here's the deal. If you explicitly tag something, that's it. So, if you 
>want your stuff to be 100% portable, go ahead and explicitly mark all 
>your !omap, !pairs and !set elements.
>
>If you don't explicitly tag your nodes, they are "open to interpretation". 
>They still have only one "correct" interpretation, but that interpretation is 
>implicit rather than explicit, and some types of codes will have to live 
>without knowing the "correct" one.
>
>In general, when any YAML application reads a stream, it does so under some 
>non-empty set of assumptions about its semantics. These assumptions specify, 
>amongst other things, exactly how to interpret tag-less nodes.
>
>At this point we seperate between two groups of applications.  The first group 
>of applications are generic tools that work regardless of the specific 
>format. Such applications are perfectly happy with leaving the tag as 
>"unknown". For them, it is enough to know the kind of node (mapping, sequence 
>or scalar). The canonical example of this group of applications is a YAML 
>pretty printer.
>
>The other group of applications expect specific stream formats - e.g., an 
>application reading invoices and producing delivery orders. In this case, the 
>interpretation of all nodes is set by the expected format (in this example, 
>whatever an invoice is composed of). This allows the actual stream format to 
>be very compact and human readable, by omitting a lot of explicit tags that a 
>human doesn't care about anyway.
>
>For "specific" applications, there is an inherent trade-off between how much 
>information is coded in the "assumptions" or "rules" given to the library and 
>how much is coded in the YAML stream itself. Wisely, the YAML spec doesn't 
>enforce a particular tradeoff point here. There's just no way that the YAML 
>spec can make a sweeping requirement here that will match all applications, 
>considering the fact that the set of tags is extensible in the future.
>
>However, the YAML spec does _encourage_ "specific" applications to employ a 
>set of "common" rules. So, while there's no _requirement_ that all-digit 
>plain scalars will be interpreted as "integers", using this as part of your 
>"specific" application rules will increase interoperability.
>
>Generic libraries have the problem that they need to cater to both application 
>types. To do so, a generic library needs to allow a "specific" application to 
>explicitly state the rules for tag interpretation, and at the same time allow 
>a "generic" application to leave tags as "unknown".
>
>With all this in mind, the "bonus" collection types are seen as no different 
>than "integers" or "timestamps". *You* decide whether you want to:
>
>- Not use them at all.
>
>- Require them to be explicitly tagged.
>
>- Decide in advance which elements have
>  these types, based on the path leading to
>  them (the most common approach).
>
>- Auto-detect them based on node content.
>
>Presumably, a time will come when we'll have a recommended way to codify all 
>the tag resolution semantics (as a YAML document, what else :-). When that 
>time arrives, your Python YAML dumper will be able to consult this document 
>to decide whether or not to explicitly tag each node it emits (and possibly 
>make other decisions as well). This is part of the YAML Schema initiative, 
>which is a lot of work.
>
>Until such a time, you'll need to match your dumper's customization abilities 
>to the concrete needs of the actual applications that will use it; say, 
>providing a boolean option controlling whether all nodes of data type X 
>should or should not be explicitly tagged using tag Y, or whatever. You could 
>go further and allow making these decisions based on the path leading to the 
>node, or go for broke and provide a complete programming API that controls 
>this.
>
>Note that this is just one of many issues a dumper needs to deal with. Other 
>issues include: the choice of style for writing strings; the choice of 
>indentation level; the choice of block vs. flow style for collections; 
>comments; key order in mappings; etc.
>
>In a way, a high-quality dumper is harder than a high-quality parser because, 
>when all is said and done, parsing mostly _extracts_ information that already 
>exists in a stream. Dumping, on the other hand, injects information into the 
>stream - a *LOT* of information.
>
>You can compare this to writing a Java code beutifier. Sure, "everyone knows" 
>how to parse Java code. But writing a good Java code beautifier is much 
>harder. It isn't because the Java spec is incomplete; it is inherent in the 
>task. YAML dumping is the same.
>
>I hope this helps... and explains why we are going to keep the !omap, !set 
>and !pair parsing rules intentionally "wishy-washy".
>
>Have fun,
>
>	Oren Ben-Kiki
>
>
>-------------------------------------------------------
>This SF.Net email is sponsored by Sleepycat Software
>Learn developer strategies Cisco, Motorola, Ericsson & Lucent use to deliver
>higher performing products faster, at low TCO.
>http://www.sleepycat.com/telcomwpreg.php?From=osdnemail3
>_______________________________________________
>Yaml-core mailing list
>Yam...@li...
>https://lists.sourceforge.net/lists/listinfo/yaml-core
>
>  
>