From: Tim H. <tim...@co...> - 2004-05-07 20:06:11
|
Oren Ben-Kiki wrote: Thanks Oren. That's very helpful. I think I'm beginning to see the light. -tim >On Friday 07 May 2004 20:07, Tim Hochberg wrote: > > >>I've been trying to get a handle on the YAML spec by writing a dumper in >>Python. I'm trying to base this entirely on the spec, but in practice I >>end up getting some hints from how PyYaml does things. I recently bumped >>into yaml.org/tags.html and I'm not sure I like some of the >>wishiwashiness that the spec allows for the bonus collection types... >> >> > >They are rather wishy-washy, aren't they? :-) > >OK, here's the deal. If you explicitly tag something, that's it. So, if you >want your stuff to be 100% portable, go ahead and explicitly mark all >your !omap, !pairs and !set elements. > >If you don't explicitly tag your nodes, they are "open to interpretation". >They still have only one "correct" interpretation, but that interpretation is >implicit rather than explicit, and some types of codes will have to live >without knowing the "correct" one. > >In general, when any YAML application reads a stream, it does so under some >non-empty set of assumptions about its semantics. These assumptions specify, >amongst other things, exactly how to interpret tag-less nodes. > >At this point we seperate between two groups of applications. The first group >of applications are generic tools that work regardless of the specific >format. Such applications are perfectly happy with leaving the tag as >"unknown". For them, it is enough to know the kind of node (mapping, sequence >or scalar). The canonical example of this group of applications is a YAML >pretty printer. > >The other group of applications expect specific stream formats - e.g., an >application reading invoices and producing delivery orders. In this case, the >interpretation of all nodes is set by the expected format (in this example, >whatever an invoice is composed of). This allows the actual stream format to >be very compact and human readable, by omitting a lot of explicit tags that a >human doesn't care about anyway. > >For "specific" applications, there is an inherent trade-off between how much >information is coded in the "assumptions" or "rules" given to the library and >how much is coded in the YAML stream itself. Wisely, the YAML spec doesn't >enforce a particular tradeoff point here. There's just no way that the YAML >spec can make a sweeping requirement here that will match all applications, >considering the fact that the set of tags is extensible in the future. > >However, the YAML spec does _encourage_ "specific" applications to employ a >set of "common" rules. So, while there's no _requirement_ that all-digit >plain scalars will be interpreted as "integers", using this as part of your >"specific" application rules will increase interoperability. > >Generic libraries have the problem that they need to cater to both application >types. To do so, a generic library needs to allow a "specific" application to >explicitly state the rules for tag interpretation, and at the same time allow >a "generic" application to leave tags as "unknown". > >With all this in mind, the "bonus" collection types are seen as no different >than "integers" or "timestamps". *You* decide whether you want to: > >- Not use them at all. > >- Require them to be explicitly tagged. > >- Decide in advance which elements have > these types, based on the path leading to > them (the most common approach). > >- Auto-detect them based on node content. > >Presumably, a time will come when we'll have a recommended way to codify all >the tag resolution semantics (as a YAML document, what else :-). When that >time arrives, your Python YAML dumper will be able to consult this document >to decide whether or not to explicitly tag each node it emits (and possibly >make other decisions as well). This is part of the YAML Schema initiative, >which is a lot of work. > >Until such a time, you'll need to match your dumper's customization abilities >to the concrete needs of the actual applications that will use it; say, >providing a boolean option controlling whether all nodes of data type X >should or should not be explicitly tagged using tag Y, or whatever. You could >go further and allow making these decisions based on the path leading to the >node, or go for broke and provide a complete programming API that controls >this. > >Note that this is just one of many issues a dumper needs to deal with. Other >issues include: the choice of style for writing strings; the choice of >indentation level; the choice of block vs. flow style for collections; >comments; key order in mappings; etc. > >In a way, a high-quality dumper is harder than a high-quality parser because, >when all is said and done, parsing mostly _extracts_ information that already >exists in a stream. Dumping, on the other hand, injects information into the >stream - a *LOT* of information. > >You can compare this to writing a Java code beutifier. Sure, "everyone knows" >how to parse Java code. But writing a good Java code beautifier is much >harder. It isn't because the Java spec is incomplete; it is inherent in the >task. YAML dumping is the same. > >I hope this helps... and explains why we are going to keep the !omap, !set >and !pair parsing rules intentionally "wishy-washy". > >Have fun, > > Oren Ben-Kiki > > >------------------------------------------------------- >This SF.Net email is sponsored by Sleepycat Software >Learn developer strategies Cisco, Motorola, Ericsson & Lucent use to deliver >higher performing products faster, at low TCO. >http://www.sleepycat.com/telcomwpreg.php?From=osdnemail3 >_______________________________________________ >Yaml-core mailing list >Yam...@li... >https://lists.sourceforge.net/lists/listinfo/yaml-core > > > |