From: Clark C. E. <cc...@cl...> - 2003-10-24 18:47:41
|
On Fri, Oct 24, 2003 at 10:20:51AM -0700, Jason Diamond wrote: | Are there any plans to develop a vocabulary that can describe the YAML | data model in XML so that it can be validated with a DTD/XML Schema/RELAX | NG Schema? Much like the Docutils Generic DTD | <http://docutils.sourceforge.net/spec/docutils.dtd>. Having a canonical | and validatable target vocabulary might help ease the development of tools | that convert between XML and YAML and also ease the transition from XML to | YAML for developers like me who can read DTDs with ease but am still murky | about YAML's data model. This is good idea, and the xml2yaml.xsl should convert any XML objects using this schema to YAML as well as trying to "guess" about what the user meant (which is what it currently does). A YAML schema for XML would have three elements: for the implicit root: <yaml:_root> for mappings: <yaml:_key> <yaml:_value> for sequences: <yaml:_> Thus, you know you are in a sequence if you encounter a sequence item (_), and you know you are in a mapping if you encounter a sequence of key/value. Another way to express things would be to have an explict "mapping" "sequence" and "scalar" elements, but I think this would be more verbose and less clear; and not able to be "mixable" with other non-yaml prefixed data. There would also be several attributes: yaml:anchor="id" # marks a node with an anchor yaml:alias="id" # the _, key or value is an alias node yaml:tag="seq" # this is the type tag for a given node, # less the 1st !, ie "!private" yaml:style="double" # specifies the style to use Then, a "strict" XML binding of YAML would only use the above elements and attributes. A "loose" binding would follow the rules outlined at http://yaml.org/xml.html ... 1. If an element contains a sequence of elements with the same name, then the name is discarded and they are considered to be a sequence. Further, a new attribute on the YAML root node, yaml:seq="items|people" could be used to indicate elements which are sequences; in this case, all element names of children are ignored. NOTE: the implementation doesn't do this... yet. The current approch it takes is similar, but brain dead (I was keying of elements which are the sequence-items... dumb move) 2. If an element contains different elements; then it is treated as a mapping, with the names of the children treated as keys, and the element's value treated as the key's value. This requires the elements to be unique, of course. 3. If an element contains attributes, then it is treated as a mapping. And there are two sub-cases: a) If the element is a sequence (see #1), then the name of the sequence items are used as a key value, ie, <a k="v"><b>val</b><b>two</b></a> becomes { a: v, b: [val, two] } b) If the element is a mapping (see #2), then the attributes and elements are merged. <a k="v"><b>val</b><c>two</c></a> becomes { a: v, b: val, c: two } 4. If an element contains a text node, but no elements, then it is a scalar value (either a key, value, or sequence item). Note: if there is only one child element then it will be treated as a mapping (use yaml:seq to specify otherwise) if there are zero children (not even a text node), then the element will also be treated as a mapping. If an element contains only a single text node of just whitespace, then it will be converted as a scalar. Clearly "implicit" conversion like this is dangerous, however, it is the most useful. ;) There are lots of issues in this process, but if properly documented, this can be useful for XML people. I'm not sure if this schema makes sense if you are trying to "understand" YAML, however. To do this, it is probably best to scan the spec and/or tutorial. I think that trying to learn via XML could be dangerous. Best, Clark P.S. In the new information model section, I'm using lisp style SExpr to annotate the examples. |