Re: [Yaml-core] xml2yaml.xsl

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

On Fri, Oct 24, 2003 at 10:20:51AM -0700, Jason Diamond wrote:
| Are there any plans to develop a vocabulary that can describe the YAML
| data model in XML so that it can be validated with a DTD/XML Schema/RELAX
| NG Schema? Much like the Docutils Generic DTD
| <http://docutils.sourceforge.net/spec/docutils.dtd>. Having a canonical
| and validatable target vocabulary might help ease the development of tools
| that convert between XML and YAML and also ease the transition from XML to
| YAML for developers like me who can read DTDs with ease but am still murky
| about YAML's data model.

This is good idea, and the xml2yaml.xsl should convert any XML objects
using this schema to YAML as well as trying to "guess" about what the
user meant (which is what it currently does).   A YAML schema for XML
would have three elements:

for the implicit root:
   <yaml:_root> 

for mappings:
   <yaml:_key>
   <yaml:_value>

for sequences:
   <yaml:_>

Thus, you know you are in a sequence if you encounter a sequence
item (_), and you know you are in a mapping if you encounter a
sequence of key/value.   Another way to express things would be to
have an explict "mapping" "sequence" and "scalar" elements, but I 
think this would be more verbose and less clear; and not able to 
be "mixable" with other non-yaml prefixed data.   

There would also be several attributes:

  yaml:anchor="id"    # marks a node with an anchor
  yaml:alias="id"     # the _, key or value is an alias node
  yaml:tag="seq"      # this is the type tag for a given node, 
                      # less the 1st !, ie "!private" 
  yaml:style="double" # specifies the style to use

Then, a "strict" XML binding of YAML would only use the
above elements and attributes.   A "loose" binding would
follow the rules outlined at http://yaml.org/xml.html ...

  1. If an element contains a sequence of elements with the
     same name, then the name is discarded and they are 
     considered to be a sequence.  Further, a new attribute
     on the YAML root node, yaml:seq="items|people" could be used
     to indicate elements which are sequences; in this case, 
     all element names of children are ignored.

     NOTE: the implementation doesn't do this... yet.  The current
           approch it takes is similar, but brain dead (I was keying
           of elements which are the sequence-items... dumb move)

  2. If an element contains different elements; then it is treated
     as a mapping, with the names of the children treated as keys,
     and the element's value treated as the key's value.  This 
     requires the elements to be unique, of course.

  3. If an element contains attributes, then it is treated as a 
     mapping.  And there are two sub-cases:

     a) If the element is a sequence (see #1), then the name of 
        the sequence items are used as a key value, ie, 
                  <a k="v"><b>val</b><b>two</b></a>    
        becomes
                  { a: v, b: [val, two] }

     b) If the element is a mapping (see #2), then the attributes
        and elements are merged.  
                  <a k="v"><b>val</b><c>two</c></a>    
        becomes
                  { a: v, b: val, c: two }

   4. If an element contains a text node, but no elements, then
      it is a scalar value (either a key, value, or sequence item).

  Note: if there is only one child element then it will be treated
        as a mapping (use yaml:seq to specify otherwise) 
        if there are zero children (not even a text node), 
        then the element will also be treated as a mapping.
        If an element contains only a single text node of just
        whitespace, then it will be converted as a scalar.

        Clearly "implicit" conversion like this is dangerous,
        however, it is the most useful.  ;)

There are lots of issues in this process, but if properly documented,
this can be useful for XML people.  I'm not sure if this schema makes
sense if you are trying to "understand" YAML, however.   To do this,
it is probably best to scan the spec and/or tutorial.  I think that
trying to learn via XML could be dangerous.

Best,

Clark

P.S.  In the new information model section, I'm using lisp style
      SExpr to annotate the examples.