Thread: [Yaml-core] Tabular Data Sets (a brain fart)

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Hello all.  In the last few months I've been working heavily with
SQL results, CSV files,  and other tabular data sets.  I'd like YAML
1.2  (in another year or so) to address the succinct
human-presentation of these structures.  This is just a brain-storm
post to get any interested parties thinking about this problem to
see if we can find early concensus (or identify issues).

Previously (about a year ago?) Brian had proposed using a
pipe-delimited format the literal scalar indicator like...

    ---
    | name         | hr | ave
    |--------------+----+------
    | Mark McGwire | 65 | 0.278
    | Sammy Sosa   | 63 | 0.288 
    ...

This particular structure would then be equivalent to 
example 2.4 in the specification,

    ---
    -
      name: Mark McGwire
      hr:   65
      avg:  0.278
    -
      name: Sammy Sosa
      hr:   63
      avg:  0.288
    ...

A few details about this I see as important:

  -  This proposes an alternative, more succinct, style for a
     compactly presenting sequence-of-mappings.  It does not
     introduce a new fundamental "kind" or such modification
     to the information model.  It's syntax sugar.

  -  The pipe delimiter | can be used by Excel as a field
     separator for loading, the leading | used to create a column
     with indentation (which is discarded).  Excel compatibility,
     is, (perhaps unfortunately), very important.

  -  Within a given "cell", normal flow structure can occur.  
     Thus quotes can be used to escape characters, (especially
     the | indicator), and aliases can occur as needed.  I'm not
     sure how to anchor a given row (anchoring a cell is probably
     not needed), perhaps:

        ---
              | name         | hr | ave  | admires
              |--------------+----+------+--------
        &mark | Mark McGwire | 65 | 0.278|
              | Sammy Sosa   | 63 | 0.288| *MARK
        ...

      Eww! That's ugly, but I need to be able to anchor 
      a given row... I just don't know how it should look.

  -  An optional "divider" line separates the mapping keys
     from the data values can be extended later on to provide
     nested groups (I call them "facets"), perhaps something like:

        ---
        | player       | stastic
        |              | hr | ave
        |--------------+----+------
        | Mark McGwire | 65 | 0.278
        | Sammy Sosa   | 63 | 0.288 
        ...

     being mapped to...

        ---
        -
          player: Mark McGwire
          stastic:
            hr:   65
            avg:  0.278
        -
          player: Sammy Sosa
          stastic:
            hr:   63
            avg:  0.288
        ...

     This is extremely common in my data sets, and the presentation
     is straight-forward.

Anyway... any thoughts?  I suppose this would violate YAML 1.1 
productions since it uses the | character in a very odd way.  I've
tried other character though, and they don't work as nicely.
Do we have any clean migration path?

Kind Regards,

Clark

Thread: [Yaml-core] Tabular Data Sets (a brain fart)

yaml-core