Oren,
Thanks for humoring the discussion. I guess there are two
particular needs that I'm mixing/addressing:
1. A very nice "presentation" tabular form (that would not
be great for editing since columns wouldn't line up).
2. The ability to directly include delimited tables, that
are indented when put into YAML, this is helpful since
a bulk of the data files used in a biomedical lab (where
I'm working) are CSV files. I think this applies to all
sorts of other fields... tables are just so common.
Your take on these items is that we need special import/export
functions, such as:
yaml2csv <yamlfile> <ypath> [ -o <csvfile> ]
Takes a YAML file, plus a ypath expression and extracts
the "leaf" (which is a sequence of mappings) and formats
it as a CSV file to stdout (or optionally an output file).
csv2yaml <yamlfile> <ypath> [ -i <csvfile> ]
Takes a CSV file (or stdin) and injects it into a YAML file
as a sequence of mappings at the location specified by the
ypath expression.
Not a bad idea, but let me play a bit more...
On Sun, Dec 12, 2004 at 10:01:51AM +0200, Oren Ben-Kiki wrote:
| > ---
| > | name | hr | ave
| > |--------------+----+------
| > | Mark McGwire | 65 | 0.278
| > | Sammy Sosa | 63 | 0.288
| > ...
|
| ---
| - { name: Mark McGwire, hr: 65, ave: 0.278 }
| - { name: Sammy Sosa, hr: 63, ave: 0.288 }
| ...
|
| Which is not as compact but is arguably more readable.
Not really. Here is a more real-life example (but in reality
it has lots and lots of columns).
--- @|
| Dysmorphology | Physical Examination
| Acondroplasia | Marfan Syndrome | ... | Height | Weight | ...
| | | | | |
| Not Evaluated | Positive | ... | 63 | 130 | ...
... thousands of rows ...
| Positive | Not Evaluated | ... | 42 | 80 | ...
...
This just _isn't_ nice to read when you put it in current YAML styles.
| It isn't Compatible with Excel
Well, 99% of the time, the import is easy of this table... you
simply specify that | is the delimiter. In the domain I'm working
in, | never appears in the real data. Indentation isn't an issue
since Excel loads it into the first column, which can be easily
ignored. That said... it requires an extra step through the
import Wizard -- so you are correct.
| > - Within a given "cell", normal flow structure can occur.
| > Thus quotes can be used to escape characters, (especially
| > the | indicator), and aliases can occur as needed.
|
| All this seems incompatible with Excel. It would choke on nested {} or
| [], insist that each cell be contained in a single line, will
| misinterpret \t in either '\t' or "\t" (or both), will consider
| '&anchor' before the row as content (or an error), and so on.
And that's just OK, these are rare cases, and it's easy to explain
to a lab technician that *xxx in a cell represents an reference to
another object (such as mother or father).
| If for some reason we decided CSVs are needed, the right way to support
| them would be using a CSV style (I'm using @ because its a reserved
| indicator; this allows specifying the separator character)
|
| ---
| csv: @,
| name,hr,ave
| Mark McGwire,65,0.278
| Sammy Sosa,63,0.288
Right. This perhaps is the better approach. Use @ to signify
"tabular data" followed by the delimiter (or perhaps no character
to signify a 'here' document).
--- @,
, Dysmorphology , Physical Examination
, Acondroplasia , Marfan Syndrome , ... , Height , Weight , ...
, , , ... , , , ...
, Not Evaluated , Positive , ... , 63 , 130 , ...
... thousands of rows ...
, Positive , Not Evaluated , ... , 42 , 80 , ...
...
I like it.
| The details of the CSV format would be _exactly_ the way CSVs behave in
| practice in spreadsheets (this requires a bit of investigation of
| course).
Yes, fortunately, we are somewhat close already. If we allowed for
double "" to be the same as a single " it would be even closer. Is
the following currently valid YAML?
key: "one "" two"
One interpretation of this is two strings, that are automatically
concatinated. If this is the current interpretation, then I guess
this potential compatibility is dead-on-arrival.
| Of course, you won't get any anchors, or tags, and escaping
| would be very limited. It seems like this won't be good enough for your
| use case (having row anchors). Well, that's CSV for you :-)
Its not that CSV (or Excel) has to understand these items, as
much as they just preserve them intact. By using the very
first column for row-level indicators and leading whitespace
this could potentially work well. It's easy in Excel to ignore
the first column.
| I think all this is uncalled for. It is rather un-yaml-ish.
Now _this_ is the best argument against it! ;)
| It brings to
| mind the "here document" flow style (I'm using ` here, again just
| because it is reserved):
Right.
| ---
| { foo: `EOF
| how much
| indentation
| here?
| EOF
| , bar: baz
| }
| ---
| foo:
| bar:
| baz: `EOF
| what about
| indentation
| here?
| EOF
| ...
The latter is similar to our block scalar, and that's why we don't
need a HERE document. This issue is different, unlike a HERE
document, there isn't a good readable alternative in current YAML
for writing large tabular data sets (modeled as a sequence of
mappings).
| In both cases you have the same issues: either you indent the content,
| so it isn't really a CSV/"here document", or you don't indent it -
| breaking the yaml block structure, having problems with content
| starting with '%', '---' or '...', and so on.
Right.
| In both cases we prefered simply not to support the problematic style.
| FWIW, I think a "here document" style is much more useful than a CSV
| style: it could serve as the equivalent of the literal style for people
| who prefer flow collection styles. Also, allowing `"EOF" vs. `EOF would
| enable using escape sequences in a "here document", making it truly
| powerful... Still, it just doesn't mesh well with YAML indentation.
Right -- but once again, I'm not proposing to break indentation. ;)
Cheers,
Clark
|