On 3/28/06, Clark C. Evans <cce@clarkevans.com> wrote:
This is the official "complaint" thread for the YAML plain scalar.  I'm
going to start this post /w Brad's recent contribution:

    On Mon, Mar 27, 2006 at 04:24:20PM -0500, Brad Baxter wrote:
    | ---
    | color: #c0c0c0
    | title: Hanna Reitsch: Hitler's Female Test Pilot
    | chapter: 1[tab]Aviation
    | ...

The Color Problem

I think including color specifications, unquoted, is a valid use-case
for YAML:  ``#c0c0c0`` should be valid plain scalar. Unfortunately, its
competing use-case is the first-line pound-sign comment #!/some/program
as well as #------- dividers.  Hence the current compromise is to forbid
" #" in a plain scalar.  What do we think about this rule instead:

  1. Forbid " # " in plain scalars, allowing end-of-line comments,
     but permitting the color example above.

  2. Allow "#xxx" content only if it it is preceded by whitespace, hence
     sparing content dividers and first-line #! shell markers.

I don't know that I can fully grok the implications of these rules.
But my first impression is that while it may help in this case,
it wouldn't necessarily reduce the learning curve for when # is
allowed.  I don't want to sound overly objectionable about all
this; I understand that there is always going to be a learning
curve, period.  I'm just continuing my observations ...

Wacko "URL Set" Problem

To permit URLs in a plain scalar, we allow a colon to be included as
long as it is followed by a non-space character.  Further, to permit
"sets", we allow the colon to be omitted entirely, the YAML parser
simply assumes a ``None`` for the values.  The combination of these
items in PySyck is quite deadly:

  { quite:unexpected }   =>  { "quite:unexpected": None }

PyYaml seems to handle this case in the expected manner, but its not
clear to me that the specification is clear or correct here. The
opposite problem is seen with Brad's Title example above.  This is
a proposed rule:

  3. Forbid ":" within any plain-scalar found within a flow collection,
     hence making PyYaml's behavior the new rule.

  4. Remove the ": " restriction from plain-scalars in regular
     indented styles, hence permitting the unambiguous title example.

Besides colon, the examples below illustrate some of the
other indicators that (according to the 1.1 specs) must be
avoided in plain scalars:

Creator: Baldowski, Clifford H., 1917-
Title: You crazy communist! You trying to get us all killed?, [1962].
Online Publisher: [Athens, Ga.] : Digital Library of Georgia, 2002
Original Material: Savannah, Ga. : Commercial Lithograph & Printing Co., 1917

In Creator, the commas and hyphen aren't allowed.
In Title, the exclamation and question marks and brackets
are also not allowed.
In Publisher, (besides the brackets) I'm not sure if the
"floating" colon " : ", would be allowed under the proposed
rule above.
In Material, the ampersand is also not allowed.

Working as I do in a library environment, being able to
deal with plain scalars like these is an attractive idea,
but I'm pretty sure it won't be possible to allow ANY
(unquoted) character after the key of a mapping or the
hyphen introducing an element of a sequence and still
support the flow styles (and comments) that are so nice.

There are, of course, many standards for representing
bibliographic data, MARC (shudder), various XML formats
(shudder), and various "flat" formats, e.g.,
- http://www.ietf.org/rfc/rfc1807.txt
- http://jodi.ecs.soton.ac.uk/Articles/v02/i02/Kunze/

This last format, ERC, made me think that YAML ought
to be considered (and I still think so), but I think the
restrictions on plain scalars could be a stumbling block.
(That is, if the YAML data is to be edited "raw" by folks who
might not be well versed in the syntax.)

The "Tab" Issue

As it turns out, JSON permits tabs to be used in separation spaces and I
don't see a good reason for forbidding tabs in plain-scalars. If someone
uses a carriage-return to split a plain scalar into parts and accidently
get a TAB in their structural indentation it won't parse.  However, our
modern parsers (PyYaml) have very good line/column messages where the
error is, plus a good error messages.  So this just shouldn't be a huge
problem for a user to find/fix.  The confusion in Brad's last example,
plus JSON compatibility might make it worth a change:

  5. Allow tabs in separation spaces

  6. Allow tabs in plain scalars (but not structural indentation)

Allowing tabs in these places sounds okay to me, particularly
if it furthers compatibility with JSON.