On Fri, 2009-06-12 at 20:32 -0700, Joshua Choi wrote:
> According to the document markers section in the current spec, 'It is
> valid to have a “%” character at the start of a line (e.g. as the
> first character of a top-level mapping key).' This does not seem to be
> true under either the current spec or PyYAML.
That phrase is wrong as the spec currently stands.
Still, there are many other ways % can be the first character in a line.
Here are two off the top of my head:
> The spec seems to give the following example as valid YAML containing
> a node starting with "%":
> %YAML 1.2
> %PS-Adobe-2.0: 777
You missed the fact this is a block scalar, the example is:
--- | # <-- note the "|"
Then again we have example 9.4 (a mapping) which is wrong as the spec
There was a bit of historical vagueness about the _intent_ wrt. '%'
characters; in 1.0 (may it rest in piece) directives were very different
and we never followed completely through when changing them in 1.1. I
guess we didn't completely follow through in 1.2 either (yet).
Technically, there's no reason to consider '%' as an indicator, because
its use is restricted only to the directive lines and there's no
possible confusing % as a first char of a plain scalar and % as a first
char of a directive.
I'll fix the 1.2 spec to be (finally) completely consistent in this
regard. Thanks for pointing this out.
> - In YAML 1.1, were "%" characters allowed to start plain scalar nodes
> at the beginning of a line?
No, they weren't (but there were other issues with it, e.g.:
what does this do?
> - Is InstantYAML's behavior correct? Is it correct for YAML 1.2 too?
_Possibly_ correct for 1.1 (which is not clearly defined for examples
such as the above), incorrect for 1.2.
> - The spec's rules seem to contradict this feature. Is this actually so?
What 1.2 should do to become 100% consistent is allow '%' as the 1st
char of a plain scalar.
> - Would it be possible to *change* this, so that non-directive lines
> cannot begin with "%"?
No. The examples of flow scalars and literal scalars and so on make it
> Disallowing characters that are well known to be indicators, such as
> "%", "!", "?", etc. is
> reasonable and sane behavior—not just for the person writing the
> YAML parser, but also the
> person using the YAML documents. Making a strange exception just for
> the sake of letting a
> scalar beginning with "%" be a plain scalar places a burden on both
The problem is that % is not an indicator in the classic sense. Nobody
cares if you write:
Is there a problem
Even if ! is an indicator, because indicators are only used at the start
of a _node_. The % however is used at the start of a _line_. This
line-oriented structure just works differently.
> This is, in my opinion, a higher cost than just requiring the
> scalars to be quoted, just like a
> scalar beginning with any other indicator. I ask this pleadingly,
> because the parser library I'm
> writing currently depends on this—allowing "%" to start a non-
> directive line would make
> parsing much more complicated.
Look at section 9.1.2 which explicitly addresses this issue. The
document markers are your friends when it comes to whether you
should/not consider a % at the 1st character position as starting a
directive. Hopefully it will keep your code simple.
Thanks for catching this!