Ok. I've done a bit more thinking on the productions. So far, our
YAML BNF has two somewhat-unusual features: (a) indentation level,
and (b) context (flow,block). Another source of complication to
the productions is something like:
The issue is that "- " and " #" are both tokens, yet, "- #" is also
those same two tokens. So, productions which use "- " need a way
to express that one or more spaces immediately following are
required, and " #" needs a way to express that one or more spaces
immediately prior is needed. Let's introduce a new production
symbol, $ , which means one or more spaces, but with the unique
property that $ $ in the productions (as denormalized) _still_ mean
one or more spaces.
item-indicator :== '-' $
comment-indicator :== $ '#'
The implementation of $ is straight-forward, the parser keeps
a single flag, when the current production is $ then one or
more subsquent spaces are matched, subsequent $ also match
if the flag is set (no matter how many spaces had occurred).
And then, if a non-space character is needed from a production,
the flag is reset. The whole process then continues.
The more general case, is that $ could be used in front of any
'character-class' (ie, an alternation of characters). And then the
flag behavior above stores the most recent character matching a
class in the 'slot' and this then matches any corresponding use of
break :== '\n'
comment-separator :== break | ' '
item-separator :== break | ' ' | '\t'
comment-indicator := $comment-separator '#'
item-indicator :== '-' $item-separator
In this way, "- #" would then match item-indicator followed by
a comment-indicator; as would "-\n#" or "- \n#" or "-\n #" etc.
However, since '\t' is not in the comment-separator (for
illustration), "-\t#" would only match item-indicator and then
fail to match comment indicator.
I did a provisional analysis of the BNF productions, and I think
several of them can collapse if we do something non-standard like
this. Given the simple implementation, it seems like it would
add clarity. For example, s-separate-span-spaces gets smaller
(you don't need the alternation, the second part just becomes optional),
c-nb-throwaway-text can be more "informative" by explicitly showing
that it needs a "\n" or " " immediately before it, and the other -comment
variants could collapse, as well as some of the gymnastics in the
later productions used to handle this very common "overlap" case.
P.S. This still doesn't address the ambiguity problem I posted a
bit earlier, but I'm not sure there is a good solution to this
besides adding another context-parameter? Or... just cope. ;(
On Thu, Sep 23, 2004 at 11:45:19PM -0400, Clark C. Evans wrote:
| I've found a case that is a bit hard to code-up in a parser
| beacuse it is ambiguous:
| For the top-level case of l-explicit-document, there is a
| s-separate-spaces followed by ns-l-block-node, which leads to
| ns-l-flow-in-block. This production has both ns-flow-node and
| _another_ s-b-separated-comment. But, the ns-flow-node leads
| to an optional ns-plain-multi.
| So, we have a s-separate-spaces (where the existance of a
| s-b-separated-comment implies s-indentation), followed by an
| optional set of productions, followed by another
| s-b-separated-comment that does not imply the existence of the
| There is quite a bit of logical separation between these two
| productions, and it seems to require some 'backtracking' of sorts;
| or some nasty context flag. In any case, removing the ambiguity
| causes deviations from the spec, and I was trying to follow it quite
| religiously. Is there a way to remove this ambiguity?
| When trying to answer this question, I noticed that the pair
| s-b-separated-comment followed by l-comment-* is very common:
| - s-separate-span-spaces
| - ns-l-flow-in-block
| - s-l-block-seq-empty
| - s-l-block-implicit-value
| - l-document-suffix
| In fact, the only place l-comment-* doesn't follow after a
| s-b-separated-comment is in c-b-block-header; so I was curious
| if this merits a distinct production? And if so, perhaps the
| s-b-separate-spaces could be 'denormalized' in its various
| places in a way to remove the ambiguity?
| This SF.Net email is sponsored by: YOU BE THE JUDGE. Be one of 170
| Project Admins to receive an Apple iPod Mini FREE for your judgement on
| who ports your project to Linux PPC the best. Sponsored by IBM.
| Deadline: Sept. 24. Go here: http://sf.net/ppc_contest.php
| Yaml-core mailing list
Clark C. Evans Prometheus Research, LLC.
o office: +1.203.777.2550
~/ , mobile: +1.203.444.0557
(( Prometheus Research: Transforming Data Into Knowledge
\/ - Research Exchange Database
/\ - Survey & Assessment Technologies
` \ - Software Tools for Researchers