I'm pretty sure the railroad track diagrams in the JSON spec were produced by a tool. I downloaded it once to took a look, it didn't support parameterized productions. For about my first year with YAML, the parameterized grammar was the hardest thing for me to understand, I don't think they had been invented when I went to school. None the less, I can't believe it's harder for me to read them than it was for you to write them in the first place. I could never have done that: my hat's off to you.
I agree the number of productions involved is the biggest factor, and I see lots of opportunities for merging them. I hesitate to suggest them with the document in last call, but since you asked... The first clear candidate elimination through for merging I saw was 158. Another idea would be pulling all the productions that start with an indicator character up a level or two, for example eliminating [109, 120, 138, 141] by pulling them into 157. Also combining [139, 140], [142, 143], etc. This would lead to more complicated productions, but structuring them with multiple lines and indentation would make them more readable (and more like JSON's railroad diagrams). It might mess with the bottoms-up approach to the document's organization, but I keep reading it from back to front when thinking about implementation. Another example: merging the tag stuff: 89-95 and 97-100 into one big production. C style escapes in 42-63 could be combined the same way. It's a matter of where you draw the line between syntax and semantics, merging these productions would let you express the semantics more compactly by doing it in a less formal way. We don't need the symbolic names for individual characters.
I find the multi-prefix Hungarian notation hard to keep in my head. The reason I suggest the i- names is that their references become natural switch statements in a hand-coded scanner, but the merging above covers it a different way. Perhaps this is related to my preference for avoiding productions with whitespace/comments at the beginning too, but concentrating on that makes my head hurt. ;-) I'm a little over my head in saying this, but I _think_ the more you can make the spec read a LL(k) grammar for a predictive recursive descent parser, the easier it will be for people to adopt YAML without needing parser generators that support both our choice of language and parameterized grammars. Does it make sense?
From: Oren Ben-Kiki [mailto:oren@...]
Sent: Wednesday, May 21, 2008 5:41 PM
To: Burt Harris
Subject: Re: [Yaml-core] ns-l+block-node vs. s-l+block node in working draft
On Wed, May 21, 2008 at 2:42 PM, Burt Harris <Burt.Harris@...> wrote:
> In Working Draft 2008-05-11, production 208 references ns-l+block-node,
> which isn't defined. Instead production 197 defines s-l+block-node which I
> think might be what's intended. I'm not sure what the correct name is, but
> it looks like a typo in one place or the other.
Yes, it is a typo. Nice catch! I have a Perl script that verifies some
of the productions against typos, but it doesn't catch the case where
there are explicit non-default parameters. I'll fix this problem and
look into enhancing the script.
> Also regarding the production naming, I'd suggest it might make sense to
> distinguish between simple character-class productions (with a c- prefix),
> from the more complex indicator-based productions (with perhaps a i-
Hmmm... In principle you have the same issue with distinguishing
between single character and complex productions in general - e.g.,
b-char and b-break.
One way to do it would be to say x-x-name for complex productions,
instead of collapsing their name to x-name - so it would be b-char but
I'm not certain this is more readable though...
> I'm not sure what motivated the change from V1.1's c-l-block-sequence to
> V1.2's l-block-sequence; whatever the reason, I applaud it!
It just sort of worked out better that way. I tried hard to simplify
the productions... not that the result is "simple" by any means ;-)
> I think
> reducing the number of productions that start with a comment or non-indent
> whitespace will make it much easier to map the spec. to practical
> implementations. Perhaps some simpler Hungarian prefix for block
> constructs like s-l+block-indented would help. I'm not sure how to put this
> in parser terminology, but after having struggled with 1.1s productions for
> quite a while, it seems like a good idea.
I'm open to suggestions for a better notation!
BTW, I established the current one when I was struggling to keep track
of things myself - and I was _writing_ the productions at the time. I
appreciate how much harder it is to read them without having them "in
your head" first.
> It would be incredibly useful to
> be able to express YAML's syntax with the same sort of railroad track
> diagrams that JSONs is documented in.
I considered that, but there are several problems with it:
- It would make the spec even longer. A one-line production can easily
become a multi-line diagram.
- There's no program I know of that automatically generates these
diagrams. I'd have to write one myself or something. I suspect the
JSON ones were done by hand.
- I'm not certain this would improve readability by that much. I feel
the complexity doesn't stem from single complicated productions, but
by the amount of them. If there are specific productions that you feel
are too complex, I could split them up... I don't _think_ there are
productions I can easily merge, but if you can think of any, let me