From: Clark C. E. <cc...@cl...> - 2004-09-23 03:02:42
|
I've just spent way too much time fiddling with the comment production in the CVS snapshot specification [1], in particular s-separate-spaces(n,c). While I don't mind the s-b-separated-comment (which allows # comments to follow a particular item), it goes to far by including an l-comment. I can't think of a good rationale for allowing this, an example document: --- |2 # l-comment first line of block ... If you remove the 2 it becomes quite ugly to figure out what's being intended. Overall, this "extra" flexibility causes more pain than it is worth. In particular, this extra l-comment actually causes a bug in the case of l-text-comment(n+1) which seems to be allowed. Anyway, I don't think the fix to this is to parameterize l-comment with (<n). So, I recommend removing l-comment* from s-separate-span-spaces. That said, I do agree with s-b-separated-comment followed immediately with the mandatory s-indentation and s-discarded-spaces, to allow indicators to be spaced out, each on its own line. Alternatively, make l-comment always start in column 0. In fact, I think I like that option better; in this case l-comment is not so bad in the s-separate-spaces production. Thoughts? Clark [1] http://yaml.org/spec/spec.html#s-separate-spaces(n,c) |
From: Oren Ben-K. <or...@be...> - 2004-09-23 06:10:00
|
On Thursday 23 September 2004 05:02, Clark C. Evans wrote: > ... I don't mind the s-b-separated-comment > (which allows # comments to follow a particular item), it goes to far > by including an l-comment. Ur, s-b-separated-comment does NOT include a following l-comment. In every case it makes sense for it to be followed by comment lines, it is noted explicitly using an l-comment production. You can know this by looking at its name: "s-b-" means "spaces, stuff, line break". "*-b-" productions are always limited to one line. Only "l-" productions span several lines. Well, with one exception - s-separated-span-spaces pretends to be a single space but may in fact span lines. I really should rename it to s-l-s-separated-spaces. > I can't think of a good rationale for > allowing this, an example document: > > --- |2 > # l-comment > first line of block > ... This is an error. Check the productions :-) Have fun, Oren Ben-Kiki |
From: Clark C. E. <cc...@cl...> - 2004-09-23 13:19:38
|
On Thu, Sep 23, 2004 at 08:09:53AM +0200, Oren Ben-Kiki wrote: | On Thursday 23 September 2004 05:02, Clark C. Evans wrote: | > ... I don't mind the s-b-separated-comment | > (which allows # comments to follow a particular item), it goes to far | > by including an l-comment. | | Ur, s-b-separated-comment does NOT include a following l-comment. In | every case it makes sense for it to be followed by comment lines, it is | noted explicitly using an l-comment production. I'm talking about s-separate-spaces production, which does include an l-comment | > I can't think of a good rationale for | > allowing this, an example document: | > | > --- |2 | > # l-comment | > first line of block | > ... | | This is an error. Check the productions :-) Ahh. Ok, but it would then allow, --- !tag # l-comment | 2 # s-b-separated-comment? content Hmm. Ok. Thanks Oren. Did I say how wonderful and perfect the productions are? *grins* Clark ns-l-block-node n-s-l-block-in-block c-ns-properties c-separate-spaces s-b-separated-comment l-comment # right here s-indentation(2) ahh.... |
From: Clark C. E. <cc...@cl...> - 2004-09-24 03:45:25
|
Oren, I've found a case that is a bit hard to code-up in a parser beacuse it is ambiguous: For the top-level case of l-explicit-document, there is a s-separate-spaces followed by ns-l-block-node, which leads to ns-l-flow-in-block. This production has both ns-flow-node and _another_ s-b-separated-comment. But, the ns-flow-node leads to an optional ns-plain-multi. So, we have a s-separate-spaces (where the existance of a s-b-separated-comment implies s-indentation), followed by an optional set of productions, followed by another s-b-separated-comment that does not imply the existence of the s-indentation. There is quite a bit of logical separation between these two productions, and it seems to require some 'backtracking' of sorts; or some nasty context flag. In any case, removing the ambiguity causes deviations from the spec, and I was trying to follow it quite religiously. Is there a way to remove this ambiguity? ... When trying to answer this question, I noticed that the pair s-b-separated-comment followed by l-comment-* is very common: - s-separate-span-spaces - ns-l-flow-in-block - s-l-block-seq-empty - s-l-block-implicit-value - l-document-suffix In fact, the only place l-comment-* doesn't follow after a s-b-separated-comment is in c-b-block-header; so I was curious if this merits a distinct production? And if so, perhaps the s-b-separate-spaces could be 'denormalized' in its various places in a way to remove the ambiguity? Clark |
From: Oren Ben-K. <or...@be...> - 2004-09-25 18:55:41
|
On Friday 24 September 2004 05:45, Clark C. Evans wrote: > I've found a case that is a bit hard to code-up in a parser > beacuse it is ambiguous: > > For the top-level case of l-explicit-document, there is a > s-separate-spaces followed by ns-l-block-node, which leads to > ns-l-flow-in-block. This production has both ns-flow-node and > _another_ s-b-separated-comment. When the node is an empty plain scalar, right. > But, the ns-flow-node leads to an optional ns-plain-multi. > So, we have a s-separate-spaces (where the existance of a > s-b-separated-comment implies s-indentation), followed by an > optional set of productions, followed by another > s-b-separated-comment that does not imply the existence of the > s-indentation. That's a bug; the empty-plain scalar is a PITA to get right. Its one of the things I never finalized to my satisfaction. There are two strategies to handle it: make plain-scalar match "nothing", and handle the ambiguities (as you point out, not very nice); or make plain-scalar always match _something_, and allow for all the productions invoking it to be optional all the way up to the top production (in each case, state that a missing <whatever> means an empty plain scalar node). I was in the process of migrating to the second option but never finalized it to my satisfaction. When I finish doing that the ambiguities will be gone (but we might end up with a few more complex productions). > When trying to answer this question, I noticed that the pair > s-b-separated-comment followed by l-comment-* is very common: > > - s-separate-span-spaces > - ns-l-flow-in-block > - s-l-block-seq-empty > - s-l-block-implicit-value > - l-document-suffix > > In fact, the only place l-comment-* doesn't follow after a > s-b-separated-comment is in c-b-block-header; so I was curious > if this merits a distinct production? I suppose it might. Have fun, Oren Ben-Kiki |
From: Clark C. E. <cc...@cl...> - 2004-09-24 06:06:15
|
Ok. I've done a bit more thinking on the productions. So far, our YAML BNF has two somewhat-unusual features: (a) indentation level, and (b) context (flow,block). Another source of complication to the productions is something like: --- - #comment ... The issue is that "- " and " #" are both tokens, yet, "- #" is also those same two tokens. So, productions which use "- " need a way to express that one or more spaces immediately following are required, and " #" needs a way to express that one or more spaces immediately prior is needed. Let's introduce a new production symbol, $ , which means one or more spaces, but with the unique property that $ $ in the productions (as denormalized) _still_ mean one or more spaces. item-indicator :== '-' $ comment-indicator :== $ '#' The implementation of $ is straight-forward, the parser keeps a single flag, when the current production is $ then one or more subsquent spaces are matched, subsequent $ also match if the flag is set (no matter how many spaces had occurred). And then, if a non-space character is needed from a production, the flag is reset. The whole process then continues. The more general case, is that $ could be used in front of any 'character-class' (ie, an alternation of characters). And then the flag behavior above stores the most recent character matching a class in the 'slot' and this then matches any corresponding use of characters... break :== '\n' comment-separator :== break | ' ' item-separator :== break | ' ' | '\t' comment-indicator := $comment-separator '#' item-indicator :== '-' $item-separator In this way, "- #" would then match item-indicator followed by a comment-indicator; as would "-\n#" or "- \n#" or "-\n #" etc. However, since '\t' is not in the comment-separator (for illustration), "-\t#" would only match item-indicator and then fail to match comment indicator. I did a provisional analysis of the BNF productions, and I think several of them can collapse if we do something non-standard like this. Given the simple implementation, it seems like it would add clarity. For example, s-separate-span-spaces gets smaller (you don't need the alternation, the second part just becomes optional), c-nb-throwaway-text can be more "informative" by explicitly showing that it needs a "\n" or " " immediately before it, and the other -comment variants could collapse, as well as some of the gymnastics in the later productions used to handle this very common "overlap" case. Best, Clark P.S. This still doesn't address the ambiguity problem I posted a bit earlier, but I'm not sure there is a good solution to this besides adding another context-parameter? Or... just cope. ;( On Thu, Sep 23, 2004 at 11:45:19PM -0400, Clark C. Evans wrote: | Oren, | | I've found a case that is a bit hard to code-up in a parser | beacuse it is ambiguous: | | For the top-level case of l-explicit-document, there is a | s-separate-spaces followed by ns-l-block-node, which leads to | ns-l-flow-in-block. This production has both ns-flow-node and | _another_ s-b-separated-comment. But, the ns-flow-node leads | to an optional ns-plain-multi. | | So, we have a s-separate-spaces (where the existance of a | s-b-separated-comment implies s-indentation), followed by an | optional set of productions, followed by another | s-b-separated-comment that does not imply the existence of the | s-indentation. | | There is quite a bit of logical separation between these two | productions, and it seems to require some 'backtracking' of sorts; | or some nasty context flag. In any case, removing the ambiguity | causes deviations from the spec, and I was trying to follow it quite | religiously. Is there a way to remove this ambiguity? | | ... | | When trying to answer this question, I noticed that the pair | s-b-separated-comment followed by l-comment-* is very common: | | - s-separate-span-spaces | - ns-l-flow-in-block | - s-l-block-seq-empty | - s-l-block-implicit-value | - l-document-suffix | | In fact, the only place l-comment-* doesn't follow after a | s-b-separated-comment is in c-b-block-header; so I was curious | if this merits a distinct production? And if so, perhaps the | s-b-separate-spaces could be 'denormalized' in its various | places in a way to remove the ambiguity? | | Clark | | | ------------------------------------------------------- | This SF.Net email is sponsored by: YOU BE THE JUDGE. Be one of 170 | Project Admins to receive an Apple iPod Mini FREE for your judgement on | who ports your project to Linux PPC the best. Sponsored by IBM. | Deadline: Sept. 24. Go here: http://sf.net/ppc_contest.php | _______________________________________________ | Yaml-core mailing list | Yam...@li... | https://lists.sourceforge.net/lists/listinfo/yaml-core -- Clark C. Evans Prometheus Research, LLC. http://www.prometheusresearch.com/ o office: +1.203.777.2550 ~/ , mobile: +1.203.444.0557 // (( Prometheus Research: Transforming Data Into Knowledge \\ , \/ - Research Exchange Database /\ - Survey & Assessment Technologies ` \ - Software Tools for Researchers ~ * |
From: David H. <dav...@bl...> - 2004-09-24 07:11:35
|
Clark C. Evans wrote: > Ok. I've done a bit more thinking on the productions. So far, our > YAML BNF has two somewhat-unusual features: (a) indentation level, > and (b) context (flow,block). Another source of complication to > the productions is something like: > > --- > - #comment > ... > > The issue is that "- " and " #" are both tokens, yet, "- #" is also > those same two tokens. So, productions which use "- " need a way > to express that one or more spaces immediately following are > required, and " #" needs a way to express that one or more spaces > immediately prior is needed. Let's introduce a new production > symbol, $ , which means one or more spaces, but with the unique > property that $ $ in the productions (as denormalized) _still_ mean > one or more spaces. > > item-indicator :== '-' $ > comment-indicator :== $ '#' Parsing expression grammars have a simple way to express this: item-indicator ::= '-' &spaces comment-indicator ::= spaces '#' spaces ::= ' '* The notation &spaces means "check that 'spaces' would match here, but do not consume it" (http://en.wikipedia.org/wiki/Parsing_expression_grammar). I haven't looked at the grammar in detail, though; it may be that there is another way of doing this in plain BNF. -- David Hopwood <dav...@bl...> |
From: Clark C. E. <cc...@cl...> - 2004-09-24 14:05:01
|
On Fri, Sep 24, 2004 at 08:11:25AM +0100, David Hopwood wrote: | >The issue is that "- " and " #" are both tokens, yet, "- #" is also | >those same two tokens. So, productions which use "- " need a way | >to express that one or more spaces immediately following are | >required, and " #" needs a way to express that one or more spaces | >immediately prior is needed. | | Parsing expression grammars have a simple way to express this: | | item-indicator ::= '-' &spaces | comment-indicator ::= spaces '#' | spaces ::= ' '* This is nice. This could potentially clean things up, or at least make the production's intent more clear. ... The other problem I'm having (implementing) is that the specification has a production of the form: A (B C)? D? B? So, I get A, no problem. But when I move onto B, I don't know which one it is. However, in reality, if one has C then D isn't optional. So, perhaps this is just a bug production issue... A ((B C D) | D? B? That's icky, but I suppose its what's going on. The production D is ns-plain. Thoughts? Clark |
From: Clark C. E. <cc...@cl...> - 2004-09-24 15:56:35
|
Just musing. Consider ` used to mark a space, --- `- `` ... In this case, the production immediately after the '-' isq n-l-block-seq-node, which has a s-separate-spaces, followed by (an ultimately optional) ns-flow-node, followed by a s-b-separated-comment. So. s-separate-spaces matches since s-b-separated-comment in this case is LF, and then the is-indentation(n) matches. In this case the ns-flow-node is optional (blank plain-scalar reported), followed by a s-b-separated-comment matching the next LF. So, this passes. Let's try... --- `- ... In this case, s-b-separate-spaces doesn't match: there are not any discarded spaces and the indent(1) isn't there. So, it would appear to me that n-l-block-seq-node does not match. Anyway, I suppose this works at a higher level, but it is clearly ambiguous. Clark On Fri, Sep 24, 2004 at 10:04:58AM -0400, Clark C. Evans wrote: | The other problem I'm having (implementing) is that the | specification has a production of the form: | | A (B C)? D? B? | | So, I get A, no problem. But when I move onto B, I don't | know which one it is. However, in reality, if one has C | then D isn't optional. So, perhaps this is just a bug | production issue... | | A ((B C D) | D? B? | | That's icky, but I suppose its what's going on. The | production D is ns-plain. Thoughts? | | Clark | | | ------------------------------------------------------- | This SF.Net email is sponsored by: YOU BE THE JUDGE. Be one of 170 | Project Admins to receive an Apple iPod Mini FREE for your judgement on | who ports your project to Linux PPC the best. Sponsored by IBM. | Deadline: Sept. 24. Go here: http://sf.net/ppc_contest.php | _______________________________________________ | Yaml-core mailing list | Yam...@li... | https://lists.sourceforge.net/lists/listinfo/yaml-core -- Clark C. Evans Prometheus Research, LLC. http://www.prometheusresearch.com/ o office: +1.203.777.2550 ~/ , mobile: +1.203.444.0557 // (( Prometheus Research: Transforming Data Into Knowledge \\ , \/ - Research Exchange Database /\ - Survey & Assessment Technologies ` \ - Software Tools for Researchers ~ * |
From: David H. <dav...@bl...> - 2004-09-24 20:33:28
|
Clark C. Evans wrote: > On Fri, Sep 24, 2004 at 08:11:25AM +0100, David Hopwood wrote: > | >The issue is that "- " and " #" are both tokens, yet, "- #" is also > | >those same two tokens. So, productions which use "- " need a way > | >to express that one or more spaces immediately following are > | >required, and " #" needs a way to express that one or more spaces > | >immediately prior is needed. > | > | Parsing expression grammars have a simple way to express this: > | > | item-indicator ::= '-' &spaces > | comment-indicator ::= spaces '#' > | spaces ::= ' '* Should have been: spaces ::= ' '+ > This is nice. This could potentially clean things up, or at > least make the production's intent more clear. > > The other problem I'm having (implementing) is that the > specification has a production of the form: > > A (B C)? D? B? > > So, I get A, no problem. But when I move onto B, I don't > know which one it is. However, in reality, if one has C > then D isn't optional. Do you mean: A (B C D)? B? In BNF, this is ambiguous. In a PEG it is not, because the ? operator is greedy; it is equivalent to: A ((B C D) / ()) (B / ()) where the left-hand side of / is always matched before the right if possible. In general PEGs are really nice for resolving ambiguities -- they can always be parsed deterministically. -- David Hopwood <dav...@bl...> |
From: Oren Ben-K. <or...@be...> - 2004-09-25 19:10:01
|
On Friday 24 September 2004 09:11, David Hopwood wrote: > Parsing expression grammars have a simple way to express this: > > item-indicator ::= '-' &spaces > comment-indicator ::= spaces '#' > spaces ::= ' '* > > The notation &spaces means "check that 'spaces' would match here, but > do not consume it" > (http://en.wikipedia.org/wiki/Parsing_expression_grammar). Yes, that's a know trick, and it matches to some extent what a parser would do in practice. In general it is always possible to add such &-expressions to a BNF grammer and remove all ambiguities (provided the grammer isn't inherently ambiguos). Then again, it doesn't remove lookahead, it just makes it explicit. Whether this would actually simplify the productions... Hmmm. I admit I haven't given it much thought. I kind of doubt that it will, but I'll keep it in mind. > I haven't looked at the grammar in detail, though; it may be that > there is another way of doing this in plain BNF. Not that I know of. There's an inherent tension between the grammer's "semantics" - 1-1 matching between (some) productions and the entities encoded in the syntax - and the "operations" - productions that make a decision based on the minimal number of additional unparsed characters. In the extreme you end up with a BNF consisting only of rules of the form: production ::= ('a'|'b'|...) if-letter-production | ('0'|'1'|...) if-digit-production | &lookahead if-lookahead-production | ... Such a syntax is trivial to implement, but it is a pain to understand (its like the "machine code" of a C program). As it is possible to automatically convert any BNF to such an extreme "operational" form, I put my emphasis on higher-level productions in the spec itself for the benefit of the readers. Have fun, Oren Ben-Kiki |
From: Clark C. E. <cc...@cl...> - 2004-09-24 21:41:31
|
On Fri, Sep 24, 2004 at 09:33:20PM +0100, David Hopwood wrote: | >The other problem I'm having (implementing) is that the | >specification has a production of the form: | > | > A (B C)? D? B? | > | >So, I get A, no problem. But when I move onto B, I don't | >know which one it is. However, in reality, if one has C | >then D isn't optional. | | Do you mean: | | A (B C D)? B? PN (ns-plain) DS (discarded-spaces) LC (l-comment) In (s-indentation(n) n-discarded-spaces?) SC (separated-comment) ::= DS? ... SS (separate-spaces) ::= DS | ( SC LC* In ) /* particular reduction paths */ FN (flow-node) ::> PN? BN (block-node) ::> FN SC LC* /* demonstration of reduction when PN is missing... */ START = SS BN = SS FN SC LC* = (DS | SC LC* In) PN? SC LC* = (DS PN? SC LC*) | (SC LC* In PN? SC LC* ) = (DS SC LC*) | (SC LC* In SC LC*) Anyway... since SC, LC* and DS are all slight-variants of each other, these two cases are just, well, hard to grok. It seems that the SC, SS LC* and other productions just seem to be doing 'ignorable' processing. I'm curious if one could just wrap these into a single, larger and denormalized mega-production. This would, at any rate, map better to an implementation (I doubt one would want to implement each of of these small productions with a distinct function). I was musing about a production mechanism that was recursive, that is, ignorable(n, subprod) ::= ( (DS subprod(n)) | (SC LC* In subprod(n)) )? SC LC* then this example is ignorable(n, ns-plain) Thoughts? Clark |
From: David H. <dav...@bl...> - 2004-09-25 07:18:09
|
Clark C. Evans wrote: > PN (ns-plain) > DS (discarded-spaces) > LC (l-comment) > In (s-indentation(n) n-discarded-spaces?) > SC (separated-comment) ::= DS? ... > SS (separate-spaces) ::= DS | ( SC LC* In ) > > /* particular reduction paths */ > FN (flow-node) ::> PN? > BN (block-node) ::> FN SC LC* > > /* demonstration of reduction when PN is missing... */ > START = SS BN > = SS FN SC LC* > = (DS | SC LC* In) PN? SC LC* > = (DS PN? SC LC*) | (SC LC* In PN? SC LC* ) > = (DS SC LC*) | (SC LC* In SC LC*) Now you've gone and given me a splitting headache... > Anyway... since SC, LC* and DS are all slight-variants of each > other, these two cases are just, well, hard to grok. It seems that > the SC, SS LC* and other productions just seem to be doing > 'ignorable' processing. Agree. Taking a step back, though, I think the reason why this grammar is so hard to grok is that it's trying to handle indentation, lexical structure, and syntactic structure all at the same time. What I would do would be to split it into an indentation/lexing stage, and a syntactic stage. The indentation/lexing stage splits the file into lines, strips out comments, and determines whether each line ends in a block indicator; it also replaces the initial whitespace with artificial tokens IN and OUT when the indentation level changes. This makes the syntactic stage much easier, it can even be context-free, I think. I'll work on this a bit more and then post a more detailed description. -- David Hopwood <dav...@bl...> |
From: Oren Ben-K. <or...@be...> - 2004-09-25 19:18:29
|
On Saturday 25 September 2004 09:18, David Hopwood wrote: > > /* demonstration of reduction when PN is missing... */ > > START = SS BN > > = SS FN SC LC* > > = (DS | SC LC* In) PN? SC LC* > > = (DS PN? SC LC*) | (SC LC* In PN? SC LC* ) > > = (DS SC LC*) | (SC LC* In SC LC*) > > Now you've gone and given me a splitting headache... Its all the fault of the damn empty plain scalar. It never fails to cause a mess of things. > > Anyway... since SC, LC* and DS are all slight-variants of each > > other, these two cases are just, well, hard to grok. It seems that > > the SC, SS LC* and other productions just seem to be doing > > 'ignorable' processing. > > Agree. Yes. I'll make an effort to unify it to a "mega-production" like Clark suggested. s-seperate-spaces was a step in this direction. > Taking a step back, though, I think the reason why this grammar is so > hard to grok is that it's trying to handle indentation, lexical > structure, and syntactic structure all at the same time. > > What I would do would be to split it into an indentation/lexing > stage, and a syntactic stage. > > The indentation/lexing stage splits the file into lines, strips out > comments, and determines whether each line ends in a block indicator; > it also replaces the initial whitespace with artificial tokens IN and > OUT when the indentation level changes. This makes the syntactic > stage much easier, it can even be context-free, I think. I'll work on > this a bit more and then post a more detailed description. It sounds nice in theory. In practice, to tell whether a "|" at the end of the line denotes a block or not, you have to do the full parsing of the line (is this | inside an unterminated quoted string?). YAML just isn't geared towards the two-phase parsing... --- foo : { !ab"cd bar: " | # Can't simply count " Not a block > Not folded !not-a-tag Tricky..." } ... Have fun, Oren Ben-Kiki |
From: David H. <dav...@bl...> - 2004-09-26 00:37:50
|
Oren Ben-Kiki wrote: > On Saturday 25 September 2004 09:18, David Hopwood wrote: >>Taking a step back, though, I think the reason why this grammar is so >>hard to grok is that it's trying to handle indentation, lexical >>structure, and syntactic structure all at the same time. >> >>What I would do would be to split it into an indentation/lexing >>stage, and a syntactic stage. >> >>The indentation/lexing stage splits the file into lines, strips out >>comments, and determines whether each line ends in a block indicator; >>it also replaces the initial whitespace with artificial tokens IN and >>OUT when the indentation level changes. This makes the syntactic >>stage much easier, it can even be context-free, I think. I'll work on >>this a bit more and then post a more detailed description. > > It sounds nice in theory. In practice, to tell whether a "|" at the end > of the line denotes a block or not, you have to do the full parsing of > the line (is this | inside an unterminated quoted string?). Nope -- strings can be handled in the lexing stage without parsing anything else. > YAML just isn't geared towards the two-phase parsing... > > --- > foo : { !ab"cd bar: " | # Can't simply count " > Not a block > > Not folded > !not-a-tag > Tricky..." } > ... Thanks for the example, it confirms I'm on the right track. The output of lexing for this document would be: "---" lf "foo : { !ab" DSTR "cd bar: " ENDD " |" TEXT "Not a block >" lf "Not folded" lf "!not a tag" lf "Tricky..." #x21 " }" ENDT "..." Then the syntactic stage would reject the document because the tag specifier "!ab" cannot be continued by DSTR. (This is correct because double quotes are not URI characters.) -- David Hopwood <dav...@bl...> |
From: David H. <dav...@bl...> - 2004-09-26 01:01:02
|
I wrote: > Oren Ben-Kiki wrote: >> YAML just isn't geared towards the two-phase parsing... >> >> --- >> foo : { !ab"cd bar: " | # Can't simply count " >> Not a block > >> Not folded >> !not-a-tag >> Tricky..." } >> ... > > Thanks for the example, it confirms I'm on the right track. > > The output of lexing for this document would be: > > "---" lf > "foo : { !ab" DSTR "cd bar: " ENDD " |" TEXT "Not a block >" lf > "Not folded" lf > "!not a tag" lf > "Tricky..." #x21 " }" ENDT ^^^^ should be #x22, i.e. double quote > "..." -- David Hopwood <dav...@bl...> |
From: Oren Ben-K. <or...@be...> - 2004-09-25 18:45:23
|
I have dropped the ball on the productions, but the next two weekends are going to be "long" (Holiday season) so I'll have some time to do a thorough job on the syntax section. In the mean while: On Thursday 23 September 2004 21:53, Clark C. Evans wrote: > The ambiguous cases for these productions need quite a bit of help, > since they cause huge difficulty while providing an implementation > (ie, they cause backtracking). I'd rather have more productions, > or productions /w more context arguments. This doesn't always help. > A particular example > is s-separate-span-spaces production. This item has a mandatory > s-indentation, but before it gets to this indentation it has > several line comments. It's no way to know if the scalar has > begun at this point... so it is ambiguous with the comments > that could occur after a scalar. Any ideas how to fix? Do you mean something like (where ` stands for a space): --- -`!tag ```````# comment ````` ```scalar`starts`here ... Sure, when you parse the lines following the !tag, you can't tell if a scalar has started until you see the first non-space character. But that's inherent; reformulating the productions isn't going to be of any help. Also, I don't see the problem; you know in advance these spaces can be converted by a single space. A similar problem arises when parsing inside a plain scalar: --- -`plain`scalar `````` ```` `````# comment ... When parsing the spaces following the 'plain scalar' line, you can't tell if the scalar is done until you get to a less indented line, or you see a comment. Again, no re-formulating of the productions will help you out here. But again you know *in advance* that all these spaces can be converted to a single space. Does that help? Have fun, Oren Ben-Kiki |