Re[1]: [CEDET-devel] Incremental parser behavior
Brought to you by:
zappo
From: Eric M. L. <er...@si...> - 2002-08-17 01:04:01
|
Hi David, I certainly see the how problem you are describing could be a problem. Unfortunately I fear the fixes you propose in 1 and 2 are problematic. The reparse-symbol, as used in semantic 1.4 is for tags found inside other tokens. I expanded on the original use by adding smarts for splicing new tags in and out of the master cache, within the child-list of some parent token. To remove the incremental parser for child tokens would make the incremental parser nearly useless for Java, where 90% of the file is taken up by one class. More below. >>> David Ponce <da...@dp...> seems to think that: >Hi Eric, > >While I was hacking WY grammars, I got some problems with the >incremental parser when adding new tokens between existing ones. > >That is when `semantic-edits-change-between-tokens' returns >something. Here is a summary: > >1. The `reparse-symbol' property can't be retrieved. After > `semantic-edits-change-between-tokens' returned a value the > variable `tokens' is set to nil. So the following statement that > retrieve the reparse-symbol always fails: > > (setq reparse-symbol (semantic-token-get > (car tokens) 'reparse-symbol)) > > As, in that case, "The CAR of cache-list is the token just before > our change, but wasn't modified.", a solution could be to first try > to get reparse-symbol from tokens, then from cache-list, like this: > > (setq reparse-symbol > (semantic-token-get (car (or tokens cache-list)) > 'reparse-symbol)) > > I tried the above change and it seems to work better. I think having TOKENS be null at this point is a curious problem caused when editing white-space. At the same time, if that white-space is inside some parent, we need to know at what symbol to start parsing again. Fortunately, I think your insight in using the cache list is a good idea. Those tokens belong to the same lineage as the white space edited. If cache-list is nil, we should probably then go and just mark the parent as the dirty item, or force a full reparse. (Not too expensive if there are no tokens in the file. ;) >2. The inserted text is parsed using the grammar rule pointed by the > reparse-symbol found in token just before the new text. > Sometimes, that rule does not preserve the right semantic of the > inserted text! Here is an example with WY grammar: > > 1. Initial state. The following text result in one 'nonterminal > token: any-value, that contains five nonterminal children: > any_symbol, STRING, NUMBER, PREFIX-EXP, PAREN_BLOCK, as 'rule > tokens. > > any_value: > any_symbol > | STRING > | NUMBER > | PREFIX-EXP > | PAREN_BLOCK > ; > > 2. Now I insert a new TEST rule between STRING and NUMBER, like > this (the change is enclosed in [...]): > > any_value: > any_symbol > | STRING[ > | TEST] > | NUMBER > | PREFIX-EXP > | PAREN_BLOCK > ; > > In that case `semantic-edits-change-between-tokens' returns the > 'rule tokens from STRING to PAREN_BLOCK. The reparse-symbol > `rule' is correctly retrieved from the STRING token (car > cache-list). The parser successfully re-parses the inserted text > "\n | TEST" using the `rule' semantic. But returns a false > result, that is an 'empty rule token for the first `|', followed > by a 'rule token for TEST :-( > > In fact without other context the parser can't determine if a rule > like the above is actually an empty rule plus a normal rule, or > just the latter. A short example: > > any_value: > | TEST > > any_value: > STRING > | TEST > > In first case "| TEST" means "empty or TEST", in second one it > means "or TEST". In other words, the meaning of the new text > depends on its position inside the nonterminal definition (inside > the parent 'nonterminal token)! > > That demonstrates a conflict between the semantic of reparse-symbol > and the way it is currently used by the incremental parser to parse > change between tokens. > > IMO, the reparse-symbol rule is safe to use only when re-parsing a > whole token, or a new token out of context, that is inserted > between existing tokens at top level. > > When new text is inserted between existing tokens which are part of > a parent token, the only safe way to re-parse things is to re-parse > the whole parent token. It will ensure that the semantic of > inserted text will be correct. [ ... ] I think an important difference between your analysis and mine is that I think it is ok to reparse tokens that have a parent, but only if those tokens were generated using `semantic-repeat-parse-whole-stream', as opposed to a recursive rule in a wisent grammar. I think you even cover in your wisent manual the benefits of using wisent style repetitive rules for .wy rules as opposed to the semantic version. A side effect seems to be that it breaks the incremental parser. If you can identify this specific scenario in your patch, I think it would be ok. Have fun Eric -- Eric Ludlam: za...@gn..., er...@si... Home: http://www.ludlam.net Siege: www.siege-engine.com Emacs: http://cedet.sourceforge.net GNU: www.gnu.org |