>>> "David PONCE" <David.Ponce@...> seems to think that:
>Hi Eric,
>
[ ... ]
>> * Why does semantic's grammar need such a complicated number regexp?
>
>No more used now :-)
Good.
>> * Why does it need it's own symbol matcher? Why not use the
>> symbol-keyword matcher?
>
>First, it let me customize the regexp to suit grammar lexer needs (I
>just added the '.' in the regexp to fix your problem below). Second,
>I prefer to use uppercase for nonterminal symbols ;-)
[.. < ]
>* semantic-grammar.el:
>
>(semantic-grammar-lex-symbol): Include '.' in symbol.
Why not make . a symbol constituent in the syntax table, or in the
syntax table modifiers for the .wy grammar?
Also, your regex is the same as `semantic-lex-symbol-or-keyword'. If
you have preferenes with case sensitivity, we should
re-introduce/familiarize ourselves with the case-fold-search modifier
used during parsing.
>> * Why is the comment-start-skip so complicated? This is used in the
>> lexer, and that complexity probably slows things down. Perhaps
>> semantic-lex-comment-regex could be specified with something
>> simpler?
>
>I just used the regexp from `emacs-lisp-mode'.
I thought I recognized it. ;)
I recommend rewriting it, perhaps like this:
(setq semantic-lex-comment-regex ";;")
since grammars now require two semicolons to make a comment.
We can probably ignore the ?;; case if you modify the syntax table to
include ? as a quote character in the syntax table.
>> After wondering these things, I discovered the real problem.
>> The lexer actually had a progress hang as soon as it hit the valid
>> bison/yacc rule:
>>
>> .hush_warning:
>> ;; action
>> ;
>>
>> or any reference to .hush_warning. This lead me to wonder about
>> `wisent-lex-punctuation' which just sort of confused me when I tried
>> to understand its purpose.
>
>This problem is solved now :-)
Yay!
>`wisent-lex-punctuation' automatically return lexical tokens with the
>terminal symbols associated to punctuations defined like this:
>
>%token <punctuation> SYMBOL "value"
>
>For example in semantic-grammar.wy the following punctuations are
>defined:
>
>%token <punctuation> COLON ":"
>%token <punctuation> SEMI ";"
>%token <punctuation> OR "|"
>%token <punctuation> LT "<"
>%token <punctuation> GT ">"
>%token <punctuation> PERCENT "%"
>
>Using `wisent-lex-punctuation', the lexer automatically return a
>COLON, SEMI, OR, LT, GT, PERCENT lexical token when it encounters a
>punctuation that respectively matches ":", ";", "|", "<", ">", "%".
>This is very convenient for LALR grammars, that often need to
>distinguish the different kind of punctuations.
>
>> In general, however, would it make sense to move this and others into
>> semantic-grammar.el or semantic-lex.el instead of wisent-bovine.el?
>
>As the `semantic-lex-token' stuff is already in semantic-lex, it makes
>sense to move `wisent-lex-punctuation' (as `semantic-lex-separator'?)
>in semantic-lex too. If you are OK with that, I am willing to do it
>;-)
Sounds good. Why the name `separator'? I need to look more closely
at how lexical tokens are handled I think. Following
`semantic-lex-symbol-or-keyword', perhaps the name
`semantic-lex-punctuation-or-... what?' would match.
>Anyway, I checked some changes in to improve the grammar lexer (the
>change log is below).
Yay!
>I converted the g++-parse.y file from the gcc sources, using the
>`bison->wisent' command. Then it took me about 1.5 sec. (with my slow
>Celeron 366 Mhz) to successfully parse the resulting WY grammar :-)
Huzzah! That's more like it. Your parser totally rocks.
the g++ parse.y file has rules in it like this:
namespace_alias:
NAMESPACE identifier '='
;; Action
any_id ';'
;; Action
;
where '=' and ';' are token literals. Does wisent support this? I
was pretty sure you had said you had removed this capability. If it
is not supported, two options are to add support, or to make
bison->wisent perform a translation to names for you.
[ ... ]
>* wisent/wisent-bovine.el:
>
>(wisent-lex-punctuation): Continue lexical analysis if a punctuation
>don't match.
[ ... ]
Does this mean that '.' did not show up with a matched token name, so
the character was not skipped?
I think it would be good to have this create a generic punctuation
token when there isn't a punctuation keyword. (See lame name above.)
Thanks!
Eric
--
Eric Ludlam: zappo@..., eric@...
Home: http://www.ludlam.net Siege: http://www.siege-engine.com
Emacs: http://cedet.sourceforge.net GNU: http://www.gnu.org
|