Re: [CEDET-devel] bison->wisent
Brought to you by:
zappo
From: Eric M. L. <er...@si...> - 2002-09-11 13:30:51
|
>>> "David PONCE" <Dav...@wa...> seems to think that: >Hi Eric, > [ ... ] >> * Why does semantic's grammar need such a complicated number regexp? > >No more used now :-) Good. >> * Why does it need it's own symbol matcher? Why not use the >> symbol-keyword matcher? > >First, it let me customize the regexp to suit grammar lexer needs (I >just added the '.' in the regexp to fix your problem below). Second, >I prefer to use uppercase for nonterminal symbols ;-) [.. < ] >* semantic-grammar.el: > >(semantic-grammar-lex-symbol): Include '.' in symbol. Why not make . a symbol constituent in the syntax table, or in the syntax table modifiers for the .wy grammar? Also, your regex is the same as `semantic-lex-symbol-or-keyword'. If you have preferenes with case sensitivity, we should re-introduce/familiarize ourselves with the case-fold-search modifier used during parsing. >> * Why is the comment-start-skip so complicated? This is used in the >> lexer, and that complexity probably slows things down. Perhaps >> semantic-lex-comment-regex could be specified with something >> simpler? > >I just used the regexp from `emacs-lisp-mode'. I thought I recognized it. ;) I recommend rewriting it, perhaps like this: (setq semantic-lex-comment-regex ";;") since grammars now require two semicolons to make a comment. We can probably ignore the ?;; case if you modify the syntax table to include ? as a quote character in the syntax table. >> After wondering these things, I discovered the real problem. >> The lexer actually had a progress hang as soon as it hit the valid >> bison/yacc rule: >> >> .hush_warning: >> ;; action >> ; >> >> or any reference to .hush_warning. This lead me to wonder about >> `wisent-lex-punctuation' which just sort of confused me when I tried >> to understand its purpose. > >This problem is solved now :-) Yay! >`wisent-lex-punctuation' automatically return lexical tokens with the >terminal symbols associated to punctuations defined like this: > >%token <punctuation> SYMBOL "value" > >For example in semantic-grammar.wy the following punctuations are >defined: > >%token <punctuation> COLON ":" >%token <punctuation> SEMI ";" >%token <punctuation> OR "|" >%token <punctuation> LT "<" >%token <punctuation> GT ">" >%token <punctuation> PERCENT "%" > >Using `wisent-lex-punctuation', the lexer automatically return a >COLON, SEMI, OR, LT, GT, PERCENT lexical token when it encounters a >punctuation that respectively matches ":", ";", "|", "<", ">", "%". >This is very convenient for LALR grammars, that often need to >distinguish the different kind of punctuations. > >> In general, however, would it make sense to move this and others into >> semantic-grammar.el or semantic-lex.el instead of wisent-bovine.el? > >As the `semantic-lex-token' stuff is already in semantic-lex, it makes >sense to move `wisent-lex-punctuation' (as `semantic-lex-separator'?) >in semantic-lex too. If you are OK with that, I am willing to do it >;-) Sounds good. Why the name `separator'? I need to look more closely at how lexical tokens are handled I think. Following `semantic-lex-symbol-or-keyword', perhaps the name `semantic-lex-punctuation-or-... what?' would match. >Anyway, I checked some changes in to improve the grammar lexer (the >change log is below). Yay! >I converted the g++-parse.y file from the gcc sources, using the >`bison->wisent' command. Then it took me about 1.5 sec. (with my slow >Celeron 366 Mhz) to successfully parse the resulting WY grammar :-) Huzzah! That's more like it. Your parser totally rocks. the g++ parse.y file has rules in it like this: namespace_alias: NAMESPACE identifier '=' ;; Action any_id ';' ;; Action ; where '=' and ';' are token literals. Does wisent support this? I was pretty sure you had said you had removed this capability. If it is not supported, two options are to add support, or to make bison->wisent perform a translation to names for you. [ ... ] >* wisent/wisent-bovine.el: > >(wisent-lex-punctuation): Continue lexical analysis if a punctuation >don't match. [ ... ] Does this mean that '.' did not show up with a matched token name, so the character was not skipped? I think it would be good to have this create a generic punctuation token when there isn't a punctuation keyword. (See lame name above.) Thanks! Eric -- Eric Ludlam: za...@gn..., er...@si... Home: http://www.ludlam.net Siege: www.siege-engine.com Emacs: http://cedet.sourceforge.net GNU: www.gnu.org |