Re: [CEDET-devel] bison->wisent

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

>>> "David PONCE" <Dav...@wa...> seems to think that:
>Hi Eric,
>
  [ ... ]
>> * Why does semantic's grammar need such a complicated number regexp?
>
>No more used now :-)

Good.

>> * Why does it need it's own symbol matcher?  Why not use the
>>   symbol-keyword matcher?
>
>First, it let me customize the regexp to suit grammar lexer needs (I
>just added the '.' in the regexp to fix your problem below).  Second,
>I prefer to use uppercase for nonterminal symbols ;-)
[.. < ]
>* semantic-grammar.el:
>
>(semantic-grammar-lex-symbol): Include '.' in symbol.

Why not make . a symbol constituent in the syntax table, or in the
syntax table modifiers for the .wy grammar?

Also, your regex is the same as `semantic-lex-symbol-or-keyword'.  If
you have preferenes with case sensitivity, we should
re-introduce/familiarize ourselves with the case-fold-search modifier
used during parsing.

>> * Why is the comment-start-skip so complicated?  This is used in the
>>   lexer, and that complexity probably slows things down.  Perhaps
>>   semantic-lex-comment-regex could be specified with something
>>   simpler?
>
>I just used the regexp from `emacs-lisp-mode'.

I thought I recognized it.  ;)

I recommend rewriting it, perhaps like this:

(setq semantic-lex-comment-regex ";;")

since grammars now require two semicolons to make a comment.
We can probably ignore the ?;; case if you modify the syntax table to
include ? as a quote character in the syntax table.

>>   After wondering these things, I discovered the real problem.
>> The lexer actually had a progress hang as soon as it hit the valid
>> bison/yacc rule:
>> 
>> .hush_warning: 
>> ;; action
>> ;
>> 
>> or any reference to .hush_warning.  This lead me to wonder about
>> `wisent-lex-punctuation' which just sort of confused me when I tried
>> to understand its purpose.
>
>This problem is solved now :-)

Yay!

>`wisent-lex-punctuation' automatically return lexical tokens with the
>terminal symbols associated to punctuations defined like this:
>
>%token <punctuation> SYMBOL "value"
>
>For example in semantic-grammar.wy the following punctuations are
>defined:
>
>%token <punctuation>   COLON       ":"
>%token <punctuation>   SEMI        ";"
>%token <punctuation>   OR          "|"
>%token <punctuation>   LT          "<"
>%token <punctuation>   GT          ">"
>%token <punctuation>   PERCENT     "%"
>
>Using `wisent-lex-punctuation', the lexer automatically return a
>COLON, SEMI, OR, LT, GT, PERCENT lexical token when it encounters a
>punctuation that respectively matches ":", ";", "|", "<", ">", "%".
>This is very convenient for LALR grammars, that often need to
>distinguish the different kind of punctuations.
>
>> In general, however, would it make sense to move this and others into
>> semantic-grammar.el or semantic-lex.el instead of wisent-bovine.el?
>
>As the `semantic-lex-token' stuff is already in semantic-lex, it makes
>sense to move `wisent-lex-punctuation' (as `semantic-lex-separator'?)
>in semantic-lex too.  If you are OK with that, I am willing to do it
>;-)

Sounds good.   Why the name `separator'?  I need to look more closely
at how lexical tokens are handled I think.  Following
`semantic-lex-symbol-or-keyword', perhaps the name
`semantic-lex-punctuation-or-... what?' would match.

>Anyway, I checked some changes in to improve the grammar lexer (the
>change log is below).

Yay!

>I converted the g++-parse.y file from the gcc sources, using the
>`bison->wisent' command.  Then it took me about 1.5 sec. (with my slow
>Celeron 366 Mhz) to successfully parse the resulting WY grammar :-)

Huzzah!  That's more like it.  Your parser totally rocks.

the g++ parse.y file has rules in it like this:

namespace_alias:
          NAMESPACE identifier '='
;; Action
          any_id ';'
;; Action
	;

where '=' and ';' are token literals.  Does wisent support this?  I
was pretty sure you had said you had removed this capability.  If it
is not supported, two options are to add support, or to make
bison->wisent perform a translation to names for you.

  [ ... ]
>* wisent/wisent-bovine.el:
>
>(wisent-lex-punctuation): Continue lexical analysis if a punctuation
>don't match.
  [ ... ]

Does this mean that '.' did not show up with a matched token name, so
the character was not skipped?

I think it would be good to have this create a generic punctuation
token when there isn't a punctuation keyword.  (See lame name above.)

Thanks!
Eric

-- 
          Eric Ludlam:                 za...@gn..., er...@si...
   Home: http://www.ludlam.net            Siege: www.siege-engine.com
Emacs: http://cedet.sourceforge.net               GNU: www.gnu.org