Re[2]: [CEDET-devel] problem getting wisent-python.wy started (continued)

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

>>> pon...@ne... (David Ponce) seems to think that:
>Hi Eric,
>
>[...]
>> I like the idea of handlers.  It is similar to the flex extensions.
>> It would be nice if we could get all the different types of extensions
>> down into a common set of constructs.  This makes a fourth style of
>> specialized lexical adaptation.  (the first three being the syntax
>> table, flags, then the extensions.)
>> 
>> Perhaps when we combine the lexical notions of the LL and LALR
>> parsers, we could have one batch of handlers, and enabling whitespace,
>> newlines, comments, etc are all different handlers that you explicitly
>> add instead of flags.  That would be more efficient.  Unfortunately,
>> it seems like your handlers run at a different level from the
>> handlers I'm thinking of.
>
>I must confess I don't clearly understand what this means: "your
>handlers run at a different level from the handlers I'm thinking
>of".  Could you please elaborate?

The current semantic-flex extensions are regexp that look at text.
If I understood your handlers properly, they filter already generated
tokens.

>I already thought about a such architecture for lexical analysis based
>on handlers instead of options.  It would allow to easily extend the
>lexer capabilities by creating new handlers.  It could be also a good
>mechanism to unify `semantic-flex' and `wisent-flex'.  Each parser
>could provide a set of "standard" handlers to get the lexical tokens
>they need, with an appropriate format.  Then each language could
>extend the standard set depending on its requirements.

This would certainly be the ideal.

>Compared to the current implementation of `semantic-flex' and
>`wisent-flex' an implementation based on handlers will result in a lot
>of function calls.  So I am not sure it will be more efficient.

If handlers were macros, we could build specialized lexers by
constructing our own code out of them, eliminating function calls.
Similar to the way we build elisp code from the grammar.

>I also thought about a different approach based on filtering of
>syntactic tokens, which is close of the current way `semantic-flex'
>and `wisent-flex' work.
>
>Instead of using options to include some category of tokens,
>`semantic-flex' could simply convert all input data into a stream of
>all the sort of syntactic tokens it found (like if all
>`semantic-flex-enable-...' options were enabled).

This could miss specialized regexp based tokens as created by
semantic-make.el which converts otherwise normal looking text into
bigger blocks.

>This intermediate generic stream would be very easy to digest by
>specific lexers (like 'wisent-flex').  When called, such lexers would
>return the next lexical token available or EOI, eliminating unneeded
>tokens, and maybe doing more sophisticated processing too.

That is a nifty idea.  I have read some code in Emacs that goes out
of it's way to avoid consing, as if it were a bad thing.  This
approach would add more consing, but certainly simplify use of it.

>A set of default lexers could be provided for each parser.  The
>simplest one will probably look like this:
>
>(defun semantic-default-lexer ()
>  (prog1 (car semantic-token-stream)
>    (setq semantic-token-stream (cdr semantic-token-stream))))
>
>A more sophisticated one is `wisent-flex' ;-)
>
>The issue here would be to adapt the current LL parser so it would be
>able to get lexical tokens one at a time.

That would be quite the trick since it uses the call stack to unwind
errors. ;)  The LL parser could just run the filter on the whole
input stream.

>Globally I think this approach should be faster than the handler's one!
>
>> Next, I'm a little wary of putting lexical information in the grammar.
>> (Yes, I know we already put some of the lexical flags in the grammar
>> file too.)  If we think it is good to put lexical information in the
>> grammar, we should probably design .wy grammar features for other
>> parts too the way lex does perhaps.  Eeks.
>
>IMO, having both lexical and syntactic informations in one file is a
>good thing.  A lot of new parsers adopted a such design.  In fact I
>think it would nice to also include all the related ELisp code in the
>grammar file.  So the corresponding .el file could be entirely
>generated.  And even better only the byte-compiled .elc file could be
>directly produced ;-)
  [ ... ]

I tend to agree though I am concerned of bucking the "standard" of
flex & bison.

I always wanted to have the .bnf (or whatever extension we use)
include a full definition of the syntax table as well.  Such a
definition could have more details in it for our lexical purposes
(like numbers,) and then be reduced to the more typical syntax table
for the language mode.

Such a definition could be used to construct a specialized lexer (as
per discussion above) using various handlers.

Nifty.
Eric

-- 
          Eric Ludlam:                 za...@gn..., er...@si...
   Home: www.ultranet.com/~zappo            Siege: www.siege-engine.com
Emacs: http://cedet.sourceforge.net               GNU: www.gnu.org