Re: [CEDET-devel] wisent compiler patch

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Hi Eric,

Sorry for the late reply.  I was in Paris for my job, during the two
last days, and hadn't access to Internet.

[...]
>>That makes sense.  Could you check it in, please?
> 
> 
> Done.

Thanks!

[...]
> Naturally, I didn't feel like reading up on things first. ;)  What I
> found was that the tokens produced by the lexer have to be mentioned
> in the grammar file or they can't be used.  Here are some things I
> bumped into.  I do not know if good error/warning messages can be
> generated for them or not.

Yes, tokens must be declared in the grammar, like they must be in
Bison.

They are declared using %token, %left, %right or %assoc statements.

> 1) `%token FOO', is not the same as `%token FOO "foo"'

That's right.  The former is a simple token without type nor value.
The latter is a keyword, that form was introduced by Semantic in BNF
grammars.  Maybe, could we introduce a new %keyword statement, to
declare language keywords?  Notice also, that the parser doesn't
really distinguish keywords and tokens,  it just eat lexical tokens!
Only the lexer uses that notion of keywords.

>  - I started with AWK because it was small, and was tricked by that
>  one.
> 
> 2) `%token <symbol> IDENTIFIER' requires a change in the lexer to
>    actually produce IDENTIFIER tokens.  To just use the lexer symbol
>    token, I needed to add `%token <symbol> symbol' which seemed
>    strange.

Right.  The default implementation `semantic-lex-symbol-or-keyword'
is "bovine" centric, and produces `symbol' tokens.  To get
`IDENTIFIER' tokens, it is necessary to "clone" that analyzer, and
replace `symbol' by `IDENTIFIER'.  Not really a problem ;-)

I admit that "%token <symbol> symbol" can seem strange, but it is a
valid syntax, that defines the lexical token `symbol' for tokens of
type <symbol>.  The confusion here is due to
`semantic-lex-symbol-or-keyword', that return `symbol' tokens.

>    It seems as if you need to write a majority of your own lexer or
>    you can't do anything.  It would be nice if there were some
>    defaults I could use so I could make a lexer very close to some
>    default without having to create lots of special things for
>    identifiers, strings, various semantic-list types, etc.
> 
>    I guess I wasn't expecting full duplicity between what the lexer
>    produces, and what you need to tell wisent about so it can create
>    a parser.  A side effect of having worked on the bovine parser for
>    so long.

I think, it is better to explicitly declare all the lexical tokens used.
Bison requires that, and Wisent too.

> 3) graphviz-dot-mode's syntax table things -, >, and other items were
>    of syntax type symbol, not punctuation.  Many of my %token entries
>    where then ignored.  I've since patched graphviz-dot-mode to use
>    better syntax entries.

That is a constraint introduced by default lexical analyzers that
uses Emacs syntax classes (things like "\\s<code>") in regexps.  I also
encountered that sort of problem when I wrote the semantic-grammar
lexer, and I had to concoct a `semantic-grammar-syntax-table' ;-)

David