Re: [CEDET-devel] wisent compiler patch
Brought to you by:
zappo
From: David P. <dav...@wa...> - 2003-03-26 09:07:13
|
Hi Eric, Sorry for the late reply. I was in Paris for my job, during the two last days, and hadn't access to Internet. [...] >>That makes sense. Could you check it in, please? > > > Done. Thanks! [...] > Naturally, I didn't feel like reading up on things first. ;) What I > found was that the tokens produced by the lexer have to be mentioned > in the grammar file or they can't be used. Here are some things I > bumped into. I do not know if good error/warning messages can be > generated for them or not. Yes, tokens must be declared in the grammar, like they must be in Bison. They are declared using %token, %left, %right or %assoc statements. > 1) `%token FOO', is not the same as `%token FOO "foo"' That's right. The former is a simple token without type nor value. The latter is a keyword, that form was introduced by Semantic in BNF grammars. Maybe, could we introduce a new %keyword statement, to declare language keywords? Notice also, that the parser doesn't really distinguish keywords and tokens, it just eat lexical tokens! Only the lexer uses that notion of keywords. > - I started with AWK because it was small, and was tricked by that > one. > > 2) `%token <symbol> IDENTIFIER' requires a change in the lexer to > actually produce IDENTIFIER tokens. To just use the lexer symbol > token, I needed to add `%token <symbol> symbol' which seemed > strange. Right. The default implementation `semantic-lex-symbol-or-keyword' is "bovine" centric, and produces `symbol' tokens. To get `IDENTIFIER' tokens, it is necessary to "clone" that analyzer, and replace `symbol' by `IDENTIFIER'. Not really a problem ;-) I admit that "%token <symbol> symbol" can seem strange, but it is a valid syntax, that defines the lexical token `symbol' for tokens of type <symbol>. The confusion here is due to `semantic-lex-symbol-or-keyword', that return `symbol' tokens. > It seems as if you need to write a majority of your own lexer or > you can't do anything. It would be nice if there were some > defaults I could use so I could make a lexer very close to some > default without having to create lots of special things for > identifiers, strings, various semantic-list types, etc. > > I guess I wasn't expecting full duplicity between what the lexer > produces, and what you need to tell wisent about so it can create > a parser. A side effect of having worked on the bovine parser for > so long. I think, it is better to explicitly declare all the lexical tokens used. Bison requires that, and Wisent too. > 3) graphviz-dot-mode's syntax table things -, >, and other items were > of syntax type symbol, not punctuation. Many of my %token entries > where then ignored. I've since patched graphviz-dot-mode to use > better syntax entries. That is a constraint introduced by default lexical analyzers that uses Emacs syntax classes (things like "\\s<code>") in regexps. I also encountered that sort of problem when I wrote the semantic-grammar lexer, and I had to concoct a `semantic-grammar-syntax-table' ;-) David |