>>> Marcus Harnisch <marcus.harnisch@...> seems to think that:
>Hi Eric, David,
>
>David PONCE writes:
> > Eric M. Ludlam writes:
> > >>%token <symbol> RADIX "radix"
> > >
> > > When you use the above form, "radix" is matched as a lexical
> > > symbol.
>
>But that only works if the lexer returns a `symbol' token. I have
>set up a lexer that returns keyword tokens (if in the keyword table)
>or NAME. So could I use
>
>%token <NAME> RADIX "radix"
>
>instead? Doesn't seem to work...
The wisent parser has a special step which the bovine parser does not
that will convert the above statement into a RADIX lexical-ish token.
I'm not completely sure of the behavior. David can provide a good
answer.
You may need to leave RADIX alone (undeclared) and in your optional
lambda expression, you can throw a parser error if a NAME is not
string= to "radix". I'm not totally sure of the syntax for that.
> > > When you use the form:
> > >
> > > %token RADIX "radix"
> > >
> > > then "radix" is now identified as a keyword.
>
>This is not what I want. `radix' is not a reserved word and could be
>used as an identifier (or name) in a different context.
I handled a case like this in C where the "float" data type would
cause grief with the "float.h" include file. It looked like this for
a time:
filename: symbol punctuation "." symbol "h"
| FLOAT punctuation "." symbol "h"
;
though that type of syntax (for the . and h) is now depricated.
[ ... ]
> > Maybe it could be worth reducing the language syntax to only parse a
> > subset of the language, useful for semantic tags (like I did in the Java
> > grammar defined in wisent-java-tags.wy).
>
>I will lok into that. I think I haven't quite understood how to
>utilize the "sloppyness" of Wisent and still get satisfactory
>results. At that stage I am indeed mostly interested in generating
>tags.
If you language does not put large amounts of tag-uniniteresting code
inside Emacs lists such as { curly braces } or [ brackets ], you may
need to design your own lexical block analyzer to handle it. For
example, the lexer might match BEGIN/END statements.
> > > David can answer this best. As far as lexer hacks are concerned,
> > > remember that Emacs is very good at matching regexps in its C code,
> > > and parsing is slow, as it is implemented in Emacs Lisp. Take
> > > advantage of this when it makes the most sense.
> >
> > Agreed!
>
>I see your point of regexps being superior as far as speed is
>concerned. Unfortunately, they are not ideal to parse bigger
>contexts. Isn't this why you came up with semantic in the first place?
>You could put each grammar in a sufficiently long regular expression
>;-)
>
> > IMO, a good new parser for Semantic would be a GLR parser that would
> > permit to parse ambiguous languages like C/C++. However a such parser
> > would be probably too slow if implemented in Elisp.
>
>Well, slow is relative of course.
>
> > Maybe an interesting new approach for Semantic would be to be able to
> > directly integrate some existing parsers implemented in C (like new
> > Bison LALR/GLR parser or the GLR D-parser).
>
>Say, I would be able write a Bison/ANTLR/hand-coded parser for a
>language that would print Lisp code (the AST for instance). Semantic
>would now call that parser on a region of a buffer, providing an
>appropriate start non-terminal and `read' the parser's output back in.
>Something like that?
[ ... ]
Yes, this would be handled as separate parser. If you look at
semantic-texi.el you will find a parser based on regular
expressions. It shows how to hook into the parser structure. You
would replace the regexp based parsing with a system call and read
statement.
In XEmacs, you could have a C code parser loaded dynamically that you
could call.
Good Luck
Eric
--
Eric Ludlam: zappo@..., eric@...
Home: http://www.ludlam.net Siege: http://www.siege-engine.com
Emacs: http://cedet.sourceforge.net GNU: http://www.gnu.org
|