Hi,
I too have been busy. As my children get older, their bedtime gets
later which means less time for Emacs :( I had checked in a new file
`semantic-sort' that pulls more entries out of semantic-util, and
renames `token' to `tag'.
There is not much left in semantic-util, which means we are very
close to done with the token->tag conversion which could provide an
opportunity for a beta.
>>> David PONCE <david.ponce@...> seems to think that:
>Hi All,
>
>It seems that our hacking of cedet code is very slight these days.
>I suppose we are all particularly busy at other tasks ;-)
>
>However, I checked changes in, to fix some indentation problems in
>semantic-grammar-mode, and to auto load the semanticdb top level
>search routines (that fixes errors in senator completion when
>semanticdb mode is enabled).
Thanks!
[ ... ]
>
>Also I started to work on a LALR grammar for C, and I must admit that
>the task is not easy. I read a lot of threads about that subject and
>discovered that the C grammar contains some nasty ambiguities that
>make it difficult to be LALR :-(
That's great! (That a wisent c parser has been started.) See way
down below for more:
>The main problem I encountered is that C identifiers can be
>interpreted as typedef names or ordinary identifiers depending on
>context.
>
>A simple example will show what I mean:
>
>typedef struct {int x; int y} point;
>point point;
>
>The first occurrence of `point' is a typedef name, and the second
>occurrence an ordinary identifier!
>
>In quasi all implementations I studied, such ambiguities are solved
>by the lexer that returns an IDENTIFIER as a TYPEDEF_NAME terminal
>when that IDENTIFIER has been previously declared as a typedef.
>That requires to maintain a table of declared C symbols that take
>into account the scope of declarations.
>
>Unfortunately, that could work only if all preprocessor statements
>have been previously expanded by a first preprocessing pass. This is
>not a problem for normal compilation. But this is an issue for
>Semantic that just parses the source as it is to obtain declaration
>tags.
>
>And how it would be possible to do incremental re-parsing?
>
>Attached you will find a tarball of what I managed to do:
>
>- A C-tags.wy grammar hacked from the LALR grammar supplied to the
> community by James A. Roskind and available at
> <http://www.empathy.com/pccts/roskind.html>.
>
>- A quickly hacked wisent-c.el that permits to try the grammar.
>
>For now I got a very limited success in parsing the code in:
>
>- test.c, a very basic example.
>
>I also had to add a hook to `wisent-parse' to be able to do some
>context initialization before starting the parser. See:
>
>- wisent.el.patch, probably it would make more sense to have that hook
> called from `semantic-parse-stream'?
>
>The parser fails in many cases, particularly when it encounters a
>declaration like:
>
>EMACS_INT undo_limit;
>
>And there is no typedef for EMACS_INT, or the typedef is [probably]
>in a included header.
>
>After doing all that, I suspect that using a true C grammar, probably
>is not the right direction for Semantic, and that we will have to hack
>from scratch a specific LALR grammar :(
>
>Perhaps using wisent to parse C (things are even worse with C++!) is a
>wrong choice?
[ ... ]
You have many good observations. When it comes to creating a simple
tagging parser (as we did with c.by) many aspects of the language
that pose problems have little comments that say "There could be an
error, but assume it is right." meaning that something is
syntactically correct, but could be an error based on some wider
context.
The biggest problem are custom macros like:
#define MYFUNC (name) int name (fancy arg, list here)
MYFUNC(blah)
{
code;
}
which we currently completely ignore.
Fortunately c++'s complexity results in a reduction of such goofy
macros.
Anyway, one thought to overcome those macros is to have a
preprocessing step in semantic. It might call an actual
pre-processor, and the semanic parser would track line number pragmas.
That would be a pain to do though.
This leads me to the long term plans for semantic. A goal of mine is
for the user to leave the cursor sitting at some location, and Emacs
can suggest all possible suggestions for completion, or documentation
based on the context. Meaning it would search only tag tables
specified by include statements.
This capability would enabled (in effect) pre-compiled headers or
header parsing state which could (as you suggest) feed into the
active lexical keyword table, or symbol->keyword translation.
Unfortunately this is probably a ways off, and is related somewhat to
the semanticdb-find API I had started.
For the short term, I suspect the best tactic is to simply let your
parser accept all code that "might" be correct. c.by has a rule like
(paraphrased) like this:
variable: typesimple name opt-defaultvalue
;
typesimple: built-in-type
| struct-or-union
| symbol
;
which simply assumes that in the right place (where a variable might
appear) that the programmer knows what they're doing and that the
symbol is probably a valid type.
When real type analysis is available said symbol can be examined, and
semantic can identify the problem, it should still accept it, but
underline it indicating a problem.
Such actions, even in the presence of known invalid code, will make
the parser more robust, and provide better and more useful
diagnostics while editing.
Thanks!
Eric
--
Eric Ludlam: zappo@..., eric@...
Home: http://www.ludlam.net Siege: http://www.siege-engine.com
Emacs: http://cedet.sourceforge.net GNU: http://www.gnu.org
|