Thread: [CEDET-devel] What I am doing ;-)

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Hi All,

It seems that our hacking of cedet code is very slight these days.
I suppose we are all particularly busy at other tasks ;-)

However, I checked changes in, to fix some indentation problems in
semantic-grammar-mode, and to auto load the semanticdb top level
search routines (that fixes errors in senator completion when
semanticdb mode is enabled).

Here is the change log:

2003-06-20  David Ponce  <da...@dp...>

        * semantic/semanticdb-find.el

        (semanticdb-find-tags-by-name)
        (semanticdb-find-tags-by-name-regexp)
        (semanticdb-find-tags-for-completion)
        (semanticdb-deep-find-tags-by-name)
        (semanticdb-deep-find-tags-by-name-regexp)
        (semanticdb-deep-find-tags-for-completion)
        (semanticdb-find-tags-external-children-of-type): Add autoload
        cookies.

        * semantic/semantic-grammar.el

        (semantic-grammar-goto-grammar-indent-anchor): In certain cases
        `forward-sexp' ignore important punctuations.  Use
        `skip-syntax-backward' instead to skip punctuations.

2003-06-19  David Ponce  <da...@dp...>

        * semantic/semantic-grammar.el

        (semantic-grammar-mode): Set buffer local value of
        `parse-sexp-ignore-comments' to non-nil to fix indentation
        problems when there are unbalanced parenthesis in comments.
        (semantic-grammar-goto-grammar-indent-anchor): Consider %prec like
        another symbol in rule.
        (semantic-grammar-grammar-compute-indentation): Don't consider
        %prec like other percent keywords that are aligned at beginning of
        line.

Also I started to work on a LALR grammar for C, and I must admit that
the task is not easy.  I read a lot of threads about that subject and
discovered that the C grammar contains some nasty ambiguities that
make it difficult to be LALR :-(

The main problem I encountered is that C identifiers can be
interpreted as typedef names or ordinary identifiers depending on
context.

A simple example will show what I mean:

typedef struct {int x; int y} point;
point point;

The first occurrence of `point' is a typedef name, and the second
occurrence an ordinary identifier!

In quasi all implementations I studied, such ambiguities are solved
by the lexer that returns an IDENTIFIER as a TYPEDEF_NAME terminal
when that IDENTIFIER has been previously declared as a typedef.
That requires to maintain a table of declared C symbols that take
into account the scope of declarations.

Unfortunately, that could work only if all preprocessor statements
have been previously expanded by a first preprocessing pass.  This is
not a problem for normal compilation.  But this is an issue for
Semantic that just parses the source as it is to obtain declaration
tags.

And how it would be possible to do incremental re-parsing?

Attached you will find a tarball of what I managed to do:

- A C-tags.wy grammar hacked from the LALR grammar supplied to the
  community by James A. Roskind and available at
  <http://www.empathy.com/pccts/roskind.html>.

- A quickly hacked wisent-c.el that permits to try the grammar.

For now I got a very limited success in parsing the code in:

- test.c, a very basic example.

I also had to add a hook to `wisent-parse' to be able to do some
context initialization before starting the parser. See:

- wisent.el.patch, probably it would make more sense to have that hook
  called from `semantic-parse-stream'?

The parser fails in many cases, particularly when it encounters a
declaration like:

EMACS_INT undo_limit;

And there is no typedef for EMACS_INT, or the typedef is [probably]
in a included header.

After doing all that, I suspect that using a true C grammar, probably
is not the right direction for Semantic, and that we will have to hack
from scratch a specific LALR grammar :(

Perhaps using wisent to parse C (things are even worse with C++!) is a
wrong choice?

Any thoughts, help, or improvement to my work will be very welcome ;-)

Sincerely,
David

Thread: [CEDET-devel] What I am doing ;-)

cedet-devel