[CEDET-devel] What I am doing ;-)
Brought to you by:
zappo
From: David P. <dav...@wa...> - 2003-06-20 14:42:41
|
Hi All, It seems that our hacking of cedet code is very slight these days. I suppose we are all particularly busy at other tasks ;-) However, I checked changes in, to fix some indentation problems in semantic-grammar-mode, and to auto load the semanticdb top level search routines (that fixes errors in senator completion when semanticdb mode is enabled). Here is the change log: 2003-06-20 David Ponce <da...@dp...> * semantic/semanticdb-find.el (semanticdb-find-tags-by-name) (semanticdb-find-tags-by-name-regexp) (semanticdb-find-tags-for-completion) (semanticdb-deep-find-tags-by-name) (semanticdb-deep-find-tags-by-name-regexp) (semanticdb-deep-find-tags-for-completion) (semanticdb-find-tags-external-children-of-type): Add autoload cookies. * semantic/semantic-grammar.el (semantic-grammar-goto-grammar-indent-anchor): In certain cases `forward-sexp' ignore important punctuations. Use `skip-syntax-backward' instead to skip punctuations. 2003-06-19 David Ponce <da...@dp...> * semantic/semantic-grammar.el (semantic-grammar-mode): Set buffer local value of `parse-sexp-ignore-comments' to non-nil to fix indentation problems when there are unbalanced parenthesis in comments. (semantic-grammar-goto-grammar-indent-anchor): Consider %prec like another symbol in rule. (semantic-grammar-grammar-compute-indentation): Don't consider %prec like other percent keywords that are aligned at beginning of line. Also I started to work on a LALR grammar for C, and I must admit that the task is not easy. I read a lot of threads about that subject and discovered that the C grammar contains some nasty ambiguities that make it difficult to be LALR :-( The main problem I encountered is that C identifiers can be interpreted as typedef names or ordinary identifiers depending on context. A simple example will show what I mean: typedef struct {int x; int y} point; point point; The first occurrence of `point' is a typedef name, and the second occurrence an ordinary identifier! In quasi all implementations I studied, such ambiguities are solved by the lexer that returns an IDENTIFIER as a TYPEDEF_NAME terminal when that IDENTIFIER has been previously declared as a typedef. That requires to maintain a table of declared C symbols that take into account the scope of declarations. Unfortunately, that could work only if all preprocessor statements have been previously expanded by a first preprocessing pass. This is not a problem for normal compilation. But this is an issue for Semantic that just parses the source as it is to obtain declaration tags. And how it would be possible to do incremental re-parsing? Attached you will find a tarball of what I managed to do: - A C-tags.wy grammar hacked from the LALR grammar supplied to the community by James A. Roskind and available at <http://www.empathy.com/pccts/roskind.html>. - A quickly hacked wisent-c.el that permits to try the grammar. For now I got a very limited success in parsing the code in: - test.c, a very basic example. I also had to add a hook to `wisent-parse' to be able to do some context initialization before starting the parser. See: - wisent.el.patch, probably it would make more sense to have that hook called from `semantic-parse-stream'? The parser fails in many cases, particularly when it encounters a declaration like: EMACS_INT undo_limit; And there is no typedef for EMACS_INT, or the typedef is [probably] in a included header. After doing all that, I suspect that using a true C grammar, probably is not the right direction for Semantic, and that we will have to hack from scratch a specific LALR grammar :( Perhaps using wisent to parse C (things are even worse with C++!) is a wrong choice? Any thoughts, help, or improvement to my work will be very welcome ;-) Sincerely, David |