Re[2]: [CEDET-devel] wisent-c news

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Hi,

>>> David Ponce <dav...@wa...> seems to think that:
>Hi Eric,
>
> >   This looks like some great stuff!
>
>Thanks!
>
> > I will have to upgrade my parser knowledge in order to participate
> > properly.  Do you think your use of abstract syntax trees will be
> > necessary in many grammars?  Better yet, would they simplify some
> > existing grammars like the java and python grammars?
>
>I used sort of AST in the C grammar to deal with the high level of
>generality of C declarations that can appears both at "top level" and
>inside other declarations (like in function parameter lists).
>
>Using an AST permits to keep useful notions in a convenient data
>structure, and simplifies access to these informations at higher
>level to produce semantic tags.
>
>Maybe using AST could simplify other grammars too.  I don't know.  I
>have to look more thoroughly at grammars code to see if it is worth
>using AST.
>
> > If so, is it possible to tie it up in a formalized way so less
> > wisent-ast programming is needed in the grammars?
>
>I am afraid, but I am not sure to understand what you would like to
>achieve here.  IMO wisent-ast programming is already very simple and
>mainly consists of these actions:
>
>- add/replace a node value.
>- merge two ASTs to obtain a new AST containing the union of nodes.
>- retrieve value of nodes.
>
>Perhaps, wisent-ast could become a more general semantic-ast library,
>as there is no wisent specific code there?  Even better we could use
>grammar built-in macros to completely hide the implementation, like
>it is already done with tags.  Something like this:
>
>- AST-ADD
>- AST-PUT
>- AST-GET
>- AST-MERGE
>- etc.

This is indeed what I was thinking, and share your need for
additional consideration on the topic.

> > I do not fully understand the specifics of how it is being utilized
> > yet, but it seems similar to what the .by C parser does to compound
> > lots of data into a tag that is formalized later in the post parser
> > expand function.
>
>One of the difficulty with the C grammar, is that a declaration is a
>very high level thing that can correspond to a type declaration, a
>variable declaration, a function declaration (prototype) or
>definition.  Another difficulty is that some informations needed to
>produce tags, are obtained by deeply nested rules.
>
>Using AST here permits to store the useful informations that will be
>used in a later step to decide what kind of declaration tag to
>produce.  That can be decided at a higher level rule semantic action
>for simple cases, or at tag expansion for more complex declarations
>like this one:
>
>int i = 0, *const j = 1, fprintf(HANDLE hnd, char * msg, ...);
>
>The above declaration is in fact equivalent to three declarations:
>
>int i = 0;
>int *const j = 1;
>int fprintf(HANDLE hnd, char * msg, ...);
>
>In such cases `wisent-c-expand-tag' receives a variable tag whose
>name is a list of three AST, one for each compound item that contains
>the sub-declaration part of that compound item.  For the above
>example the list of AST will be something like:
>
>(
>  (:id ("i"))
>  (:id ("j") :specifiers ("const") :pointer ("*"))
>  (:id ("fprintf") :parms (the list of parameter tags))
>  )
>
>In fine, that will be expanded to three tags:
>
>("i" variable :type "int" ...)
>("j" variable :type "int" ((typemodifiers const)) ...)
>("fprintf" function :type "int"
>            :arguments (the list of parameter tags)
>            ...)

Yes, this is what the .by grammar must do as well, so it is being
used for the same purpose as what I was using the "name" slot for.

  [ ... ]
> >   I would like to encourage you to check in your "experimental"
> > changes.  I'm sure your new parser will with grow to completely
> > replace the old one in time.
> >
> >   It might also be nice if the long comment describing the
> > contributing grammar were in a separate text file.  It might help
> > keep it pristine as changes shake the working grammar.
>
>Maybe it would be worth using a texinfo file?

Whatever you think can be best leveraged to help new users.

Perhaps our doc directory should have a main index, like the Emacs
"dir" file, and it could contain a link to the C texinfo file.

> >   Do you think the wisent directory is a bit cluttered yet?
> > Perhaps it is time to update the directory tree a bit:
> >
> > cedet/semantic/bovine
> > 	      /doc
> > 	      /parsers/c
> > 		      /java
> > 		      /python
> > 		      /...
> > 	      /tests
> > 	      /wisent
> >
> >   That might make the build process and load path a bit tricky, but
> > having test files next to the parser would make things neater.  We
> > could put both the .by and .wy grammars in the same directory.
>
>That's a nice idea!  I would prefer to call the "parsers" directory:
>"languages", or something like that, to avoid confusion between true
>parsers and support stuff needed to parse a particular language.
>
>What do you think of this directory organization?
>
>cedet/semantic
>          /core (?)
>          /doc
>          /languages
>              /grammars (?)
>              /C
>              /java
>              /python
>              /...
>          /parsers
>              /bovine
>              /wisent
>          /util (?)
>
>The "main" directory would contain all the core stuff.
>
>Perhaps it would make sense to have a "core" subdirectory for core and
>db libraries, and an "util" subdirectory for applications (completion,
>intellisense, etc.). The main directory would only contain project
>files like NEWS, INSTALL, README, ChangeLog, Makefile, project.ede,
>semantic-load.el, semantic-al.el, etc..

In the existing project files, I have a "semantic" and "tools" target
as examples of some groups.  I often see this for a package called
NAME:

NAME/src
NAME/othersubdir

so perhaps src would be more logical than core? 

>The "parsers" subdirectory would contains the parser engines and
>associated support libraries.  Common parser stuff would be directly
>in that directory.  Each specific parser would be in its own
>subdirectory.

This sounds ok to me.  As there are only 2 parsers, I don't mind them
at the top level either.

>The "languages" subdirectory would contain language support stuff.
>Common things would be directly in that directory, and each language
>specific stuff (including tests) would be in its own subdirectory.
>The grammar framework could be in a "grammars" subdirectory or
>directly under the "languages" directory.

Languages is indeed a better word to use.  The Emacs parallel is
"modes", which doesn't really fit our model very well. ;)

Thanks!
Eric

-- 
          Eric Ludlam:                 za...@gn..., er...@si...
   Home: http://www.ludlam.net            Siege: www.siege-engine.com
Emacs: http://cedet.sourceforge.net               GNU: www.gnu.org