[Opencxx-users] refining the Synopsis APIs

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

hi there,

as I'm reading through the literature for inspiration
for the actual design tasks (http://synopsis.fresco.org/issues/task),
I'm putting together a list of relevant articles and
other documents that are of potential use in this context.
I'v put the page tentatively at http://synopsis.fresco.org/reading.html.

One thing that becomes more and more clear is the workflow from
low level syntactic data to high level semantic data. During the
processing there a number of 'intermediate representations' of
the code may be generated.
I'v put a temporary graphic of the module view as I currently see it
at http://synopsis.fresco.org/cxx-parser.png. Here are some remarks:

The Parse Tree (*PTree*) is what the C++ parser currently generates.
It is based on an external buffer of characters, and the API provides
the means to replace one subgraph by a (or insert/add an entirely new)
generated tree.

The Abstract Syntax Tree (*AST*) is an optional refinement that reduces
the API by giving direct access to the typed subnodes (an 'if' statement
contains a 'condition' expression and a statement, etc.) and hides the
purely syntactic tokens such as ';', '{', '}', etc.
At this point it is not clear whether the AST deserves a module (namespace)
of its own, or whether the API can simply be added to the existing PTree
classes.

The *SymbolTable* is what we are currently working on. It stores type- and
other information about declared identifiers. Some of this information
is required to disambiguate certain C++ expressions, and so the construction
of the datastructure has to be done during the parsing.

The Abstract Semantic Graph (*ASG*) is a structure that is built from
the AST (or PTree), but where identifiers are cross-referenced to their
respective declarations (thus it's a graph, not a tree). Also, at this
level of abstraction there is very little connection to the syntax, so
the type hierarchy can be simplified (compared to the ugly mix of type,
identifier, and other tokens that C++ declarations usually are made of).
Construction of the ASG is done by traversing the AST while looking up
symbols in the SymbolTable. It is done after the parsing has finished.

Note that Synopsis already contains an ASG, though confusingly calls it 'AST'.

Various inspection tools have different requirements for the APIs they operate
on. A code manipulating / generating tool will likely operate on the PTree / AST
level, while document generation and code navigation is based on the ASG.

I'm thus considering to rename the Synopsis.AST module to 'Synopsis.ASG'.
'Synopsis.AST' will be a python wrapper around the C++ PTree module, or
AST, if it will be defined. The current synopsis processing pipeline will
be an 'ASG processor pipeline', with an equivalent facility being defined
for ASTs (for code-to-code translation and other metaprogramming tasks).

Comments, suggestions, and criticism are, as allways, highly appreciated !

Regards,
		Stefan