From: Stefan S. <se...@sy...> - 2004-10-26 04:19:35
|
hi there, as I'm reading through the literature for inspiration for the actual design tasks (http://synopsis.fresco.org/issues/task), I'm putting together a list of relevant articles and other documents that are of potential use in this context. I'v put the page tentatively at http://synopsis.fresco.org/reading.html. One thing that becomes more and more clear is the workflow from low level syntactic data to high level semantic data. During the processing there a number of 'intermediate representations' of the code may be generated. I'v put a temporary graphic of the module view as I currently see it at http://synopsis.fresco.org/cxx-parser.png. Here are some remarks: The Parse Tree (*PTree*) is what the C++ parser currently generates. It is based on an external buffer of characters, and the API provides the means to replace one subgraph by a (or insert/add an entirely new) generated tree. The Abstract Syntax Tree (*AST*) is an optional refinement that reduces the API by giving direct access to the typed subnodes (an 'if' statement contains a 'condition' expression and a statement, etc.) and hides the purely syntactic tokens such as ';', '{', '}', etc. At this point it is not clear whether the AST deserves a module (namespace) of its own, or whether the API can simply be added to the existing PTree classes. The *SymbolTable* is what we are currently working on. It stores type- and other information about declared identifiers. Some of this information is required to disambiguate certain C++ expressions, and so the construction of the datastructure has to be done during the parsing. The Abstract Semantic Graph (*ASG*) is a structure that is built from the AST (or PTree), but where identifiers are cross-referenced to their respective declarations (thus it's a graph, not a tree). Also, at this level of abstraction there is very little connection to the syntax, so the type hierarchy can be simplified (compared to the ugly mix of type, identifier, and other tokens that C++ declarations usually are made of). Construction of the ASG is done by traversing the AST while looking up symbols in the SymbolTable. It is done after the parsing has finished. Note that Synopsis already contains an ASG, though confusingly calls it 'AST'. Various inspection tools have different requirements for the APIs they operate on. A code manipulating / generating tool will likely operate on the PTree / AST level, while document generation and code navigation is based on the ASG. I'm thus considering to rename the Synopsis.AST module to 'Synopsis.ASG'. 'Synopsis.AST' will be a python wrapper around the C++ PTree module, or AST, if it will be defined. The current synopsis processing pipeline will be an 'ASG processor pipeline', with an equivalent facility being defined for ASTs (for code-to-code translation and other metaprogramming tasks). Comments, suggestions, and criticism are, as allways, highly appreciated ! Regards, Stefan |