From: Stefan S. <se...@sy...> - 2004-10-26 04:19:35
|
hi there, as I'm reading through the literature for inspiration for the actual design tasks (http://synopsis.fresco.org/issues/task), I'm putting together a list of relevant articles and other documents that are of potential use in this context. I'v put the page tentatively at http://synopsis.fresco.org/reading.html. One thing that becomes more and more clear is the workflow from low level syntactic data to high level semantic data. During the processing there a number of 'intermediate representations' of the code may be generated. I'v put a temporary graphic of the module view as I currently see it at http://synopsis.fresco.org/cxx-parser.png. Here are some remarks: The Parse Tree (*PTree*) is what the C++ parser currently generates. It is based on an external buffer of characters, and the API provides the means to replace one subgraph by a (or insert/add an entirely new) generated tree. The Abstract Syntax Tree (*AST*) is an optional refinement that reduces the API by giving direct access to the typed subnodes (an 'if' statement contains a 'condition' expression and a statement, etc.) and hides the purely syntactic tokens such as ';', '{', '}', etc. At this point it is not clear whether the AST deserves a module (namespace) of its own, or whether the API can simply be added to the existing PTree classes. The *SymbolTable* is what we are currently working on. It stores type- and other information about declared identifiers. Some of this information is required to disambiguate certain C++ expressions, and so the construction of the datastructure has to be done during the parsing. The Abstract Semantic Graph (*ASG*) is a structure that is built from the AST (or PTree), but where identifiers are cross-referenced to their respective declarations (thus it's a graph, not a tree). Also, at this level of abstraction there is very little connection to the syntax, so the type hierarchy can be simplified (compared to the ugly mix of type, identifier, and other tokens that C++ declarations usually are made of). Construction of the ASG is done by traversing the AST while looking up symbols in the SymbolTable. It is done after the parsing has finished. Note that Synopsis already contains an ASG, though confusingly calls it 'AST'. Various inspection tools have different requirements for the APIs they operate on. A code manipulating / generating tool will likely operate on the PTree / AST level, while document generation and code navigation is based on the ASG. I'm thus considering to rename the Synopsis.AST module to 'Synopsis.ASG'. 'Synopsis.AST' will be a python wrapper around the C++ PTree module, or AST, if it will be defined. The current synopsis processing pipeline will be an 'ASG processor pipeline', with an equivalent facility being defined for ASTs (for code-to-code translation and other metaprogramming tasks). Comments, suggestions, and criticism are, as allways, highly appreciated ! Regards, Stefan |
From: Vladimir P. <gh...@cs...> - 2004-10-28 11:16:23
|
On Tuesday 26 October 2004 08:16, Stefan Seefeld wrote: > The Abstract Semantic Graph (*ASG*) is a structure that is built from > the AST (or PTree), but where identifiers are cross-referenced to their > respective declarations (thus it's a graph, not a tree). I think that "Abstract" and "Semantic" words are all very confusing. Basically, you description of ASG says it's just a set of interlinked C++ objects. Semantic could mean anything, including operational semantic (how i+1 is prorcessed and so on). I'd suggest to name this module just "Objects", or "Entities", to avoid further confusion. > Also, at this > level of abstraction there is very little connection to the syntax, so > the type hierarchy can be simplified (compared to the ugly mix of type, > identifier, and other tokens that C++ declarations usually are made of). > Construction of the ASG is done by traversing the AST while looking up > symbols in the SymbolTable. It is done after the parsing has finished. Traversing AST or PTree? The difference between the too is confusing as well. If PTree is just a grouping of input tokens into blocks (class definitions, functions), and "Entities" is high-level classes and functions (no trace of syntax), then there's very little place left for any other layer. I'm still not sure that this reprsentation can be constructed after parsing. We've talked with you that during parsing you need to instantiate templates, and it's not clear if PTree representation is rich enough to handle such functionality. In fact, now that you've sketched the diagram and explained they layers, I think we need to agree what PTree does. For example: PTree -- just grouping of tokens, as little semantic information as possible. Entities -- reach semantic information, including logic for instantiating templates, template argument deduction and so on. Given that, it's not all clear that Entiries are constructed after parsing. Given an appropriate design of "Processor", the steps can be interleaved. - Volodya |
From: David A. <da...@bo...> - 2004-10-28 15:32:26
|
Vladimir Prus <gh...@cs...> writes: > On Tuesday 26 October 2004 08:16, Stefan Seefeld wrote: > >> The Abstract Semantic Graph (*ASG*) is a structure that is built from >> the AST (or PTree), but where identifiers are cross-referenced to their >> respective declarations (thus it's a graph, not a tree). > > I think that "Abstract" and "Semantic" words are all very confusing. > Basically, you description of ASG says it's just a set of interlinked C++ > objects. Semantic could mean anything, including operational semantic (how > i+1 is prorcessed and so on). I'd suggest to name this module just "Objects", > or "Entities", to avoid further confusion. Also, the term AST has a well-known meaning which, I'm pretty sure, does not constrain it to denote pure trees, nor pure syntax. Choosing a different meaning in the name of "clarity" might be a bad idea. -- Dave Abrahams Boost Consulting http://www.boost-consulting.com |