From: Stefan S. <se...@sy...> - 2004-06-01 04:14:06
|
hi there, while Grzegorz is still struggling with the merge of some refactoring we have been working on, I'd like to discuss some ideas about opencxx's evolution. I'd very much appreciate comments from people using opencxx right now or being tempted to use it so we can understand better what opencxx is already good at, and what it would be useful to work on. I'v started to use opencxx myself as a C++ parser backend for my synopsis framework, where I initially simply collected all declarations from source code together with comments directly preceding them, to generate documentation. Synopsis already had its own AST-like class hierarchy, so the task was 'simply' to traverse the opencxx ptree and map that to a synopsis AST. Later we went some steps further to use the power of opencxx to generate 'cross referenced source code', i.e. html pages that display source files, but with variables and types being linked to their respective declaration. For quite some time I have been pondering to expose a 'real' AST such as that from opencxx to python, so I could use my processor framework to manipulate the source code directly for code generation. However, I found the ptree stuff quite obscure so this idea never really got off the ground. I'v recently started to integrate a C parser (from the 'ctool' project) into synopsis, and there the parse tree is much simpler to read, simply because it is more typed. Instead of just having specific ptree topologies for 'statements', 'declarations', etc., I have real classes 'Statement', 'Declaration', etc. That's much more pleasing to look at ! :-) On the other hand, the ctool doesn't preserve the tokens in their original form in the same way opencxx does, and doesn't tokenize the comments (something we have been working hard to add to synopsis' opencxx port). This leads me to a couple of items on my wishlist, which I'd like to discuss / propose here: * I suggest the ptree hierarchy to be refactored into a more typed form. That could simply mean that a big number of new 'Statement', 'Expression', and other classes should be derived from 'Ptree', or it could be done in a different way, I don't know yet. However, this would mean that it would be much more straight forward to inspect an AST, as these types would be more or less self-explanatory (ever wondered what 'node->Cdr()->Cdr()->Car()' represents ??) * I suggest to open up opencxx in a way that exposes the basic API (parser / ptree generation, walkers / ptree transformation, metaclass and the other introspection stuff) as a C++ library as well as a python module. This means that the occ executable will be very much obsolete, or at least it would only be a convenience for the most popular features, but more fine-grained control would be accessible through the APIs, through which users can customize opencxx to their needs. It also means that all the platform-specific code to run subprocesses such as the preprocessor as well as load metaclass plugins could be isolated such that the backend library would be more platform neutral and robust. As I'm already maintaining an opencxx 'branch' as part of the synopsis project, I'm experimenting with things there. Synopsis uses subversion, so directory-layout related refactoring is much simpler than with cvs. I'v also a number of advanced features that I don't want to loose, such as preprocessor data integrated with the ast (synopsis records macro definitions and calls, file inclusion information, etc.). I'm thus tempted to work off of my own opencxx branch, though I'm happily sharing my changes with opencxx. In particular, I'm thinking of a simple bootstrapping process, whithin which I would rework the ptree hierarchy, and then use opencxx itself to *generate* the C++-to-python binding to expose this class hierarchy to python. Once I have that, people can introspect and manipulate the source code from within python, with a direct C++ API as a fallback, in case they find the python-API inacceptable for various reasons (which I can't really imagine :-) Finally, I'm wondering whether it wouldn't be simpler for me to modify the opencxx lexer and parser to be able to parse C code (all the various flavours that still exist, such as K&R), so I can drop the ctool backend. A C parser / processor with the features of opencxx would in particular be useful to all those GNOME / Mono developers, with language binding generation being just one example usage. Now, please tell me what you think about these ideas, whether they make sense to you at all, whether you find them useful, or would even like to help. Best regards, Stefan |
From: Brian K. <bk...@mo...> - 2004-06-01 14:44:53
|
Hi, I've been using opencxx for a few months now to implement parallel extensions to C++ similar to C A R Hoare's CSP or the language Occam. I had to modify the parser a bit to handle extensions along the lines of: par { <statements> } but that was quite easy to do. Use of this tool has definitely cut months off of my original development schedule. I think you're right that a bit more structure in the AST might be nice- being able to select the different pieces of a for-loop block by calling individual methods would probably make things easier. One feature that would be really nice (perhaps this is possible- please let me know if it is!) would be to get the type of an arbitrary expression. For example, if I wanted to implement a "let" block, e.g. let (i = <expr>, j = <expr>) { } it would be really nice if I could directly query the ptree object storing <expr> for its return type so that I could transform this code into normal C++ declarations. I think that releasing the code as a library would be very helpful, but keeping the occ executable would still be a good idea- it's a very convenient way to use the tool. Regards, Brian Kahne Freescale Semiconductor Stefan Seefeld wrote: > hi there, > > while Grzegorz is still struggling with the merge of some > refactoring we have been working on, I'd like to discuss > some ideas about opencxx's evolution. > > I'd very much appreciate comments from people using opencxx > right now or being tempted to use it so we can understand > better what opencxx is already good at, and what it would > be useful to work on. > > I'v started to use opencxx myself as a C++ parser backend > for my synopsis framework, where I initially simply collected > all declarations from source code together with comments > directly preceding them, to generate documentation. > > Synopsis already had its own AST-like class hierarchy, so > the task was 'simply' to traverse the opencxx ptree and > map that to a synopsis AST. > > Later we went some steps further to use the power of > opencxx to generate 'cross referenced source code', i.e. > html pages that display source files, but with variables > and types being linked to their respective declaration. > > For quite some time I have been pondering to expose a 'real' AST > such as that from opencxx to python, so I could use my > processor framework to manipulate the source code directly > for code generation. However, I found the ptree stuff quite > obscure so this idea never really got off the ground. > > I'v recently started to integrate a C parser (from the 'ctool' project) > into synopsis, and there the parse tree is much simpler to read, > simply because it is more typed. Instead of just having specific > ptree topologies for 'statements', 'declarations', etc., I have > real classes 'Statement', 'Declaration', etc. > That's much more pleasing to look at ! :-) > > On the other hand, the ctool doesn't preserve the tokens in their > original form in the same way opencxx does, and doesn't tokenize > the comments (something we have been working hard to add to synopsis' > opencxx port). > > This leads me to a couple of items on my wishlist, which I'd like > to discuss / propose here: > > * I suggest the ptree hierarchy to be refactored into a more typed > form. That could simply mean that a big number of new 'Statement', > 'Expression', and other classes should be derived from 'Ptree', or > it could be done in a different way, I don't know yet. > However, this would mean that it would be much more straight forward > to inspect an AST, as these types would be more or less self-explanatory > (ever wondered what 'node->Cdr()->Cdr()->Car()' represents ??) > > * I suggest to open up opencxx in a way that exposes the basic API > (parser / ptree generation, walkers / ptree transformation, metaclass and > the other introspection stuff) as a C++ library as well as a python > module. > This means that the occ executable will be very much obsolete, or at > least > it would only be a convenience for the most popular features, but more > fine-grained > control would be accessible through the APIs, through which users can > customize > opencxx to their needs. It also means that all the platform-specific code > to run subprocesses such as the preprocessor as well as load metaclass > plugins > could be isolated such that the backend library would be more platform > neutral and robust. > > As I'm already maintaining an opencxx 'branch' as part of the synopsis > project, > I'm experimenting with things there. Synopsis uses subversion, so > directory-layout > related refactoring is much simpler than with cvs. I'v also a number of > advanced > features that I don't want to loose, such as preprocessor data > integrated with > the ast (synopsis records macro definitions and calls, file inclusion > information, etc.). > > I'm thus tempted to work off of my own opencxx branch, though I'm > happily sharing > my changes with opencxx. In particular, I'm thinking of a simple > bootstrapping > process, whithin which I would rework the ptree hierarchy, and then use > opencxx > itself to *generate* the C++-to-python binding to expose this class > hierarchy > to python. Once I have that, people can introspect and manipulate the > source > code from within python, with a direct C++ API as a fallback, in case > they find > the python-API inacceptable for various reasons (which I can't really > imagine :-) > > Finally, I'm wondering whether it wouldn't be simpler for me to modify the > opencxx lexer and parser to be able to parse C code (all the various > flavours > that still exist, such as K&R), so I can drop the ctool backend. > A C parser / processor with the features of opencxx would in particular be > useful to all those GNOME / Mono developers, with language binding > generation > being just one example usage. > > Now, please tell me what you think about these ideas, whether they make > sense > to you at all, whether you find them useful, or would even like to help. > > Best regards, > Stefan > > > > ------------------------------------------------------- > This SF.Net email is sponsored by: Oracle 10g > Get certified on the hottest thing ever to hit the market... Oracle 10g. > Take an Oracle 10g class now, and we'll give you the exam FREE. > http://ads.osdn.com/?ad_id=3149&alloc_id=8166&op=click > _______________________________________________ > Opencxx-users mailing list > Ope...@li... > https://lists.sourceforge.net/lists/listinfo/opencxx-users |
From: Stefan S. <se...@sy...> - 2004-06-01 15:03:18
|
Hi Brian, Brian Kahne wrote: > One feature that would be really nice (perhaps this is possible- please > let me know if it is!) would be to get the type of an arbitrary > expression. For example, if I wanted to implement a "let" block, e.g. > > let (i = <expr>, j = <expr>) { > > } > > it would be really nice if I could directly query the ptree object > storing <expr> for its return type so that I could transform this code > into normal C++ declarations. yes, that would be possible with typed ptree nodes, as then the 'Walker' classes would not only act as a traversal, but also as a visitor, i.e. one could use the double-dispatch mechanism to resolve the type that is part of of ptree nodes being traversed. Right now the parent ptree node has to detect the sub-node's type by inspecting the node's topology, i.e. instead of if (node->Car()->IsLeaf()) do_something(); one would write if (if_statement->else_block) // access the *typed* 'else' statement, as 'this' // is a visitor with various 'visit_statement(Statement *)' methods if_statement->else_block->accept(this); You get the idea... > I think that releasing the code as a library would be very helpful, but > keeping the occ executable would still be a good idea- it's a very > convenient way to use the tool. I agree. As a convenience tool it covers the majority of the use cases, so it surely has its use. Regards, Stefan |
From: Stefan S. <se...@sy...> - 2004-06-01 17:17:53
|
Brian Kahne wrote: > > Yes, having the Walker act as a visitor, as you describe below, would be > very nice. In addition to having typed ptree-nodes, though, it would be > nice to get the actual C++ type of the expression, e.g. if the > expression is of the form "a + b * c", then be able to get a TypeInfo > object back that says that the return type of the expression is class > Foo. Is that possible today? It seems like the only way to get > TypeInfo objects is by looking up a name in the Environment, whereas > this requires figuring out that operator+() and operator*() is > overloaded for this class, getting that operator's return type, etc. I'm not sure about that. However, this shouldn't be much harder than constructing a call graph, where you have to use similar lookup rules to find the right (possibly overloaded) functions depending on the arguments' type and the C++ scoping rules. We do something similar already in synopsis when generating the cross-referenced source view I was talking about earlier, but I'd be very happy to see this functionality offered by opencxx directly. May be Grzegorz has more insights about how much work would be involved to add that. Regards, Stefan |
From: Grzegorz J. <ja...@he...> - 2004-06-04 01:58:23
|
Hi Stefan and All, On Tue, 1 Jun 2004, Stefan Seefeld wrote: [snip] > * I suggest the ptree hierarchy to be refactored into a more typed > form. That could simply mean that a big number of new 'Statement', > 'Expression', and other classes should be derived from 'Ptree', or > it could be done in a different way, I don't know yet. > However, this would mean that it would be much more straight forward > to inspect an AST, as these types would be more or less self-explanatory > (ever wondered what 'node->Cdr()->Cdr()->Car()' represents ??) I agree that 'node->Cdr()->Cdr()->Car()' is unsafe and difficult to use. The same goes for creating new Ptree nodes. Moreover, the mapping of C++ syntax into combinations of Ptree nodes is not documented, which makes this area even more unclear. Nevertheless, I think that *replacing* Ptree hierarchy with more typed form will be extremely difficult, because: (1) Parser, type elaborator and translator all use Cdr/Car. Even if we ignore translator for the moment, parser and elaborator themselves are 20KLOC of highly nontrivial code. Reworking this code is a huge job, especially if you do this part-time, and will be a wonderfull source of bugs (and we don't have a decent regression testsuite). Moreover, I think that directly replacing the AST datastructure is difficult, because it has to be done practically in one big step, mostly because grammar is not a hierarchical system (there is a lot of recursion in grammar, which means that you cannot start replacing things piece by piece going bottom-up, because dependency graph between different AST classes is not acyclic). (2) Contrary to popular belief, creating an object model for AST is a lot of conceptual work. Better yet, for a language like C++ there is no unique and optimal AST object model. Example: how should "a+b+c" be represented? Some clients would like to see two binary "+" nodes, others would rather take advantage of "+" being associative and view it as one ternary node. Some clients are interested in nodes representing parentheses, while others would rather treat them as textual decorations not belonging in AST, etc. Here are my suggestion on how to improve usability of AST gradually: (1) Find out and write down mappings from the "less typed" AST to "more typed" AST, e.g.: IfStatement: Cond ->Cdr()->Car(); Then ->Cdr()->Cdr()->Cdr()->Car(); Else ->Cdr()->Cdr()->Cdr()->Cdr()->Cdr()->Car(); (2) Use it to write or generate a set of wrappers, that would encapsulate AST in a typesafe interface: template <> class Node<IfStatement> { public: Node(Node<Expr> c, Node<Statement> t, Node<Statement> e); Node<Expr> Cond() { return p_->Cdr()->Car(); } Node<Statement> Then() { ... } Node<Statement> Else() { ... } private: Node(Ptree* p) : p_(p) {} friend class AstFactory; Ptree* p_; }; class AstFactory { public: template <class T> static Node<T> Create<T>( /* ... */ ); }; This can be done non-intrusively, without touching the existing codebase (= without introducing bugs). (3) Write parser wrapper, that would wrap Ptree*'s returned from parser in Node<>'s. (4) Write abstract walker for Node<>'s and make Node<>'s visitable. (5) At this point we will have usable, type-safe interface. Moreover, clients not satisfied with this interface (e.g. those who prefer multiary "+") will have a chance to reuse it nonintrusively or just write another interface to Ptree structure from scratch. (6) Having a wrapper interface atop Ptree structure allows for changing Ptree structure without affecting clients of wrapper interface. (Changes in Ptree, e.g. addition of new nodes, can be compensated in wrapper layer.) (7) Typesafe wrapper interface would enable automatic generation of parser regression tests (e.g. from gcc testsuite), that should be used as a safety net if we ever decide to refactor parser so that it uses the type-safe interface. (8) At this point we can think about moving typesafe interface into parser, elaborator and translator, and later about totally removing Ptree classes and actually replacing Ptree's with a canonical implementation of Composite pattern. However, I don't think this is the way to go, mainly because of (5). I believe it is better to have low-level implementation plus high-level wrapper(s). (Side note: In fact implementation of AST in OpenC++ is more tricky than just Leaf/NonLeaf, see e.g. PtreeIfStatement etc. Nevertheless this implementation still forces Cdr/Car on clients, and AFAIU this is something we want to escape from.) > * I suggest to open up opencxx in a way that exposes the basic API (parser > / ptree generation, walkers / ptree transformation, metaclass and the > other introspection stuff) as a C++ library as well as a python module. > This means that the occ executable will be very much obsolete, or at > least it would only be a convenience for the most popular features, but > more fine-grained control would be accessible through the APIs, through > which users can customize opencxx to their needs. It also means that > all the platform-specific code to run subprocesses such as the > preprocessor as well as load metaclass plugins could be isolated such > that the backend library would be more platform neutral and robust. I second that. However, I think the interfaces should not be published as they are now, because IMO they are not encapsulating enough. I think parser iterface is OK, but others interfaces should be seriously reviewed before we commit to them. For example program object model is very much coupled with translator. I think we should untangle them first and transform translator framework into exemplary, nonprivledged client of frontend library (libraries). > As I'm already maintaining an opencxx 'branch' as part of the synopsis > project, I'm experimenting with things there. Synopsis uses subversion, so > directory-layout related refactoring is much simpler than with cvs. As for Subversion in OpenC++: I have had heard very positive opinions about Subversion, however I don't see an easy way to move OpenC++ development to Subversion now. Currently we rely on SF.net, which provides CVS, CompileFarm, mailing lists, shell+cron accounts and web hosting (and other features which we don't use at the moment). I don't see any other organization that would provide this level of service and comittment to OpenC++ project. AFAIU the biggest issue is moving files/dirs in a repo. I am using the standard CVS way (delete here, add there) and ideed it looses history and makes merges more difficult, but so far it was not very painful. The lost of history can be mitigated by mentioning the old location of a file in the initial comment of moved (=added) file. > I'v also a number of advanced features that I don't want to loose, such as > preprocessor data integrated with the ast (synopsis records macro > definitions and calls, file inclusion information, etc.). The ability of OpenC++ to understand preprocessor, so that code can be transformed without expanding preprocessor macros, would be very desirable. I support any effort in this direction. This is in general difficult in C++, but it is doable. (See CRefactory project.) Together with Python scripting it would create very powerful refactoring tool. > I'm thus tempted to work off of my own opencxx branch, though I'm happily > sharing my changes with opencxx. In particular, I'm thinking of a simple > bootstrapping process, whithin which I would rework the ptree hierarchy, > and then use opencxx itself to *generate* the C++-to-python binding to > expose this class hierarchy to python. This sounds exciting, but would that have any advantages over using SWIG? (I don't know, I have not used SWIG myself.) > Once I have that, people can introspect and manipulate the source code > from within python, That would be really great. > with a direct C++ API as a fallback, in case they find > the python-API inacceptable for various reasons (which I can't really > imagine :-) Some people may be concerned about performance (but still, I would love Python API). > Finally, I'm wondering whether it wouldn't be simpler for me to modify the > opencxx lexer and parser to be able to parse C code (all the various > flavours that still exist, such as K&R), so I can drop the ctool backend. > A C parser / processor with the features of opencxx would in particular be > useful to all those GNOME / Mono developers, with language binding > generation being just one example usage. I very much support extending the lexer/parser to support full C syntax, however I don't know how much work it takes to get there. What issues do you see? Best regards Grzegorz ################################################################## # Grzegorz Jakacki Huada Electronic Design # # Senior Engineer, CAD Dept. 1 Gaojiayuan, Chaoyang # # tel. +86-10-64365577 x2074 Beijing 100015, China # # Copyright (C) 2004 Grzegorz Jakacki, HED. All Rights Reserved. # ################################################################## |
From: Stefan S. <se...@sy...> - 2004-06-04 05:10:18
|
Hi Grzegorz, Grzegorz Jakacki wrote: > Nevertheless, I think that *replacing* Ptree hierarchy with more typed > form will be extremely difficult, because: [...] I fully agree with your arguments. I believe that before we start to undertake any serious efford to refactor this stuff, we need to * document the existing API to be able to fully understand what the individual types / methods do and what ramifications any change would have on those types. * provide a much better regression test coverage on different levels so we can measure to what degree changes break compatibility (some will be unavoidable) Once we have a good grasp at the complete workflow involved in the implementation of the various use cases (parse tree / syntax tree construction, code generation, etc.) we can suggest migration paths that provide optimal control. > (1) Find out and write down mappings from the "less typed" AST > to "more typed" AST, e.g.: I have two sources of documentation: * documentation of the Parser API, such as: /* definition : null.declaration | typedef | template.decl | metaclass.decl | linkage.spec | namespace.spec | namespace.alias | using.declaration | extern.template.decl | declaration */ bool Parser::rDefinition(Ptree*& p) ... which indicates that the method 'rDefinition' returns a ptree that is a definition with the given subtypes as shown in the comment above. Wouldn't it be possible to go over all these comments and model a class tree that models this grammar ? Is that possible at all ? It seems if the parser is currently able to construct a 'Definition' object as a ptree, it should be possible to do the same but with more typed objects instead... * I'm looking into the 'ctool' code I imported into synopsis. http://synopsis.fresco.org/viewsvn/synopsis-Synopsis/trunk/Synopsis/Parsers/C/ contains a class hierarchy for C, which shouldn't be that different from a C++ grammar, so it may serve as inspiration. > template <> > class Node<IfStatement> > { > public: > Node(Node<Expr> c, Node<Statement> t, Node<Statement> e); > Node<Expr> Cond() { return p_->Cdr()->Car(); } > Node<Statement> Then() { ... } > Node<Statement> Else() { ... } > private: > Node(Ptree* p) : p_(p) {} > friend class AstFactory; > Ptree* p_; > }; > > class AstFactory > { > public: > template <class T> > static Node<T> Create<T>( /* ... */ ); > }; Interesting idea ! The price for this non-intrusiveness, however, would be that we essentially have to have two parsing stages, the first generating the parse tree, the second to add the AST as a superstructure on top. Right ? I was really thinking of modifying the Parser itself so it only generates the typed objects (which could still derive from Ptree so we don't have to rewrite everything...) > (5) At this point we will have usable, type-safe interface. > Moreover, clients not satisfied with this interface > (e.g. those who prefer multiary "+") will have a chance > to reuse it nonintrusively or just write another > interface to Ptree structure from scratch. That is a good point ! The generated tree could be visited in terms of ptree, *as well* as a typed syntax tree. A little like the other poster referring to the different DOM APIs... > (6) Having a wrapper interface atop Ptree structure allows > for changing Ptree structure without affecting clients > of wrapper interface. (Changes in Ptree, e.g. addition > of new nodes, can be compensated in wrapper layer.) I'm not sure about that. Could you elaborate ? In my mind an AST node would rely on a specific ptree structure it is wrapping, so if you can modify the parse tree underneath, wouldn't that invalidate the AST nodes, i.e. break some invariants ? > I second that. However, I think the interfaces should not be published as > they are now, because IMO they are not encapsulating enough. I think parser > iterface is OK, but others interfaces should be seriously reviewed before we > commit to them. For example program object model is very much coupled with > translator. I think we should untangle them first and transform translator > framework into exemplary, nonprivledged client of frontend > library (libraries). agreed. On the other hand the whole process of changing the APIs will be incremental and iterative, so as long as we don't commit to a fixed and stable API I don't see why we can not apply the changes 'in public'. >>As I'm already maintaining an opencxx 'branch' as part of the synopsis >>project, I'm experimenting with things there. Synopsis uses subversion, so >>directory-layout related refactoring is much simpler than with cvs. > > > As for Subversion in OpenC++: I have had heard very positive opinions about > Subversion, however I don't see an easy way to move OpenC++ development to > Subversion now. Currently we rely on SF.net, which provides CVS, > CompileFarm, mailing lists, shell+cron accounts and web hosting (and other > features which we don't use at the moment). I don't see any other > organization that would provide this level of service and comittment to > OpenC++ project. Well, as I said, synopsis presently contains a branch of opencxx. As synopsis is my central focus, and as for me opencxx could well be part of synopsis (with re-defined scope), I could imagine to move development effords there. The synopsis project is hosted by SPI ('software in the public interest') and it has its own set of infrastructure tools ('roundup' issue tracker, 'qmtest' unit testing framework, 'subversion' configuration management tool, 'mailman' mailing lists, etc.) Of course, such a move is not a simple decision. You really have to evaluate what future directions you want to take, and whether that fits with the synopsis framework. As I said, my interest into opencxx is in the context of synopsis, so I'll probably do most of my work from there. Please take this as an offer and invitation, not an attempt to fork your project. > I very much support extending the lexer/parser to support full C syntax, > however I don't know how much work it takes to get there. What issues do you > see? I see different steps: * optionally remove the tokens that are not keywords in ordinary C ('class', 'virtual', ...) * find all expressions that are valid C but not C++, i.e. define additions to the parser that have to be enabled for C but not C++ * do the contrary, i.e. find expressions that are valid C++ but not C... It shouldn't be hard, and, it should be able to do that incrementally. The important step is some basic restructuring of the Parser class into an interface and implementation of the statements that are common between all flavours of C and C++, and put the rest into subclasses (K&R, ansi C, C89, etc.) Regards, Stefan |
From: Grzegorz J. <ja...@he...> - 2004-06-04 09:56:51
|
Hi! On Fri, 4 Jun 2004, Stefan Seefeld wrote: > Grzegorz Jakacki wrote: > > > Nevertheless, I think that *replacing* Ptree hierarchy with more typed > > form will be extremely difficult, because: > > [...] > > I fully agree with your arguments. I believe that before we start to > undertake any serious efford to refactor this stuff, we need to > > * document the existing API to be able to fully understand what the individual > types / methods do and what ramifications any change would have on those types. Agreed. I begun documenting Ptree and PtreeUtils, this is going to be my major focus after the merge. > * provide a much better regression test coverage on different levels so we can > measure to what degree changes break compatibility (some will be unavoidable) This seems to be a lot of work. > Once we have a good grasp at the complete workflow involved in the implementation > of the various use cases (parse tree / syntax tree construction, > code generation, etc.) we can suggest migration paths that provide optimal control. > > > (1) Find out and write down mappings from the "less typed" AST > > to "more typed" AST, e.g.: > > I have two sources of documentation: > > * documentation of the Parser API, such as: > > /* > definition > : null.declaration > | typedef > | template.decl > | metaclass.decl > | linkage.spec > | namespace.spec > | namespace.alias > | using.declaration > | extern.template.decl > | declaration > */ > bool Parser::rDefinition(Ptree*& p) > ... > > which indicates that the method 'rDefinition' returns a ptree that is a definition > with the given subtypes as shown in the comment above. Wouldn't it be possible to > go over all these comments and model a class tree that models this grammar ? Hm, don't know. > Is that possible at all ? Why not? Each nonterminal translates into abstract class, each production translates into concrete class. However, it does not necessarily yield a reasonable AST. Some nonterminals in grammar are just shortcuts, in particular those having just one production. You don't want them in AST. > It seems if the parser is currently able to construct > a 'Definition' object as a ptree, it should be possible to do the same but with > more typed objects instead... I can see potential problems where existing code keeps e.g. definitions and expressions on the same list, which now is a list of Ptree*, but could be no more if we want to inject more type information. > > * I'm looking into the 'ctool' code I imported into synopsis. > http://synopsis.fresco.org/viewsvn/synopsis-Synopsis/trunk/Synopsis/Parsers/C/ > contains a class hierarchy for C, which shouldn't be that different from a C++ grammar, > so it may serve as inspiration. Observe that existing Ptree classes already constitute a hierarchy (however majority of the code flattens it by upcasting to Leaf/NonLeaf). > > > template <> > > class Node<IfStatement> > > { > > public: > > Node(Node<Expr> c, Node<Statement> t, Node<Statement> e); > > Node<Expr> Cond() { return p_->Cdr()->Car(); } > > Node<Statement> Then() { ... } > > Node<Statement> Else() { ... } > > private: > > Node(Ptree* p) : p_(p) {} > > friend class AstFactory; > > Ptree* p_; > > }; > > > > class AstFactory > > { > > public: > > template <class T> > > static Node<T> Create<T>( /* ... */ ); > > }; > > Interesting idea ! The price for this non-intrusiveness, however, would be that we > essentially have to have two parsing stages, the first generating the parse tree, > the second to add the AST as a superstructure on top. Right ? Not at all. Node<IfStatement> is meant to work as a smart pointer to Ptree. Is is to be used by value and not stored anywhere, e.g.: Node<Definition> d = ParseDefinition("int main() {}"); string id = d.GetIdentifier(); // instead of d->Cdr()->Car()-> etc. or Node<Definition> d = AstFactory::Create<FunctionDefinition>( ParseType("void") , "main" , std::vector<Node<ArgDecl> >() , std::vector<Node<Statement> >() ); or void DoSomething(Node<Expr> n) { ... } Node<>'s by themselves do not have links to other Node<>'s. Each contains just one wrapped Ptree* and that's it. The only purpose of node is to keep downcasting and Crd/Car magic hidden from a client. As an extra bonus wrapped Ptree* could be replaced with boost::shared_ptr or Loki::SmartPtr or whatever without clients even knowing about it. > I was really thinking of modifying the Parser itself so it only > generates the typed objects (which could still derive from Ptree so we > don't have to rewrite everything...) In fact Ptree's are typed even today (see PtreeIfStatement). The issue is that elaborator and translator do not use this type information much. > > (5) At this point we will have usable, type-safe interface. > > Moreover, clients not satisfied with this interface > > (e.g. those who prefer multiary "+") will have a chance > > to reuse it nonintrusively or just write another > > interface to Ptree structure from scratch. > > That is a good point ! The generated tree could be visited in terms > of ptree, *as well* as a typed syntax tree. A little like the other > poster referring to the different DOM APIs... > > > (6) Having a wrapper interface atop Ptree structure allows > > for changing Ptree structure without affecting clients > > of wrapper interface. (Changes in Ptree, e.g. addition > > of new nodes, can be compensated in wrapper layer.) > > I'm not sure about that. Could you elaborate ? In my mind an AST node > would rely on a specific ptree structure it is wrapping, so if you can > modify the parse tree underneath, wouldn't that invalidate the AST nodes, > i.e. break some invariants ? Example 1: You decide to change the structure of a Ptree nodes representing some C++ construct (e.g. to store more information in it). This will break all clients depending directly on Ptree, because they need to update Car/Cdr paths to reflect new structure. However, if there is Node<> iface in the middle, you update Car/Cdr paths only in Node<>. Example 2: You add new kind of node, e.g. for "using" declaration. All existing clients of Node<> will break if you just add new type to Node visitor. However, you can branch Node<> iface into two versions: (a) new, incompatible with existing clients, but exposing "using" in visitor. (b) old (transitional), which hides "using" node from clients (or e.g. aborts at an attempt to visit Node<Declaration> that indeed represents "using"; or maybe Node<> iface should provide for Node<Unknown> to handle such situations. In some cases there is reasonable visitation for a new node, e.g. if it has "decorative" character (in terms of Decorator pattern), like e.g. parentheses node.) Old clients would then be able to compile against (b) and work without regressions. > > I second that. However, I think the interfaces should not be published as > > they are now, because IMO they are not encapsulating enough. I think parser > > iterface is OK, but others interfaces should be seriously reviewed before we > > commit to them. For example program object model is very much coupled with > > translator. I think we should untangle them first and transform translator > > framework into exemplary, nonprivledged client of frontend > > library (libraries). > > agreed. On the other hand the whole process of changing the APIs will be > incremental and iterative, so as long as we don't commit to a fixed and > stable API I don't see why we can not apply the changes 'in public'. Everything happens "in public", but some parts are not "published", in the sense that I don't feel we are bound to keep them stable. > >>As I'm already maintaining an opencxx 'branch' as part of the synopsis > >>project, I'm experimenting with things there. Synopsis uses subversion, so > >>directory-layout related refactoring is much simpler than with cvs. > > > > > > As for Subversion in OpenC++: I have had heard very positive opinions about > > Subversion, however I don't see an easy way to move OpenC++ development to > > Subversion now. Currently we rely on SF.net, which provides CVS, > > CompileFarm, mailing lists, shell+cron accounts and web hosting (and other > > features which we don't use at the moment). I don't see any other > > organization that would provide this level of service and comittment to > > OpenC++ project. > > Well, as I said, synopsis presently contains a branch of opencxx. As synopsis > is my central focus, and as for me opencxx could well be part of synopsis > (with re-defined scope), I could imagine to move development effords there. > The synopsis project is hosted by SPI ('software in the public interest') > and it has its own set of infrastructure tools ('roundup' issue tracker, > 'qmtest' unit testing framework, 'subversion' configuration management tool, > 'mailman' mailing lists, etc.) > Of course, such a move is not a simple decision. You really have to evaluate > what future directions you want to take, and whether that fits with the > synopsis framework. As I said, my interest into opencxx is in the context > of synopsis, so I'll probably do most of my work from there. Please take > this as an offer and invitation, not an attempt to fork your project. This requires some thought indeed. > > I very much support extending the lexer/parser to support full C syntax, > > however I don't know how much work it takes to get there. What issues do you > > see? > > I see different steps: > > * optionally remove the tokens that are not keywords in ordinary C > ('class', 'virtual', ...) > > * find all expressions that are valid C but not C++, i.e. define > additions to the parser that have to be enabled for C but not C++ > > * do the contrary, i.e. find expressions that are valid C++ but not C... > > It shouldn't be hard, and, it should be able to do that incrementally. > The important step is some basic restructuring of the Parser class into > an interface and implementation of the statements that are common between > all flavours of C and C++, and put the rest into subclasses (K&R, ansi C, C89, etc.) An important thing to consider here is if we want OpenC++ to be validating. If not, then we don't need to bother that "class" occures in C source. Personally I think that validating parser/elaborator is much harder to write and IMO this should not be our priority, since we have no chances to reach the quality of validation that e.g. g++, MSVC or EDG present today. However, we do have a chance to do something new and useful in providing refactoring tool and frontend libraries, and I think this should be our focus now. Best regards Grzegorz ################################################################## # Grzegorz Jakacki Huada Electronic Design # # Senior Engineer, CAD Dept. 1 Gaojiayuan, Chaoyang # # tel. +86-10-64365577 x2074 Beijing 100015, China # # Copyright (C) 2004 Grzegorz Jakacki, HED. All Rights Reserved. # ################################################################## |
From: Stefan S. <se...@sy...> - 2004-06-04 13:06:02
|
Grzegorz Jakacki wrote: >>* provide a much better regression test coverage on different levels so we can >> measure to what degree changes break compatibility (some will be unavoidable) > > > This seems to be a lot of work. but it's worth it, I believe. [...ptree -> AST abstraction...] > I can see potential problems where existing code keeps e.g. definitions and > expressions on the same list, which now is a list of Ptree*, but could be no > more if we want to inject more type information. agreed. On the other hand, I'm not suggesting that we change the ptree structure, just that we make the ptree class tree more rich (type and API wise) such that users *can* use the high level type information and API. Walking the ptree via the 'untyped' Ptree nodes will still be possible. >>* I'm looking into the 'ctool' code I imported into synopsis. >> http://synopsis.fresco.org/viewsvn/synopsis-Synopsis/trunk/Synopsis/Parsers/C/ >> contains a class hierarchy for C, which shouldn't be that different from a C++ grammar, >> so it may serve as inspiration. > > > Observe that existing Ptree classes already constitute a hierarchy > (however majority of the code flattens it by upcasting to Leaf/NonLeaf). exactly. I realize the attempt to get more type info into the ptree, but it doesn't look complete. I think completeness could/should be defined on the criteria that I could write a Visitor that is able to *completely* traverse the ptree recovering all the type info without ever touching methods 'Cdr()' and 'Car()'. > Not at all. Node<IfStatement> is meant to work as a smart pointer to > Ptree. Is is to be used by value and not stored anywhere, e.g.: > > Node<Definition> d = ParseDefinition("int main() {}"); > > string id = d.GetIdentifier(); // instead of d->Cdr()->Car()-> etc. hmm, but then the user already knows that he is parsing a definition, so the type info isn't adding much value. Read: I want an *abstract* syntax tree, such that I can run 'StatementList *ast = parse(my_file);' and then inspect the returned 'ast' by some custom visitors. In particular, if I want to expose this ast to a scripting frontend such as python, it is impractical to have these wrapper classes be temporary objects, as that would make the binding quite complex and slow. > Example 1: You decide to change the structure of a Ptree nodes > representing some C++ construct (e.g. to store more information in it). > This will break all clients depending directly on Ptree, because > they need to update Car/Cdr paths to reflect new structure. However, > if there is Node<> iface in the middle, you update Car/Cdr paths > only in Node<>. But what about the Node's type ? If my wrapped ptree is a declaration, but by simply modifying the ptree I change that to be a function call, the wrapper's type ('Node<Declaration>') would be wrong. Of course, if we expose the two APIs in parallel we'll always have this problem. May be the ptree API should not expose any modifiers, i.e. exclusively operate on const ptrees. > Example 2: You add new kind of node, e.g. for "using" declaration. > All existing clients of Node<> will break if you just add new type to > Node visitor. However, you can branch Node<> iface into two versions: That's a good point. The Visitor pattern really assumes the type hierarchy of the visited objects to be stable. [...] >>Of course, such a move is not a simple decision. You really have to evaluate >>what future directions you want to take, and whether that fits with the >>synopsis framework. As I said, my interest into opencxx is in the context >>of synopsis, so I'll probably do most of my work from there. Please take >>this as an offer and invitation, not an attempt to fork your project. > > > This requires some thought indeed. There is no need for a quick decision. I'm just observing that right now we are each working on a separate branch, so merging them would be practical. And that even more so if we are going to look into adapting qmtest as a unit testing framework (Right now I don't do unit testing on the opencxx backend, just the generated synopsis AST which I dump). The most important thing I believe is that we define what we each expect from opencxx (and synopsis) in the future, i.e. whether we are aiming at the same things, and whether the common goals suggest that we both work from a common code base, or whether the overlap is simply not large enough to be worth a merge. >>It shouldn't be hard, and, it should be able to do that incrementally. >>The important step is some basic restructuring of the Parser class into >>an interface and implementation of the statements that are common between >>all flavours of C and C++, and put the rest into subclasses (K&R, ansi C, C89, etc.) > > > An important thing to consider here is if we want OpenC++ to be validating. > If not, then we don't need to bother that "class" occures in C source. > Personally I think that validating parser/elaborator is much harder to write > and IMO this should not be our priority, since we have no chances to reach > the quality of validation that e.g. g++, MSVC or EDG present today. I don't understand what you mean with 'validate'. opencxx (and ctool) are well able to indicate 'parse errors', even though the error message is not very high level, as that would indeed require that more language specific semantics be available to the parser (or the object that is trying to issue a meaningful error message). What do you mean with 'bother that "class" occures in C' ? Right the lexer will return a 'CLASS' token if it runs into the 'class' string. That doesn't make sense in C, as it would have to be an ordinary identifier. Similarly for all the other keywords. So removing those C++-specific keywords when scanning in C-mode sounds like the (easy) first step towards C compatibility. > However, we do have a chance to do something new and useful in providing > refactoring tool and frontend libraries, and I think this should be our > focus now. agreed. I'm just wondering how much efford it would be to extend opencxx's scope to C, as I can see a lot of use in a tool like that for example to the GNOME folks. Regards, Stefan |
From: Grzegorz J. <ja...@he...> - 2004-06-08 01:25:17
|
On Fri, 4 Jun 2004, Stefan Seefeld wrote: > Grzegorz Jakacki wrote: > > >>* provide a much better regression test coverage on different levels so we can > >> measure to what degree changes break compatibility (some will be unavoidable) > > > > > > This seems to be a lot of work. > > but it's worth it, I believe. My point is that it is too much for one leap. > [...ptree -> AST abstraction...] > > > I can see potential problems where existing code keeps e.g. definitions and > > expressions on the same list, which now is a list of Ptree*, but could be no > > more if we want to inject more type information. > > agreed. On the other hand, I'm not suggesting that we change the ptree structure, > just that we make the ptree class tree more rich (type and API wise) such that > users *can* use the high level type information and API. Walking the ptree via > the 'untyped' Ptree nodes will still be possible. I think I get your general idea, but most likely devil is in details. > >>* I'm looking into the 'ctool' code I imported into synopsis. > >> http://synopsis.fresco.org/viewsvn/synopsis-Synopsis/trunk/Synopsis/Parsers/C/ > >> contains a class hierarchy for C, which shouldn't be that different from a C++ grammar, > >> so it may serve as inspiration. > > > > > > Observe that existing Ptree classes already constitute a hierarchy > > (however majority of the code flattens it by upcasting to Leaf/NonLeaf). > > exactly. I realize the attempt to get more type info into the ptree, but it doesn't > look complete. I think completeness could/should be defined on the criteria that > I could write a Visitor that is able to *completely* traverse the ptree recovering > all the type info without ever touching methods 'Cdr()' and 'Car()'. > > > Not at all. Node<IfStatement> is meant to work as a smart pointer to > > Ptree. Is is to be used by value and not stored anywhere, e.g.: > > > > Node<Definition> d = ParseDefinition("int main() {}"); > > > > string id = d.GetIdentifier(); // instead of d->Cdr()->Car()-> etc. > > hmm, but then the user already knows that he is parsing a definition, > so the type info isn't adding much value. Read: I want an *abstract* > syntax tree, such that I can run 'StatementList *ast = parse(my_file);' > and then inspect the returned 'ast' by some custom visitors. I did not mean what you understood. In fact I want Node<Definition> to be "abstract". The second part of example was a miss, let me fix it: Node<Definition> d = ParseDefinition("int main() {}"); SomeVisitor v; v(d); //<--- calls v.Visit(Node<FunctionDefinition>) > In particular, if I want to expose this ast to a scripting frontend > such as python, it is impractical to have these wrapper classes be > temporary objects, as that would make the binding quite complex and > slow. (1) Why? (I have never seen how you create a binding, so I don't have an idea what happens.) (2) What if instead of coding these wrappers, we generate them based on the "Cdr/Car Ptree -> highlevel Ptree" mapping? We could generate "C++ wrappers" and "Python wrappers" My point is to avoid making intrusive changes to Ptree hierarchy, as it breaks essentially all OpenC++, so compensating for such changes is both expensive and error-prone. > > Example 1: You decide to change the structure of a Ptree nodes > > representing some C++ construct (e.g. to store more information in it). > > This will break all clients depending directly on Ptree, because > > they need to update Car/Cdr paths to reflect new structure. However, > > if there is Node<> iface in the middle, you update Car/Cdr paths > > only in Node<>. > > But what about the Node's type ? If my wrapped ptree is a declaration, > but by simply modifying the ptree I change that to be a function call, > the wrapper's type ('Node<Declaration>') would be wrong. I think we are not talking about the same thing. This is what I meant: You decide to store more information, say type annotation, at declaration node. I means as a design decision, not run-time decision. There are several kinds of declarations, each is encoded with some Ptree shape. You decide, that the type annotation will be added atop the declaration tree, so where up to now you had NonLeaf(Decl) / \ NonLeaf NonLeaf ... ... you want to have NonLeaf(Decl) / \ [annotation] NonLeaf <--- the old tree / \ NonLeaf NonLeaf ... ... This change breaks all clients using Leaf/NonLeaf interface, since Cdr/Car access path for declarations change. However, it is quite easy to compensate for this modification in Node<> iface, so that clients of Node<> iface are kept unaware that Ptree shapes changed. Moreover, it is easy to make annotation available to new clients of Node<> iface without breaking the old clients. > Of course, if we expose the two APIs in parallel we'll always have > this problem. May be the ptree API should not expose any modifiers, > i.e. exclusively operate on const ptrees. AFAIU this applies to run-time tree modification, not design change, but is also a valid point, so let me address it. My understanding would be that Ptree API is low-level, Node<> API is high-level. Clients should be safe when they commit exclusively to high-level API. This warranty is void once they start tampering with tree using Ptree API, as clearly Ptree API lets you create a structure, that does not map onto any type-correct tree in the sense of Node<> API. I would be happy with Node<> API coredumping or throwing as soon as it finds out that somebody put any kind of rubbish into underlying Ptree tree. Alternatively Node<> API could contain something like "Invalid" node type that would be exposed in places where underlying Ptre structure is broken in the sense of Node<> API. > > Example 2: You add new kind of node, e.g. for "using" declaration. > > All existing clients of Node<> will break if you just add new type to > > Node visitor. However, you can branch Node<> iface into two versions: > > That's a good point. The Visitor pattern really assumes the type hierarchy > of the visited objects to be stable. > > [...] > > >>Of course, such a move is not a simple decision. You really have to evaluate > >>what future directions you want to take, and whether that fits with the > >>synopsis framework. As I said, my interest into opencxx is in the context > >>of synopsis, so I'll probably do most of my work from there. Please take > >>this as an offer and invitation, not an attempt to fork your project. > > > > > > This requires some thought indeed. > > There is no need for a quick decision. I'm just observing that right now > we are each working on a separate branch, so merging them would be practical. > And that even more so if we are going to look into adapting qmtest as a > unit testing framework (Right now I don't do unit testing on the opencxx > backend, just the generated synopsis AST which I dump). Agreed. Let's see how things work out. > The most important thing I believe is that we define what we each expect > from opencxx (and synopsis) in the future, i.e. whether we are aiming at > the same things, and whether the common goals suggest that we both work > from a common code base, or whether the overlap is simply not large enough > to be worth a merge. Yes. > >>It shouldn't be hard, and, it should be able to do that incrementally. > >>The important step is some basic restructuring of the Parser class into > >>an interface and implementation of the statements that are common between > >>all flavours of C and C++, and put the rest into subclasses (K&R, ansi C, C89, etc.) > > > > > > An important thing to consider here is if we want OpenC++ to be validating. > > If not, then we don't need to bother that "class" occures in C source. > > Personally I think that validating parser/elaborator is much harder to write > > and IMO this should not be our priority, since we have no chances to reach > > the quality of validation that e.g. g++, MSVC or EDG present today. > > I don't understand what you mean with 'validate'. opencxx (and ctool) > are well able to indicate 'parse errors', even though the error > message is not very high level, as that would indeed require that more > language specific semantics be available to the parser (or the object > that is trying to issue a meaningful error message). I don't think it is true in general. OpenC++ is deliberately not strict about syntax. If I am not mistaken it will let you have something like a list ("{1,2,3}" maybe?) as an expression. It was meant to support language extesions. > What do you mean > with 'bother that "class" occures in C' ? Right the lexer will return > a 'CLASS' token if it runs into the 'class' string. That doesn't make > sense in C, as it would have to be an ordinary identifier. Similarly > for all the other keywords. So removing those C++-specific keywords > when scanning in C-mode sounds like the (easy) first step towards C > compatibility. Oh, I see now. So to restate my point, I think we should not invest time in validating the OpenC++ input. I would say that we should assume that input source code is valid C++. > > However, we do have a chance to do something new and useful in providing > > refactoring tool and frontend libraries, and I think this should be our > > focus now. > > agreed. I'm just wondering how much efford it would be to extend opencxx's > scope to C, as I can see a lot of use in a tool like that for example to > the GNOME folks. My concern is that we are trying to go into too many directions: * making type elaborator and program object model into library * typesafe API * Python bindings * C compatibility (Not to mention areas where we need quality improvements as templates and overloading.) BR Grzegorz ################################################################## # Grzegorz Jakacki Huada Electronic Design # # Senior Engineer, CAD Dept. 1 Gaojiayuan, Chaoyang # # tel. +86-10-64365577 x2074 Beijing 100015, China # # Copyright (C) 2004 Grzegorz Jakacki, HED. All Rights Reserved. # ################################################################## |
From: Stefan S. <se...@sy...> - 2004-06-08 02:30:04
|
Grzegorz Jakacki wrote: >>>>* provide a much better regression test coverage on different levels so we can >>>> measure to what degree changes break compatibility (some will be unavoidable) >>> >>> >>>This seems to be a lot of work. >> >>but it's worth it, I believe. > > > My point is that it is too much for one leap. ah, yes, I agree. These changes should be done incrementally. Let's start with the frontend to get more flexible access to the occ lib(s) and the build unit tests with this to cover the different processing stages. That's a good occasion to document it, too ! ;-) > I did not mean what you understood. In fact I want Node<Definition> to be > "abstract". The second part of example was a miss, let me fix it: > > Node<Definition> d = ParseDefinition("int main() {}"); > SomeVisitor v; > v(d); //<--- calls v.Visit(Node<FunctionDefinition>) Ah, so 'ParseDefinition()' is an 'abstract factory' ? That means that all those 'Node<>' classes derive from an abstract 'NodeBase' class, at which point I'm wondering what the advantage of such a templated class hierarchy is as opposed to a simple traditional one (i.e. instead of 'Node<Foo>' just using 'Foo', as 'Foo' would need to be defined anyways) >>In particular, if I want to expose this ast to a scripting frontend >>such as python, it is impractical to have these wrapper classes be >>temporary objects, as that would make the binding quite complex and >>slow. > > > (1) Why? (I have never seen how you create a binding, so I don't have > an idea what happens.) In general, the idea for this particular binding would be to allow users to define 'Walker' classes in both, C++, as well as python. If I'm in python and I get hold of a 'Declaration' object, calling a method (or attribute or property etc.) will result in the invocation of the associated C/C++ method. But since python has its own idea about function invocation, parameter passing, etc., each C++ method needs to be wrapped by a C function that deals with parameter / return value conversion / wrapping. So if a method returns a reference to another C++ object, that has to be wrapped in its respective python object. If these objects are returned by value, you get into a lot of trouble because it's hard to track dependencies (i.e. reference counts) as nodes refer to and depend on each other in a parse tree. It would be far more easy to manage child / parent links internally, so the python binding wouldn't need to care as long as the referer is still alive. > > (2) What if instead of coding these wrappers, we generate them > based on the "Cdr/Car Ptree -> highlevel Ptree" mapping? > We could generate "C++ wrappers" and "Python wrappers" How / where would that mapping be defined ? > My point is to avoid making intrusive changes to Ptree hierarchy, as it > breaks essentially all OpenC++, so compensating for such changes is both > expensive and error-prone. I understand. I don't have a good understanding how widespread opencxx' use is these days, i.e. how disruptive a change like this would be to its users. > You decide to store more information, say type annotation, at declaration > node. I means as a design decision, not run-time decision. There are > several kinds of declarations, each is encoded with some Ptree shape. You > decide, that the type annotation will be added atop the declaration tree, > so where up to now you had > > NonLeaf(Decl) > / \ > NonLeaf NonLeaf > ... ... > > you want to have > > NonLeaf(Decl) > / \ > [annotation] NonLeaf <--- the old tree > / \ > NonLeaf NonLeaf > ... ... No. I didn't mean to suggest a topological change. Rather, I suggest that instead of using raw 'NonLeaf' (say) objects, we use a richer type system with types such as 'Declaration', 'Statement', etc. that all *derive from* NonLeaf. And, as these types know the topology of the sub-trees they are composed of, they could provide typed access to the subtrees: struct Declaration : NonLeaf { Type *type() { return static_type<Type *>(Car()->Car());} ... }; which is technically nothing else but what you'v been describing above with '[annotation]', but these metadata are not stuffed into the ptree by the user, but by the compiler, i.e. API compatibility is preserved. > Clients should be > safe when they commit exclusively to high-level API. This warranty is void > once they start tampering with tree using Ptree API, as clearly Ptree API > lets you create a structure, that does not map onto any type-correct tree > in the sense of Node<> API. I would be happy with Node<> API coredumping > or throwing as soon as it finds out that somebody put any kind of rubbish > into underlying Ptree tree. Alternatively Node<> API could contain > something like "Invalid" node type that would be exposed in places where > underlying Ptre structure is broken in the sense of Node<> API. I see your point and I agree to a certain degree. However, I believe that we could enforce validity constraints by making the ptree access const through the 'Node<>' API. In other words, if you want the freedom to manipulate the ptree disregarding the C++ syntax, you'd have to get hold of a (non-const) pointer to a ptree. Hmm, that could mean that we provide two separate parsers, one generating a ptree, the other generating a 'Node<>' tree. But then it may be simpler to have a single parser generating the ptree as before, and provide a Walker that maps that to a 'Node<>' tree (would that be an 'AST' ??) [...] > Oh, I see now. So to restate my point, I think we should not invest time > in validating the OpenC++ input. I would say that we should assume that > input source code is valid C++. But the lexer and parser already look for the 'class' token (and so some walker may already recognize 'class Foo;' to be a forward declaration, say). All I'm pondering about now is whether optionally removing the C++ keywords would get us closer to a C parser, and if so, what else needs to be done to complete the step so opencxx could be used for both languages. > My concern is that we are trying to go into too many directions: > > * making type elaborator and program object model into library > * typesafe API > * Python bindings > * C compatibility > > (Not to mention areas where we need quality improvements as templates and > overloading.) yeah, that's too much to be worked on at the same time. I started this whole thread to get feedback about possible use cases and to have a discussion about how to support them, at some point in the future. I'm not working on all these fronts in parallel. I think the first and third point (making opencxx a library and providing scripting access to it) is the easiest and most useful one in the short run. C compatibility is something quite appart, i.e. I don't expect this to have much (if any) impact on the rest. Providing a type-safe ptree / AST API is probably the hardest part of this all. Regards, Stefan |
From: Grzegorz J. <ja...@he...> - 2004-06-08 06:36:13
|
On Mon, 7 Jun 2004, Stefan Seefeld wrote: > Grzegorz Jakacki wrote: [...] > > I did not mean what you understood. In fact I want Node<Definition> to be > > "abstract". The second part of example was a miss, let me fix it: > > > > Node<Definition> d = ParseDefinition("int main() {}"); > > SomeVisitor v; > > v(d); //<--- calls v.Visit(Node<FunctionDefinition>) > > Ah, so 'ParseDefinition()' is an 'abstract factory' ? Exactly. (Observe, that rParseDefition() in current code is an Abstract Factory for Ptree hierarchy.) > That means > that all those 'Node<>' classes derive from an abstract 'NodeBase' class, Nope. Read "Node<>", think "smart pointer". Node<> is intended to be a smart pointer with shallow copy. Node<FunctionDefinition> will have default conversion to Node<Definition>. Visitor for Node<> hierarchy will dispatch Node<Definition> to Visit(Node<FunctionDefition>), Visit(Node<ClassDefinition>) etc. > at which point I'm wondering what the advantage of such a templated > class hierarchy is as opposed to a simple traditional one > (i.e. instead of 'Node<Foo>' just using 'Foo', as 'Foo' would need to > be defined anyways) Foo does not need to be defined, declaration alone is sufficient. But we can also reuse PtreeXxxx classes for Foo (that would make sense that in general Node<PtreeIfStatement> is a wrapper for PtreeIfStatement*). The advantage of using Node<Foo> over using Foo* is that Node<> wrappers do not expose Car/Cdr. Moreover, in some cases determining the nature of a Ptree (e.g. if it is if-then, or if-then-else) requires some code (e.g. go to Car()->Car()->Car()->Cdr() and check if it is NULL). Ptree hierarchy has only PtreeIfStatement, which covers both if-then and if-then-else. Having Node<> wrapper we can have Node<IfThen> and Node<IfThenElse>, both wrapping PtreeIfStatement*, but carrying additionial information in the wrapper type. Yet another argument is that this kind of API is trurly and *interface* to the tree data. Node<> scheme allows to have many interfaces. In particular, recall my example of how people want "+" node to be exposed in API --- some want to see "+" as a binary operator, others as a multiary one. Assuming that Node<> shows binary plus, client can write/generate another API, that will expose "multiary" plus. Clients using different APIs can still exchange the underlying Ptree datastructure, they just see different views. > >>In particular, if I want to expose this ast to a scripting frontend > >>such as python, it is impractical to have these wrapper classes be > >>temporary objects, as that would make the binding quite complex and > >>slow. > > > > > > (1) Why? (I have never seen how you create a binding, so I don't have > > an idea what happens.) > > In general, the idea for this particular binding would be to allow > users to define 'Walker' classes in both, C++, as well as python. > If I'm in python and I get hold of a 'Declaration' object, calling > a method (or attribute or property etc.) will result in the invocation > of the associated C/C++ method. But since python has its own idea about > function invocation, parameter passing, etc., each C++ method needs to > be wrapped by a C function that deals with parameter / return value > conversion / wrapping. So if a method returns a reference to another > C++ object, that has to be wrapped in its respective python object. > If these objects are returned by value, you get into a lot of trouble > because it's hard to track dependencies (i.e. reference counts) as > nodes refer to and depend on each other in a parse tree. It would > be far more easy to manage child / parent links internally, so the > python binding wouldn't need to care as long as the referer is still > alive. I think I need an example. Also I believe that you are referring to situation when Node<> wrappers constitute a polymorhpic hierarchy, which is not what I had in mind. > > (2) What if instead of coding these wrappers, we generate them > > based on the "Cdr/Car Ptree -> highlevel Ptree" mapping? > > We could generate "C++ wrappers" and "Python wrappers" > > How / where would that mapping be defined ? As a text file, e.g.: IfPtree : IfThenElse Cond Cdr()->Car() Then Cdr()->Cdr()->Cdr()->Car() Else Cdr()->Cdr()->Cdr()->Cdr()->Cdr()->Car() or as Python data, e.g.: {("IfPtree", "IfThenElse": { "Cond" : "Cdr()->Car()", "Then" : "Cdr()->Cdr()->Cdr()->Car()", "Else" : "Cdr()->Cdr()->Cdr()->Cdr()->Cdr()->Car() } ... } > > My point is to avoid making intrusive changes to Ptree hierarchy, as it > > breaks essentially all OpenC++, so compensating for such changes is both > > expensive and error-prone. > > I understand. I don't have a good understanding how widespread opencxx' > use is these days, i.e. how disruptive a change like this would be to its > users. In fact I had in mind just the damages in OpenC++ backend (and of course the bill is higher when you keep external clients in mind). > > You decide to store more information, say type annotation, at declaration > > node. I means as a design decision, not run-time decision. There are > > several kinds of declarations, each is encoded with some Ptree shape. You > > decide, that the type annotation will be added atop the declaration tree, > > so where up to now you had > > > > NonLeaf(Decl) > > / \ > > NonLeaf NonLeaf > > ... ... > > > > you want to have > > > > NonLeaf(Decl) > > / \ > > [annotation] NonLeaf <--- the old tree > > / \ > > NonLeaf NonLeaf > > ... ... > > No. I didn't mean to suggest a topological change. Rather, I suggest > that instead of using raw 'NonLeaf' (say) objects, we use a richer type > system with types such as 'Declaration', 'Statement', etc. that all *derive from* > NonLeaf. That's how the code works today. > And, as these types know the topology of the sub-trees they are composed > of, they could provide typed access to the subtrees: > > struct Declaration : NonLeaf > { > Type *type() { return static_type<Type *>(Car()->Car());} > ... > }; > > which is technically nothing else but what you'v been describing above with > '[annotation]', Ok, I see. I think I had yet another use case where extending the topology is useful, but I am unable to recall it now (I suppose I was thinking along these lines where I was trying to find a way for clients to put their typed data in the ptree, e.g. OpenC++ backend needs to store type encodings in some ptrees, but in general not all clients need to.) > but these metadata are not stuffed into the ptree by the user, > but by the compiler, i.e. API compatibility is preserved. I think I don't understand. > > Clients should be > > safe when they commit exclusively to high-level API. This warranty is void > > once they start tampering with tree using Ptree API, as clearly Ptree API > > lets you create a structure, that does not map onto any type-correct tree > > in the sense of Node<> API. I would be happy with Node<> API coredumping > > or throwing as soon as it finds out that somebody put any kind of rubbish > > into underlying Ptree tree. Alternatively Node<> API could contain > > something like "Invalid" node type that would be exposed in places where > > underlying Ptre structure is broken in the sense of Node<> API. > > I see your point and I agree to a certain degree. However, I believe that > we could enforce validity constraints by making the ptree access const > through the 'Node<>' API. > In other words, if you want the freedom to manipulate the ptree disregarding > the C++ syntax, you'd have to get hold of a (non-const) pointer to a ptree. > Hmm, that could mean that we provide two separate parsers, one generating > a ptree, the other generating a 'Node<>' tree. > But then it may be simpler to have a single parser generating the ptree > as before, and provide a Walker that maps that to a 'Node<>' tree (would > that be an 'AST' ??) I think we are converging, but: * Why do you think that Node<> API would need to be const? * Even with two API-s (const/non-const) why would we need two parsers? We have one parser now with non-const API. We can create a wrapper that wraps this parser in const-API, period. (Maybe this is what you have in mind writting about Walker that maps ptree to Node<> ?) > [...] > > > Oh, I see now. So to restate my point, I think we should not invest time > > in validating the OpenC++ input. I would say that we should assume that > > input source code is valid C++. > > But the lexer and parser already look for the 'class' token (and so > some walker may already recognize 'class Foo;' to be a forward declaration, say). Sorry, I was not clear enough. I understand that we cannot use C++ parser as is to parse C code. > All I'm pondering about now is whether optionally removing the C++ keywords > would get us closer to a C parser, and if so, what else needs to be done to > complete the step so opencxx could be used for both languages. This is an interesting question, i.e. can the "common factor" of C and C++ parser be factored out and how. Switching between C/C++ keywords should be easy with existing code. Moreover, lexer is encapsulated, so this is not an issue too. The fun begins in parser. > > My concern is that we are trying to go into too many directions: > > > > * making type elaborator and program object model into library > > * typesafe API > > * Python bindings > > * C compatibility > > > > (Not to mention areas where we need quality improvements as templates and > > overloading.) > > yeah, that's too much to be worked on at the same time. I started > this whole thread to get feedback about possible use cases and to > have a discussion about how to support them, at some point in the future. > I'm not working on all these fronts in parallel. I think the first and > third point (making opencxx a library and providing scripting access to it) > is the easiest and most useful one in the short run. This is exactly what I think. > C compatibility is > something quite appart, i.e. I don't expect this to have much (if any) > impact on the rest. Providing a type-safe ptree / AST API is probably > the hardest part of this all. I think it depends. There are many possible AST object models. In particular, existing Ptree hierarchy gives raise to one of them. Having read-only type-safe API along this model is quite easy, it is just a matter of determining the Car/Cdr paths. If this API is useful and convenient is another question. > > Regards, > Stefan > > > > > ------------------------------------------------------- > This SF.Net email is sponsored by: GNOME Foundation > Hackers Unite! GUADEC: The world's #1 Open Source Desktop Event. > GNOME Users and Developers European Conference, 28-30th June in Norway > http://2004/guadec.org > _______________________________________________ > Opencxx-users mailing list > Ope...@li... > https://lists.sourceforge.net/lists/listinfo/opencxx-users > > ################################################################## # Grzegorz Jakacki Huada Electronic Design # # Senior Engineer, CAD Dept. 1 Gaojiayuan, Chaoyang # # tel. +86-10-64365577 x2074 Beijing 100015, China # # Copyright (C) 2004 Grzegorz Jakacki, HED. All Rights Reserved. # ################################################################## |
From: Stefan S. <se...@sy...> - 2004-06-08 13:20:05
|
Grzegorz Jakacki wrote: >>>I did not mean what you understood. In fact I want Node<Definition> to be >>>"abstract". The second part of example was a miss, let me fix it: >>> >>> Node<Definition> d = ParseDefinition("int main() {}"); >>> SomeVisitor v; >>> v(d); //<--- calls v.Visit(Node<FunctionDefinition>) >> >>Ah, so 'ParseDefinition()' is an 'abstract factory' ? > > > Exactly. > > (Observe, that rParseDefition() in current code is an Abstract > Factory for Ptree hierarchy.) Indeed. But Then you can't simply assign to a Node<Definition> without a downcast. That was what I stumbled over :-) >>That means >>that all those 'Node<>' classes derive from an abstract 'NodeBase' class, > > > Nope. Read "Node<>", think "smart pointer". ok. But that's a different issue. Whether we use smart pointers or not, somewhere we need a class hierarchy with a type system that covers *all* the C++ grammar. > > Node<> is intended to be a smart pointer with shallow copy. > > Node<FunctionDefinition> will have default conversion to Node<Definition>. > > Visitor for Node<> hierarchy will dispatch Node<Definition> to > Visit(Node<FunctionDefition>), Visit(Node<ClassDefinition>) etc. that's possible but sounds a bit complicated, since even though 'FunctionDefinition' IsA 'Definition', but 'Node<FunctionDefinition>' is *not* a 'Node<Definition>'. I.e. in the former case the compiler and the C++ type system would do the proper dispatching for us, but when using the indirection over Node<> smart pointers we'd have to do that manually. That's why I'd like to keep the discussion to the AST type hierarchy separate from the issue of whether or not to use Node<> smart pointers. >>at which point I'm wondering what the advantage of such a templated >>class hierarchy is as opposed to a simple traditional one >>(i.e. instead of 'Node<Foo>' just using 'Foo', as 'Foo' would need to >>be defined anyways) > > > Foo does not need to be defined, declaration alone is sufficient. But we > can also reuse PtreeXxxx classes for Foo (that would make sense that in > general Node<PtreeIfStatement> is a wrapper for PtreeIfStatement*). When is a declaration sufficient ? > The advantage of using Node<Foo> over using Foo* is that > Node<> wrappers do not expose Car/Cdr. yeah, I see your point. Well, what about deriving privately from Ptree and then raising the const methods into the public API via 'using' directives ? That has the additional benefit of preserving the type system for us to play with. > Moreover, in some cases determining the nature of a Ptree (e.g. if it is > if-then, or if-then-else) requires some code (e.g. go to > Car()->Car()->Car()->Cdr() and check if it is NULL). Ptree hierarchy has > only PtreeIfStatement, which covers both if-then and if-then-else. Having > Node<> wrapper we can have Node<IfThen> and Node<IfThenElse>, both > wrapping PtreeIfStatement*, but carrying additionial information in the > wrapper type. hmm, the 'ctool' backend I was talking about earlier has an 'IfStatement' that looks about so: struct IfStatement : Statement { Expression *condition; Statement *then_block; Statement *else_block; }; where the 'else_block' can be empty. I like the simplicity of this, even though I'd encapsulate it a little more (especially if this is just a ptree wrapper). > Yet another argument is that this kind of API is trurly and *interface* to > the tree data. Node<> scheme allows to have many interfaces. In particular, > recall my example of how people want "+" node to be exposed in API --- some > want to see "+" as a binary operator, others as a multiary one. Assuming > that Node<> shows binary plus, client can write/generate another API, that > will expose "multiary" plus. Clients using different APIs can still exchange > the underlying Ptree datastructure, they just see different views. yeah, I agree in general that delegation is often better than derivation. My real issue here is, as I said, the lost of the type system. Could you demonstrate how a Node<> based visitor would be implemented (i.e. how it would resolve the correct type) ? As that's the central point I believe, I'll wait before I continue arguing about the other points until my understanding of this Node<> visitor as you see it is more complete. Regards, Stefan |
From: Grzegorz J. <ja...@he...> - 2004-06-09 09:08:03
|
On Tue, 8 Jun 2004, Stefan Seefeld wrote: > Grzegorz Jakacki wrote: > > >>>I did not mean what you understood. In fact I want Node<Definition> to be > >>>"abstract". The second part of example was a miss, let me fix it: > >>> > >>> Node<Definition> d = ParseDefinition("int main() {}"); > >>> SomeVisitor v; > >>> v(d); //<--- calls v.Visit(Node<FunctionDefinition>) > >> > >>Ah, so 'ParseDefinition()' is an 'abstract factory' ? > > > > > > Exactly. > > > > (Observe, that rParseDefition() in current code is an Abstract > > Factory for Ptree hierarchy.) > > Indeed. But Then you can't simply assign to a Node<Definition> without > a downcast. That was what I stumbled over :-) > > >>That means > >>that all those 'Node<>' classes derive from an abstract 'NodeBase' class, > > > > > > Nope. Read "Node<>", think "smart pointer". > > ok. But that's a different issue. Whether we use smart pointers or not, > somewhere we need a class hierarchy with a type system that covers *all* > the C++ grammar. Hm, looks like I did not convey what I ment to. Perhaps a piece of code will speak more clearly than myself (attached at the end of e-mail). > > > > Node<> is intended to be a smart pointer with shallow copy. > > > > Node<FunctionDefinition> will have default conversion to Node<Definition>. > > > > Visitor for Node<> hierarchy will dispatch Node<Definition> to > > Visit(Node<FunctionDefition>), Visit(Node<ClassDefinition>) etc. > > that's possible but sounds a bit complicated, since even though > 'FunctionDefinition' IsA 'Definition', but 'Node<FunctionDefinition>' > is *not* a 'Node<Definition>'. You restrict yourself to modelling IsA by derived-to-base conversion. > I.e. in the former case the compiler > and the C++ type system would do the proper dispatching for us, but > when using the indirection over Node<> smart pointers we'd have to > do that manually. Excatly. See the code. > That's why I'd like to keep the discussion to the AST type hierarchy > separate from the issue of whether or not to use Node<> smart pointers. But Node<> pointers are meant to "simulate" AST hierarchy. > >>at which point I'm wondering what the advantage of such a templated > >>class hierarchy is as opposed to a simple traditional one > >>(i.e. instead of 'Node<Foo>' just using 'Foo', as 'Foo' would need to > >>be defined anyways) > > > > > > Foo does not need to be defined, declaration alone is sufficient. But we > > can also reuse PtreeXxxx classes for Foo (that would make sense that in > > general Node<PtreeIfStatement> is a wrapper for PtreeIfStatement*). > > When is a declaration sufficient ? Pls. let me know if the code makes it clear to you. > > The advantage of using Node<Foo> over using Foo* is that > > Node<> wrappers do not expose Car/Cdr. > > yeah, I see your point. Well, what about deriving privately from Ptree > and then raising the const methods into the public API via 'using' directives ? Hm, I don't think I meant it. > That has the additional benefit of preserving the type system for us to play with. Not sure if i get you. > > > Moreover, in some cases determining the nature of a Ptree (e.g. if it is > > if-then, or if-then-else) requires some code (e.g. go to > > Car()->Car()->Car()->Cdr() and check if it is NULL). Ptree hierarchy has > > only PtreeIfStatement, which covers both if-then and if-then-else. Having > > Node<> wrapper we can have Node<IfThen> and Node<IfThenElse>, both > > wrapping PtreeIfStatement*, but carrying additionial information in the > > wrapper type. > > hmm, the 'ctool' backend I was talking about earlier has an 'IfStatement' > that looks about so: > > struct IfStatement : Statement > { > Expression *condition; > Statement *then_block; > Statement *else_block; > }; > > where the 'else_block' can be empty. I like the simplicity of this, even > though I'd encapsulate it a little more (especially if this is just a > ptree wrapper). That's OK, it just a matter of how you want the wrappers to behave. > > Yet another argument is that this kind of API is trurly and *interface* to > > the tree data. Node<> scheme allows to have many interfaces. In particular, > > recall my example of how people want "+" node to be exposed in API --- some > > want to see "+" as a binary operator, others as a multiary one. Assuming > > that Node<> shows binary plus, client can write/generate another API, that > > will expose "multiary" plus. Clients using different APIs can still exchange > > the underlying Ptree datastructure, they just see different views. > > yeah, I agree in general that delegation is often better than derivation. > My real issue here is, as I said, the lost of the type system. Could you > demonstrate how a Node<> based visitor would be implemented (i.e. how it > would resolve the correct type) ? As that's the central point I believe, > I'll wait before I continue arguing about the other points until my > understanding of this Node<> visitor as you see it is more complete. Voila: ================================================================== #include <string> using std::string; // ----- low-level tree implementation (a la existing Ptree) ----- struct Ptree { virtual ~Ptree() {} }; struct Leaf : public Ptree { Leaf(const string& txt) : txt_(txt) {} string txt_; }; struct NonLeaf : public Ptree { NonLeaf(Ptree* l, Ptree* r) : l_(l), r_(r) {} Ptree* l_; Ptree* r_; }; struct PtreeBinaryOp : public NonLeaf { PtreeBinaryOp(Ptree* l, Ptree* r) : NonLeaf(l, r) {} }; struct PtreeVar : public Leaf { PtreeVar(const string& id) : Leaf(id) {} }; /* The above AST is capable of representing the sentences derived from the following grammar (The Grammar): (plus) E -> E "+" E (uminus) E -> "-" E (var) E -> id using the following mappings: T( E1 "+" E2 ) = PtreeBinaryOp / \ T( E1 ) NonLeaf / \ Leaf T( E2 ) "+" T( "-" E ) = NonLeaf / \ Leaf T( E ) "-" T( id ) = PtreeVar "id" This mapping is quite irregular. My point is to demonstrate how to deal with similar irregularities in existing Ptree. */ // ------ high-level API ------ /* High level API defined below presents The Grammar in a regular, canonical way. */ // high-level API declares names of nonterminals... struct Expr; // ...and productions. struct Plus; struct UMinus; struct Var; // Moreover, it adds extra "error" production, to represent // Ptree subtrees that do not encode any legal tree. struct Invalid; // First part of API is a set of wrapper classes // (think about them as non-owning smart pointers // or handles) // // It is not difficult to automate creation of this code // from the grammar description. template <class T> class Node; class AbstractNodeVisitor; template <> class Node<Expr> { public: Node() : p_(0) {} private: Node(Ptree* p) : p_(p) {} template <class U> friend class Node; friend void Dispatch(AbstractNodeVisitor&, Node<Expr>); Ptree* p_; }; template <> class Node<Plus> { public: static Node New(Node<Expr> l, Node<Expr> r) { return new PtreeBinaryOp(l.p_, new NonLeaf(new Leaf("+"), r.p_)); } Node() : p_(0) {} Node<Expr> GetLeft() const { return Node<Expr>(p_->l_); } void SetLeft(Node<Expr> e) const { p_->l_ = e.p_; } Node<Expr> GetRight() const { return Node<Expr>(static_cast<NonLeaf*>(p_->r_)->r_); } void SetRight(Node<Expr> e) const { static_cast<NonLeaf*>(p_->r_)->r_ = e.p_; } operator Node<Expr>() const { return p_; } private: Node(PtreeBinaryOp* p) : p_(p) {} template <class U> friend class Node; friend void Dispatch(AbstractNodeVisitor&, Node<Expr>); PtreeBinaryOp* p_; }; template <> class Node<UMinus> { public: static Node New(Node<Expr> c) { return new NonLeaf(new Leaf("-"), c.p_); } Node() : p_(0) {} Node<Expr> GetChild() const { return Node<Expr>(p_->r_); } void SetChild(Node<Expr> e) const { p_->r_ = e.p_; } operator Node<Expr>() const { return p_; } private: Node(NonLeaf* p) : p_(p) {} template <class U> friend class Node; friend void Dispatch(AbstractNodeVisitor&, Node<Expr>); NonLeaf* p_; }; template <> class Node<Var> { public: static Node<Var> New(const string& id) { return new PtreeVar(id); } Node() : p_(0) {} string GetId() const { return p_->txt_; } void SetId(const string& id) const { p_->txt_ = id; } operator Node<Expr>() const { return p_; } private: Node(PtreeVar* p) : p_(p) {} template <class U> friend class Node; friend void Dispatch(AbstractNodeVisitor&, Node<Expr>); PtreeVar* p_; }; template <> class Node<Invalid> { public: Node() : p_(0) {} operator Node<Expr>() const { return p_; } private: Node(Ptree* p) : p_(p) {} template <class U> friend class Node; friend void Dispatch(AbstractNodeVisitor&, Node<Expr>); Ptree* p_; }; // Third part of high-level API is an abstract visitor for Node<>'s class AbstractNodeVisitor { public: virtual void Visit(Node<Plus>) = 0; virtual void Visit(Node<Var>) = 0; virtual void Visit(Node<UMinus>) = 0; virtual void Visit(Node<Invalid>) = 0; }; // Fourth part of high-level API is a dispatcher -- // a mechanism that lets you find out more detailed // type information about node (e.g. you pass // Node<Expr>, and dispatcher calls, say, // v.Visit(Node<Plus>), if your Node<Expr> was indeed // a plus expression). // // There are many ways to implement Dispatch(). Usually // it will use some technique to find out // actual (dynamic) type of Ptree* wrapped in Node<>'s, // plus a little bit of peeking into the tree, if // dynamic type information itself is not enough // to find out what high-level node should be reported. // // Note1: This implementation uses dynamic_cast in Ptree* // hierarchy to learn about dynamic type; other implementations // are possible, e.g. type tags ('p_->What()'), type querries // ('p_->IsA(...)') or visitation in Ptree hierarchy. // // Note2: If the syntax presented by high-level API matches closely // the ihneritance hierarchy in Ptree, then it is just enough // to find out the dynamic type of Ptree* to be able to build // Node<> wrapper; in particular, no further "peeking" is necessary // (in fact some peeking may be administered to check if the // Ptree tree has the correct topolofy, so that later, say, // Node<>::GetRight() does not hit a NULL pointer somewhere // along Cdr/Car path). // // Note3: I think that for many AST mappings a workable implementation // of Dispatch can be generated automatically (although there may // be a quagmire somewhere here; e.g. finding if an AST mapping is // ambiguous seems undecidable in general). // // Note4: Dispatch() is effectively a "tree parser". void Dispatch(AbstractNodeVisitor& v, Node<Expr> e) { if (PtreeBinaryOp* b = dynamic_cast<PtreeBinaryOp*>(e.p_)) { if (NonLeaf* br = dynamic_cast<NonLeaf*>(b->r_)) { if (Leaf* brl = dynamic_cast<Leaf*>(br->l_)) { if (brl->txt_ == "+") { if (b->l_ && br->r_) { return v.Visit(Node<Plus>(b)); } } } } } if (NonLeaf* b = dynamic_cast<NonLeaf*>(e.p_)) { if (Leaf* bl = dynamic_cast<Leaf*>(b->l_)) { if (bl->txt_ == "-") { if (b->r_) { return v.Visit(Node<UMinus>(b)); } } } } if (PtreeVar* b = dynamic_cast<PtreeVar*>(e.p_)) { return v.Visit(Node<Var>(b)); } v.Visit(Node<Invalid>(e.p_)); } // // General observations: // // * There is no inheritance relation whatsoever between // instantiations of Node<>. // // * Node<> behaves like polymorphic pointer. // // * Internally Node<> can use raw, smart or gc-managed pointers. // // * Node<T>::New is analogon of "new T" // // * Node<T>() is analog of default pointer constructor; // Note<T>() creates "null pointer" // // * Node<Expr> is "abstract" -- it does have New() member function, // you can either create null Node<Expr> or assign/initialize it // from another, non-abstract Node<T> // // * There is a user-defined conversion from "concrete" Node<>s to // "abstract" Node<Expr> (this is analogon of derived-to-base conversion) // // * Dispatch() is an analogon of virtual dispatch. // // * Dispatch() dispatches to a classic, polymorphic visitor, i.e. its // first argument must be derived from AbstractNodeVisitor; it is also // possible to arrange for generic visitation, i.e. templatize the first // argument of Dispatch(). // // * AFAICS code pertaining to Node<> wrappers is inlineable on modern // compilers. // // * It is possible to model "multiple inheritance" on Node<>s, // which corresponds to grammar in which many nonterminals produce // the same RHS. Observe, that unlike standard implementation // in C++ typesystem, such "multiple inheritance" does not involve // memory overhead. // -------- Example --------- class MyVisitor : public AbstractNodeVisitor { public: MyVisitor(ostream& os) : os_(os) {} void Visit(Node<Plus> p) { os_ << "Plus("; Dispatch(*this, p.GetLeft()); os_ << ","; Dispatch(*this, p.GetRight()); os_ << ")"; } virtual void Visit(Node<Var> v) { os_ << "Var(" << v.GetId() << ")"; } virtual void Visit(Node<UMinus> u) { os_ << "UMinus("; Dispatch(*this, u.GetChild()); os_ << ")"; } virtual void Visit(Node<Invalid>) { os_ << "Invalid()"; } private: ostream& os_; }; int main() { Node<Plus> p = Node<Plus>::New( Node<UMinus>::New( Node<Var>::New("x") ) , Node<Var>::New("y") ); Node<Expr> e = p; // "derived-to-base" MyVisitor v(std::cout); Dispatch(v, e); cout << endl; Node<Expr> f = Node<UMinus>::New(Node<Var>::New("z")); p.SetRight(f); Dispatch(v, e); cout << endl; } ================================================================== Best regards Grzegorz ################################################################## # Grzegorz Jakacki Huada Electronic Design # # Senior Engineer, CAD Dept. 1 Gaojiayuan, Chaoyang # # tel. +86-10-64365577 x2074 Beijing 100015, China # # Copyright (C) 2004 Grzegorz Jakacki, HED. All Rights Reserved. # ################################################################## |
From: Stefan S. <se...@sy...> - 2004-06-10 01:52:03
|
Hi Grzegorz, Grzegorz Jakacki wrote: > You restrict yourself to modelling IsA by derived-to-base conversion. I try to avoid having to define and implement my own type system. C++ already provides one for us, so why not delegate all the related work to the C++ runtime (RTTI) and the compiler ? I'm not (yet) convinced that the limitations of such an approach outwight the advantages. > struct PtreeBinaryOp : public NonLeaf > { > PtreeBinaryOp(Ptree* l, Ptree* r) : NonLeaf(l, r) {} > }; Why isn't the 'BinaryOp' part of the high level API ? If it was, it could have three members to directly access the operator as well as left and right operands: struct BinaryOperator : public NonLeaf { //. add appropriate constructor(s) const Ptree *get_operator() const; // instead of 'const Ptree' this could return a typed subclass Expression *left(); Expression *right(); }; by the way: if this returns a Ptree, it has to be const, so users can't modify it to become something other than a binary operator. As Expressions would hide the un(type)safe Ptree API, they don't need to be const, i.e. users can modify them as long as it remains an Expression. > // Fourth part of high-level API is a dispatcher -- > // a mechanism that lets you find out more detailed > // type information about node (e.g. you pass > // Node<Expr>, and dispatcher calls, say, > // v.Visit(Node<Plus>), if your Node<Expr> was indeed > // a plus expression). yes, I think this is the central point we are disagreeing on: if we don't use the C++ type system, we have to build our own. That's quite heavy and I don't see any advantage in such an approach. Regards, Stefan |
From: Stefan S. <se...@sy...> - 2004-06-17 02:03:57
|
Grzegorz Jakacki wrote: > Hi Stefan, > > Stefan Seefeld wrote: > >> you make some good points. However, your arguments all gravitate >> around some requirements which I'm not sure I agree with. (That >> was the reason for my 'future directions' thread, so I could better >> appreciate user requirements to base my judgement on) >> I'm not sure the API enhancement I'v been suggesting would be >> one-among-many. In my mind such a typed AST simply reflects the >> C++ language, so there isn't that much freedom to choose. > I don't think that the language itself detemines the canonical AST > architecture. ok. >> The API should allow the definition >> of such extensions, but they remain extensions on a single (high-level) >> API, instead of being just another API. > > > That's my target too. In particular I would like to let clients > *nonintrusively* extend OpenC++ frontend library so that they can > > * obtain API that presents e.g. ternary '+', or > * API that hides parentheses in expressions, or these two seem to me to be just a special *view* on the AST, but not the AST itself. Read: For this I wouldn't provide a different AST, but rather a tool to filter the unwanted tokens or generally to present the same data in different ways. Anyways, I'm sure we agree on this as the highlevel Node<> API you are suggesting is nothing but a high-level view on the Ptree, too... > * API that has an extra construct (e.g. 'metaclass ID' declaration) that > I can typesafely insert into AST obtained from the parser. that one is different since this really requires the parser (ptree factory) to be aware of the extension. It's thus more difficult to achieve this type of extensibility (but the OpenC++ parser already has hooks for this). >> Ok, but even if we go for wrapper instead of subclasses, >> I don't see any reason not to use the C++ type system, i.e. I'd still >> suggest 'Statement', 'Expression', etc. to form a type system, > > > I assume you meant "type hierarchy". right >> instead >> of having Node be a template type. > > > I don't get it. Are you talking about (a) Statement and Expression being > wrappers around Ptree* or about (b) putting more type info into Ptree > hierarchy? that's exactly the question. If we use wrappers we don't touch Ptree, i.e. we don't add type info into the Ptree hierarchy, but instead build another hierarchy on top of the Ptree. > If (a), than I fail to understand how this should work --- Statement > and Expression would be polymorphic, so how can they work as wrappers? > What copy semantics do you want for them? Will Expression be a concrete > base class??? This link describes the ctool API (the C parser backend synopsis provides now): http://synopsis.fresco.org/docs/Manual/ctool/index.html Have a look into the 'Inheritance Graph'. That's about the AST hierarchy I have in mind (with the required additions to cover C++, of course). Expression is indeed an abstract base class, And as such it can't simply be copied, at least not on the stack. Thus, when I'm talking about 'wrappers', I'm not talking about temporary objects, but a superstructure on top of the parse tree. > If (b), then I sustain my argument that this is a viable solution, > but it requires lots of *revolutionary* changes in parser, elaborator > and translator. Hmm, I'd like to get a better understanding of what 'lots of revolutionary changes' actually means. For one, the Ptree type hierarchy already seems quite complete. It just doesn't provide a high-level API to access the structure without using Cdr() and Car(). Thus, adding a high-level API to the existing classes wouldn't be intrusive at all. The thing that would really need to be changed is the Walker interface. But isn't that a good thing ? Making Walker a true Visitor would be beneficial to everybody (there, too, I don't yet fully understand the ramifications, i.e. whether the required changes would have to be applied in one shot or whether we could do it incrementally). > This is a recurring point and I suppose that this is the source of > misunderstanding. It seems to me that you believe that Node<> solution > requires constructing another in-memory tree atop Ptree tree. This is > not true. Yes, see above. For Node<> to be able to live on the stack, it needs to be non-polymorphic, i.e. it can't use the C++ type system directly. Thus it requires an IMO ugly mechanism to dispatch methods based on the dynamic type of one of its members. Again, what I find confusing is that the high-level type hierarchy is part of the Ptree, but not the high-level API. If you suggest to add the high-level API on top, why not the type hierarchy, too ? What good is the current type hierarchy anyways, as there's no associated API to begin with ? Would anything stop working if I took out, say, the 'PTreeWhileStatement' class ? > Perhaps you are looking at Node<> through the existing solution from > Synopsis. With Node<> API *no* AST is *built*. See the code example that > I posted --- the only tree in memory is Ptree tree. yes, I noticed the difference. I just don't understand the rationale for this, see above. ... I'm sorry if this thread sounds confusing and doesn't seem to lead anywhere. At this point it just reflects my lack of understanding of the current design (i.e. either the presence of the PTree class hierarchy or the absence of a corresponding high-level API). Regards, Stefan |
From: Grzegorz J. <ja...@ac...> - 2004-06-23 13:34:02
|
Hi, Stefan Seefeld wrote: > Grzegorz Jakacki wrote: > >> Hi Stefan, >> >> Stefan Seefeld wrote: >> >>> you make some good points. However, your arguments all gravitate >>> around some requirements which I'm not sure I agree with. (That >>> was the reason for my 'future directions' thread, so I could better >>> appreciate user requirements to base my judgement on) > > >>> I'm not sure the API enhancement I'v been suggesting would be >>> one-among-many. In my mind such a typed AST simply reflects the >>> C++ language, so there isn't that much freedom to choose. > > >> I don't think that the language itself detemines the canonical AST >> architecture. > > > ok. > >>> The API should allow the definition >>> of such extensions, but they remain extensions on a single (high-level) >>> API, instead of being just another API. >> >> >> >> That's my target too. In particular I would like to let clients >> *nonintrusively* extend OpenC++ frontend library so that they can >> >> * obtain API that presents e.g. ternary '+', or >> * API that hides parentheses in expressions, or > > > these two seem to me to be just a special *view* on the AST, but not the > AST itself. Read: For this I wouldn't provide a different AST, but rather > a tool to filter the unwanted tokens or generally to present the same data > in different ways. Ok. Node<> interface is precisely such a tool. > Anyways, I'm sure we agree on this as the highlevel Node<> API you are > suggesting is nothing but a high-level view on the Ptree, too... Exactly. > >> * API that has an extra construct (e.g. 'metaclass ID' declaration) that >> I can typesafely insert into AST obtained from the parser. > > > that one is different since this really requires the parser (ptree factory) > to be aware of the extension. > It's thus more difficult to achieve this type > of extensibility (but the OpenC++ parser already has hooks for this). Yes. >>> Ok, but even if we go for wrapper instead of subclasses, >>> I don't see any reason not to use the C++ type system, i.e. I'd still >>> suggest 'Statement', 'Expression', etc. to form a type system, >> >> >> >> I assume you meant "type hierarchy". > > > right > >>> instead >>> of having Node be a template type. >> >> >> >> I don't get it. Are you talking about (a) Statement and Expression being >> wrappers around Ptree* or about (b) putting more type info into Ptree >> hierarchy? > > > that's exactly the question. If we use wrappers we don't touch Ptree, i.e. > we don't add type info into the Ptree hierarchy, but instead build another > hierarchy on top of the Ptree. Yes. >> If (a), than I fail to understand how this should work --- Statement >> and Expression would be polymorphic, so how can they work as wrappers? >> What copy semantics do you want for them? Will Expression be a >> concrete base class??? > > > This link describes the ctool API (the C parser backend synopsis > provides now): > http://synopsis.fresco.org/docs/Manual/ctool/index.html > > Have a look into the 'Inheritance Graph'. That's about the AST hierarchy > I have in mind (with the required additions to cover C++, of course). OpenC++ has a similar hierarchy "logically", but it is not enforced by design. > Expression is indeed an abstract base class, And as such it can't simply be > copied, at least not on the stack. Thus, when I'm talking about 'wrappers', > I'm not talking about temporary objects, but a superstructure on top of > the parse tree. I did not have anything like that in mind. The wrappers I suggested were meant to be used as value-semantics objects, in many cases temporaries. Just very much like e.g. boost::shared_ptr. > >> If (b), then I sustain my argument that this is a viable solution, >> but it requires lots of *revolutionary* changes in parser, elaborator >> and translator. > > > Hmm, I'd like to get a better understanding of what 'lots of revolutionary > changes' actually means. Sure, sorry for not being clear enough. Take a particular construct, e.g. member function declaration. Legacy code uses some topology of Ptree nodes to encode it. You want to replace it with one object of class, say, MemberFunctionDecl. This requires fixing all sites that depend on legacy topology: * the code in parser that build the topology * the code in elaborator and translator that inspects and/or modifies the topology. Another issue is that you want to introduce stricter typing into the existing parser/elaborator/translator code. Most likely you have had an experience of retrofiting "const" --- in many cases it is a snowball effect and I would expect the same here. I think (but maybe I am just overly cautious) that it is easier and safer first to build the type scaffolding out of the code (i.e. to build Node<>s structure), then blend it into the code (=change Ptree*'s into more concrete types according to the model worked out in Node<>s "hierarchy"), if such a move seems useful. > For one, the Ptree type hierarchy already seems > quite complete. It just doesn't provide a high-level API to access the > structure without using Cdr() and Car(). Right. > Thus, adding a high-level API > to the existing classes wouldn't be intrusive at all. Right. However what do you do about Cdr/Car? They have to stay, because lots of code depends on them (not only external code, but elaborator and translator in the first place) Making them private, as once suggested, is not a solution, since elaborator/translator and some clients need to see the low-level view, at least still for some time. > The thing that would really need to be changed is the Walker interface. > But isn't that a good thing ? Making Walker a true Visitor would be > beneficial > to everybody (there, too, I don't yet fully understand the ramifications, > i.e. whether the required changes would have to be applied in one shot or > whether we could do it incrementally). AFAIU your point is to have a visitor with "Visit(PtreSOMETHING* )" member functions instead of "VisitSOMETHING(Ptree*)" as it is now, right? IMO this is the easiest piece. Just add another visitation scheme to Ptree and make another visitor. This will not interfere with the old one at all. >> This is a recurring point and I suppose that this is the source of >> misunderstanding. It seems to me that you believe that Node<> solution >> requires constructing another in-memory tree atop Ptree tree. This is >> not true. > > > Yes, see above. For Node<> to be able to live on the stack, it needs > to be non-polymorphic, i.e. it can't use the C++ type system directly. Correct. > Thus it requires an IMO ugly mechanism to dispatch methods based on > the dynamic type of one of its members. (1) Why ugly? (2) C++ virtual dispatch does precisely the same --- dispatches a call based on a dynamic type of an object. Looks like I am missing the point here, could you explain? > Again, what I find confusing is that the high-level type hierarchy is > part of the Ptree, but not the high-level API. If you suggest to > add the high-level API on top, why not the type hierarchy, too ? (1) Because I can keep Cdr/Car in the low-level interface and hide them in the high-level interface. (2) Because I can manipulate the high-level interface to get yet more typesafe tree view, but I don't need to compensate for these changes in the legacy code (including parser, elaborator and walker) > What good is the current type hierarchy anyways, as there's no associated > API to begin with ? Would anything stop working if I took out, say, > the 'PTreeWhileStatement' class ? PtreeWhileStatement is just a NonLeaf with a tag saying "I am special non-leaf: I am while-statement". Also PtreeWhileStatement redefines What(), so that type querrying is possible ( Ptree* p; ... if (p->What()==...) { ... } ). Also PtreeWhileStatement redefines Translate(), so that Walker::TranslateWhile() is called. If you remove PtreeWhileStatement: * parser will no longer compile, because it creates PtreeWhileStatement non-leaf after having parsed "while(){}"; but say you change PtreeWhileStatement into NonLeaf to make parser compile; then... * elaborator may compile, but will no longer work, because p->What() will give value indicating generic non-leaf, not while-statement. Perhaps even elaborator will not compile, if for some reason it downcasts NonLeaf to PtreeWhileStatement * code using walkers will no longer work, because p->Translate(w) will dispatch to w->Translate(Ptree*), not w->Translate(PtreeWhileStatement) >> Perhaps you are looking at Node<> through the existing solution from >> Synopsis. With Node<> API *no* AST is *built*. See the code example >> that I posted --- the only tree in memory is Ptree tree. > > > yes, I noticed the difference. I just don't understand the rationale > for this, see above. > > ... As I see it: (1) You can hide Cdr/Car from the users of high-level iface, yet still make them available in low-level iface. (2) The boundary of high-level and low-level API is an additional level of indirection that makes it easier to tinker with things on one side without affecting things on the other (e.g. making high-level iface more typesafe without touching Ptrees or changing topologies of Ptrees, without affecting clients of high-level iface). (3) It is non-intrusive wrt. the legacy code (parser, elaborator, translator). > I'm sorry if this thread sounds confusing and doesn't seem to lead > anywhere. At this point it just reflects my lack of understanding of > the current design (i.e. either the presence of the PTree class hierarchy > or the absence of a corresponding high-level API). Let me try to recap what I learnt about OpenC++ AST. Imagine, that we have just this simple AST structure: struct Ptree { virtual int What() = 0; }; struct Leaf : public Ptree { char* text; int length; } struct NonLeaf : public Ptree { NonLeaf(int what, Ptree* l, Ptree* r) : what_(what), l_(l), r_(r) {} virtual int What() { return what_; } int what_; Ptree* l_; Ptree* r_; }; You create representation of "while (COND) BODY" say like this: Ptree* parseStmt() { ... // it's while return new NonLeaf(WHILE_C, cond, new NonLeaf(0, body, NULL)) ... } where WHILE_C is enum or constant. Now imagine that you encode 'what_' attribute in the dynamic type of an object. For that purpose you derive classes from NonLeaf that have no data, they just redefine What(), e.g.: struct PtreeWhile : public NonLeaf { PtreeWhile(Ptree* l, Ptree* r) : NonLeaf(l, r) {} virtual int What() { return WHILE_C; } }; [Observe, that you no longer need the first argument in NonLeaf ctor, as its information is now encoded in the object's type.] The code for creating while representation now looks like this: Ptree* parseStmt() { ... // it's while return new PtreeWhile(cond, new NonLeaf(body, NULL)) ... ^^^^^^^^^^ ^^^^^^^ } specific node generic node Now imagine you add visitation by introducing: class PtreeWhile { ... void Translate(Walker* w) { w->TranslateWhile((Ptree*)this); } ... }; Also, you may introduce additional, node-specific data members into some concrete nodes: class PtreeClassSpec { ... char* encoded_name; }; This is how Ptree works today. The concrete types of Ptree nodes serve mainly as "tags", also they allow storing type-specific data members, but generaly all interfaces use Ptree*. The "high-level" types are present in the sense that many functions that take Ptree* in fact always take objects of certain concrete subclass of Ptree (so in fact the "working" type of their argument is more concrete than Ptree), but this is not enforced by design --- from the compiler's point of view those functions take arbitrary Ptree*. Hope this helps. It seems to me that we should try to move forward with implementation. I don't think we have an agreement about Node<>s usability. However we do agree that adding accessors to concrete Ptree nodes will help. (I still think that they belong into Node<> wrappers, but anyway, they will be helpful no matter where they are put). Why don't we begin with this part? As for the HEAD --- I did not have time to investigate the failure that I get in CF RedHat when building with gc. Perhaps I will be able to look into it over the weekend. Best regards Grzegorz |
From: Stefan S. <se...@sy...> - 2004-07-08 02:22:04
Attachments:
inline.patch
|
Grzegorz Jakacki wrote: > Currently each C++ constructs is represented by a bunch of Ptree nodes, > e.g.: > > if (C) T else E > > is something like > > PtreeIfStatement > / \ > C NonLeaf > / \ > T NonLeaf > / \ > E NULL > > [in fact it may be even more complicated] > > PtreeIfStatement is derived from NonLeaf; NonLeaf has obvious binary > ctor. All derived classes generally also have binary ctors which just > forward to NonLeaf's ctor. > > In particular currently there is no function which would build the above > topology given C, T and E. Similarily, there is no function that would > destroy it given a pointer to PtreeIfStatement. > > I was looking for a place to put this functionality in Node<>-less > scenario. What I was trying to convey was that this function does not > belong as a ctor in PtreeIfStatement, because: > > (1) PtreeIfStatement would need to have two ctors: > * legacy ctor forwarding to NonLeaf's ctor > * new ctor building auxiliary NonLeaf nodes indeed, but why is this a problem ? The idea would be to slowly phase out the legacy constructor (which is right now only used by the parser anyways, right ?) And I don't see a problem having two separate constructors, one operating on Ptree pointers, the other on the typed counterpart. > (2) PtreeIfStatement's dtor would need to know somehow > which ctor was used; if the legacy one, then it should > just destroy *this; if the new one, then it should > also destroy auxiliary NonLeaf nodes. I don't want > to go there. yeah, sounds messy. But why should the IfStatement care how it was constructed ? Doesn't this reveal a more fundamental problem ? You seem to assume that the legacy constructor constructs an IfStatement that doesn't own its children, while the other does. That sounds indeed like a bad idea. Either Ptrees own their children, or they don't. (Well, a third possibility would be to use ref counting, in which case nobody would own, but instead allocate a reference). >> Is there some other instance >> beside the parser that will create ptree nodes ? > > > My primary motivation for developing OpenC++ are applications to > refactorng, so the answer is YES, I would like to be able to manipulate > AST and create new nodes. Sure. So everybody would use the new constructor, and as soon as the existing code moves to that, we could remove the legacy constructor. > > Will a Node<> ever > >> own the ptree it wraps ? > > > Good question. No. At least I did not have it in mind. There are a couple of questions hiding here: Right now a Leaf node points to some memory associated with the input stream. That makes it easy to re-serialize the whole file just from the ptree. But what happens if you insert a new leaf ptree ? What memory would it refer to ? Who would destroy it ? How's this solved right now ? >> Now if I get synopsis to process the boost.org code on all platforms, >> I'm ready >> to ship :-) > > > This is impressive. Boost is a heavy workout. Congratulations!!! I would > like to put it up as a news on the website; will something like this do?: > > Synopsis 0.7, source documentation tool based on > OpenC++ library, now parses all Boost code! Sure, as soon as I'm done with the release :-) > I branched rel_2_8, I will see how RH works, possibly apply patches in > this branch and release, so HEAD is open for commits if you would like > to commit your recent changes. Also if you want you can get yourself a > sandbox branch. synopsis is my sandbox :-) --- Attached is a little patch that I just applied to opencxx trunk. With that opencxx passes all tests on fedora core 2 with gc disabled. I'd suggest to apply it to the 2.8 branch, too. There are some other improvements I may backport from synopsis, but these are merely features, not bug fixes, so they are less critical. For example today I fixed synopsis to accept the ternary condition operator in a template parameter expression such as foobar<is_const ? 1 : 2>(); More on that stuff later... Regards, Stefan |
From: Grzegorz J. <ja...@ac...> - 2004-07-08 11:53:42
|
Hi, Stefan Seefeld wrote: > Grzegorz Jakacki wrote: > >> Currently each C++ constructs is represented by a bunch of Ptree >> nodes, e.g.: >> >> if (C) T else E >> >> is something like >> >> PtreeIfStatement >> / \ >> C NonLeaf >> / \ >> T NonLeaf >> / \ >> E NULL >> >> [in fact it may be even more complicated] >> >> PtreeIfStatement is derived from NonLeaf; NonLeaf has obvious binary >> ctor. All derived classes generally also have binary ctors which just >> forward to NonLeaf's ctor. >> >> In particular currently there is no function which would build the >> above topology given C, T and E. Similarily, there is no function that >> would destroy it given a pointer to PtreeIfStatement. >> >> I was looking for a place to put this functionality in Node<>-less >> scenario. What I was trying to convey was that this function does not >> belong as a ctor in PtreeIfStatement, because: >> >> (1) PtreeIfStatement would need to have two ctors: >> * legacy ctor forwarding to NonLeaf's ctor >> * new ctor building auxiliary NonLeaf nodes > > > indeed, but why is this a problem ? The idea would be to slowly phase out > the legacy constructor (which is right now only used by the parser anyways, > right ?) > And I don't see a problem having two separate constructors, one operating > on Ptree pointers, the other on the typed counterpart. I am not sure if you get the exact picture. The "untyped" constructor does not construct the NonLeaf auxiliary nodes. It is just: PtreeIfStatement(Ptree* p, Ptree* q) : NonLeaf(p, q) {} If we have another, "typed" constructor like this one: PtreeIfStatement( PtreeExprStatement* cond , PtreeExprStatement *th , PtreeExprStatement* el ) : NonLeaf(cond, new NonLeaf(th, new NonLeaf(el, NULL))) {} Then, as you noticed, there is a question of who owns auxiliary NonLeaf nodes (and, consequently, if ~PtreeIfStatement should delete them). > >> (2) PtreeIfStatement's dtor would need to know somehow >> which ctor was used; if the legacy one, then it should >> just destroy *this; if the new one, then it should >> also destroy auxiliary NonLeaf nodes. I don't want >> to go there. > > > yeah, sounds messy. But why should the IfStatement care how > it was constructed ? Because "untyped" ctor does not create auxiliary NonLeaf nodes, and "typed" does. > Doesn't this reveal a more fundamental > problem ? You seem to assume that the legacy constructor > constructs an IfStatement that doesn't own its children, while > the other does. The point is that legacy IfStatement constructor does not create the representation of if-statement. In particular its arguments are not "condition", "then-part" and "else-part". Representation of if-statement consists of three nodes (PtreeIfStatement node and two "glue" NonLeaf nodes) and legacy PTreeIfStatement ctor constructs only one of them. > That sounds indeed like a bad idea. Either Ptrees > own their children, or they don't. I agree. However I am not sure what you mean by "child". What is a "child" of PtreeIfStatement node? Do you consider "else-part" to be a "child" of PtreeIfStatement node? It is also not clear to me if you would like to keep the existing representation of if-statement as three physical nodes or if you eventually would like to modify PtreeIfStatement so that it becomes struct PtreeIfStatement { ... PtreeExprStatement *cond_, *then_, *else_; }; ? I proposed NodePtr<> iface to mask the fact that one logical AST node (e.g. if-statement) is in fact represented by several physical objects. The client of NodePtr<> would not be aware of this fact. NodePtr<> provides uniform interface for creating, destroying and manipulating such "logical objects". > (Well, a third possibility > would be to use ref counting, in which case nobody would own, > but instead allocate a reference). > >>> Is there some other instance >>> beside the parser that will create ptree nodes ? >> >> >> >> My primary motivation for developing OpenC++ are applications to >> refactorng, so the answer is YES, I would like to be able to >> manipulate AST and create new nodes. > > > Sure. So everybody would use the new constructor, and as soon as the > existing code moves to that, we could remove the legacy constructor. Removing legacy ctors will be difficult. You once asked why and I was not clear about it, now I think I am. See Parser::rIfStatement(). It builds representation of if-statement and it takes 12 LOC and 8 function calls just to build the topology, excluding logic necessary to call parser for subexpressions and figure out if ELSE is present. Removing legacy ctor means replacing this 12 LOC and 8 calls with one call to a new ctor. Similar transformation would need to be done in roughly 100 functions in parser. This seems like a lot of work and a lot of new bugs in code that is reasonably stable and works. Why do it? > >> > Will a Node<> ever >> >>> own the ptree it wraps ? >> >> >> >> Good question. No. At least I did not have it in mind. > > > There are a couple of questions hiding here: Right now a Leaf node > points to some memory associated with the input stream. That makes > it easy to re-serialize the whole file just from the ptree. But > what happens if you insert a new leaf ptree ? What memory would > it refer to ? > Who would destroy it ? Now: it does not matter, it can refer to any c_string that is left allocated, because everything is GC-ed. Future (pick one): * have special kinds of Leafs that store std::string internally and use this string for deserialization; * have "string manager" that owns inserted pieces * other?... > How's this solved right now ? GC. Best regards Grzegorz PS: I am applying your patch to rel_2_8. Thanks! >>> Now if I get synopsis to process the boost.org code on all platforms, >>> I'm ready >>> to ship :-) >> >> This is impressive. Boost is a heavy workout. Congratulations!!! I >> would like to put it up as a news on the website; will something like >> this do?: >> >> Synopsis 0.7, source documentation tool based on >> OpenC++ library, now parses all Boost code! > > > Sure, as soon as I'm done with the release :-) > >> I branched rel_2_8, I will see how RH works, possibly apply patches in >> this branch and release, so HEAD is open for commits if you would like >> to commit your recent changes. Also if you want you can get yourself a >> sandbox branch. > > > synopsis is my sandbox :-) > > --- > > Attached is a little patch that I just applied to opencxx trunk. With > that opencxx passes all tests on fedora core 2 with gc disabled. I'd > suggest to apply it to the 2.8 branch, too. > > There are some other improvements I may backport from synopsis, but > these are merely features, not bug fixes, so they are less critical. > For example today I fixed synopsis to accept the ternary condition > operator in a template parameter expression such as > > foobar<is_const ? 1 : 2>(); > > More on that stuff later... > > Regards, > Stefan > > > ------------------------------------------------------------------------ > > Index: opencxx/parser/Lex.cc > =================================================================== > RCS file: /cvsroot/opencxx/opencxx/opencxx/parser/Lex.cc,v > retrieving revision 1.1.2.1 > diff -u -r1.1.2.1 Lex.cc > --- opencxx/parser/Lex.cc 27 May 2004 03:26:07 -0000 1.1.2.1 > +++ opencxx/parser/Lex.cc 8 Jul 2004 02:16:53 -0000 > @@ -807,6 +807,7 @@ > { "__attribute__", token(ATTRIBUTE) }, > { "__const", token(CONST) }, > { "__extension__", token(EXTENSION) }, > + { "__inline", token(INLINE) }, > { "__inline__", token(INLINE) }, > { "__noreturn__", token(Ignore) }, > { "__restrict", token(Ignore) }, |
From: Stefan S. <se...@sy...> - 2004-07-08 22:46:59
|
Grzegorz Jakacki wrote: >> And I don't see a problem having two separate constructors, one operating >> on Ptree pointers, the other on the typed counterpart. > > > I am not sure if you get the exact picture. The "untyped" constructor > does not construct the NonLeaf auxiliary nodes. It is just: > > > PtreeIfStatement(Ptree* p, Ptree* q) : NonLeaf(p, q) {} > > If we have another, "typed" constructor like this one: > > PtreeIfStatement( > PtreeExprStatement* cond > , PtreeExprStatement *th > , PtreeExprStatement* el > ) > : NonLeaf(cond, new NonLeaf(th, new NonLeaf(el, NULL))) > {} > > Then, as you noticed, there is a question of who owns auxiliary > NonLeaf nodes (and, consequently, if ~PtreeIfStatement should delete them). right, but I didn't mean the 'typed constructor' to construct any nodes himself, but rather to take 'PtreeExprStatement' arguments instead of 'Ptree'. The ownership semantics remains the same. >> That sounds indeed like a bad idea. Either Ptrees >> own their children, or they don't. > > > I agree. However I am not sure what you mean by "child". What is a > "child" of PtreeIfStatement node? Do you consider "else-part" to be a > "child" of PtreeIfStatement node? yes. > It is also not clear to me if you would like to keep the existing > representation of if-statement as three physical nodes or if you > eventually would like to modify PtreeIfStatement so that it becomes > > struct PtreeIfStatement > { > ... > PtreeExprStatement *cond_, *then_, *else_; > }; the latter. Regards, Stefan |
From: Grzegorz J. <ja...@ac...> - 2004-07-09 12:14:01
|
Hi Stefan, Stefan Seefeld wrote: > Grzegorz Jakacki wrote: > >>> And I don't see a problem having two separate constructors, one >>> operating >>> on Ptree pointers, the other on the typed counterpart. >> >> >> >> I am not sure if you get the exact picture. The "untyped" constructor >> does not construct the NonLeaf auxiliary nodes. It is just: >> >> >> PtreeIfStatement(Ptree* p, Ptree* q) : NonLeaf(p, q) {} >> >> If we have another, "typed" constructor like this one: >> >> PtreeIfStatement( >> PtreeExprStatement* cond >> , PtreeExprStatement *th >> , PtreeExprStatement* el >> ) >> : NonLeaf(cond, new NonLeaf(th, new NonLeaf(el, NULL))) >> {} >> >> Then, as you noticed, there is a question of who owns auxiliary >> NonLeaf nodes (and, consequently, if ~PtreeIfStatement should delete >> them). > > > right, but I didn't mean the 'typed constructor' to construct any nodes > himself, but rather to take 'PtreeExprStatement' arguments instead of > 'Ptree'. > The ownership semantics remains the same. Stefan, look again at the picture: immediate "physical" children of PtreeIfStatement are not PtreeExprStatement nodes. They do not have any reasonable type in the sense of C++ grammar. They are only glue nodes. Having said that, I don't understand how can PtreeIfStatement take PtreeExprStatement nodes and at the same time not construct any nodes itself. > >>> That sounds indeed like a bad idea. Either Ptrees >>> own their children, or they don't. >> >> >> >> I agree. However I am not sure what you mean by "child". What is a >> "child" of PtreeIfStatement node? Do you consider "else-part" to be a >> "child" of PtreeIfStatement node? > > > yes. > >> It is also not clear to me if you would like to keep the existing >> representation of if-statement as three physical nodes or if you >> eventually would like to modify PtreeIfStatement so that it becomes >> >> struct PtreeIfStatement >> { >> ... >> PtreeExprStatement *cond_, *then_, *else_; >> }; > > > the latter. Then, as I described in my last e-mail, there is huge amount of work involved in transforming all the legacy code that deals with "glue" nodes. Every single piece of code that accesses, say, "condition" of if-statement, will need to be changed from 'p->Cdr()->Car()' to 'p->GetCond()'. Your solution is very clean, but it requires a huge investment. My suggestion is to leave the legacy code as it is and provide NodePtr<> atop of it. NodePtr<old_PtreeIfStatement> would very closely mimic behaviour of new_PtreeIfStatement*, so where is the gain? Moreover, if you ever change parser or elaborator, you are free to mix the code that uses NodePtr<> with the legacy code. Even more --- if one day you discover, that you eradicated all occurences of PtreeXXX, Leaf and NonLeaf from parser, you can implement your clean classes a la the PtreeIfStatement above and just textually replace every NodePtr<Foo> with new_Foo* and you are done. My solution arrives exactly at the same point that you want to reach, but it seems to me that the landing is much softer. Best regards Grzegorz -------------------------- WRAP-UP ======= Instead of introducing class new_PtreeIfStatement { ... PtreeExprStatement* GetCond() { return cond_; } ... PtreeExprStatement* cond_; }; and fixing all the places in code that will get broken by replacing PtreeIfStatement by new_PtreeIfStatement, let's introduce template <> class NodePtr<PtreeIfStatement> { ... NodePtr<PtreeExprStatement> GetCond() const { return NodePtr<PtreeExprStatement>( (PtreeExprStatement*)(p_->Cdr()->Car)); } ... private: PtreeIfStatement* p_; }; and leave the legacy code as it is. NodePtr<PtreeIfStatement> works essentially as new_PtreeIfStatement*, it can be further enhanced to mimic newPtreeIfStatement* very closesly, including pointer syntax. All uses of Ptree* derivatives can be gradually replaced with NodePtr<>. Once all of them are eradicated, every NodePtr<Foo> can be replaced with new_Foo* (and that will not require any further code tweaking), which effectively provides smooth, evolutionary way of arriving at the first solution. ----- END OF WRAP-UP ----- > > Regards, > Stefan > > > ------------------------------------------------------- > This SF.Net email sponsored by Black Hat Briefings & Training. > Attend Black Hat Briefings & Training, Las Vegas July 24-29 - digital > self defense, top technical experts, no vendor pitches, unmatched > networking opportunities. Visit www.blackhat.com > _______________________________________________ > Opencxx-users mailing list > Ope...@li... > https://lists.sourceforge.net/lists/listinfo/opencxx-users |
From: Stefan S. <se...@sy...> - 2004-06-08 13:30:29
|
Grzegorz Jakacki wrote: > On Mon, 7 Jun 2004, Stefan Seefeld wrote: >>All I'm pondering about now is whether optionally removing the C++ keywords >>would get us closer to a C parser, and if so, what else needs to be done to >>complete the step so opencxx could be used for both languages. > > > This is an interesting question, i.e. can the "common factor" of C and C++ > parser be factored out and how. > > Switching between C/C++ keywords should be easy with existing code. > Moreover, lexer is encapsulated, so this is not an issue too. The fun > begins in parser. indeed. For the lexer we might get away with a dynamically assigned token table (one for C and one for C++), but for the parser we may have a parser base that deals with the common statements that are valid in C as well as C++ while a derived C parser recognizes valid C that is not valid C++ (such as void foo() int x, int y { } and a derived C++ parser that deals with the C++ specific stuff. > I think it depends. There are many possible AST object models. In > particular, existing Ptree hierarchy gives raise to one of them. Having > read-only type-safe API along this model is quite easy, it is just a > matter of determining the Car/Cdr paths. If this API is useful and > convenient is another question. right. regards, Stefan |
From: Stefan S. <se...@sy...> - 2004-06-08 13:02:29
|
Christophe Avoinne wrote: > class NodeVisitor > { > ... > template< typename Class > > void operator( const Node< Class > &node ) > { ... // doing your stuff according with Class } > ... > }; > > As you can see, no derivation at all. but what problem does the above solve ? The idea was to use the visitor pattern, which means to call some kind of polymorphic 'accept' method on the 'Node' object, and its implementation will then call back with the exact type: struct FooNode : NodeBase { virtual void accept(Visitor *v) { v->visit_Foo(this);} }; It's the polymorphic 'accept' that requires 'FooNode' to be part of a class hierarchy of Nodes, not the 'visit' (or operator (), or however you spell it). Regards, Stefan |