From: Grzegorz J. <ja...@he...> - 2004-06-04 01:58:23
|
Hi Stefan and All, On Tue, 1 Jun 2004, Stefan Seefeld wrote: [snip] > * I suggest the ptree hierarchy to be refactored into a more typed > form. That could simply mean that a big number of new 'Statement', > 'Expression', and other classes should be derived from 'Ptree', or > it could be done in a different way, I don't know yet. > However, this would mean that it would be much more straight forward > to inspect an AST, as these types would be more or less self-explanatory > (ever wondered what 'node->Cdr()->Cdr()->Car()' represents ??) I agree that 'node->Cdr()->Cdr()->Car()' is unsafe and difficult to use. The same goes for creating new Ptree nodes. Moreover, the mapping of C++ syntax into combinations of Ptree nodes is not documented, which makes this area even more unclear. Nevertheless, I think that *replacing* Ptree hierarchy with more typed form will be extremely difficult, because: (1) Parser, type elaborator and translator all use Cdr/Car. Even if we ignore translator for the moment, parser and elaborator themselves are 20KLOC of highly nontrivial code. Reworking this code is a huge job, especially if you do this part-time, and will be a wonderfull source of bugs (and we don't have a decent regression testsuite). Moreover, I think that directly replacing the AST datastructure is difficult, because it has to be done practically in one big step, mostly because grammar is not a hierarchical system (there is a lot of recursion in grammar, which means that you cannot start replacing things piece by piece going bottom-up, because dependency graph between different AST classes is not acyclic). (2) Contrary to popular belief, creating an object model for AST is a lot of conceptual work. Better yet, for a language like C++ there is no unique and optimal AST object model. Example: how should "a+b+c" be represented? Some clients would like to see two binary "+" nodes, others would rather take advantage of "+" being associative and view it as one ternary node. Some clients are interested in nodes representing parentheses, while others would rather treat them as textual decorations not belonging in AST, etc. Here are my suggestion on how to improve usability of AST gradually: (1) Find out and write down mappings from the "less typed" AST to "more typed" AST, e.g.: IfStatement: Cond ->Cdr()->Car(); Then ->Cdr()->Cdr()->Cdr()->Car(); Else ->Cdr()->Cdr()->Cdr()->Cdr()->Cdr()->Car(); (2) Use it to write or generate a set of wrappers, that would encapsulate AST in a typesafe interface: template <> class Node<IfStatement> { public: Node(Node<Expr> c, Node<Statement> t, Node<Statement> e); Node<Expr> Cond() { return p_->Cdr()->Car(); } Node<Statement> Then() { ... } Node<Statement> Else() { ... } private: Node(Ptree* p) : p_(p) {} friend class AstFactory; Ptree* p_; }; class AstFactory { public: template <class T> static Node<T> Create<T>( /* ... */ ); }; This can be done non-intrusively, without touching the existing codebase (= without introducing bugs). (3) Write parser wrapper, that would wrap Ptree*'s returned from parser in Node<>'s. (4) Write abstract walker for Node<>'s and make Node<>'s visitable. (5) At this point we will have usable, type-safe interface. Moreover, clients not satisfied with this interface (e.g. those who prefer multiary "+") will have a chance to reuse it nonintrusively or just write another interface to Ptree structure from scratch. (6) Having a wrapper interface atop Ptree structure allows for changing Ptree structure without affecting clients of wrapper interface. (Changes in Ptree, e.g. addition of new nodes, can be compensated in wrapper layer.) (7) Typesafe wrapper interface would enable automatic generation of parser regression tests (e.g. from gcc testsuite), that should be used as a safety net if we ever decide to refactor parser so that it uses the type-safe interface. (8) At this point we can think about moving typesafe interface into parser, elaborator and translator, and later about totally removing Ptree classes and actually replacing Ptree's with a canonical implementation of Composite pattern. However, I don't think this is the way to go, mainly because of (5). I believe it is better to have low-level implementation plus high-level wrapper(s). (Side note: In fact implementation of AST in OpenC++ is more tricky than just Leaf/NonLeaf, see e.g. PtreeIfStatement etc. Nevertheless this implementation still forces Cdr/Car on clients, and AFAIU this is something we want to escape from.) > * I suggest to open up opencxx in a way that exposes the basic API (parser > / ptree generation, walkers / ptree transformation, metaclass and the > other introspection stuff) as a C++ library as well as a python module. > This means that the occ executable will be very much obsolete, or at > least it would only be a convenience for the most popular features, but > more fine-grained control would be accessible through the APIs, through > which users can customize opencxx to their needs. It also means that > all the platform-specific code to run subprocesses such as the > preprocessor as well as load metaclass plugins could be isolated such > that the backend library would be more platform neutral and robust. I second that. However, I think the interfaces should not be published as they are now, because IMO they are not encapsulating enough. I think parser iterface is OK, but others interfaces should be seriously reviewed before we commit to them. For example program object model is very much coupled with translator. I think we should untangle them first and transform translator framework into exemplary, nonprivledged client of frontend library (libraries). > As I'm already maintaining an opencxx 'branch' as part of the synopsis > project, I'm experimenting with things there. Synopsis uses subversion, so > directory-layout related refactoring is much simpler than with cvs. As for Subversion in OpenC++: I have had heard very positive opinions about Subversion, however I don't see an easy way to move OpenC++ development to Subversion now. Currently we rely on SF.net, which provides CVS, CompileFarm, mailing lists, shell+cron accounts and web hosting (and other features which we don't use at the moment). I don't see any other organization that would provide this level of service and comittment to OpenC++ project. AFAIU the biggest issue is moving files/dirs in a repo. I am using the standard CVS way (delete here, add there) and ideed it looses history and makes merges more difficult, but so far it was not very painful. The lost of history can be mitigated by mentioning the old location of a file in the initial comment of moved (=added) file. > I'v also a number of advanced features that I don't want to loose, such as > preprocessor data integrated with the ast (synopsis records macro > definitions and calls, file inclusion information, etc.). The ability of OpenC++ to understand preprocessor, so that code can be transformed without expanding preprocessor macros, would be very desirable. I support any effort in this direction. This is in general difficult in C++, but it is doable. (See CRefactory project.) Together with Python scripting it would create very powerful refactoring tool. > I'm thus tempted to work off of my own opencxx branch, though I'm happily > sharing my changes with opencxx. In particular, I'm thinking of a simple > bootstrapping process, whithin which I would rework the ptree hierarchy, > and then use opencxx itself to *generate* the C++-to-python binding to > expose this class hierarchy to python. This sounds exciting, but would that have any advantages over using SWIG? (I don't know, I have not used SWIG myself.) > Once I have that, people can introspect and manipulate the source code > from within python, That would be really great. > with a direct C++ API as a fallback, in case they find > the python-API inacceptable for various reasons (which I can't really > imagine :-) Some people may be concerned about performance (but still, I would love Python API). > Finally, I'm wondering whether it wouldn't be simpler for me to modify the > opencxx lexer and parser to be able to parse C code (all the various > flavours that still exist, such as K&R), so I can drop the ctool backend. > A C parser / processor with the features of opencxx would in particular be > useful to all those GNOME / Mono developers, with language binding > generation being just one example usage. I very much support extending the lexer/parser to support full C syntax, however I don't know how much work it takes to get there. What issues do you see? Best regards Grzegorz ################################################################## # Grzegorz Jakacki Huada Electronic Design # # Senior Engineer, CAD Dept. 1 Gaojiayuan, Chaoyang # # tel. +86-10-64365577 x2074 Beijing 100015, China # # Copyright (C) 2004 Grzegorz Jakacki, HED. All Rights Reserved. # ################################################################## |