From: Stefan S. <se...@sy...> - 2004-06-01 04:14:06
|
hi there, while Grzegorz is still struggling with the merge of some refactoring we have been working on, I'd like to discuss some ideas about opencxx's evolution. I'd very much appreciate comments from people using opencxx right now or being tempted to use it so we can understand better what opencxx is already good at, and what it would be useful to work on. I'v started to use opencxx myself as a C++ parser backend for my synopsis framework, where I initially simply collected all declarations from source code together with comments directly preceding them, to generate documentation. Synopsis already had its own AST-like class hierarchy, so the task was 'simply' to traverse the opencxx ptree and map that to a synopsis AST. Later we went some steps further to use the power of opencxx to generate 'cross referenced source code', i.e. html pages that display source files, but with variables and types being linked to their respective declaration. For quite some time I have been pondering to expose a 'real' AST such as that from opencxx to python, so I could use my processor framework to manipulate the source code directly for code generation. However, I found the ptree stuff quite obscure so this idea never really got off the ground. I'v recently started to integrate a C parser (from the 'ctool' project) into synopsis, and there the parse tree is much simpler to read, simply because it is more typed. Instead of just having specific ptree topologies for 'statements', 'declarations', etc., I have real classes 'Statement', 'Declaration', etc. That's much more pleasing to look at ! :-) On the other hand, the ctool doesn't preserve the tokens in their original form in the same way opencxx does, and doesn't tokenize the comments (something we have been working hard to add to synopsis' opencxx port). This leads me to a couple of items on my wishlist, which I'd like to discuss / propose here: * I suggest the ptree hierarchy to be refactored into a more typed form. That could simply mean that a big number of new 'Statement', 'Expression', and other classes should be derived from 'Ptree', or it could be done in a different way, I don't know yet. However, this would mean that it would be much more straight forward to inspect an AST, as these types would be more or less self-explanatory (ever wondered what 'node->Cdr()->Cdr()->Car()' represents ??) * I suggest to open up opencxx in a way that exposes the basic API (parser / ptree generation, walkers / ptree transformation, metaclass and the other introspection stuff) as a C++ library as well as a python module. This means that the occ executable will be very much obsolete, or at least it would only be a convenience for the most popular features, but more fine-grained control would be accessible through the APIs, through which users can customize opencxx to their needs. It also means that all the platform-specific code to run subprocesses such as the preprocessor as well as load metaclass plugins could be isolated such that the backend library would be more platform neutral and robust. As I'm already maintaining an opencxx 'branch' as part of the synopsis project, I'm experimenting with things there. Synopsis uses subversion, so directory-layout related refactoring is much simpler than with cvs. I'v also a number of advanced features that I don't want to loose, such as preprocessor data integrated with the ast (synopsis records macro definitions and calls, file inclusion information, etc.). I'm thus tempted to work off of my own opencxx branch, though I'm happily sharing my changes with opencxx. In particular, I'm thinking of a simple bootstrapping process, whithin which I would rework the ptree hierarchy, and then use opencxx itself to *generate* the C++-to-python binding to expose this class hierarchy to python. Once I have that, people can introspect and manipulate the source code from within python, with a direct C++ API as a fallback, in case they find the python-API inacceptable for various reasons (which I can't really imagine :-) Finally, I'm wondering whether it wouldn't be simpler for me to modify the opencxx lexer and parser to be able to parse C code (all the various flavours that still exist, such as K&R), so I can drop the ctool backend. A C parser / processor with the features of opencxx would in particular be useful to all those GNOME / Mono developers, with language binding generation being just one example usage. Now, please tell me what you think about these ideas, whether they make sense to you at all, whether you find them useful, or would even like to help. Best regards, Stefan |