[Opencxx-users] future directions

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

hi there,

while Grzegorz is still struggling with the merge of some
refactoring we have been working on, I'd like to discuss
some ideas about opencxx's evolution.

I'd very much appreciate comments from people using opencxx
right now or being tempted to use it so we can understand
better what opencxx is already good at, and what it would
be useful to work on.

I'v started to use opencxx myself as a C++ parser backend
for my synopsis framework, where I initially simply collected
all declarations from source code together with comments
directly preceding them, to generate documentation.

Synopsis already had its own AST-like class hierarchy, so
the task was 'simply' to traverse the opencxx ptree and
map that to a synopsis AST.

Later we went some steps further to use the power of
opencxx to generate 'cross referenced source code', i.e.
html pages that display source files, but with variables
and types being linked to their respective declaration.

For quite some time I have been pondering to expose a 'real' AST
such as that from opencxx to python, so I could use my
processor framework to manipulate the source code directly
for code generation. However, I found the ptree stuff quite
obscure so this idea never really got off the ground.

I'v recently started to integrate a C parser (from the 'ctool' project)
into synopsis, and there the parse tree is much simpler to read,
simply because it is more typed. Instead of just having specific
ptree topologies for 'statements', 'declarations', etc., I have
real classes 'Statement', 'Declaration', etc.
That's much more pleasing to look at ! :-)

On the other hand, the ctool doesn't preserve the tokens in their
original form in the same way opencxx does, and doesn't tokenize
the comments (something we have been working hard to add to synopsis'
opencxx port).

This leads me to a couple of items on my wishlist, which I'd like
to discuss / propose here:

* I suggest the ptree hierarchy to be refactored into a more typed
   form. That could simply mean that a big number of new 'Statement',
   'Expression', and other classes should be derived from 'Ptree', or
   it could be done in a different way, I don't know yet.
   However, this would mean that it would be much more straight forward
   to inspect an AST, as these types would be more or less self-explanatory
   (ever wondered what 'node->Cdr()->Cdr()->Car()' represents ??)

* I suggest to open up opencxx in a way that exposes the basic API
   (parser / ptree generation, walkers / ptree transformation, metaclass and
   the other introspection stuff) as a C++ library as well as a python module.
   This means that the occ executable will be very much obsolete, or at least
   it would only be a convenience for the most popular features, but more fine-grained
   control would be accessible through the APIs, through which users can customize
   opencxx to their needs. It also means that all the platform-specific code
   to run subprocesses such as the preprocessor as well as load metaclass plugins
   could be isolated such that the backend library would be more platform neutral and robust.

As I'm already maintaining an opencxx 'branch' as part of the synopsis project,
I'm experimenting with things there. Synopsis uses subversion, so directory-layout
related refactoring is much simpler than with cvs. I'v also a number of advanced
features that I don't want to loose, such as preprocessor data integrated with
the ast (synopsis records macro definitions and calls, file inclusion information, etc.).

I'm thus tempted to work off of my own opencxx branch, though I'm happily sharing
my changes with opencxx. In particular, I'm thinking of a simple bootstrapping
process, whithin which I would rework the ptree hierarchy, and then use opencxx
itself to *generate* the C++-to-python binding to expose this class hierarchy
to python. Once I have that, people can introspect and manipulate the source
code from within python, with a direct C++ API as a fallback, in case they find
the python-API inacceptable for various reasons (which I can't really imagine :-)

Finally, I'm wondering whether it wouldn't be simpler for me to modify the
opencxx lexer and parser to be able to parse C code (all the various flavours
that still exist, such as K&R), so I can drop the ctool backend.
A C parser / processor with the features of opencxx would in particular be
useful to all those GNOME / Mono developers, with language binding generation
being just one example usage.

Now, please tell me what you think about these ideas, whether they make sense
to you at all, whether you find them useful, or would even like to help.

Best regards,
		Stefan