Re: [Opencxx-users] future directions

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Hi Stefan and All,

On Tue, 1 Jun 2004, Stefan Seefeld wrote:

[snip]
 > * I suggest the ptree hierarchy to be refactored into a more typed
 >    form. That could simply mean that a big number of new 'Statement',
 >    'Expression', and other classes should be derived from 'Ptree', or
 >    it could be done in a different way, I don't know yet.
 >    However, this would mean that it would be much more straight forward
 >    to inspect an AST, as these types would be more or less self-explanatory
 >    (ever wondered what 'node->Cdr()->Cdr()->Car()' represents ??)

I agree that 'node->Cdr()->Cdr()->Car()' is unsafe and difficult to use. The
same goes for creating new Ptree nodes. Moreover, the mapping of C++ syntax
into combinations of Ptree nodes is not documented, which makes this area
even more unclear.

Nevertheless, I think that *replacing* Ptree hierarchy with more typed
form will be extremely difficult, because:

 (1) Parser, type elaborator and translator all use Cdr/Car.  Even if we
     ignore translator for the moment, parser and elaborator themselves are
     20KLOC of highly nontrivial code. Reworking this code is a huge job,
     especially if you do this part-time, and will be a wonderfull source of
     bugs (and we don't have a decent regression testsuite). Moreover, I
     think that directly replacing the AST datastructure is difficult,
     because it has to be done practically in one big step, mostly because
     grammar is not a hierarchical system (there is a lot of recursion in
     grammar, which means that you cannot start replacing things piece by
     piece going bottom-up, because dependency graph between different AST
     classes is not acyclic).

 (2) Contrary to popular belief, creating an object model for AST is a lot of
     conceptual work. Better yet, for a language like C++ there is no unique
     and optimal AST object model. Example: how should "a+b+c" be
     represented? Some clients would like to see two binary "+" nodes, others
     would rather take advantage of "+" being associative and view it as one
     ternary node. Some clients are interested in nodes representing
     parentheses, while others would rather treat them as textual
     decorations not belonging in AST, etc.

Here are my suggestion on how to improve usability of AST gradually:

(1) Find out and write down mappings from the "less typed" AST
    to "more typed" AST, e.g.:

         IfStatement:

             Cond   ->Cdr()->Car();
             Then   ->Cdr()->Cdr()->Cdr()->Car();
             Else   ->Cdr()->Cdr()->Cdr()->Cdr()->Cdr()->Car();

(2) Use it to write or generate a set of wrappers, that would encapsulate
    AST in a typesafe interface:

         template <>
         class Node<IfStatement>
         {
         public:
             Node(Node<Expr> c, Node<Statement> t, Node<Statement> e);
             Node<Expr>      Cond() { return p_->Cdr()->Car(); }
             Node<Statement> Then() { ... }
             Node<Statement> Else() { ... }
         private:
             Node(Ptree* p) : p_(p) {}
             friend class AstFactory;
             Ptree* p_;
         };

         class AstFactory
         {
         public:
             template <class T>
             static Node<T> Create<T>( /* ... */ );
         };

    This can be done non-intrusively, without touching the existing codebase
    (= without introducing bugs).

(3) Write parser wrapper, that would wrap Ptree*'s returned from
    parser in Node<>'s.

(4) Write abstract walker for Node<>'s and make Node<>'s visitable.

(5) At this point we will have usable, type-safe interface.
    Moreover, clients not satisfied with this interface
    (e.g. those who prefer multiary "+") will have a chance
    to reuse it nonintrusively or just write another
    interface to Ptree structure from scratch.

(6) Having a wrapper interface atop Ptree structure allows
    for changing Ptree structure without affecting clients
    of wrapper interface. (Changes in Ptree, e.g. addition
    of new nodes, can be compensated in wrapper layer.)

(7) Typesafe wrapper interface would enable automatic generation
    of parser regression tests (e.g. from gcc testsuite), that
    should be used as a safety net if we ever decide to refactor
    parser so that it uses the type-safe interface.

(8) At this point we can think about moving typesafe interface into parser,
    elaborator and translator, and later about totally removing Ptree
    classes and actually replacing Ptree's with a canonical implementation
    of Composite pattern. However, I don't think this is the way to go,
    mainly because of (5). I believe it is better to have low-level
    implementation plus high-level wrapper(s).

(Side note: In fact implementation of AST in OpenC++ is more tricky
than just Leaf/NonLeaf, see e.g. PtreeIfStatement etc. Nevertheless
this implementation still forces Cdr/Car on clients, and AFAIU this
is something we want to escape from.)

> * I suggest to open up opencxx in a way that exposes the basic API (parser
>    / ptree generation, walkers / ptree transformation, metaclass and the
>    other introspection stuff) as a C++ library as well as a python module.
>    This means that the occ executable will be very much obsolete, or at
>    least it would only be a convenience for the most popular features, but
>    more fine-grained control would be accessible through the APIs, through
>    which users can customize opencxx to their needs. It also means that
>    all the platform-specific code to run subprocesses such as the
>    preprocessor as well as load metaclass plugins could be isolated such
>    that the backend library would be more platform neutral and robust.

I second that. However, I think the interfaces should not be published as
they are now, because IMO they are not encapsulating enough. I think parser
iterface is OK, but others interfaces should be seriously reviewed before we
commit to them. For example program object model is very much coupled with
translator. I think we should untangle them first and transform translator
framework into exemplary, nonprivledged client of frontend
library (libraries).

> As I'm already maintaining an opencxx 'branch' as part of the synopsis
> project, I'm experimenting with things there. Synopsis uses subversion, so
> directory-layout related refactoring is much simpler than with cvs.

As for Subversion in OpenC++: I have had heard very positive opinions about
Subversion, however I don't see an easy way to move OpenC++ development to
Subversion now. Currently we rely on SF.net, which provides CVS,
CompileFarm, mailing lists, shell+cron accounts and web hosting (and other
features which we don't use at the moment). I don't see any other
organization that would provide this level of service and comittment to
OpenC++ project.

AFAIU the biggest issue is moving files/dirs in a repo. I am using the
standard CVS way (delete here, add there) and ideed it looses history and
makes merges more difficult, but so far it was not very painful. The lost
of history can be mitigated by mentioning the old location of a file in
the initial comment of moved (=added) file.

> I'v also a number of advanced features that I don't want to loose, such as
> preprocessor data integrated with the ast (synopsis records macro
> definitions and calls, file inclusion information, etc.).

The ability of OpenC++ to understand preprocessor, so that code can be
transformed without expanding preprocessor macros, would be very desirable.
I support any effort in this direction. This is in general difficult in C++,
but it is doable. (See CRefactory project.) Together with Python scripting
it would create very powerful refactoring tool.

> I'm thus tempted to work off of my own opencxx branch, though I'm happily
> sharing my changes with opencxx. In particular, I'm thinking of a simple
> bootstrapping process, whithin which I would rework the ptree hierarchy,
> and then use opencxx itself to *generate* the C++-to-python binding to
> expose this class hierarchy to python.

This sounds exciting, but would that have any advantages over using SWIG? (I
don't know, I have not used SWIG myself.)

> Once I have that, people can introspect and manipulate the source code
> from within python,

That would be really great.

> with a direct C++ API as a fallback, in case they find
> the python-API inacceptable for various reasons (which I can't really
> imagine :-)

Some people may be concerned about performance (but still, I would love
Python API).

> Finally, I'm wondering whether it wouldn't be simpler for me to modify the
> opencxx lexer and parser to be able to parse C code (all the various
> flavours that still exist, such as K&R), so I can drop the ctool backend.
> A C parser / processor with the features of opencxx would in particular be
> useful to all those GNOME / Mono developers, with language binding
> generation being just one example usage.

I very much support extending the lexer/parser to support full C syntax,
however I don't know how much work it takes to get there. What issues do you
see?

Best regards
Grzegorz

##################################################################
# Grzegorz Jakacki                       Huada Electronic Design #
# Senior Engineer, CAD Dept.              1 Gaojiayuan, Chaoyang #
# tel. +86-10-64365577 x2074               Beijing 100015, China #
# Copyright (C) 2004 Grzegorz Jakacki, HED. All Rights Reserved. #
##################################################################