[Opencxx-users] Re: [opencxx] Let's break OpenC++ up

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

On Tue, 18 Mar 2003, Stefan Reuther wrote:

> Hello,
>
> On Mon, Mar 17, 2003 at 04:43:53PM +0800, Grzegorz Jakacki wrote:
> > On Fri, 14 Mar 2003, Stefan Reuther wrote:
> > > I think, there are two problems, at least with the parser. The
> > > interface consists of two parts, namely the Ptree descendants
> > > and their contents.
> > >
> > > 1) Every Ptree descendant has a method "Translate" which calls
> > >    a method of Walker. Adding a new Ptree class means adding a
> > >    method to Walker.
> >
> > This is the first sign of too little isolation between visitor (Walker)
> > and visited class hierarchy (Ptree and derived classes) --- you cannot
> > extend Ptree hierarchy without modifying Walker code.
>
> I don't think it can be done much different. Ptree and Walker
> have to "know each other".

I believe that it is possible to introduce a layer in between,
which would encapsualte the actual implementation of Ptree.

> What needs to be improved is that it
> can be detected at compile-time that a Walker descendant does no
> longer match our tree.

Yes, but "our tree" does not necessarily have to mean phisical
implementation.

> > >    User code must override that method, or
> > >    they'll (accidentially, maybe) run into the default behaviour
> > >    which just recurses into the subtrees.
> >
> > I am not sure if I understand you here. When the user adds the method of
> > Walke, he/she is free to define the default behavious as he/she wants,
> > right? Or maybe you mean, that *some* default behavious has to be
> > defined, since the member function cannot be just left abstract, because
> > otherwise user would make Walker class abstract?
>
> Right now, the default behaviour for every Walker::TranslateXYZ
> method is to recurse into the subtrees, and return the changed
> tree. I'm trying to say that this behaviour is not optimal,
> because it might break user code's expectations. Say, I write a
> Walker where each TranslateXYZ method returns a particular node,
> like this:
>    Ptree* MyWalker::TranslateIfStatement(Ptree* p) {
>        return new MyFunkyPtreeDescendant(p, 0);
>    }
> To be sure, I take walker.h and override all methods of Walker.
> So I can now rightfully assume that this
>   dynamic_cast<MyFunkyPtreeDescendant*>(walker.Translate(p)) != 0
> holds.
>
> If a new TranslateXYZ method is added to Walker, my code still
> compiles but does no longer fulfill the above assertion. The
> wired-in default behaviour doesn't match my use-case. I can't
> say what to do with nodes I don't know about.
>
> Solutions I propose:
>
> - add a AbstractWalker class, where all methods are pure
>   virtual. People concerned with the above problem (like me) can
>   use that as a base class. Code using that class breaks when we
>   add new productions, and forces people to update their code
>   (or stay with the old version). Walker would be derived from
>   that class.
> - add a DefaultWalker class, where all TranslateXYZ methods
>   call another function, like this:
>      Ptree* DefaultWalker::TranslateIfStatement(Ptree* p) {
>          return DefaultHandler(p);
>      }
>   This would let people choose their default behaviour.

Ok, I do not see any issues.

> Having a
>   Ptree::Clone would come in handy for this one.

This should be easy to add too.

>
> Another nice thing to have would be a Walker returning other
> things than Ptrees. I'm still unsure how that could work.

In fact Walker is implementation of pattern called Visitor.
I am not sure if I have already mentioner "Design Patterns"
by Gamma, Johnson, Ralph & Vlissides. If you have it around that may
be a good moment to have a look at chapters "Visitor" and "Composite"
(chapters are quite compact and not too long). Personally I found it very
enlightening.

Walker (or more generally visitor) can return arbitrary thing. I assume
that what you have in mind is that it returns nodes of another tree
(or rather handles/pointers to such nodes), thus implementing some
homomoprhic tree transformation. It that it?

> We
> have such a thing in the VFiasco project, but (Michael, please
> look away) it's not perfect.

Could I have a look?

> We also have that which I called
> "DefaultWalker" here.
>
> > >    This problem can be
> > >    solved with an abstract base class, so people who want to be
> > >    paranoid can derive from that.
> >
> > Right, but then they have to define traversals in all 'Translate()'
> > overloads by themselves.
>
> A good thing, in my opinion. When I want to generate code, I
> want to be sure not to miss any node.

But I am still thinking about ways to "derive" tree traversals from
simpler ones.

> For example, the recent
> addition of the "typeid" operator caused our code to silently
> translate
>    typeinfo ti = typeid(p);
> as
>    typeinfo ti = p;

Why was that (I could go figure out myself, but probably you still
remember)?

> > >    Their code will automatically
> > >    fail to compile when we add a new node type, which is a Good
> > >    Thing(tm), IMHO.
> >
> > It depends. When I am adding new node to store '__attribute__' GNU
> > extension, I do not want old code to fail. I would prefere a design,
> > in which I can do it so that new code can see '__attribute__' node,
> > while old code does not see it.
>
> That's what I meant with "node contents" below.
>
> > > 2) The bigger problem is with the content of the nodes. For
> > >    example, the new wide-char constants turn into Leafs. Old
> > >    programs might assume that all Leafs are either numeric, char
> > >    or string literals, and break now. Or, my "if(decl)" patch[1]
> > >    would add the new property that the third child of "if" can
> > >    be a declaration instead of an expression. I don't see a good
> > >    solution for that problem. Okay, one could go away from the
> > >    Lispy data structure towards a more "explicit" syntax, i.e.
> > >      class PtreeIfStatement {
> > >        Ptree *if_keyword, *left_paren, *right_paren;
> > >        PtreeExpressionOrDeclaration *condition;
> > >        PtreeStatement *controlled_statement;
> > >      };
> >
> > Precisely. In fact OpenC++ does not use the "Lispy data structure"
> > (I really like this term!). The abstract datatypes that you envision
> > are already present --- look at the names of Metaclass methods ---
> > they are candidates for the new syntax hierarchy. Such hierarchy
> > would be much safer to play with: instead of
> > "ifthenelse->Car()->Car()->Cdr()", which potentially can coredump
> > at each arrow, one would conveniently write "ifthenelse->GetThen()".
>
> Sir, where do I sign? :-)

You mean NDA ? No, I do not have the code :-)

Those abstract data types are still, well, abstract. However
they exist in the desing, they correspond to 'TranslateXYZ()' methods.
However, we can move towards them, at the end is my write-up on how I
think it can be done.

> Still, in the *current* code, TranslateIfStatement takes a
> Ptree* and not a PtreeIfStatement*, although the latter would be
> a safe assumption for most cases.

There is a little bit of confusion, since lispy data structure is mixed up
with 'PtreeIfStatement'-like classes, which on the other hand do not
provide methods like "GetThen()". Why cannot we change
TranslateIfStatement so that it takes "Ptree*" argument?

> > Also if sometime in the future the representation of "PtreeIfStatement"
> > node changes, then client using "ifthenelse->GetThen()" would feel good,
> > while the client using "ifthenelse->Car()->Car()->Cdr()" would have
> > to fix his/her Car()s and Cdr()s.
>
> Far future, because nowadays everyone uses Car() and Cdr().

How can anyone use any other access method, while this is the only one
available?

I think that we touched a general problem of interface changes in the
presence of legacy code.

In my opinion the evolution is inevitable, revolution is not.

If for several versions you provide clients with both old and new
interface, then at some moment you may start deprecating the old one
(i.e. hesitate no more to change implementation of the syntax tree
if the change would not be visible through the new interface).

The time, when two interfaces are available simultaneously enables clients
to *gradually* move from the old to the new interface, most likely running
the code with both new and old interface binding for quite some time.

> One problem we'd still have to solve is that people can still
> construct invalid trees. For example, if I have a Ptree
> representing "if (cond) statement;", someone could Snoc a "&" at
> the end, although that'd yield invalid syntax, and nothing could
> prevent it (although adding an "else" and another statement
> still must be valid). That's what I meant with "Lispy" data
> structure: in Lisp, you have structured lists, too, but no-one
> can prevent you from mangling them somehow.

This cannot be avoided if there is lispy interface to create the tree
nodes available to the client. I think we should just assert or throw once
such a situation (invalid tree) is identified.

> > >    but that would need an enormous amount of work and break
> > >    everything.
> >
> > If you just throw away Ptree's and introduce new hierarchy, then yes.
> >
> > But there is much better way: one can wrap raw tree implementation (like
> > Ptree) into better-structured one using wrapper types (which are
> > implementation of Proxy pattern). The technique I am trying to come up with
> > for it is mainly based on so-called traits (classes similar to e.g.
> > numeric_limits<> from Standard Library) and type-safe visitation (similar to
> > technique used in boost::visitor).
> >
> > Introduction of wrappers does not break any existing code, but makes the new
> > code much better isolated from the actual tree representation, so changes
> > in the tree representation (even adding new nodes) does not break the new
> > code.
>
> I think adding those accessor functions ("GetCondition",
> "GetThen", "GetIf") would be a good start, although it doesn't
> work nicely for everything: for example, PtreeDeleteExpr can
> have the following forms:
>          [:: delete expr]
>          [:: delete \[ \] expr]
>          [delete expr]
>          [delete \[ \] expr]
> I'm unsure whether we should "regularize" that, for example, as
>          [::  delete nil     expr]
>          [::  delete [\[ \]] expr]
>          [nil delete nil     expr]
>          [nil delete [\[ \]] expr]
> Still, this would break a lot of stuff.

I would say that unfortunately for the time being (= unless Ptree* is
deprecated) the burden of figuring out how to handle the tree lies on the
accessor function. This is what client code does now anyways.

Accessor functions can be refactored in the future, if the tree
implementation is changed/normalized.

(BTW: I do not think that accessors should be free functions; see the
write-up).

> There are quite a number
> of places like this in the parse tree. I could (try to) make up
> a complete list.

If you want to invest your time in it, than it would syrely help in
developing the accessors.

> In my original posting, I mentioned only
> "un-regular" places in Declarators.
>
> Another nice thing to have would be a "better" class hierarchy.
> Improvements I see so far are:
> - all PtreeFooExpr should share a common base class,
>   PtreeExpression

What would this class contain? (Not to discourage you, I am just looking
for arguments).

> - all PtreeFooStatement should share a common base class,
>   PtreeStatement. Hmmm, this seems to collide with PtreeBrace
>   and PtreeDeclaration, so maybe leave it out.

Could you elaborate on where is the problem exactly?

> - PtreeDeclaration and PtreeExpression should have a common base
>   class (needed for my if(condition) thing)

Where would this help? In function signatures, I guess, but at the moment
I cannot see where exactly.

> Anything else?

The problem I can see is that fixes to syntax tree are always inevitable.
There is no universally good class hierarchy for C++ syntax, and there
will never be. The class hierarchy depends on the application. I am trying
to address this issue by introducing a "view" --- something in between the
parse tree and Walker.

> This should not break anything.
>
> > > [1] Since the CVS code currently can't be built, I'll wait with
> > > checking it in.
> >
> > Ok.
>
> It's in now.

Great. However could I ask you to add the testcases to verify this
functionality? There are instructions somewhere under ./testsuite, perhaps
those tests should go into ./testsuite/comp.

> Change in the parse tree: the third child of if/while/switch may
> now be a PtreeDeclaration instead of an expression :-)
>
> By the way, I get a strange message I can't interpret:
> # Mailing ope...@li......
> # Generating notification message...
> # Generating notification message... done.
> # Mailing ope...@li......
> # Generating notification message...
> # Traceback (innermost last):
> #   File "/cvsroot/sitedocs/CVSROOT/cvstools/syncmail", line 322, in ?
> #     main()
> #   File "/cvsroot/sitedocs/CVSROOT/cvstools/syncmail", line 315, in main
> #     blast_mail(subject, people, specs[1:], contextlines, fromhost)
> #   File "/cvsroot/sitedocs/CVSROOT/cvstools/syncmail", line 240, in blast_mail
> #     print calculate_diff(file, contextlines)
> #   File "/cvsroot/sitedocs/CVSROOT/cvstools/syncmail", line 139, in calculate_diff
> #     file, oldrev, newrev = string.split(filespec, ',')
> # ValueError: unpack list of wrong size
>
> Doesn't look like any interpreter I know :-)

Python?

This is Michael's fault, I believe he is looking into it, but most likely
you are better informed in this regard :-)

> > > > But now comes the problem: when we just extend Frontend itself, without
> > > > proper support in Translator (e.g. your patches will make 'if ({...}) '
> > > > parse, but Translator will choke on it), we do not have way to add tests
> > > > that make sure the new Frontend functionality works.
> > >
> > > My 1) is a partial solution.
> >
> > But this requires adding *some* support of new nodes to the translator
> > anyway, right?
>
> Yes, but I can't seem to find a way around that, because there
> is no universal fallback. Some want just to recurse into the new
> nodes, some do not want to miss them.

I am thinking about adding the code into the "view" layer (between tree
and walker), what would express the new node in terms of old nodes
(or at least abort).

>
> > > > BTW: Over a year ago there was a resolution on this forum to factor out the
> > > > Driver subsystem and make it a shell script (for easier maintenance
> > > > and to make underlying compiler/preproc/linker settable with Autoconf).
> > >
> > > Personally, I don't care whether the driver is in C, Perl, Shell
> > > or Intercal.
> >
> > I do, because I am responding to bug reports :-)
> > (Seriously: shell is the only solution that does not have to duplicate
> > libtool functionality, e.g. to know how to install shared library on the
> > given platform).
>
> Okay, I overlooked that shared library thing.
>
> > > What about generating a ".h" file and including that in the
> > > driver? In another project of mine, I have this in "configure.in":
> [...]
> > > This creates a file ("arch/platform.h") which contains the
> > > specified code, with shell variables expanded (Disclaimer: I'm
> > > not an Autoconf expert, maybe what I did can be done much
> > > easier).
> >
> > AC_DEFINE is the solution you are looking for.
>
> Whoops? I wonder how I could have missed that :-P
>
> > What you suggest is reasonable for e.g. setting compiler or setting the
> > proper shared lib extension (.so, .dynlib etc.), but apart of those there is
> > *functionality* which is necessary in driver. How will you pass the
> > knowledge about the correct way of installing shared library? On some
> > platforms you just do 'cp'. On some you also have to set up dynamic linker
> > somehow. [snip]
>
> I was mainly thinking about configuring compiler names etc. into
> occ. I have "g++-2.95" and "g++-3.2", and to select one I
> install a symlink in $HOME/bin. I'd like to be able to choose
> that when I compile or - even better - run occ.

I think this kind of stuff should go into client's Makefile. You know the
best what compiler and how you want to call. Possibly you also want to do
some tricks with auto-generated dependencies etc. I think it would be
cleaner, when occ is just 'occ -E'. We could provide "out-of-the-box
Makefile" or script that would do the job for lazy users, who just want to
compile straight with the default compiler.

Best regards
Grzegorz

----------------------------
Hi,

As promised, this is a write-up of thoughts I had wrt. improvind the
design of OpenC++ in evolutionary way.

As I already wrote, there is too much coupling between Walker and Ptree
hierarchy (this is usual problem with Visitor pattern).

I was thinking about providing more isolation by introducing a level of
indirection. In particular I would like Ptree* to be replaced by some
proxy type (a.k.a. "smart pointer", "handler" or "wrapper").

The key point is that if 'p' is a pointer, the expression like 'p.sth' does
not compile. Thus we can migrate from Ptree* to some proxy type, that
behaves like Ptree*. For example:

template <class T>
class PtreePtr
{
public:
    T& operator*()  { return *ptr_; }
    T* operator->() { return ptr_; }
private:
    T* ptr_;
};

We can migrate to this solution in small steps, with very small changes in
API (instead of Ptree* you get something that behaves almost like Ptree*).

The next step would be to get rid of 'new Ptree...' operators in the code.
It should be gradually replaced by something like 'create<Ptree...>()',
which is supposed to give back "something that behaves like Ptree",
wihtout exposing details on what it really is to the client (not even
its type, which should be available through, say, create<Ptree...>::type
or separate designated trait class).

With the above code "injected" into OpenC++ we can start
specializing PtreePtr for particular Ptree kinds,
adding methods for accessing respective subtrees, e.g.:

template <>
class PtreePtr<PtreeIfStmt>
{
    ...
    PtreePtr<Ptree> GetThen()
    {
        return PtreePtr<Ptree>(ptr_->Cdr()->Car()->Cdr());
    }
};

Now if you write code using PtreePtr and use those new accessors (like
'GetThen()'), your code is isolated from the changes in actual Ptree
implementation --- the changes in implementation can be compensated by
changes in PtreePtr specializations, so that the client does not notice
them.

As for visitors (aka. walkers): currently the "view" of the tree that
visitor "sees" is fixed by the tree implementation. However, not every
visitor wants to see the same --- e.g. sometimes you want to see comments
or whitespace as nodes, sometimes not. It is especially true when you have
deveopled several visitors and you add some new functionality to the
parser (e.g. "explicit" keyword), so you need to add syntax node, but you
would like to keep your old visitors running (and e.g. ignore the new
node). Another example are parenthesis --- if you have visitor that
simplifies arithmetic expressions, you most likely are not interested in
it seeing the nodes representing parenthesis.

This can be taken care of by the use of tag classes. One can define a set
of tag classes (which represent a "view" for visitor) and trait class that
maps node implementation type to a tag (in a given view), e.g.:

    class BinaryNode {};
    ...
    class BasicViev;

    template <class View, class Node> class tree_traits;

    template<> struct tree_traits<BasicView,PtreePtr<PtreeAssign> >
    {
        typedef BinaryNode type;
    }
    ...

    tree_traits<BasicView,Node>::tag    <--- what 'Node' means in 'BasicView'
                ^^^^^^^^^ ^^^^
             view          implementation type

To make the mechanism complete, one needs also a "dispatcher" or
"destructor" --- function or functor, that for given view, visitor and node
determines the node "nature" --- e.g. given Ptree* determines if it is
"if statment", "assignment" etc., casts the node representation to
appropriate type (or wraps it in appropriate proxy) and calls the visitor
on it.

Visitor makes use of the traits class to perform compile-time dispatching,
instead of overload resolution suggested in "Design Patterns". This is
basically analogon of a dreaded 'switch' statement, but it is not dreaded,
because

  (1) unlike ordinary switch it fails to compile, when
      you add new tags, that are not covered

  (2) you never want to add new tags, because instead of
      it you leave the view as it was, and add new view,
      that exposes the new node tag.

Now you can have multiple visitors, each of which sees the tree differently.
In particular, when you add a new node, you can make all old visitors work
as before, and add a new view exposing the new node for the new visitors,
which are going to exploit it. (Looks like I am repeating myself, but this
is cruicial).

That's more or less it. Sorry for being chaotic: "This letter is longer than
usual because I lack time to make it short."

I would very much appreciate any feedback.

Best regards
Grzegorz

##################################################################
# Grzegorz Jakacki                       Huada Electronic Design #
# Senior Engineer, CAD Dept.              1 Gaojiayuan, Chaoyang #
# tel. +86-10-64365577 x2074               Beijing 100015, China #
# Copyright (C) 2002 Grzegorz Jakacki, HED. All Rights Reserved. #
##################################################################