[Opencxx-users] Re: [opencxx] Let's break OpenC++ up

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Hello,

On Mon, Mar 17, 2003 at 04:43:53PM +0800, Grzegorz Jakacki wrote:
> On Fri, 14 Mar 2003, Stefan Reuther wrote:
> > I think, there are two problems, at least with the parser. The
> > interface consists of two parts, namely the Ptree descendants
> > and their contents.
> >
> > 1) Every Ptree descendant has a method "Translate" which calls
> >    a method of Walker. Adding a new Ptree class means adding a
> >    method to Walker.
> 
> This is the first sign of too little isolation between visitor (Walker)
> and visited class hierarchy (Ptree and derived classes) --- you cannot
> extend Ptree hierarchy without modifying Walker code.

I don't think it can be done much different. Ptree and Walker
have to "know each other". What needs to be improved is that it
can be detected at compile-time that a Walker descendant does no
longer match our tree.

> >    User code must override that method, or
> >    they'll (accidentially, maybe) run into the default behaviour
> >    which just recurses into the subtrees.
> 
> I am not sure if I understand you here. When the user adds the method of
> Walke, he/she is free to define the default behavious as he/she wants,
> right? Or maybe you mean, that *some* default behavious has to be
> defined, since the member function cannot be just left abstract, because
> otherwise user would make Walker class abstract?

Right now, the default behaviour for every Walker::TranslateXYZ
method is to recurse into the subtrees, and return the changed
tree. I'm trying to say that this behaviour is not optimal,
because it might break user code's expectations. Say, I write a
Walker where each TranslateXYZ method returns a particular node,
like this:
   Ptree* MyWalker::TranslateIfStatement(Ptree* p) {
       return new MyFunkyPtreeDescendant(p, 0);
   }
To be sure, I take walker.h and override all methods of Walker.
So I can now rightfully assume that this
  dynamic_cast<MyFunkyPtreeDescendant*>(walker.Translate(p)) != 0
holds.

If a new TranslateXYZ method is added to Walker, my code still
compiles but does no longer fulfill the above assertion. The
wired-in default behaviour doesn't match my use-case. I can't
say what to do with nodes I don't know about.

Solutions I propose:

- add a AbstractWalker class, where all methods are pure
  virtual. People concerned with the above problem (like me) can
  use that as a base class. Code using that class breaks when we
  add new productions, and forces people to update their code
  (or stay with the old version). Walker would be derived from
  that class.
- add a DefaultWalker class, where all TranslateXYZ methods
  call another function, like this:
     Ptree* DefaultWalker::TranslateIfStatement(Ptree* p) {
         return DefaultHandler(p);
     }
  This would let people choose their default behaviour. Having a
  Ptree::Clone would come in handy for this one.

Another nice thing to have would be a Walker returning other
things than Ptrees. I'm still unsure how that could work. We
have such a thing in the VFiasco project, but (Michael, please
look away) it's not perfect. We also have that which I called
"DefaultWalker" here.

> >    This problem can be
> >    solved with an abstract base class, so people who want to be
> >    paranoid can derive from that.
> 
> Right, but then they have to define traversals in all 'Translate()'
> overloads by themselves.

A good thing, in my opinion. When I want to generate code, I
want to be sure not to miss any node. For example, the recent
addition of the "typeid" operator caused our code to silently
translate
   typeinfo ti = typeid(p);
as
   typeinfo ti = p;

> >    Their code will automatically
> >    fail to compile when we add a new node type, which is a Good
> >    Thing(tm), IMHO.
> 
> It depends. When I am adding new node to store '__attribute__' GNU
> extension, I do not want old code to fail. I would prefere a design,
> in which I can do it so that new code can see '__attribute__' node,
> while old code does not see it.

That's what I meant with "node contents" below.

> > 2) The bigger problem is with the content of the nodes. For
> >    example, the new wide-char constants turn into Leafs. Old
> >    programs might assume that all Leafs are either numeric, char
> >    or string literals, and break now. Or, my "if(decl)" patch[1]
> >    would add the new property that the third child of "if" can
> >    be a declaration instead of an expression. I don't see a good
> >    solution for that problem. Okay, one could go away from the
> >    Lispy data structure towards a more "explicit" syntax, i.e.
> >      class PtreeIfStatement {
> >        Ptree *if_keyword, *left_paren, *right_paren;
> >        PtreeExpressionOrDeclaration *condition;
> >        PtreeStatement *controlled_statement;
> >      };
> 
> Precisely. In fact OpenC++ does not use the "Lispy data structure"
> (I really like this term!). The abstract datatypes that you envision
> are already present --- look at the names of Metaclass methods ---
> they are candidates for the new syntax hierarchy. Such hierarchy
> would be much safer to play with: instead of
> "ifthenelse->Car()->Car()->Cdr()", which potentially can coredump
> at each arrow, one would conveniently write "ifthenelse->GetThen()".

Sir, where do I sign? :-)

Still, in the *current* code, TranslateIfStatement takes a
Ptree* and not a PtreeIfStatement*, although the latter would be
a safe assumption for most cases.

> Also if sometime in the future the representation of "PtreeIfStatement"
> node changes, then client using "ifthenelse->GetThen()" would feel good,
> while the client using "ifthenelse->Car()->Car()->Cdr()" would have
> to fix his/her Car()s and Cdr()s.

Far future, because nowadays everyone uses Car() and Cdr().

One problem we'd still have to solve is that people can still
construct invalid trees. For example, if I have a Ptree
representing "if (cond) statement;", someone could Snoc a "&" at
the end, although that'd yield invalid syntax, and nothing could
prevent it (although adding an "else" and another statement
still must be valid). That's what I meant with "Lispy" data
structure: in Lisp, you have structured lists, too, but no-one
can prevent you from mangling them somehow.

> >    but that would need an enormous amount of work and break
> >    everything.
> 
> If you just throw away Ptree's and introduce new hierarchy, then yes.
> 
> But there is much better way: one can wrap raw tree implementation (like
> Ptree) into better-structured one using wrapper types (which are
> implementation of Proxy pattern). The technique I am trying to come up with
> for it is mainly based on so-called traits (classes similar to e.g.
> numeric_limits<> from Standard Library) and type-safe visitation (similar to
> technique used in boost::visitor).
> 
> Introduction of wrappers does not break any existing code, but makes the new
> code much better isolated from the actual tree representation, so changes
> in the tree representation (even adding new nodes) does not break the new
> code.

I think adding those accessor functions ("GetCondition",
"GetThen", "GetIf") would be a good start, although it doesn't
work nicely for everything: for example, PtreeDeleteExpr can
have the following forms:
         [:: delete expr]
         [:: delete \[ \] expr]
         [delete expr]
         [delete \[ \] expr]
I'm unsure whether we should "regularize" that, for example, as
         [::  delete nil     expr]
         [::  delete [\[ \]] expr]
         [nil delete nil     expr]
         [nil delete [\[ \]] expr]
Still, this would break a lot of stuff. There are quite a number
of places like this in the parse tree. I could (try to) make up
a complete list. In my original posting, I mentioned only
"un-regular" places in Declarators.

Another nice thing to have would be a "better" class hierarchy.
Improvements I see so far are:
- all PtreeFooExpr should share a common base class,
  PtreeExpression
- all PtreeFooStatement should share a common base class,
  PtreeStatement. Hmmm, this seems to collide with PtreeBrace
  and PtreeDeclaration, so maybe leave it out.
- PtreeDeclaration and PtreeExpression should have a common base
  class (needed for my if(condition) thing)
Anything else? This should not break anything.

> > [1] Since the CVS code currently can't be built, I'll wait with
> > checking it in.
> 
> Ok.

It's in now.

Change in the parse tree: the third child of if/while/switch may
now be a PtreeDeclaration instead of an expression :-)

By the way, I get a strange message I can't interpret:
# Mailing ope...@li......
# Generating notification message...
# Generating notification message... done.
# Mailing ope...@li......
# Generating notification message...
# Traceback (innermost last):
#   File "/cvsroot/sitedocs/CVSROOT/cvstools/syncmail", line 322, in ?
#     main()
#   File "/cvsroot/sitedocs/CVSROOT/cvstools/syncmail", line 315, in main
#     blast_mail(subject, people, specs[1:], contextlines, fromhost)
#   File "/cvsroot/sitedocs/CVSROOT/cvstools/syncmail", line 240, in blast_mail
#     print calculate_diff(file, contextlines)
#   File "/cvsroot/sitedocs/CVSROOT/cvstools/syncmail", line 139, in calculate_diff
#     file, oldrev, newrev = string.split(filespec, ',')
# ValueError: unpack list of wrong size

Doesn't look like any interpreter I know :-)

> > > But now comes the problem: when we just extend Frontend itself, without
> > > proper support in Translator (e.g. your patches will make 'if ({...}) '
> > > parse, but Translator will choke on it), we do not have way to add tests
> > > that make sure the new Frontend functionality works.
> >
> > My 1) is a partial solution.
> 
> But this requires adding *some* support of new nodes to the translator
> anyway, right?

Yes, but I can't seem to find a way around that, because there
is no universal fallback. Some want just to recurse into the new
nodes, some do not want to miss them.

> > > BTW: Over a year ago there was a resolution on this forum to factor out the
> > > Driver subsystem and make it a shell script (for easier maintenance
> > > and to make underlying compiler/preproc/linker settable with Autoconf).
> >
> > Personally, I don't care whether the driver is in C, Perl, Shell
> > or Intercal.
> 
> I do, because I am responding to bug reports :-)
> (Seriously: shell is the only solution that does not have to duplicate
> libtool functionality, e.g. to know how to install shared library on the
> given platform).

Okay, I overlooked that shared library thing.

> > What about generating a ".h" file and including that in the
> > driver? In another project of mine, I have this in "configure.in":
[...]
> > This creates a file ("arch/platform.h") which contains the
> > specified code, with shell variables expanded (Disclaimer: I'm
> > not an Autoconf expert, maybe what I did can be done much
> > easier).
> 
> AC_DEFINE is the solution you are looking for.

Whoops? I wonder how I could have missed that :-P

> What you suggest is reasonable for e.g. setting compiler or setting the
> proper shared lib extension (.so, .dynlib etc.), but apart of those there is
> *functionality* which is necessary in driver. How will you pass the
> knowledge about the correct way of installing shared library? On some
> platforms you just do 'cp'. On some you also have to set up dynamic linker
> somehow. [snip]

I was mainly thinking about configuring compiler names etc. into
occ. I have "g++-2.95" and "g++-3.2", and to select one I
install a symlink in $HOME/bin. I'd like to be able to choose
that when I compile or - even better - run occ. For shared
libraries, you're absolutely right.

  Stefan