Re: [Opencxx-users] future directions

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

On Fri, 4 Jun 2004, Stefan Seefeld wrote:

> Grzegorz Jakacki wrote:
>
> >>* provide a much better regression test coverage on different levels so we can
> >>   measure to what degree changes break compatibility (some will be unavoidable)
> >
> >
> > This seems to be a lot of work.
>
> but it's worth it, I believe.

My point is that it is too much for one leap.

> [...ptree -> AST abstraction...]
>
> > I can see potential problems where existing code keeps e.g. definitions and
> > expressions on the same list, which now is a list of Ptree*, but could be no
> > more if we want to inject more type information.
>
> agreed. On the other hand, I'm not suggesting that we change the ptree structure,
> just that we make the ptree class tree more rich (type and API wise) such that
> users *can* use the high level type information and API. Walking the ptree via
> the 'untyped' Ptree nodes will still be possible.

I think I get your general idea, but most likely devil is in details.

> >>* I'm looking into the 'ctool' code I imported into synopsis.
> >>   http://synopsis.fresco.org/viewsvn/synopsis-Synopsis/trunk/Synopsis/Parsers/C/
> >>   contains a class hierarchy for C, which shouldn't be that different from a C++ grammar,
> >>   so it may serve as inspiration.
> >
> >
> > Observe that existing Ptree classes already constitute a hierarchy
> > (however majority of the code flattens it by upcasting to Leaf/NonLeaf).
>
> exactly. I realize the attempt to get more type info into the ptree, but it doesn't
> look complete. I think completeness could/should be defined on the criteria that
> I could write a Visitor that is able to *completely* traverse the ptree recovering
> all the type info without ever touching methods 'Cdr()' and 'Car()'.

>
> > Not at all. Node<IfStatement> is meant to work as a smart pointer to
> > Ptree. Is is to be used by value and not stored anywhere, e.g.:
> >
> >   Node<Definition> d = ParseDefinition("int main() {}");
> >
> >   string id = d.GetIdentifier(); // instead of d->Cdr()->Car()-> etc.
>
> hmm, but then the user already knows that he is parsing a definition,
> so the type info isn't adding much value. Read: I want an *abstract*
> syntax tree, such that I can run 'StatementList *ast = parse(my_file);'
> and then inspect the returned 'ast' by some custom visitors.

I did not mean what you understood. In fact I want Node<Definition> to be
"abstract". The second part of example was a miss, let me fix it:

    Node<Definition> d = ParseDefinition("int main() {}");
    SomeVisitor v;
    v(d);    //<--- calls v.Visit(Node<FunctionDefinition>)

> In particular, if I want to expose this ast to a scripting frontend
> such as python, it is impractical to have these wrapper classes be
> temporary objects, as that would make the binding quite complex and
> slow.

(1) Why? (I have never seen how you create a binding, so I don't have
    an idea what happens.)

(2) What if instead of coding these wrappers, we generate them
    based on the "Cdr/Car Ptree -> highlevel Ptree" mapping?
    We could generate "C++ wrappers" and "Python wrappers"

My point is to avoid making intrusive changes to Ptree hierarchy, as it
breaks essentially all OpenC++, so compensating for such changes is both
expensive and error-prone.

> > Example 1: You decide to change the structure of a Ptree nodes
> > representing some C++ construct (e.g. to store more information in it).
> > This will break all clients depending directly on Ptree, because
> > they need to update Car/Cdr paths to reflect new structure. However,
> > if there is Node<> iface in the middle, you update Car/Cdr paths
> > only in Node<>.
>
> But what about the Node's type ? If my wrapped ptree is a declaration,
> but by simply modifying the ptree I change that to be a function call,
> the wrapper's type ('Node<Declaration>') would be wrong.

I think we are not talking about the same thing. This is what I meant:

You decide to store more information, say type annotation, at declaration
node. I means as a design decision, not run-time decision. There are
several kinds of declarations, each is encoded with some Ptree shape. You
decide, that the type annotation will be added atop the declaration tree,
so where up to now you had

     NonLeaf(Decl)
     /     \
 NonLeaf   NonLeaf
   ...        ...

you want to have

       NonLeaf(Decl)
        /         \
[annotation]     NonLeaf  <--- the old tree
                 /     \
             NonLeaf   NonLeaf
               ...        ...

This change breaks all clients using Leaf/NonLeaf interface, since Cdr/Car
access path for declarations change. However, it is quite easy to
compensate for this modification in Node<> iface, so that clients of
Node<> iface are kept unaware that Ptree shapes changed. Moreover, it is
easy to make annotation available to new clients of Node<> iface without
breaking the old clients.

> Of course, if we expose the two APIs in parallel we'll always have
> this problem. May be the ptree API should not expose any modifiers,
> i.e. exclusively operate on const ptrees.

AFAIU this applies to run-time tree modification, not design change, but
is also a valid point, so let me address it. My understanding would be
that Ptree API is low-level, Node<> API is high-level. Clients should be
safe when they commit exclusively to high-level API. This warranty is void
once they start tampering with tree using Ptree API, as clearly Ptree API
lets you create a structure, that does not map onto any type-correct tree
in the sense of Node<> API. I would be happy with Node<> API coredumping
or throwing as soon as it finds out that somebody put any kind of rubbish
into underlying Ptree tree. Alternatively Node<> API could contain
something like "Invalid" node type that would be exposed in places where
underlying Ptre structure is broken in the sense of Node<> API.

> > Example 2: You add new kind of node, e.g. for "using" declaration.
> > All existing clients of Node<> will break if you just add new type to
> > Node visitor. However, you can branch Node<> iface into two versions:
>
> That's a good point. The Visitor pattern really assumes the type hierarchy
> of the visited objects to be stable.
>
> [...]
>
> >>Of course, such a move is not a simple decision. You really have to evaluate
> >>what future directions you want to take, and whether that fits with the
> >>synopsis framework. As I said, my interest into opencxx is in the context
> >>of synopsis, so I'll probably do most of my work from there. Please take
> >>this as an offer and invitation, not an attempt to fork your project.
> >
> >
> > This requires some thought indeed.
>
> There is no need for a quick decision. I'm just observing that right now
> we are each working on a separate branch, so merging them would be practical.
> And that even more so if we are going to look into adapting qmtest as a
> unit testing framework (Right now I don't do unit testing on the opencxx
> backend, just the generated synopsis AST which I dump).

Agreed. Let's see how things work out.

> The most important thing I believe is that we define what we each expect
> from opencxx (and synopsis) in the future, i.e. whether we are aiming at
> the same things, and whether the common goals suggest that we both work
> from a common code base, or whether the overlap is simply not large enough
> to be worth a merge.

Yes.

> >>It shouldn't be hard, and, it should be able to do that incrementally.
> >>The important step is some basic restructuring of the Parser class into
> >>an interface and implementation of the statements that are common between
> >>all flavours of C and C++, and put the rest into subclasses (K&R, ansi C, C89, etc.)
> >
> >
> > An important thing to consider here is if we want OpenC++ to be validating.
> > If not, then we don't need to bother that "class" occures in C source.
> > Personally I think that validating parser/elaborator is much harder to write
> > and IMO this should not be our priority, since we have no chances to reach
> > the quality of validation that e.g. g++, MSVC or EDG present today.
>
> I don't understand what you mean with 'validate'. opencxx (and ctool)
> are well able to indicate 'parse errors', even though the error
> message is not very high level, as that would indeed require that more
> language specific semantics be available to the parser (or the object
> that is trying to issue a meaningful error message).

I don't think it is true in general. OpenC++ is deliberately not strict
about syntax. If I am not mistaken it will let you have something like a
list ("{1,2,3}" maybe?) as an expression. It was meant to support language
extesions.

> What do you mean
> with 'bother that "class" occures in C' ? Right the lexer will return
> a 'CLASS' token if it runs into the 'class' string. That doesn't make
> sense in C, as it would have to be an ordinary identifier. Similarly
> for all the other keywords. So removing those C++-specific keywords
> when scanning in C-mode sounds like the (easy) first step towards C
> compatibility.

Oh, I see now. So to restate my point, I think we should not invest time
in validating the OpenC++ input. I would say that we should assume that
input source code is valid C++.

> > However, we do have a chance to do something new and useful in providing
> > refactoring tool and frontend libraries, and I think this should be our
> > focus now.
>
> agreed. I'm just wondering how much efford it would be to extend opencxx's
> scope to C, as I can see a lot of use in a tool like that for example to
> the GNOME folks.

My concern is that we are trying to go into too many directions:

  * making type elaborator and program object model into library
  * typesafe API
  * Python bindings
  * C compatibility

(Not to mention areas where we need quality improvements as templates and
overloading.)

BR
Grzegorz

##################################################################
# Grzegorz Jakacki                       Huada Electronic Design #
# Senior Engineer, CAD Dept.              1 Gaojiayuan, Chaoyang #
# tel. +86-10-64365577 x2074               Beijing 100015, China #
# Copyright (C) 2004 Grzegorz Jakacki, HED. All Rights Reserved. #
##################################################################