Re: [Opencxx-users] future directions

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

On Mon, 7 Jun 2004, Stefan Seefeld wrote:

> Grzegorz Jakacki wrote:
[...]
> > I did not mean what you understood. In fact I want Node<Definition> to be
> > "abstract". The second part of example was a miss, let me fix it:
> >
> >     Node<Definition> d = ParseDefinition("int main() {}");
> >     SomeVisitor v;
> >     v(d);    //<--- calls v.Visit(Node<FunctionDefinition>)
>
> Ah, so 'ParseDefinition()' is an 'abstract factory' ?

Exactly.

(Observe, that rParseDefition() in current code is an Abstract
Factory for Ptree hierarchy.)

> That means
> that all those 'Node<>' classes derive from an abstract 'NodeBase' class,

Nope. Read "Node<>", think "smart pointer".

Node<> is intended to be a smart pointer with shallow copy.

Node<FunctionDefinition> will have default conversion to Node<Definition>.

Visitor for Node<> hierarchy will dispatch Node<Definition> to
Visit(Node<FunctionDefition>), Visit(Node<ClassDefinition>) etc.

> at which point I'm wondering what the advantage of such a templated
> class hierarchy is as opposed to a simple traditional one
> (i.e. instead of 'Node<Foo>' just using 'Foo', as 'Foo' would need to
> be defined anyways)

Foo does not need to be defined, declaration alone is sufficient.  But we
can also reuse PtreeXxxx classes for Foo (that would make sense that in
general Node<PtreeIfStatement> is a wrapper for PtreeIfStatement*).

The advantage of using Node<Foo> over using Foo* is that
Node<> wrappers do not expose Car/Cdr.

Moreover, in some cases determining the nature of a Ptree (e.g. if it is
if-then, or if-then-else) requires some code (e.g. go to
Car()->Car()->Car()->Cdr() and check if it is NULL). Ptree hierarchy has
only PtreeIfStatement, which covers both if-then and if-then-else. Having
Node<> wrapper we can have Node<IfThen> and Node<IfThenElse>, both
wrapping PtreeIfStatement*, but carrying additionial information in the
wrapper type.

Yet another argument is that this kind of API is trurly and *interface* to
the tree data. Node<> scheme allows to have many interfaces. In particular,
recall my example of how people want "+" node to be exposed in API --- some
want to see "+" as a binary operator, others as a multiary one. Assuming
that Node<> shows binary plus, client can write/generate another API, that
will expose "multiary" plus. Clients using different APIs can still exchange
the underlying Ptree datastructure, they just see different views.

> >>In particular, if I want to expose this ast to a scripting frontend
> >>such as python, it is impractical to have these wrapper classes be
> >>temporary objects, as that would make the binding quite complex and
> >>slow.
> >
> >
> > (1) Why? (I have never seen how you create a binding, so I don't have
> >     an idea what happens.)
>
> In general, the idea for this particular binding would be to allow
> users to define 'Walker' classes in both, C++, as well as python.
> If I'm in python and I get hold of a 'Declaration' object, calling
> a method (or attribute or property etc.) will result in the invocation
> of the associated C/C++ method. But since python has its own idea about
> function invocation, parameter passing, etc., each C++ method needs to
> be wrapped by a C function that deals with parameter / return value
> conversion / wrapping. So if a method returns a reference to another
> C++ object, that has to be wrapped in its respective python object.
> If these objects are returned by value, you get into a lot of trouble
> because it's hard to track dependencies (i.e. reference counts) as
> nodes refer to and depend on each other in a parse tree. It would
> be far more easy to manage child / parent links internally, so the
> python binding wouldn't need to care as long as the referer is still
> alive.

I think I need an example. Also I believe that you are referring to
situation when Node<> wrappers constitute a polymorhpic hierarchy, which
is not what I had in mind.

> > (2) What if instead of coding these wrappers, we generate them
> >     based on the "Cdr/Car Ptree -> highlevel Ptree" mapping?
> >     We could generate "C++ wrappers" and "Python wrappers"
>
> How / where would that mapping be defined ?

As a text file, e.g.:

    IfPtree : IfThenElse
        Cond	Cdr()->Car()
        Then	Cdr()->Cdr()->Cdr()->Car()
        Else	Cdr()->Cdr()->Cdr()->Cdr()->Cdr()->Car()

or as Python data, e.g.:

    {("IfPtree", "IfThenElse":
      { "Cond" : "Cdr()->Car()",
        "Then" : "Cdr()->Cdr()->Cdr()->Car()",
        "Else" : "Cdr()->Cdr()->Cdr()->Cdr()->Cdr()->Car()
      }
      ...
    }

> > My point is to avoid making intrusive changes to Ptree hierarchy, as it
> > breaks essentially all OpenC++, so compensating for such changes is both
> > expensive and error-prone.
>
> I understand. I don't have a good understanding how widespread opencxx'
> use is these days, i.e. how disruptive a change like this would be to its
> users.

In fact I had in mind just the damages in OpenC++ backend (and of course
the bill is higher when you keep external clients in mind).

> > You decide to store more information, say type annotation, at declaration
> > node. I means as a design decision, not run-time decision. There are
> > several kinds of declarations, each is encoded with some Ptree shape. You
> > decide, that the type annotation will be added atop the declaration tree,
> > so where up to now you had
> >
> >      NonLeaf(Decl)
> >      /     \
> >  NonLeaf   NonLeaf
> >    ...        ...
> >
> > you want to have
> >
> >        NonLeaf(Decl)
> >         /         \
> > [annotation]     NonLeaf  <--- the old tree
> >                  /     \
> >              NonLeaf   NonLeaf
> >                ...        ...
>
> No. I didn't mean to suggest a topological change. Rather, I suggest
> that instead of using raw 'NonLeaf' (say) objects, we use a richer type
> system with types such as 'Declaration', 'Statement', etc. that all *derive from*
> NonLeaf.

That's how the code works today.

> And, as these types know the topology of the sub-trees they are composed
> of, they could provide typed access to the subtrees:
>
> struct Declaration : NonLeaf
> {
>    Type *type() { return static_type<Type *>(Car()->Car());}
>    ...
> };
>
> which is technically nothing else but what you'v been describing above with
> '[annotation]',

Ok, I see. I think I had yet another use case where extending the topology
is useful, but I am unable to recall it now (I suppose I was thinking
along these lines where I was trying to find a way for clients to put
their typed data in the ptree, e.g. OpenC++ backend needs to store type
encodings in some ptrees, but in general not all clients need to.)

> but these metadata are not stuffed into the ptree by the user,
> but by the compiler, i.e. API compatibility is preserved.

I think I don't understand.

> > Clients should be
> > safe when they commit exclusively to high-level API. This warranty is void
> > once they start tampering with tree using Ptree API, as clearly Ptree API
> > lets you create a structure, that does not map onto any type-correct tree
> > in the sense of Node<> API. I would be happy with Node<> API coredumping
> > or throwing as soon as it finds out that somebody put any kind of rubbish
> > into underlying Ptree tree. Alternatively Node<> API could contain
> > something like "Invalid" node type that would be exposed in places where
> > underlying Ptre structure is broken in the sense of Node<> API.
>
> I see your point and I agree to a certain degree. However, I believe that
> we could enforce validity constraints by making the ptree access const
> through the 'Node<>' API.
> In other words, if you want the freedom to manipulate the ptree disregarding
> the C++ syntax, you'd have to get hold of a (non-const) pointer to a ptree.
> Hmm, that could mean that we provide two separate parsers, one generating
> a ptree, the other generating a 'Node<>' tree.
> But then it may be simpler to have a single parser generating the ptree
> as before, and provide a Walker that maps that to a 'Node<>' tree (would
> that be an 'AST' ??)

I think we are converging, but:

* Why do you think that Node<> API would need to be const?

* Even with two API-s (const/non-const) why would we need two
  parsers? We have one parser now with non-const API. We
  can create a wrapper that wraps this parser in const-API,
  period. (Maybe this is what you have in mind writting about
  Walker that maps ptree to Node<> ?)

> [...]
>
> > Oh, I see now. So to restate my point, I think we should not invest time
> > in validating the OpenC++ input. I would say that we should assume that
> > input source code is valid C++.
>
> But the lexer and parser already look for the 'class' token (and so
> some walker may already recognize 'class Foo;' to be a forward declaration, say).

Sorry, I was not clear enough. I understand that we cannot use C++ parser
as is to parse C code.

> All I'm pondering about now is whether optionally removing the C++ keywords
> would get us closer to a C parser, and if so, what else needs to be done to
> complete the step so opencxx could be used for both languages.

This is an interesting question, i.e. can the "common factor" of C and C++
parser be factored out and how.

Switching between C/C++ keywords should be easy with existing code.
Moreover, lexer is encapsulated, so this is not an issue too. The fun
begins in parser.

> > My concern is that we are trying to go into too many directions:
> >
> >   * making type elaborator and program object model into library
> >   * typesafe API
> >   * Python bindings
> >   * C compatibility
> >
> > (Not to mention areas where we need quality improvements as templates and
> > overloading.)
>
> yeah, that's too much to be worked on at the same time. I started
> this whole thread to get feedback about possible use cases and to
> have a discussion about how to support them, at some point in the future.
> I'm not working on all these fronts in parallel. I think the first and
> third point (making opencxx a library and providing scripting access to it)
> is the easiest and most useful one in the short run.

This is exactly what I think.

> C compatibility is
> something quite appart, i.e. I don't expect this to have much (if any)
> impact on the rest. Providing a type-safe ptree / AST API is probably
> the hardest part of this all.

I think it depends. There are many possible AST object models. In
particular, existing Ptree hierarchy gives raise to one of them. Having
read-only type-safe API along this model is quite easy, it is just a
matter of determining the Car/Cdr paths. If this API is useful and
convenient is another question.

>
> Regards,
> 		Stefan
>
>
>
>
> -------------------------------------------------------
> This SF.Net email is sponsored by: GNOME Foundation
> Hackers Unite!  GUADEC: The world's #1 Open Source Desktop Event.
> GNOME Users and Developers European Conference, 28-30th June in Norway
> http://2004/guadec.org
> _______________________________________________
> Opencxx-users mailing list
> Ope...@li...
> https://lists.sourceforge.net/lists/listinfo/opencxx-users
>
>

##################################################################
# Grzegorz Jakacki                       Huada Electronic Design #
# Senior Engineer, CAD Dept.              1 Gaojiayuan, Chaoyang #
# tel. +86-10-64365577 x2074               Beijing 100015, China #
# Copyright (C) 2004 Grzegorz Jakacki, HED. All Rights Reserved. #
##################################################################