Re: [Opencxx-users] future directions

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Grzegorz Jakacki wrote:

>>>>* provide a much better regression test coverage on different levels so we can
>>>>  measure to what degree changes break compatibility (some will be unavoidable)
>>>
>>>
>>>This seems to be a lot of work.
>>
>>but it's worth it, I believe.
> 
> 
> My point is that it is too much for one leap.

ah, yes, I agree. These changes should be done incrementally. Let's start with
the frontend to get more flexible access to the occ lib(s) and the build unit tests
with this to cover the different processing stages. That's a good occasion to document
it, too ! ;-)

> I did not mean what you understood. In fact I want Node<Definition> to be
> "abstract". The second part of example was a miss, let me fix it:
> 
>     Node<Definition> d = ParseDefinition("int main() {}");
>     SomeVisitor v;
>     v(d);    //<--- calls v.Visit(Node<FunctionDefinition>)

Ah, so 'ParseDefinition()' is an 'abstract factory' ? That means
that all those 'Node<>' classes derive from an abstract 'NodeBase' class,
at which point I'm wondering what the advantage of such a templated
class hierarchy is as opposed to a simple traditional one
(i.e. instead of 'Node<Foo>' just using 'Foo', as 'Foo' would need to
be defined anyways)

>>In particular, if I want to expose this ast to a scripting frontend
>>such as python, it is impractical to have these wrapper classes be
>>temporary objects, as that would make the binding quite complex and
>>slow.
> 
> 
> (1) Why? (I have never seen how you create a binding, so I don't have
>     an idea what happens.)

In general, the idea for this particular binding would be to allow
users to define 'Walker' classes in both, C++, as well as python.
If I'm in python and I get hold of a 'Declaration' object, calling
a method (or attribute or property etc.) will result in the invocation
of the associated C/C++ method. But since python has its own idea about
function invocation, parameter passing, etc., each C++ method needs to
be wrapped by a C function that deals with parameter / return value
conversion / wrapping. So if a method returns a reference to another
C++ object, that has to be wrapped in its respective python object.
If these objects are returned by value, you get into a lot of trouble
because it's hard to track dependencies (i.e. reference counts) as
nodes refer to and depend on each other in a parse tree. It would
be far more easy to manage child / parent links internally, so the
python binding wouldn't need to care as long as the referer is still
alive.

> 
> (2) What if instead of coding these wrappers, we generate them
>     based on the "Cdr/Car Ptree -> highlevel Ptree" mapping?
>     We could generate "C++ wrappers" and "Python wrappers"

How / where would that mapping be defined ?

> My point is to avoid making intrusive changes to Ptree hierarchy, as it
> breaks essentially all OpenC++, so compensating for such changes is both
> expensive and error-prone.

I understand. I don't have a good understanding how widespread opencxx'
use is these days, i.e. how disruptive a change like this would be to its
users.

> You decide to store more information, say type annotation, at declaration
> node. I means as a design decision, not run-time decision. There are
> several kinds of declarations, each is encoded with some Ptree shape. You
> decide, that the type annotation will be added atop the declaration tree,
> so where up to now you had
> 
>      NonLeaf(Decl)
>      /     \
>  NonLeaf   NonLeaf
>    ...        ...
> 
> you want to have
> 
>        NonLeaf(Decl)
>         /         \
> [annotation]     NonLeaf  <--- the old tree
>                  /     \
>              NonLeaf   NonLeaf
>                ...        ...

No. I didn't mean to suggest a topological change. Rather, I suggest
that instead of using raw 'NonLeaf' (say) objects, we use a richer type
system with types such as 'Declaration', 'Statement', etc. that all *derive from*
NonLeaf. And, as these types know the topology of the sub-trees they are composed
of, they could provide typed access to the subtrees:

struct Declaration : NonLeaf
{
   Type *type() { return static_type<Type *>(Car()->Car());}
   ...
};

which is technically nothing else but what you'v been describing above with
'[annotation]', but these metadata are not stuffed into the ptree by the user,
but by the compiler, i.e. API compatibility is preserved.

> Clients should be
> safe when they commit exclusively to high-level API. This warranty is void
> once they start tampering with tree using Ptree API, as clearly Ptree API
> lets you create a structure, that does not map onto any type-correct tree
> in the sense of Node<> API. I would be happy with Node<> API coredumping
> or throwing as soon as it finds out that somebody put any kind of rubbish
> into underlying Ptree tree. Alternatively Node<> API could contain
> something like "Invalid" node type that would be exposed in places where
> underlying Ptre structure is broken in the sense of Node<> API.

I see your point and I agree to a certain degree. However, I believe that
we could enforce validity constraints by making the ptree access const
through the 'Node<>' API.
In other words, if you want the freedom to manipulate the ptree disregarding
the C++ syntax, you'd have to get hold of a (non-const) pointer to a ptree.
Hmm, that could mean that we provide two separate parsers, one generating
a ptree, the other generating a 'Node<>' tree.
But then it may be simpler to have a single parser generating the ptree
as before, and provide a Walker that maps that to a 'Node<>' tree (would
that be an 'AST' ??)

[...]

> Oh, I see now. So to restate my point, I think we should not invest time
> in validating the OpenC++ input. I would say that we should assume that
> input source code is valid C++.

But the lexer and parser already look for the 'class' token (and so
some walker may already recognize 'class Foo;' to be a forward declaration, say).
All I'm pondering about now is whether optionally removing the C++ keywords
would get us closer to a C parser, and if so, what else needs to be done to
complete the step so opencxx could be used for both languages.

> My concern is that we are trying to go into too many directions:
> 
>   * making type elaborator and program object model into library
>   * typesafe API
>   * Python bindings
>   * C compatibility
> 
> (Not to mention areas where we need quality improvements as templates and
> overloading.)

yeah, that's too much to be worked on at the same time. I started
this whole thread to get feedback about possible use cases and to
have a discussion about how to support them, at some point in the future.
I'm not working on all these fronts in parallel. I think the first and
third point (making opencxx a library and providing scripting access to it)
is the easiest and most useful one in the short run. C compatibility is
something quite appart, i.e. I don't expect this to have much (if any)
impact on the rest. Providing a type-safe ptree / AST API is probably
the hardest part of this all.

Regards,
		Stefan