Re: [Opencxx-users] future directions

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Hi,

Stefan Seefeld wrote:
> Grzegorz Jakacki wrote:
> 
>> Hi Stefan,
>>
>> Stefan Seefeld wrote:
>>
>>> you make some good points. However, your arguments all gravitate
>>> around some requirements which I'm not sure I agree with. (That
>>> was the reason for my 'future directions' thread, so I could better
>>> appreciate user requirements to base my judgement on)
> 
> 
>>> I'm not sure the API enhancement I'v been suggesting would be
>>> one-among-many. In my mind such a typed AST simply reflects the
>>> C++ language, so there isn't that much freedom to choose. 
> 
> 
>> I don't think that the language itself detemines the canonical AST 
>> architecture.
> 
> 
> ok.
> 
>>> The API should allow the definition
>>> of such extensions, but they remain extensions on a single (high-level)
>>> API, instead of being just another API.
>>
>>
>>
>> That's my target too. In particular I would like to let clients 
>> *nonintrusively* extend OpenC++ frontend library so that they can
>>
>> * obtain API that presents e.g. ternary '+', or
>> * API that hides parentheses in expressions, or
> 
> 
> these two seem to me to be just a special *view* on the AST, but not the
> AST itself. Read: For this I wouldn't provide a different AST, but rather
> a tool to filter the unwanted tokens or generally to present the same data
> in different ways.

Ok. Node<> interface is precisely such a tool.

> Anyways, I'm sure we agree on this as the highlevel Node<> API you are
> suggesting is nothing but a high-level view on the Ptree, too...

Exactly.

> 
>> * API that has an extra construct (e.g. 'metaclass ID' declaration) that
>>   I can typesafely insert into AST obtained from the parser.
> 
> 
> that one is different since this really requires the parser (ptree factory)
> to be aware of the extension. 
 > It's thus more difficult to achieve this type
> of extensibility (but the OpenC++ parser already has hooks for this).

Yes.

>>> Ok, but even if we go for wrapper instead of subclasses,
>>> I don't see any reason not to use the C++ type system, i.e. I'd still
>>> suggest 'Statement', 'Expression', etc. to form a type system,
>>
>>
>>
>> I assume you meant "type hierarchy".
> 
> 
> right
> 
>>> instead
>>> of having Node be a template type. 
>>
>>
>>
>> I don't get it. Are you talking about (a) Statement and Expression being
>> wrappers around Ptree* or about (b) putting more type info into Ptree 
>> hierarchy?
> 
> 
> that's exactly the question. If we use wrappers we don't touch Ptree, i.e.
> we don't add type info into the Ptree hierarchy, but instead build another
> hierarchy on top of the Ptree.

Yes.

>> If (a), than I fail to understand how this should work --- Statement
>> and Expression would be polymorphic, so how can they work as wrappers? 
>> What copy semantics do you want for them? Will Expression be a 
>> concrete base class???
> 
> 
> This link describes the ctool API (the C parser backend synopsis 
> provides now):
> http://synopsis.fresco.org/docs/Manual/ctool/index.html
> 
> Have a look into the 'Inheritance Graph'. That's about the AST hierarchy
> I have in mind (with the required additions to cover C++, of course).

OpenC++ has a similar hierarchy "logically", but it is not enforced by 
design.

> Expression is indeed an abstract base class, And as such it can't simply be
> copied, at least not on the stack. Thus, when I'm talking about 'wrappers',
> I'm not talking about temporary objects, but a superstructure on top of
> the parse tree.

I did not have anything like that in mind. The wrappers I suggested were 
meant to be used as value-semantics objects, in many cases temporaries.
Just very much like e.g. boost::shared_ptr.

> 
>> If (b), then I sustain my argument that this is a viable solution,
>> but it requires lots of *revolutionary* changes in parser, elaborator 
>> and translator.
> 
> 
> Hmm, I'd like to get a better understanding of what 'lots of revolutionary
> changes' actually means. 

Sure, sorry for not being clear enough.

Take a particular construct, e.g. member function declaration. Legacy 
code uses some topology of Ptree nodes to encode it. You want to replace 
it with one object of class, say, MemberFunctionDecl. This requires 
fixing all sites that depend on legacy topology:

   * the code in parser that build the topology
   * the code in elaborator and translator that inspects and/or
     modifies the topology.

Another issue is that you want to introduce stricter typing into the 
existing parser/elaborator/translator code. Most likely you have had an 
experience of retrofiting "const" --- in many cases it is a snowball 
effect and I would expect the same here. I think (but maybe I am just 
overly cautious) that it is easier and safer first to build the type 
scaffolding out of the code (i.e. to build Node<>s structure), then
blend it into the code (=change Ptree*'s into more concrete types
according to the model worked out in Node<>s "hierarchy"), if such a 
move seems useful.

> For one, the Ptree type hierarchy already seems
> quite complete. It just doesn't provide a high-level API to access the
> structure without using Cdr() and Car(). 

Right.

> Thus, adding a high-level API
> to the existing classes wouldn't be intrusive at all.

Right. However what do you do about Cdr/Car? They have to stay, because 
lots of code depends on them (not only
external code, but elaborator and translator in the first place)
Making them private, as once suggested, is
not a solution, since elaborator/translator
and some clients need to see the low-level view,
at least still for some time.

> The thing that would really need to be changed is the Walker interface.
> But isn't that a good thing ? Making Walker a true Visitor would be 
> beneficial
> to everybody (there, too, I don't yet fully understand the ramifications,
> i.e. whether the required changes would have to be applied in one shot or
> whether we could do it incrementally).

AFAIU your point is to have a visitor with "Visit(PtreSOMETHING* )"
member functions instead of "VisitSOMETHING(Ptree*)" as it is now, right?

IMO this is the easiest piece. Just add another visitation scheme to 
Ptree and make another visitor. This will not interfere with the old one 
at all.

>> This is a recurring point and I suppose that this is the source of
>> misunderstanding. It seems to me that you believe that Node<> solution
>> requires constructing another in-memory tree atop Ptree tree. This is 
>> not true.
> 
> 
> Yes, see above. For Node<> to be able to live on the stack, it needs
> to be non-polymorphic, i.e. it can't use the C++ type system directly.

Correct.

> Thus it requires an IMO ugly mechanism to dispatch methods based on
> the dynamic type of one of its members.

(1) Why ugly?
(2) C++ virtual dispatch does precisely the same --- dispatches
     a call based on a dynamic type of an object.

Looks like I am missing the point here, could you explain?

> Again, what I find confusing is that the high-level type hierarchy is
> part of the Ptree, but not the high-level API.  If you suggest to
> add the high-level API on top, why not the type hierarchy, too ?

(1) Because I can keep Cdr/Car in the low-level interface and hide them
     in the high-level interface.

(2) Because I can manipulate the high-level interface to get yet more
     typesafe tree view, but I don't need to compensate for these changes
     in the legacy code (including parser, elaborator and walker)

> What good is the current type hierarchy anyways, as there's no associated
> API to begin with ?  Would anything stop working if I took out, say,
> the 'PTreeWhileStatement' class ?

PtreeWhileStatement is just a NonLeaf with a tag saying "I am special 
non-leaf: I am while-statement". Also PtreeWhileStatement redefines
What(), so that type querrying is possible
( Ptree* p; ... if (p->What()==...) { ... } ). Also PtreeWhileStatement 
redefines Translate(), so that Walker::TranslateWhile() is called.

If you remove PtreeWhileStatement:

   * parser will no longer compile, because it creates
     PtreeWhileStatement non-leaf after having parsed "while(){}";
     but say you change PtreeWhileStatement into NonLeaf to make
     parser compile; then...

   * elaborator may compile, but will no longer work, because
     p->What() will give value indicating generic non-leaf,
     not while-statement. Perhaps even elaborator will not compile,
     if for some reason it downcasts NonLeaf to PtreeWhileStatement

   * code using walkers will no longer work, because p->Translate(w)
     will dispatch to w->Translate(Ptree*), not
     w->Translate(PtreeWhileStatement)

>> Perhaps you are looking at Node<> through the existing solution from
>> Synopsis. With Node<> API *no* AST is *built*. See the code example 
>> that I posted --- the only tree in memory is Ptree tree.
> 
> 
> yes, I noticed the difference. I just don't understand the rationale
> for this, see above.
> 
> ...

As I see it:

(1) You can hide Cdr/Car from the users of high-level iface,
     yet still make them available in low-level iface.

(2) The boundary of high-level and low-level API is an additional
     level of indirection that makes it easier to tinker with things
     on one side without affecting things on the other (e.g. making
     high-level iface more typesafe without touching Ptrees or
     changing topologies of Ptrees, without affecting clients of
     high-level iface).

(3) It is non-intrusive wrt. the legacy code (parser, elaborator,
     translator).

> I'm sorry if this thread sounds confusing and doesn't seem to lead
> anywhere. At this point it just reflects my lack of understanding of
> the current design (i.e. either the presence of the PTree class hierarchy
> or the absence of a corresponding high-level API).

Let me try to recap what I learnt about OpenC++ AST.

Imagine, that we have just this simple AST structure:

struct Ptree
{
     virtual int What() = 0;
};

struct Leaf : public Ptree
{
     char* text;
     int length;
}

struct NonLeaf : public Ptree
{
     NonLeaf(int what, Ptree* l, Ptree* r)
       : what_(what), l_(l), r_(r)
     {}
     virtual int What() { return what_; }
     int what_;
     Ptree* l_;
     Ptree* r_;
};

You create representation of "while (COND) BODY" say like this:

Ptree* parseStmt()
{
     ...
     // it's while
     return new NonLeaf(WHILE_C, cond, new NonLeaf(0, body, NULL))
     ...
}

where WHILE_C is enum or constant.

Now imagine that you encode 'what_' attribute in the dynamic type of an 
object. For that purpose you derive classes from NonLeaf that
have no data, they just redefine What(), e.g.:

struct PtreeWhile : public NonLeaf
{
     PtreeWhile(Ptree* l, Ptree* r) : NonLeaf(l, r) {}
     virtual int What() { return WHILE_C; }
};

[Observe, that you no longer need the first argument in NonLeaf ctor, as
its information is now encoded in the object's type.]

The code for creating while representation now looks like this:

Ptree* parseStmt()
{
     ...
     // it's while
     return new PtreeWhile(cond, new NonLeaf(body, NULL))
     ...        ^^^^^^^^^^           ^^^^^^^
}              specific node        generic node

Now imagine you add visitation by introducing:

class PtreeWhile
{
     ...
     void Translate(Walker* w) { w->TranslateWhile((Ptree*)this); }
     ...
};

Also, you may introduce additional, node-specific data members into
some concrete nodes:

class PtreeClassSpec
{
     ...
     char* encoded_name;
};

This is how Ptree works today.

The concrete types of Ptree nodes serve mainly as "tags", also they 
allow storing type-specific data members, but generaly all interfaces
use Ptree*.

The "high-level" types are present in the sense that many functions that
take Ptree* in fact always take objects of certain concrete subclass of 
Ptree (so in fact the "working" type of their argument is more concrete 
than Ptree), but this is not enforced by design --- from the compiler's 
point of view those functions take arbitrary Ptree*.

Hope this helps.

It seems to me that we should try to move forward with implementation. I 
don't think we have an agreement about Node<>s usability. However we do 
agree that adding accessors to concrete Ptree nodes will help. (I still 
think that they belong into Node<> wrappers, but anyway, they will be 
helpful no matter where they are put). Why don't we begin with this part?

As for the HEAD --- I did not have time to investigate the failure that 
I get in CF RedHat when building with gc. Perhaps I will be able to
look into it over the weekend.

Best regards
Grzegorz