Re: [Opencxx-users] future directions

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

On Tue, 8 Jun 2004, Stefan Seefeld wrote:

> Grzegorz Jakacki wrote:
>
> >>>I did not mean what you understood. In fact I want Node<Definition> to be
> >>>"abstract". The second part of example was a miss, let me fix it:
> >>>
> >>>    Node<Definition> d = ParseDefinition("int main() {}");
> >>>    SomeVisitor v;
> >>>    v(d);    //<--- calls v.Visit(Node<FunctionDefinition>)
> >>
> >>Ah, so 'ParseDefinition()' is an 'abstract factory' ?
> >
> >
> > Exactly.
> >
> > (Observe, that rParseDefition() in current code is an Abstract
> > Factory for Ptree hierarchy.)
>
> Indeed. But Then you can't simply assign to a Node<Definition> without
> a downcast. That was what I stumbled over :-)
>
> >>That means
> >>that all those 'Node<>' classes derive from an abstract 'NodeBase' class,
> >
> >
> > Nope. Read "Node<>", think "smart pointer".
>
> ok. But that's a different issue. Whether we use smart pointers or not,
> somewhere we need a class hierarchy with a type system that covers *all*
> the C++ grammar.

Hm, looks like I did not convey what I ment to. Perhaps a piece of code
will speak more clearly than myself (attached at the end of e-mail).

> >
> > Node<> is intended to be a smart pointer with shallow copy.
> >
> > Node<FunctionDefinition> will have default conversion to Node<Definition>.
> >
> > Visitor for Node<> hierarchy will dispatch Node<Definition> to
> > Visit(Node<FunctionDefition>), Visit(Node<ClassDefinition>) etc.
>
> that's possible but sounds a bit complicated, since even though
> 'FunctionDefinition' IsA 'Definition', but 'Node<FunctionDefinition>'
> is *not* a 'Node<Definition>'.

You restrict yourself to modelling IsA by derived-to-base conversion.

> I.e. in the former case the compiler
> and the C++ type system would do the proper dispatching for us, but
> when using the indirection over Node<> smart pointers we'd have to
> do that manually.

Excatly. See the code.

> That's why I'd like to keep the discussion to the AST type hierarchy
> separate from the issue of whether or not to use Node<> smart pointers.

But Node<> pointers are meant to "simulate" AST hierarchy.

> >>at which point I'm wondering what the advantage of such a templated
> >>class hierarchy is as opposed to a simple traditional one
> >>(i.e. instead of 'Node<Foo>' just using 'Foo', as 'Foo' would need to
> >>be defined anyways)
> >
> >
> > Foo does not need to be defined, declaration alone is sufficient.  But we
> > can also reuse PtreeXxxx classes for Foo (that would make sense that in
> > general Node<PtreeIfStatement> is a wrapper for PtreeIfStatement*).
>
> When is a declaration sufficient ?

Pls. let me know if the code makes it clear to you.

> > The advantage of using Node<Foo> over using Foo* is that
> > Node<> wrappers do not expose Car/Cdr.
>
> yeah, I see your point. Well, what about deriving privately from Ptree
> and then raising the const methods into the public API via 'using' directives ?

Hm, I don't think I meant it.

> That has the additional benefit of preserving the type system for us to play with.

Not sure if i get you.

>
> > Moreover, in some cases determining the nature of a Ptree (e.g. if it is
> > if-then, or if-then-else) requires some code (e.g. go to
> > Car()->Car()->Car()->Cdr() and check if it is NULL). Ptree hierarchy has
> > only PtreeIfStatement, which covers both if-then and if-then-else. Having
> > Node<> wrapper we can have Node<IfThen> and Node<IfThenElse>, both
> > wrapping PtreeIfStatement*, but carrying additionial information in the
> > wrapper type.
>
> hmm, the 'ctool' backend I was talking about earlier has an 'IfStatement'
> that looks about so:
>
> struct IfStatement : Statement
> {
>    Expression *condition;
>    Statement  *then_block;
>    Statement  *else_block;
> };
>
> where the 'else_block' can be empty. I like the simplicity of this, even
> though I'd encapsulate it a little more (especially if this is just a
> ptree wrapper).

That's OK, it just a matter of how you want the wrappers to behave.

> > Yet another argument is that this kind of API is trurly and *interface* to
> > the tree data. Node<> scheme allows to have many interfaces. In particular,
> > recall my example of how people want "+" node to be exposed in API --- some
> > want to see "+" as a binary operator, others as a multiary one. Assuming
> > that Node<> shows binary plus, client can write/generate another API, that
> > will expose "multiary" plus. Clients using different APIs can still exchange
> > the underlying Ptree datastructure, they just see different views.
>
> yeah, I agree in general that delegation is often better than derivation.
> My real issue here is, as I said, the lost of the type system. Could you
> demonstrate how a Node<> based visitor would be implemented (i.e. how it
> would resolve the correct type) ? As that's the central point I believe,
> I'll wait before I continue arguing about the other points until my
> understanding of this Node<> visitor as you see it is more complete.

Voila:

==================================================================
#include <string>
using std::string;

// ----- low-level tree implementation (a la existing Ptree) -----

struct Ptree
{
    virtual ~Ptree() {}
};

struct Leaf : public Ptree
{
    Leaf(const string& txt) : txt_(txt) {}
    string txt_;
};

struct NonLeaf : public Ptree
{
    NonLeaf(Ptree* l, Ptree* r) : l_(l), r_(r) {}
    Ptree* l_;
    Ptree* r_;
};

struct PtreeBinaryOp : public NonLeaf
{
    PtreeBinaryOp(Ptree* l, Ptree* r) : NonLeaf(l, r) {}
};

struct PtreeVar : public Leaf
{
    PtreeVar(const string& id) : Leaf(id) {}
};

/*
    The above AST is capable of representing the sentences
    derived from the following grammar (The Grammar):

    (plus)     E -> E "+" E
    (uminus)   E -> "-" E
    (var)      E -> id

    using the following mappings:

    T( E1 "+" E2 ) =

            PtreeBinaryOp
               /     \
          T( E1 )   NonLeaf
                    /    \
                  Leaf  T( E2 )
                  "+"

    T( "-" E ) =

            NonLeaf
             /    \
           Leaf  T( E )
           "-"

    T( id ) =

            PtreeVar
              "id"

    This mapping is quite irregular. My point is to demonstrate how to deal
    with similar irregularities in existing Ptree.

 */

// ------ high-level API ------

/*
    High level API defined below presents The Grammar
    in a regular, canonical way.
*/

// high-level API declares names of nonterminals...

struct Expr;

// ...and productions.

struct Plus;
struct UMinus;
struct Var;

// Moreover, it adds extra "error" production, to represent
// Ptree subtrees that do not encode any legal tree.

struct Invalid;

// First part of API is a set of wrapper classes
// (think about them as non-owning smart pointers
// or handles)
//
// It is not difficult to automate creation of this code
// from the grammar description.

template <class T> class Node;

class AbstractNodeVisitor;

template <>
class Node<Expr>
{
public:
    Node() : p_(0) {}
private:
    Node(Ptree* p) : p_(p) {}
    template <class U> friend class Node;
    friend void Dispatch(AbstractNodeVisitor&, Node<Expr>);
    Ptree* p_;
};

template <>
class Node<Plus>
{
public:
    static Node New(Node<Expr> l, Node<Expr> r)
    {
        return new PtreeBinaryOp(l.p_, new NonLeaf(new Leaf("+"), r.p_));
    }

    Node() : p_(0) {}

    Node<Expr> GetLeft() const
    {
        return Node<Expr>(p_->l_);
    }

    void SetLeft(Node<Expr> e) const
    {
        p_->l_ = e.p_;
    }

    Node<Expr> GetRight() const
    {
        return Node<Expr>(static_cast<NonLeaf*>(p_->r_)->r_);
    }

    void SetRight(Node<Expr> e) const
    {
        static_cast<NonLeaf*>(p_->r_)->r_ = e.p_;
    }

    operator Node<Expr>() const
    {
        return p_;
    }
private:
    Node(PtreeBinaryOp* p) : p_(p) {}
    template <class U> friend class Node;
    friend void Dispatch(AbstractNodeVisitor&, Node<Expr>);
    PtreeBinaryOp* p_;
};

template <>
class Node<UMinus>
{
public:
    static Node New(Node<Expr> c)
    {
        return new NonLeaf(new Leaf("-"), c.p_);
    }
    Node() : p_(0) {}

    Node<Expr> GetChild() const
    {
        return Node<Expr>(p_->r_);
    }

    void SetChild(Node<Expr> e) const
    {
        p_->r_ = e.p_;
    }

    operator Node<Expr>() const
    {
        return p_;
    }
private:
    Node(NonLeaf* p) : p_(p) {}
    template <class U> friend class Node;
    friend void Dispatch(AbstractNodeVisitor&, Node<Expr>);
    NonLeaf* p_;
};

template <>
class Node<Var>
{
public:
    static Node<Var> New(const string& id)
    {
        return new PtreeVar(id);
    }

    Node() : p_(0) {}

    string GetId() const { return p_->txt_; }
    void SetId(const string& id) const {  p_->txt_ = id; }

    operator Node<Expr>() const
    {
        return p_;
    }
private:
    Node(PtreeVar* p) : p_(p) {}
    template <class U> friend class Node;
    friend void Dispatch(AbstractNodeVisitor&, Node<Expr>);
    PtreeVar* p_;
};

template <>
class Node<Invalid>
{
public:
    Node() : p_(0) {}

    operator Node<Expr>() const
    {
        return p_;
    }
private:
    Node(Ptree* p) : p_(p) {}
    template <class U> friend class Node;
    friend void Dispatch(AbstractNodeVisitor&, Node<Expr>);
    Ptree* p_;
};

// Third part of high-level API is an abstract visitor for Node<>'s

class AbstractNodeVisitor
{
public:
    virtual void Visit(Node<Plus>) = 0;
    virtual void Visit(Node<Var>) = 0;
    virtual void Visit(Node<UMinus>) = 0;
    virtual void Visit(Node<Invalid>) = 0;
};

// Fourth part of high-level API is a dispatcher --
// a mechanism that lets you find out more detailed
// type information about node (e.g. you pass
// Node<Expr>, and dispatcher calls, say,
// v.Visit(Node<Plus>), if your Node<Expr> was indeed
// a plus expression).
//
// There are many ways to implement Dispatch(). Usually
// it will use some technique to find out
// actual (dynamic) type of Ptree* wrapped in Node<>'s,
// plus a little bit of peeking into the tree, if
// dynamic type information itself is not enough
// to find out what high-level node should be reported.
//
// Note1: This implementation uses dynamic_cast in Ptree*
//   hierarchy to learn about dynamic type; other implementations
//   are possible, e.g. type tags ('p_->What()'), type querries
//   ('p_->IsA(...)') or visitation in Ptree hierarchy.
//
// Note2: If the syntax presented by high-level API matches closely
//   the ihneritance hierarchy in Ptree, then it is just enough
//   to find out the dynamic type of Ptree* to be able to build
//   Node<> wrapper; in particular, no further "peeking" is necessary
//   (in fact some peeking may be administered to check if the
//   Ptree tree has the correct topolofy, so that later, say,
//   Node<>::GetRight() does not hit a NULL pointer somewhere
//   along Cdr/Car path).
//
// Note3: I think that for many AST mappings a workable implementation
//   of Dispatch can be generated automatically (although there may
//   be a quagmire somewhere here; e.g. finding if an AST mapping is
//   ambiguous seems undecidable in general).
//
// Note4: Dispatch() is effectively a "tree parser".

void Dispatch(AbstractNodeVisitor& v, Node<Expr> e)
{
    if (PtreeBinaryOp* b = dynamic_cast<PtreeBinaryOp*>(e.p_))
    {
        if (NonLeaf* br = dynamic_cast<NonLeaf*>(b->r_))
        {
            if (Leaf* brl = dynamic_cast<Leaf*>(br->l_))
            {
                if (brl->txt_ == "+")
                {
                    if (b->l_  &&  br->r_)
                    {
                        return v.Visit(Node<Plus>(b));
                    }
                }
            }
        }
    }
    if (NonLeaf* b = dynamic_cast<NonLeaf*>(e.p_))
    {
        if (Leaf* bl = dynamic_cast<Leaf*>(b->l_))
        {
            if (bl->txt_ == "-")
            {
                if (b->r_)
                {
                    return v.Visit(Node<UMinus>(b));
                }
            }
        }
    }
    if (PtreeVar* b = dynamic_cast<PtreeVar*>(e.p_))
    {
        return v.Visit(Node<Var>(b));
    }
    v.Visit(Node<Invalid>(e.p_));
}

//
// General observations:
//
//  * There is no inheritance relation whatsoever between
//    instantiations of Node<>.
//
//  * Node<> behaves like polymorphic pointer.
//
//  * Internally Node<> can use raw, smart or gc-managed pointers.
//
//  * Node<T>::New is analogon of "new T"
//
//  * Node<T>() is analog of default pointer constructor;
//    Note<T>() creates "null pointer"
//
//  * Node<Expr> is "abstract" -- it does have New() member function,
//    you can either create null Node<Expr> or assign/initialize it
//    from another, non-abstract Node<T>
//
//  * There is a user-defined conversion from "concrete" Node<>s to
//    "abstract" Node<Expr> (this is analogon of derived-to-base conversion)
//
//  * Dispatch() is an analogon of virtual dispatch.
//
//  * Dispatch() dispatches to a classic, polymorphic visitor, i.e. its
//    first argument must be derived from AbstractNodeVisitor; it is also
//    possible to arrange for generic visitation, i.e. templatize the first
//    argument of Dispatch().
//
//  * AFAICS code pertaining to Node<> wrappers is inlineable on modern
//    compilers.
//
//  * It is possible to model "multiple inheritance" on Node<>s,
//    which corresponds to grammar in which many nonterminals produce
//    the same RHS. Observe, that unlike standard implementation
//    in C++ typesystem, such "multiple inheritance" does not involve
//    memory overhead.

// -------- Example ---------

class MyVisitor : public AbstractNodeVisitor
{
public:

    MyVisitor(ostream& os) : os_(os) {}

    void Visit(Node<Plus> p)
    {
        os_ << "Plus(";
        Dispatch(*this, p.GetLeft());
        os_ << ",";
        Dispatch(*this, p.GetRight());
        os_ << ")";
    }

    virtual void Visit(Node<Var> v)
    {
        os_ << "Var(" << v.GetId() << ")";
    }

    virtual void Visit(Node<UMinus> u)
    {
        os_ << "UMinus(";
        Dispatch(*this, u.GetChild());
        os_ << ")";
    }

    virtual void Visit(Node<Invalid>)
    {
        os_ << "Invalid()";
    }

private:
    ostream& os_;
};

int main()
{
    Node<Plus> p =
        Node<Plus>::New(
            Node<UMinus>::New(
                Node<Var>::New("x")
            )
          , Node<Var>::New("y")
        );

    Node<Expr> e = p;  // "derived-to-base"

    MyVisitor v(std::cout);

    Dispatch(v, e);
    cout << endl;

    Node<Expr> f = Node<UMinus>::New(Node<Var>::New("z"));
    p.SetRight(f);

    Dispatch(v, e);
    cout << endl;

}

==================================================================

Best regards
Grzegorz

##################################################################
# Grzegorz Jakacki                       Huada Electronic Design #
# Senior Engineer, CAD Dept.              1 Gaojiayuan, Chaoyang #
# tel. +86-10-64365577 x2074               Beijing 100015, China #
# Copyright (C) 2004 Grzegorz Jakacki, HED. All Rights Reserved. #
##################################################################