Re: [Opencxx-users] future directions

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Hi!

On Fri, 4 Jun 2004, Stefan Seefeld wrote:

> Grzegorz Jakacki wrote:
>
> > Nevertheless, I think that *replacing* Ptree hierarchy with more typed
> > form will be extremely difficult, because:
>
> [...]
>
> I fully agree with your arguments. I believe that before we start to
> undertake any serious efford to refactor this stuff, we need to
>
> * document the existing API to be able to fully understand what the individual
>    types / methods do and what ramifications any change would have on those types.

Agreed. I begun documenting Ptree and PtreeUtils, this is going to be my
major focus after the merge.

> * provide a much better regression test coverage on different levels so we can
>    measure to what degree changes break compatibility (some will be unavoidable)

This seems to be a lot of work.

> Once we have a good grasp at the complete workflow involved in the implementation
> of the various use cases (parse tree / syntax tree construction,
> code generation, etc.) we can suggest migration paths that provide optimal control.
>
> > (1) Find out and write down mappings from the "less typed" AST
> >     to "more typed" AST, e.g.:
>
> I have two sources of documentation:
>
> * documentation of the Parser API, such as:
>
> /*
>    definition
>    : null.declaration
>    | typedef
>    | template.decl
>    | metaclass.decl
>    | linkage.spec
>    | namespace.spec
>    | namespace.alias
>    | using.declaration
>    | extern.template.decl
>    | declaration
> */
> bool Parser::rDefinition(Ptree*& p)
> ...
>
> which indicates that the method 'rDefinition' returns a ptree that is a definition
> with the given subtypes as shown in the comment above. Wouldn't it be possible to
> go over all these comments and model a class tree that models this grammar ?

Hm, don't know.

> Is that possible at all ?

Why not? Each nonterminal translates into abstract class, each production
translates into concrete class.

However, it does not necessarily yield a reasonable AST. Some nonterminals
in grammar are just shortcuts, in particular those having just one
production. You don't want them in AST.

> It seems if the parser is currently able to construct
> a 'Definition' object as a ptree, it should be possible to do the same but with
> more typed objects instead...

I can see potential problems where existing code keeps e.g. definitions and
expressions on the same list, which now is a list of Ptree*, but could be no
more if we want to inject more type information.

>
> * I'm looking into the 'ctool' code I imported into synopsis.
>    http://synopsis.fresco.org/viewsvn/synopsis-Synopsis/trunk/Synopsis/Parsers/C/
>    contains a class hierarchy for C, which shouldn't be that different from a C++ grammar,
>    so it may serve as inspiration.

Observe that existing Ptree classes already constitute a hierarchy
(however majority of the code flattens it by upcasting to Leaf/NonLeaf).

>
> >          template <>
> >          class Node<IfStatement>
> >          {
> >          public:
> >              Node(Node<Expr> c, Node<Statement> t, Node<Statement> e);
> >              Node<Expr>      Cond() { return p_->Cdr()->Car(); }
> >              Node<Statement> Then() { ... }
> >              Node<Statement> Else() { ... }
> >          private:
> >              Node(Ptree* p) : p_(p) {}
> >              friend class AstFactory;
> >              Ptree* p_;
> >          };
> >
> >          class AstFactory
> >          {
> >          public:
> >              template <class T>
> >              static Node<T> Create<T>( /* ... */ );
> >          };
>
> Interesting idea ! The price for this non-intrusiveness, however, would be that we
> essentially have to have two parsing stages, the first generating the parse tree,
> the second to add the AST as a superstructure on top. Right ?

Not at all. Node<IfStatement> is meant to work as a smart pointer to
Ptree. Is is to be used by value and not stored anywhere, e.g.:

  Node<Definition> d = ParseDefinition("int main() {}");

  string id = d.GetIdentifier(); // instead of d->Cdr()->Car()-> etc.

or

  Node<Definition> d =
      AstFactory::Create<FunctionDefinition>(
          ParseType("void")
        , "main"
        , std::vector<Node<ArgDecl> >()
        , std::vector<Node<Statement> >()
      );

or

  void DoSomething(Node<Expr> n)
  {
     ...
  }

Node<>'s by themselves do not have links to other Node<>'s. Each contains
just one wrapped Ptree* and that's it. The only purpose of node is to keep
downcasting and Crd/Car magic hidden from a client.

As an extra bonus wrapped Ptree* could be replaced with boost::shared_ptr
or Loki::SmartPtr or whatever without clients even knowing about it.

> I was really thinking of modifying the Parser itself so it only
> generates the typed objects (which could still derive from Ptree so we
> don't have to rewrite everything...)

In fact Ptree's are typed even today (see PtreeIfStatement). The issue is
that elaborator and translator do not use this type information much.

> > (5) At this point we will have usable, type-safe interface.
> >     Moreover, clients not satisfied with this interface
> >     (e.g. those who prefer multiary "+") will have a chance
> >     to reuse it nonintrusively or just write another
> >     interface to Ptree structure from scratch.
>
> That is a good point ! The generated tree could be visited in terms
> of ptree, *as well* as a typed syntax tree. A little like the other
> poster referring to the different DOM APIs...
>
> > (6) Having a wrapper interface atop Ptree structure allows
> >     for changing Ptree structure without affecting clients
> >     of wrapper interface. (Changes in Ptree, e.g. addition
> >     of new nodes, can be compensated in wrapper layer.)
>
> I'm not sure about that. Could you elaborate ? In my mind an AST node
> would rely on a specific ptree structure it is wrapping, so if you can
> modify the parse tree underneath, wouldn't that invalidate the AST nodes,
> i.e. break some invariants ?

Example 1: You decide to change the structure of a Ptree nodes
representing some C++ construct (e.g. to store more information in it).
This will break all clients depending directly on Ptree, because
they need to update Car/Cdr paths to reflect new structure. However,
if there is Node<> iface in the middle, you update Car/Cdr paths
only in Node<>.

Example 2: You add new kind of node, e.g. for "using" declaration.
All existing clients of Node<> will break if you just add new type to
Node visitor. However, you can branch Node<> iface into two versions:

(a) new, incompatible with existing clients, but exposing "using"
    in visitor.

(b) old (transitional), which hides "using" node from clients
    (or e.g. aborts at an attempt to visit Node<Declaration>
    that indeed represents "using"; or maybe Node<> iface should
    provide for Node<Unknown> to handle such situations. In some
    cases there is reasonable visitation for a new node, e.g.
    if it has "decorative" character (in terms of Decorator pattern),
    like e.g. parentheses node.)

Old clients would then be able to compile against (b) and work without
regressions.

> > I second that. However, I think the interfaces should not be published as
> > they are now, because IMO they are not encapsulating enough. I think parser
> > iterface is OK, but others interfaces should be seriously reviewed before we
> > commit to them. For example program object model is very much coupled with
> > translator. I think we should untangle them first and transform translator
> > framework into exemplary, nonprivledged client of frontend
> > library (libraries).
>
> agreed. On the other hand the whole process of changing the APIs will be
> incremental and iterative, so as long as we don't commit to a fixed and
> stable API I don't see why we can not apply the changes 'in public'.

Everything happens "in public", but some parts are not "published", in the
sense that I don't feel we are bound to keep them stable.

> >>As I'm already maintaining an opencxx 'branch' as part of the synopsis
> >>project, I'm experimenting with things there. Synopsis uses subversion, so
> >>directory-layout related refactoring is much simpler than with cvs.
> >
> >
> > As for Subversion in OpenC++: I have had heard very positive opinions about
> > Subversion, however I don't see an easy way to move OpenC++ development to
> > Subversion now. Currently we rely on SF.net, which provides CVS,
> > CompileFarm, mailing lists, shell+cron accounts and web hosting (and other
> > features which we don't use at the moment). I don't see any other
> > organization that would provide this level of service and comittment to
> > OpenC++ project.
>
> Well, as I said, synopsis presently contains a branch of opencxx. As synopsis
> is my central focus, and as for me opencxx could well be part of synopsis
> (with re-defined scope), I could imagine to move development effords there.
> The synopsis project is hosted by SPI ('software in the public interest')
> and it has its own set of infrastructure tools ('roundup' issue tracker,
> 'qmtest' unit testing framework, 'subversion' configuration management tool,
> 'mailman' mailing lists, etc.)
> Of course, such a move is not a simple decision. You really have to evaluate
> what future directions you want to take, and whether that fits with the
> synopsis framework. As I said, my interest into opencxx is in the context
> of synopsis, so I'll probably do most of my work from there. Please take
> this as an offer and invitation, not an attempt to fork your project.

This requires some thought indeed.

> > I very much support extending the lexer/parser to support full C syntax,
> > however I don't know how much work it takes to get there. What issues do you
> > see?
>
> I see different steps:
>
> * optionally remove the tokens that are not keywords in ordinary C
>    ('class', 'virtual', ...)
>
> * find all expressions that are valid C but not C++, i.e. define
>    additions to the parser that have to be enabled for C but not C++
>
> * do the contrary, i.e. find expressions that are valid C++ but not C...
>
> It shouldn't be hard, and, it should be able to do that incrementally.
> The important step is some basic restructuring of the Parser class into
> an interface and implementation of the statements that are common between
> all flavours of C and C++, and put the rest into subclasses (K&R, ansi C, C89, etc.)

An important thing to consider here is if we want OpenC++ to be validating.
If not, then we don't need to bother that "class" occures in C source.
Personally I think that validating parser/elaborator is much harder to write
and IMO this should not be our priority, since we have no chances to reach
the quality of validation that e.g. g++, MSVC or EDG present today.
However, we do have a chance to do something new and useful in providing
refactoring tool and frontend libraries, and I think this should be our
focus now.

Best regards
Grzegorz

##################################################################
# Grzegorz Jakacki                       Huada Electronic Design #
# Senior Engineer, CAD Dept.              1 Gaojiayuan, Chaoyang #
# tel. +86-10-64365577 x2074               Beijing 100015, China #
# Copyright (C) 2004 Grzegorz Jakacki, HED. All Rights Reserved. #
##################################################################