Re: [Opencxx-users] future directions

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Hi Grzegorz,

Grzegorz Jakacki wrote:

> Nevertheless, I think that *replacing* Ptree hierarchy with more typed
> form will be extremely difficult, because:

[...]

I fully agree with your arguments. I believe that before we start to
undertake any serious efford to refactor this stuff, we need to

* document the existing API to be able to fully understand what the individual
   types / methods do and what ramifications any change would have on those types.

* provide a much better regression test coverage on different levels so we can
   measure to what degree changes break compatibility (some will be unavoidable)

Once we have a good grasp at the complete workflow involved in the implementation
of the various use cases (parse tree / syntax tree construction, code generation,
etc.) we can suggest migration paths that provide optimal control.

> (1) Find out and write down mappings from the "less typed" AST
>     to "more typed" AST, e.g.:

I have two sources of documentation:

* documentation of the Parser API, such as:

/*
   definition
   : null.declaration
   | typedef
   | template.decl
   | metaclass.decl
   | linkage.spec
   | namespace.spec
   | namespace.alias
   | using.declaration
   | extern.template.decl
   | declaration
*/
bool Parser::rDefinition(Ptree*& p)
...

which indicates that the method 'rDefinition' returns a ptree that is a definition
with the given subtypes as shown in the comment above. Wouldn't it be possible to
go over all these comments and model a class tree that models this grammar ? Is
that possible at all ? It seems if the parser is currently able to construct
a 'Definition' object as a ptree, it should be possible to do the same but with
more typed objects instead...

* I'm looking into the 'ctool' code I imported into synopsis.
   http://synopsis.fresco.org/viewsvn/synopsis-Synopsis/trunk/Synopsis/Parsers/C/
   contains a class hierarchy for C, which shouldn't be that different from a C++ grammar,
   so it may serve as inspiration.

>          template <>
>          class Node<IfStatement>
>          {
>          public:
>              Node(Node<Expr> c, Node<Statement> t, Node<Statement> e);
>              Node<Expr>      Cond() { return p_->Cdr()->Car(); }
>              Node<Statement> Then() { ... }
>              Node<Statement> Else() { ... }
>          private:
>              Node(Ptree* p) : p_(p) {}
>              friend class AstFactory;
>              Ptree* p_;
>          };
> 
>          class AstFactory
>          {
>          public:
>              template <class T>
>              static Node<T> Create<T>( /* ... */ );
>          };

Interesting idea ! The price for this non-intrusiveness, however, would be that we
essentially have to have two parsing stages, the first generating the parse tree,
the second to add the AST as a superstructure on top. Right ?

I was really thinking of modifying the Parser itself so it only generates the
typed objects (which could still derive from Ptree so we don't have to rewrite everything...)

> (5) At this point we will have usable, type-safe interface.
>     Moreover, clients not satisfied with this interface
>     (e.g. those who prefer multiary "+") will have a chance
>     to reuse it nonintrusively or just write another
>     interface to Ptree structure from scratch.

That is a good point ! The generated tree could be visited in terms
of ptree, *as well* as a typed syntax tree. A little like the other
poster referring to the different DOM APIs...

> (6) Having a wrapper interface atop Ptree structure allows
>     for changing Ptree structure without affecting clients
>     of wrapper interface. (Changes in Ptree, e.g. addition
>     of new nodes, can be compensated in wrapper layer.)

I'm not sure about that. Could you elaborate ? In my mind an AST node
would rely on a specific ptree structure it is wrapping, so if you can
modify the parse tree underneath, wouldn't that invalidate the AST nodes,
i.e. break some invariants ?

> I second that. However, I think the interfaces should not be published as
> they are now, because IMO they are not encapsulating enough. I think parser
> iterface is OK, but others interfaces should be seriously reviewed before we
> commit to them. For example program object model is very much coupled with
> translator. I think we should untangle them first and transform translator
> framework into exemplary, nonprivledged client of frontend
> library (libraries).

agreed. On the other hand the whole process of changing the APIs will be
incremental and iterative, so as long as we don't commit to a fixed and
stable API I don't see why we can not apply the changes 'in public'.

>>As I'm already maintaining an opencxx 'branch' as part of the synopsis
>>project, I'm experimenting with things there. Synopsis uses subversion, so
>>directory-layout related refactoring is much simpler than with cvs.
> 
> 
> As for Subversion in OpenC++: I have had heard very positive opinions about
> Subversion, however I don't see an easy way to move OpenC++ development to
> Subversion now. Currently we rely on SF.net, which provides CVS,
> CompileFarm, mailing lists, shell+cron accounts and web hosting (and other
> features which we don't use at the moment). I don't see any other
> organization that would provide this level of service and comittment to
> OpenC++ project.

Well, as I said, synopsis presently contains a branch of opencxx. As synopsis
is my central focus, and as for me opencxx could well be part of synopsis
(with re-defined scope), I could imagine to move development effords there.
The synopsis project is hosted by SPI ('software in the public interest')
and it has its own set of infrastructure tools ('roundup' issue tracker,
'qmtest' unit testing framework, 'subversion' configuration management tool,
'mailman' mailing lists, etc.)
Of course, such a move is not a simple decision. You really have to evaluate
what future directions you want to take, and whether that fits with the
synopsis framework. As I said, my interest into opencxx is in the context
of synopsis, so I'll probably do most of my work from there. Please take
this as an offer and invitation, not an attempt to fork your project.

> I very much support extending the lexer/parser to support full C syntax,
> however I don't know how much work it takes to get there. What issues do you
> see?

I see different steps:

* optionally remove the tokens that are not keywords in ordinary C
   ('class', 'virtual', ...)

* find all expressions that are valid C but not C++, i.e. define
   additions to the parser that have to be enabled for C but not C++

* do the contrary, i.e. find expressions that are valid C++ but not C...

It shouldn't be hard, and, it should be able to do that incrementally.
The important step is some basic restructuring of the Parser class into
an interface and implementation of the statements that are common between
all flavours of C and C++, and put the rest into subclasses (K&R, ansi C, C89, etc.)

Regards,
		Stefan