From: Grzegorz J. <ja...@he...> - 2004-06-08 01:25:17
|
On Fri, 4 Jun 2004, Stefan Seefeld wrote: > Grzegorz Jakacki wrote: > > >>* provide a much better regression test coverage on different levels so we can > >> measure to what degree changes break compatibility (some will be unavoidable) > > > > > > This seems to be a lot of work. > > but it's worth it, I believe. My point is that it is too much for one leap. > [...ptree -> AST abstraction...] > > > I can see potential problems where existing code keeps e.g. definitions and > > expressions on the same list, which now is a list of Ptree*, but could be no > > more if we want to inject more type information. > > agreed. On the other hand, I'm not suggesting that we change the ptree structure, > just that we make the ptree class tree more rich (type and API wise) such that > users *can* use the high level type information and API. Walking the ptree via > the 'untyped' Ptree nodes will still be possible. I think I get your general idea, but most likely devil is in details. > >>* I'm looking into the 'ctool' code I imported into synopsis. > >> http://synopsis.fresco.org/viewsvn/synopsis-Synopsis/trunk/Synopsis/Parsers/C/ > >> contains a class hierarchy for C, which shouldn't be that different from a C++ grammar, > >> so it may serve as inspiration. > > > > > > Observe that existing Ptree classes already constitute a hierarchy > > (however majority of the code flattens it by upcasting to Leaf/NonLeaf). > > exactly. I realize the attempt to get more type info into the ptree, but it doesn't > look complete. I think completeness could/should be defined on the criteria that > I could write a Visitor that is able to *completely* traverse the ptree recovering > all the type info without ever touching methods 'Cdr()' and 'Car()'. > > > Not at all. Node<IfStatement> is meant to work as a smart pointer to > > Ptree. Is is to be used by value and not stored anywhere, e.g.: > > > > Node<Definition> d = ParseDefinition("int main() {}"); > > > > string id = d.GetIdentifier(); // instead of d->Cdr()->Car()-> etc. > > hmm, but then the user already knows that he is parsing a definition, > so the type info isn't adding much value. Read: I want an *abstract* > syntax tree, such that I can run 'StatementList *ast = parse(my_file);' > and then inspect the returned 'ast' by some custom visitors. I did not mean what you understood. In fact I want Node<Definition> to be "abstract". The second part of example was a miss, let me fix it: Node<Definition> d = ParseDefinition("int main() {}"); SomeVisitor v; v(d); //<--- calls v.Visit(Node<FunctionDefinition>) > In particular, if I want to expose this ast to a scripting frontend > such as python, it is impractical to have these wrapper classes be > temporary objects, as that would make the binding quite complex and > slow. (1) Why? (I have never seen how you create a binding, so I don't have an idea what happens.) (2) What if instead of coding these wrappers, we generate them based on the "Cdr/Car Ptree -> highlevel Ptree" mapping? We could generate "C++ wrappers" and "Python wrappers" My point is to avoid making intrusive changes to Ptree hierarchy, as it breaks essentially all OpenC++, so compensating for such changes is both expensive and error-prone. > > Example 1: You decide to change the structure of a Ptree nodes > > representing some C++ construct (e.g. to store more information in it). > > This will break all clients depending directly on Ptree, because > > they need to update Car/Cdr paths to reflect new structure. However, > > if there is Node<> iface in the middle, you update Car/Cdr paths > > only in Node<>. > > But what about the Node's type ? If my wrapped ptree is a declaration, > but by simply modifying the ptree I change that to be a function call, > the wrapper's type ('Node<Declaration>') would be wrong. I think we are not talking about the same thing. This is what I meant: You decide to store more information, say type annotation, at declaration node. I means as a design decision, not run-time decision. There are several kinds of declarations, each is encoded with some Ptree shape. You decide, that the type annotation will be added atop the declaration tree, so where up to now you had NonLeaf(Decl) / \ NonLeaf NonLeaf ... ... you want to have NonLeaf(Decl) / \ [annotation] NonLeaf <--- the old tree / \ NonLeaf NonLeaf ... ... This change breaks all clients using Leaf/NonLeaf interface, since Cdr/Car access path for declarations change. However, it is quite easy to compensate for this modification in Node<> iface, so that clients of Node<> iface are kept unaware that Ptree shapes changed. Moreover, it is easy to make annotation available to new clients of Node<> iface without breaking the old clients. > Of course, if we expose the two APIs in parallel we'll always have > this problem. May be the ptree API should not expose any modifiers, > i.e. exclusively operate on const ptrees. AFAIU this applies to run-time tree modification, not design change, but is also a valid point, so let me address it. My understanding would be that Ptree API is low-level, Node<> API is high-level. Clients should be safe when they commit exclusively to high-level API. This warranty is void once they start tampering with tree using Ptree API, as clearly Ptree API lets you create a structure, that does not map onto any type-correct tree in the sense of Node<> API. I would be happy with Node<> API coredumping or throwing as soon as it finds out that somebody put any kind of rubbish into underlying Ptree tree. Alternatively Node<> API could contain something like "Invalid" node type that would be exposed in places where underlying Ptre structure is broken in the sense of Node<> API. > > Example 2: You add new kind of node, e.g. for "using" declaration. > > All existing clients of Node<> will break if you just add new type to > > Node visitor. However, you can branch Node<> iface into two versions: > > That's a good point. The Visitor pattern really assumes the type hierarchy > of the visited objects to be stable. > > [...] > > >>Of course, such a move is not a simple decision. You really have to evaluate > >>what future directions you want to take, and whether that fits with the > >>synopsis framework. As I said, my interest into opencxx is in the context > >>of synopsis, so I'll probably do most of my work from there. Please take > >>this as an offer and invitation, not an attempt to fork your project. > > > > > > This requires some thought indeed. > > There is no need for a quick decision. I'm just observing that right now > we are each working on a separate branch, so merging them would be practical. > And that even more so if we are going to look into adapting qmtest as a > unit testing framework (Right now I don't do unit testing on the opencxx > backend, just the generated synopsis AST which I dump). Agreed. Let's see how things work out. > The most important thing I believe is that we define what we each expect > from opencxx (and synopsis) in the future, i.e. whether we are aiming at > the same things, and whether the common goals suggest that we both work > from a common code base, or whether the overlap is simply not large enough > to be worth a merge. Yes. > >>It shouldn't be hard, and, it should be able to do that incrementally. > >>The important step is some basic restructuring of the Parser class into > >>an interface and implementation of the statements that are common between > >>all flavours of C and C++, and put the rest into subclasses (K&R, ansi C, C89, etc.) > > > > > > An important thing to consider here is if we want OpenC++ to be validating. > > If not, then we don't need to bother that "class" occures in C source. > > Personally I think that validating parser/elaborator is much harder to write > > and IMO this should not be our priority, since we have no chances to reach > > the quality of validation that e.g. g++, MSVC or EDG present today. > > I don't understand what you mean with 'validate'. opencxx (and ctool) > are well able to indicate 'parse errors', even though the error > message is not very high level, as that would indeed require that more > language specific semantics be available to the parser (or the object > that is trying to issue a meaningful error message). I don't think it is true in general. OpenC++ is deliberately not strict about syntax. If I am not mistaken it will let you have something like a list ("{1,2,3}" maybe?) as an expression. It was meant to support language extesions. > What do you mean > with 'bother that "class" occures in C' ? Right the lexer will return > a 'CLASS' token if it runs into the 'class' string. That doesn't make > sense in C, as it would have to be an ordinary identifier. Similarly > for all the other keywords. So removing those C++-specific keywords > when scanning in C-mode sounds like the (easy) first step towards C > compatibility. Oh, I see now. So to restate my point, I think we should not invest time in validating the OpenC++ input. I would say that we should assume that input source code is valid C++. > > However, we do have a chance to do something new and useful in providing > > refactoring tool and frontend libraries, and I think this should be our > > focus now. > > agreed. I'm just wondering how much efford it would be to extend opencxx's > scope to C, as I can see a lot of use in a tool like that for example to > the GNOME folks. My concern is that we are trying to go into too many directions: * making type elaborator and program object model into library * typesafe API * Python bindings * C compatibility (Not to mention areas where we need quality improvements as templates and overloading.) BR Grzegorz ################################################################## # Grzegorz Jakacki Huada Electronic Design # # Senior Engineer, CAD Dept. 1 Gaojiayuan, Chaoyang # # tel. +86-10-64365577 x2074 Beijing 100015, China # # Copyright (C) 2004 Grzegorz Jakacki, HED. All Rights Reserved. # ################################################################## |