Re: [RBMetacode] AST generation
Status: Planning
Brought to you by:
jstrout
From: Joe S. <jo...@st...> - 2008-03-04 20:17:01
|
On Mar 4, 2008, at 11:55 AM, Thomas Tempelmann wrote: >> Option "A" is a lot cheaper in such an environment, because you just >> rerun the mini-parser when an external interface changes. > > I'd rather go with Option B, though: It should be faster (parsing the > original source code is rather slow compared to traversing the > generated > tree and looking for the right types) and it's more universal as > we're first > building a more general tree which we then attribute with extra > information. > > With Option B, we also can see better what's left (e.g. all the > identifiers > from the RB framework), filter them and recreate a table from that > to use as > a prototype of the RB framework. > > What do you and others think? I'll wait for other opinions, and let this all sink in a bit, before I opine any more on this issue myself. > In any case, what should be the structure we're generating for the > tree? > Does the one the TTsASTParser tries to generate make sense to you? > Basically, it uses separate classes for each type of grammar node, for > example it has a unique class for each possible operator. I think > that's a > bit of overkill - I'd rather have a class for each of the operator > priorities (one for plus/minus/or, one for mul/div/and, one for > comparisons, > one for not and negate, etc., and then add a property naming the > particular > operator). That way, the hierarchy of operators is seen right in > the tree. I agree that a separate class for each operator is overkill. I'm not even convinced you need a separate class for different priorities. The parser needs separate rules for them (well, maybe, depending on what type of parser it is), but the AST shouldn't care about that -- by the time you have an AST, operator precedence no longer matters. It's already been taken care of. So I've been using just a BinopNode to represent all binary operator nodes, and it's been working fine for me. (Actually, that's a bit of an exaggeration; it's only for binary operators within expressions. The assignment operator is a separate node.) > Similarly, one class for all kind of loops, maybe, and a sub-class > identifying the different kinds of loops with their parameters. But > maybe > this won't work because it requires that the code is free of errors > such as > a missing "next". Maybe it's better that the initial tree is free > of such > semantics? Well, I think you do need nodes that represent various kinds of loops -- and organizing these into a class for all loops with a subclass for specific kinds of loops makes sense to me. That way, any code that needs to deal with loops in general can address the superclass, and code that needs to be more specific can deal with the subclasses. We need to take care to distinguish between an AST and a CST (Concrete Syntax Tree), though. A CST can always be converted into an AST, but the reverse is not true. There area lot of minor differences, but for example, an AST won't contain any parentheses, since those don't affect the abstract syntax; the guts of the parentheses will simply appear as an expression node directly within whatever node is using that parenthesized expression. But in a CST, the parens are like an operator, represented with their own node in the tree. I've been generating a CST, and even keeping track of noncoding tokens (i.e. whitespace), since that's actually more general -- you can do pretty much anything with it you could do with an AST, but also more, such as reconstructing the original source. That'll be important for things like refactoring tools, where you want to rearrange the source tree a bit, but maintain the user's original syntax as much as possible. Best, - Joe |