Re: [RBMetacode] very interesting parsing discussion in a LLVM tutorial
Status: Planning
Brought to you by:
jstrout
From: Thomas T. <tt...@te...> - 2008-03-13 08:36:43
|
On 11.03.2008 15:36 Uhr, "Joe Strout" <jo...@st...> wrote: > I ran across this very interesting page today: > > <http://www.llvm.org/docs/tutorial/LangImpl2.html> Thanks. Good reading on some principles, e.g. it explains what the outcome and purpose of a AST shall, and how to design the AST classes. I see that Jonanathan and my attempt at the AST with Morphe2 comes pretty close to this design, which is a relief to me :) (After all, Jonathan did his homework on parser generator design and this was already his second version, rebuilt from scratch, and then had another overhaul once we tried it with the RB grammar.) If you check out the "TTsASTParser" from out "RBMetacode" svn repository at sourceforge.net, you'll find a set of classes in there for the AST. There is an "Expression" class, which then has direct subclasses such as "Operator" and "LiteralTerm". The "Operator" has subclasses for unary and binary operators, and so on. The classes are still missing their data and constructor members, though (that's why this project not finished yet, basically). All AST classes are derived from "ASTNode". This base class has a "items" array in which the "arguments" of any subclass are assembled. E.g, an expression such as "1 + x" will consist of a "TermExpr" (for operators such as "+", "-", "or") with two array items "IntegerLiteral" and, uh .. whoops, I see that there's still missing an "Identifier" node class as a subclass of "Expression". Anyhow, I hope you get the basic idea. To store the direct item ref'd by a ASTNode subclass, maybe a simple Variant would do which then stores the current token (i.e. the TermExpr would store the "+", while the IntegerLiteral would store the "1" from the example above). That way, not every subclass needs their own constructor and accessors for this (too much work to type them all...) So, yes, this is a good article and if you read this, you should have no problem understanding how Morphe2 and TTsASTParser should work. > this chapter of the tutorial in particular has some very useful > nuggets for us -- in particular, the operator precedence parsing that > they use to handle binary operators. That would be more efficient > than the approach currently used in the ParserProto (not sure how it > compares to what Morphe2 generates). The article, under "Binary Expression Parsing", suggests to use a precedence table for all binary operators and groups them by this. So, instead of using special handlers for terms (plus, minus, or) and products (mul, div, and) and so on, it uses one central binary operator handler that deals with them all based on their value from the precedence table. While this makes sense in a hand-coded parser as keeps the code leaner, using Morphe2, which interprets a grammar and converts this into program code, the programmer (us) usually works with the grammar, not with the parser code. When you have the grammar in front of you, it's IMO easier to write it in the classic way where you outspell the rules for each precedence case (i.e. term, product, etc.) because using the precedence table in the grammar makes it less readable. I really encourage you all to check out the TTsRBParser and TTsASTParser which both use the Morphe2 grammar and output a tree (the former as a string, which looks similar to what ParserProto does, and the latter generates a tree of ASTNodes). Sure, when you like to learn how parsing works, do it by hand, but once you want to keep up with such a complex language as RB, with its often-changing syntax, you'll be better off with a grammar you can adjust quickly to cope with the changes. Let's not keep re-inventing the wheel here but get some _solutions_ done! If you se reasons why Morphe2 is not suitable well, then let's hear them, but keep in mind that Jonathan wrote Morphe2 especially to write a RB languge parsers, so it should meet the needs by design. And if we should ever get him to join this group (where is he, he has not commented on anything yet), he could maybe give us some guidedance on this as well. Thomas |