Re: [RBMetacode] very interesting parsing discussion in a LLVM tutorial

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

On 11.03.2008 15:36 Uhr, "Joe Strout" <jo...@st...> wrote:

> I ran across this very interesting page today:
> 
>      <http://www.llvm.org/docs/tutorial/LangImpl2.html>

Thanks. Good reading on some principles, e.g. it explains what the outcome
and purpose of a AST shall, and how to design the AST classes.

I see that Jonanathan and my attempt at the AST with Morphe2 comes pretty
close to this design, which is a relief to me :)

(After all, Jonathan did his homework on parser generator design and this
was already his second version, rebuilt from scratch, and then had another
overhaul once we tried it with the RB grammar.)

If you check out the "TTsASTParser" from out "RBMetacode" svn repository at
sourceforge.net, you'll find a set of classes in there for the AST. There is
an "Expression" class, which then has direct subclasses such as "Operator"
and "LiteralTerm". The "Operator" has subclasses for unary and binary
operators, and so on. The classes are still missing their data and
constructor members, though (that's why this project not finished yet,
basically).

All AST classes are derived from "ASTNode". This base class has a "items"
array in which the "arguments" of any subclass are assembled. E.g, an
expression such as "1 + x" will consist of a "TermExpr" (for operators such
as "+", "-", "or") with two array items "IntegerLiteral" and, uh .. whoops,
I see that there's still missing an "Identifier" node class as a subclass of
"Expression".

Anyhow, I hope you get the basic idea. To store the direct item ref'd by a
ASTNode subclass, maybe a simple Variant would do which then stores the
current token (i.e. the TermExpr would store the "+", while the
IntegerLiteral would store the "1" from the example above). That way, not
every subclass needs their own constructor and accessors for this (too much
work to type them all...)

So, yes, this is a good article and if you read this, you should have no
problem understanding how Morphe2 and TTsASTParser should work.

> this chapter of the tutorial in particular has some very useful
> nuggets for us -- in particular, the operator precedence parsing that
> they use to handle binary operators.  That would be more efficient
> than the approach currently used in the ParserProto (not sure how it
> compares to what Morphe2 generates).

The article, under "Binary Expression Parsing", suggests to use a precedence
table for all binary operators and groups them by this. So, instead of using
special handlers for terms (plus, minus, or) and products (mul, div, and)
and so on, it uses one central binary operator handler that deals with them
all based on their value from the precedence table.

While this makes sense in a hand-coded parser as keeps the code leaner,
using Morphe2, which interprets a grammar and converts this into program
code, the programmer (us) usually works with the grammar, not with the
parser code. When you have the grammar in front of you, it's IMO easier to
write it in the classic way where you outspell the rules for each precedence
case (i.e. term, product, etc.) because using the precedence table in the
grammar makes it less readable.

I really encourage you all to check out the TTsRBParser and TTsASTParser
which both use the Morphe2 grammar and output a tree (the former as a
string, which looks similar to what ParserProto does, and the latter
generates a tree of ASTNodes).

Sure, when you like to learn how parsing works, do it by hand, but once you
want to keep up with such a complex language as RB, with its often-changing
syntax, you'll be better off with a grammar you can adjust quickly to cope
with the changes.

Let's not keep re-inventing the wheel here but get some _solutions_ done! If
you se reasons why Morphe2 is not suitable well, then let's hear them, but
keep in mind that Jonathan wrote Morphe2 especially to write a RB languge
parsers, so it should meet the needs by design. And if we should ever get
him to join this group (where is he, he has not commented on anything yet),
he could maybe give us some guidedance on this as well.

Thomas