[RBMetacode] Tail-Recursive Parser prototype

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

I thought perhaps the parser issue was too big and slippery to attack  
on a purely intellectual level, so I've decided to try making a small  
parser so we can start getting some experience with it.

I should say that I know Thomas and Jon (and perhaps others?) have  
already done a lot of work in this direction using Jon's nifty parser  
generator, and that's still an option we should explore.  But I want  
to also explore the hand-written approach, not least because I have  
an experienced compiler engineer whom I trust advising me to do so.

So, if you do a svn update, you'll find a "ParserProto" project.  It  
handles this grammar:

	Exp: Term
	Term: Term ('+' Factor)* | Factor
	Factor: Factor ('*' Atom)* | Atom
	Atom: <identifier> | <number>

So it can handle things like 2+3*4.  It parses this into a concrete  
syntax tree (which is to say, if we were handling parentheses, they  
would actually be represented in the tree rather than abstracted away  
right away).  There is just one SyntaxNode class, though I imagine  
we'd probably want a hierarchy of classes if we got more serious  
about it.  Each node corresponds to a coding token, pretty much, as  
well as a series of noncoding tokens (i.e. whitespace) that may  
follow it.

For inspection purposes, these nodes can print themselves out with or  
without the noncoding tokens, or can try to reconstruct the original  
source.  This all seems to be working fine, except for any leading  
whitespace -- I haven't found a good place to put that.  (I also  
didn't have much luck trying to capture a trailing comment; I think  
to handle both of these, we need a StatementNode subclass that can  
keep track of leading and trailing noncoding tokens separately.)

Big-picture wise, I think we should take a simple subset of the  
grammar like this, and see how far we can take it as far as error  
handling, relating any node in the tree to a range of text in the  
original input, dealing with code fragments, etc.  Actually, the  
current grammar is probably TOO simple; we should add line  
continuation, a couple more operators, and maybe function calls too.

Then maybe we take that grammar subset, and try it out several  
different ways if necessary to convince ourselves that we've found  
the best approach for our purposes -- and then push on to implement  
the full language grammar.

Any comments?

Best,
- Joe