Here are notes on various parsing tools that might help develop specifications or implementations of sweet-expressions.
We want an LL-based tool, to help check recursive descent, enable us to create a good spec, and possibly develop an implementation.
A list of parser generators is here: http://en.wikipedia.org/wiki/Comparison_of_parser_generators
Requirements:
Hints:
Below are a list of tools, followed by other notes.
ANTLR is probably one of the most popular LL-based parser generators today.
As a practical matter must buy book ($24 for v3 electrons, inc. PDF). Extremely popular, active development, lively support. Many generated languages, including Javascript (for v3). On Fedora, install "antlr3" (just "antlr" installs v2) - very convenient!
The current version as of 2012 is version 3 (v3). ANTLR v4 due out Dec. 2012, v4 book draft out. But ANTLR v4 is radically different; it accepts far more than LL (which is probably a good thing for many users, but not what we want for our purposes). Version 4 is Java-only for now, and looks like documentation is really sketchy for v4. It might be best to stick with v3 for now.
Notation looks really clean.
Example (v3): expr : term ( ( PLUS | MINUS ) term )* ; term : factor ( ( MULT | DIV ) factor )* ; factor : NUMBER ; Example (v4) (it appears it has some LL left-recursion extensions): grammar Expr; prog: (expr NEWLINE)* ; expr: expr ('*'|'/') expr | expr ('+'|'-') expr | INT | '(' expr ')' ;
Here are some potential ANTLR pointers:
Sadly, changing modes in a lexer isn't so easy in most LL tools.
Some old ANTLR v2 docs: http://www.antlr2.org/doc/lexer.html http://ftp.camk.edu.pl/camk/chris/antlrman/antlrman.pdf
capable: Attributed grammar, LL(k). GPL (with exceptions, no prob).
Different programs for different generated languages (ugh,
but not relevant?) Coco/R EXAMPLE:
CompilationUnit = [ "package" Qualident ';' ] { ImportDeclaration } { TypeDeclaration } (. CODE HERE .) . http://www.ssw.uni-linz.ac.at/Research/Projects/Coco/
"C# and Java parser generator (compiler compiler). It improves upon simlar tools (like yacc and ANTLR) by creating well-commented and readable source code, by having automatic error recovery and detailed error messages, and by support for testing and debugging grammars without generating source code."
"Grammatica supports LL(k) grammars with an unlimited number of look-ahead tokens." GNU LGPL. Example:
ImportList = "IMPORTS" SymbolsFromModule* ";" ; http://grammatica.percederberg.net/
Claims "most popular" for Java. Originally by Sun.
Originally wasn't OSS, now it is.
Slightly ugly notation; productions have type declarations and look
like Java. Reasonable if you're doing Java of course, but
not the point in this case. You enter ".jj" files. Example:
void enumerator() :
{}
{
<ID> ("=" constant_expression())?
}
Capable, GPL, web version. Lots of capabilities. Can gen Javascript,
and that's a big plus.
Ugh: Uses prefix repetition notation *(...), a BNF notation
I (wheeler) really hate (it's not what the textbooks do,
so it's painful to read).
Haskell's Parsec library combined with Haskell's
Monad Transformer library allows using the same syntax for both INDENT
/ DEDENT / SAME -guarded parsers, and basic parsers (like n-expr) that
don't have INDENT/DEDENT/SAME, and allowing the second to be used
inside the first.
Wheeler agrees that having a parser with significant indent-processing capabilities would be a big plus. But Wheeler didn't include Parsec for several reasons, all based on the fact that Parsec is totally tied to Haskell:
AXE - Baked into C++.
HiLexed - looks unfinished.
SableCC - LALR.
The traditional way to parse indentation-sensitive languages is based
on Python's approach, inserting INDENT and DEDENT tokens as appropriate.
http://erezsh.wordpress.com/2008/07/12/python-parsing-1-lexing/
- gives pseudocode and shows a C# ANTLR implementation
http://stackoverflow.com/questions/232682/how-would-you-go-about-implementing-off-side-rule
- Inserts LINE (newline) as first stage (with indents), then adds
INDENT/DEDENT while keeping LINE. Says it has better results
http://www.antlr.org/pipermail/antlr-interest/2012-June/044736.html
- ANTLR indent/dedent
https://github.com/sirthias/parboiled/wiki/Indentation-Based-Grammars
- Parboiled builds in indentation processing
http://mail.python.org/pipermail/tutor/2009-January/066395.html
- Email discussion about parsing, with INDENT/DEDENT
http://stackoverflow.com/questions/1513020/how-do-i-do-python-style-indent-dedent-tokens-with-alex-haskell
- Discussion about implementing indentation with Alex & Haskell.
Asserts that most people do this in the lexer.
http://antlr.1301665.n2.nabble.com/emitting-multiple-tokens-INDENT-DEDENT-td7578225.html
- Sample indentation processing with ANTLR
https://github.com/antlr/examples-v3/blob/master/java/python/PythonTokenSource.java
- ANTLR source Python tokenizer
http://mobilesystemsengineering.appspot.com/staticdocs/AdvancedMobileSystemsEngineering/Solution1.txt
- ANTLR repairs for Python; may be OBE
https://github.com/bamboo/boo/tree/master/src
http://shaurz.wordpress.com/2008/03/11/haskell-style-parser-combinators-in-scheme/
- Haskell-style parser combinators in Scheme. "We can define a parser as a procedure which takes an input string and an index (i.e. the index of the current character on the input). If the parse succeeds, it returns a value and an index ≥ the original index to the unconsumed input. On failure it returns #f for the both the value and index (the index is checked for failure and the value is ignored)."
- Links to more.
http://weblambda.blogspot.com/2009/10/building-parser-combinators-in-scheme.html
- Comments on How to build a parser combinator in Scheme.
Several sources talk about sllgen, a simple LL(1) tool in Scheme
used in many courses per "Essentials of Programming Languages"
http://www.ccs.neu.edu/course/cs7400/lectures/lecture02.pdf
- Simple example use of sllgen
http://docs.racket-lang.org/eopl/index.html
- Talks about "Essentials of Programming Languages" language in DrRacket,
cross-refs sllgen stuff.
http://www.cs.indiana.edu/eopl/code/interps/sllgen.scm
- sllgen source code. Scheme LL(1) parser generator
http://3e8.org/zb/eggs/lexing.txt
"Notes on Scheme lexers and parsers"
- A text file of someone else's research on Scheme parsers.
Talks about sllgen, essence, packrat.
http://www.softpanorama.org/Algorithms/Compilers/recursive_descent_parsing.shtml
- One of the many pages on recursive descent.