Menu

Home

Joop Eggen

Grammar and Parser

- the AnyParser library 1.0

Having a background in programming languages, grammars, parsers, compiler construction
I for once wanted to have unproblematic easy tool, at which anything can be thrown, and wich works.
Unproblematic grammars, simple parsing, understandable process,
A very elegant parsing of even ambiguous grammars. And the code must be short and readable.

Notice that the git version is most recent by far.

        Joop  Eggen, 2016-09-18

Hello, World!

So let us start with a demo. First a simple expression grammar. For the following:

10 + xscale * (6 - diff) + 30

The resulting grammar is unsurprising:

    Grammar exprGrammar = new Grammar() {

        Symbol NUMBER = regex("\\d+");
        Symbol IDENT = regex "\\p{L}[_\\p{L}\\p{M}\\d]*");
        Symbol LBRACE = regex("\\(");
        Symbol RBRACE = regex("\\)");
        Symbol ADDOP = regex("\\+|-");
        Symbol MULOP = regex "\\*|/|%|\\^");

        Symbol factor = nonterminal();
        Symbol term = nonterminal();
        Symbol expr = nonterminal();

        {
            define(start,
                    seq( expr, END ));

            define(factor,
                    or( NUMBER,
                        IDENT,
                        seq(LBRACE, expr, RBRACE) ));

            define(term,
                    or( seq(factor, MULOP, term),
                        factor ));

            define(expr,
                    or( seq(term, ADDOP, expr),
                        term ));
        }
    };

start is default start rule of the grammar, in the literature
often an S. seq stands for sequence.
There is a redundance of variable name, NUMBER, and label,
"NUMBER", but it does not seem worth using reflection to
solve this.

The parse tree is simple too. One note: a NUMBER token in the tree
is shown with white space preceding it, but its real text contains
only the digits.

start     10 + xscale * (6 - diff) + 30  
 ├─ expr  10 + xscale * (6 - diff) + 30
    ├─ term ── factor ── NUMBER  10
    ├─ ADDOP   +
    └─ expr   xscale * (6 - diff) + 30
        ├─ term  xscale * (6 - diff)
           ├─ factor ── IDENT  xscale
           ├─ MULOP   *
           └─ term   (6 - diff)
               └─ factor  (6 - diff)
                   ├─ LBRACE  (
                   ├─ expr  6 - diff
                      ├─ term ── factor ── NUMBER  6
                      ├─ ADDOP   -
                      └─ expr ── term ── factor ── IDENT   diff
                   └─ RBRACE  )
        ├─ ADDOP   +
        └─ expr ── term ── factor ── NUMBER   30
 └─ END    

The library

The library can deal with ambiguous grammars, seemingly infinite
recursion, the order of or-alternatives. It also is very efficient.
The error handling is primitive on the other hand. It can only
deal with one error at the time, and will yield the farthest failed,
highest symbol as the expected construct. Contextual or corrective
error handling is missing but certainly feasible.

29 KB jar, java 8, maven, git, 18 classes from which 6 are public API.
I have used many packaged and numbered them to have the classes ordered
by bottom layer to top layer. The package paths will contain:

  • grammar
  • parser
  • demos

The iibrary should be highly readable. It has no dependencies on
other libraries. The javadoc is complete. FindBugs went over the sources.

Collaboration, Feedback, and simply Hello

I am an experienced developer, and quite approachable. You can contact
me in Esperanto, Dutch, English and German. I am aware
that first with the application of such a library the real work starts.
Here the parser delivers a parse tree of a predefined class. Which is
just the start. Also the terse coding style and the (for now) absence
of unit tests should not deter you. I do write unit tests and do TDD.

As of this moment, 2016-09-18, I put the first version up to the
public. Before using the library myself intensively I will not search
publicity. So I wonder how many will find this product.

The wiki uses Markdown syntax.

Project Members: