Work at SourceForge, help us to make it a better place! We have an immediate need for a Support Technician in our San Francisco or Denver office.

Close

Help with this grammar

Help
2012-06-27
2012-12-18
  • Joel Shellman
    Joel Shellman
    2012-06-27

    I'm using the new pre 1.0 version of Beaver. I'm trying to parse "(a b c)" with the grammar below. I get "cannot recover from the syntax error @1,1 - '(' in 0"

    %grammar
    expr = {Atom} atom | {Expr} LPAREN list RPAREN | {Nil} nil;

    list = list expr;

    atom = SYMBOL;

    nil = LPAREN RPAREN;

    %macros
    namestart = ;
    namechar = ;

       
    %tokens
    SYMBOL = namestart namechar*;

    LPAREN = "(";
    RPAREN = ")";

    WS  = + { "Skip" };

     
  • Try it with

    list = expr | list expr;
    

    otherwise it cannot start matching anything. I have not tried it, but this one look like an obvious omission.

    BTW, you do not really need all those extra non-terminals. Well, you might want them if you are shaping your AST classes, but otherwise the following would be faster (less reductions):

    %grammar
        Expr
            = { Atom } SYMBOL
            | { List } "(" ExprList ")"
            | { EmptyList } "(" ")"
            ;
        ExprList
            = Expr
            | ExprList Expr
            ;
    
     
  • BTW (another one :-)) This (defining list via analog of *) would be even better:

    %grammar
        Expr
            = { Atom } SYMBOL
            | { List } "(" ExprList ")"
            ;
        ExprList
            = 
            | ExprList Expr
            ;
    

    Note that initial list element rule is now "empty".

     
  • Joel Shellman
    Joel Shellman
    2012-06-28

    I tried exactly what you did:

    %grammar
    Expr = { Atom } SYMBOL | { List } "(" ExprList ")";
    ExprList = | ExprList Expr;
    %macros
    namestart = [A-Za-z_];
    namechar = [A-Za-z_0-9];
        
    %tokens
    SYMBOL = namestart namechar*;
    WS  = [ \t]+ { "Skip" };
    [code]
    But I get the exact same error:  "cannot recover from the syntax error @1,1 - '(' in 0"
    Here's the code I'm running to run it:
    [code]        Object result = new Parser().parse(new Scanner(new StringReader("(a b c)")));
            System.out.println("goal: " + result);
    [code]
    And the parameters to the beaver-cc:
    [code]TheGrammar.g --src-dir src/main/java --package com.pkg.parser --scanner-base BaseScanner --parser-base BaseParser[code]
    
     
  • (rewrote the reply - too many typos :-))

    Well, the root course of this is a bug of sorts in the Compiler. When a goal symbol - Expr in this case - is used recursively the algorithm that constructs the automaton gets confused. The usual way out of this (and this is what 0.9 does) is to add a synthetic goal symbol. I forget to do that in 1.0…
    While I'm working on that (among other things :-)) you can "fix" this by adding that new goal explicitly. Something like this:

    %grammar
        Goal
            = Expr
            ;
        Expr
            = { Atom } SYMBOL
            | { List } "(" ExprList ")"
            ;
        ExprList
            =
            | ExprList Expr
            ;
    %macros
        namestart = [A-Za-z_];
        namechar  = [A-Za-z_0-9];
    %tokens
        WS      = [ \t]+ { "Skip" };
        SYMBOL  = namestart namechar*;
    

    Note, that because Goal reduction represents a "pass-through" action it will be inlined by the compiler, so it will effectively disappear. Your parser will return an instance of AST.Expr.
    Here's the test runner I used:

    public class Test {
        public static void main(String[] args) throws IOException {
            Reader reader = new StringReader(args[0]);
            Object goal = new Parser().parse(new Scanner(reader));
            if (!(goal instanceof AST.Expr)) {
                throw new IllegalStateException("Oops!");
            }
            System.out.println(goal);
            
            AST.Expr expr = (AST.Expr) goal;
            expr.accept(new AST.Walker() {
                int indent = 0;
                void indentText() {
                    for (int i = 0; i < indent; i++) {
                        System.out.print("  ");
                    }
                }
                void visit(AST.Term node) {
                    indentText();
                    System.out.print(node);
                    System.out.print(": ");
                    System.out.println(node.text);
                }
                boolean enter(AST.Expr.Atom node) {
                    indentText();
                    System.out.println("Atom");
                    indent++;
                    return true;
                }
                void leave(AST.Expr.Atom node) {
                    indent--;
                }
                boolean enter(AST.Expr.List node) {
                    indentText();
                    System.out.println("List");
                    indent++;
                    return true;
                }
                void leave(AST.Expr.List node) {
                    indent--;
                }
            });
        }
    }
    

    I used tree Walker here for a short example of how it can be used.

     
  • Joel Shellman
    Joel Shellman
    2012-06-29

    Thank you very much for your help. I have it working!

    By the way, it's generating:

            void visit(Expr node) {
                if (enter(node)) {
                    node.atom.accept(this);
                    node.exprList.accept(this);
                    leave(node);
                }
            }
    

    for

    Expr = Atom
         | "(" ExprList ")";
    

    so it throws a null pointer exception since only one of the two are non-null. I'm not sure if I'm doing something wrong, but…

     
  • Oops, this is a bug. If productions are not named, compiler creates a concrete class for a nonterminal and productions are used to "construct" it. I probably forgot to extract "uncommon" sub-nodes and "protect" their traversal. Need to fix this ASAP.

    if I'm doing something wrong

    No, nothing wrong. Maybe just a little unexpected :-) The idea was that each production in most cases will be reduced to an instance of its own AST node class. That's what those production names are (mostly) used for - to name those classes. So for this declaration:

    Expr = { Atom} Atom
         | { List} "(" ExprList ")";
    

    It'll generate the following AST structure:

    public static abstract class Expr extends Node {
        public static class Atom extends Expr {
            Atom atom;
            public Atom(Atom atom) { this.atom = atom; }
        }
        public static class List extends Expr {
            ExprList exprList;
            public List(ExprList exprList) { this.exprList = exprList; }
        }
    }
    

    If productions are left unnamed, like in your example, the Expr node class becomes concrete and it gets 2 constructors:

    public static class Expr extends Node {
        Atom atom;
        ExprList exprList;
        public Expr(Atom atom) { this.atom = atom; }
        public Expr(ExprList exprList) { this.exprList = exprList; }
    }
    

    As you see in this case there will be uninitialized fields. Anyway, the idea behind these unnamed productions was that there cases when productions look almost the same with only a symbol or two that are different. Mostly to be able to optimize cases like these:

    A = B C?;
    

    ? closure can be rewritten as:

    A = B OptC;
    OptC = | C;
    

    But that adds a reduction, which we'd like to avoid to make a speedier parser. So another possible rewrite would be:

    A = B | B C;
    

    Which is better, but it needs to account for the optional presence of C. And that's apparently something I missed.

     
  • Joel Shellman
    Joel Shellman
    2012-06-30

    Thank you so much for your help!

    On that previous thing, if I do name those productions, I run into a different problem, so I left them unnamed.

    My parser seems to be working mostly… it's doing weird things when it tries to parse }. I'm pretty sure I need to at least fix the alpha line in macros. I need to allow all those special characters. But I don't know which ones I need to escape, nor how to escape them. But the really wacky thing is that if I try to parse something that has } in it such as "a (})" it gives:
    ListOfExpr
      Symbol a
      ExprList
        Bool

    Where did a Bool come from? Weird.

    %grammar
    Goal = ListOfExpr;
    ListOfExpr = ListOfExpr Expr
               | ;
    Expr = Atom
         | "(" ExprList ")";
    ExprList = ExprList Expr
             | ;
    Atom = {Bool} BOOL
         | {Str} STRING
         | {_Int} _INT
         | {_Float} _FLOAT
         | {Symbol} SYMBOL;
    %macros
    alpha = [A-Za-z=*/+_?$!@~><&%'#`;:{}-];
    digit = [0-9];
    nzdigit = [1-9];
    quotable = [^] \ ["];
    %tokens
    STRING = "\"" (quotable | "\\\"")* "\"";
    BOOL = "true" | "false";
    _INT    = ("+" | "-")? (digit+ | digit* "." "0"*);
    _FLOAT  = ("+" | "-")? (digit* "." digit* nzdigit+ digit*);
    SYMBOL = alpha ( alpha | digit )*;
    WS     = [ \t]+ { "Skip" };
    NL     = "\r" | "\n" | "\r\n" { "NewLine", "Skip" };
    
     
  • You found another bug. Range that ended with  was missing the character before the dash. I fixed it, but I'd like to make those other 2 changes - null field check in unnamed productions and synthetic goal generation when needed - before I release the next build.

    You can work around this (for now) by moving that dash in front of the range. Like this:

    alpha = [-A-Za-z=*/+_?$!@~><&%'#`;:{}];
    
     
  • PS: BTW, that BOOL was inserted by error recovery. When scanner returned unrecognized token "}", parser entered error recovery and replaced it with first that allowed it to continue parsing. That BOOL is synthentic as it has no payload (there is no "true" or "false" value attached to it. If you run your parser with the logger, it'll tell you all about this. Just to see this recovery try the following call:

    Object goal = new Parser().parse(new Scanner(reader), new beaver.Parser.NullLog() {
        public void unexpectedTokenReplaced(Term unexpectedToken, Term replacementToken) {
            System.err.println("unexpected token: replaced " + unexpectedToken + " with " + replacementToken);
        }                
    });
    

    Usually you would implement beaver.Parser.ErrorLog interface to intercept all parser messages.