Thread: [cedet-semantic] Noob wisent grammar/parser question

Brought to you by: zappo

cedet-semantic

[cedet-semantic] Noob wisent grammar/parser question

From: Thomas J. <tja...@gm...> - 2013-10-28 13:44:14

Hi!

I'm currently working on a new Erlang grammar for semantic/wisent. Lexical
analysis works fine, but I can't for the life of me figure out how to get
the parser to return a tag with the bounds properly set.

So far I just have a bare bones grammar with one rule available here:
https://github.com/tjarvstrand/erl-parse

To "reproduce" in Emacs24
- Install erlang-mode
- Clone erl-parse and add it to the load-path
- require erl-parse
- Hit M-x eval-expression RET (erl-parse-string "foo()" 'function-call) RET

Any help would be very much appreciated!

Thanks,
Thomas

Re: [cedet-semantic] Noob wisent grammar/parser question

From: David E. <de...@ra...> - 2013-10-28 17:07:38

Thomas Järvstrand writes:
> I'm currently working on a new Erlang grammar for semantic/wisent. Lexical
> analysis works fine, but I can't for the life of me figure out how to get the
> parser to return a tag with the bounds properly set.

This is because you've defined your own tag generating macro
CALL-TAG. If you look into wisent/grammar-macros.el, you'll see that for
example in `wisent-grammar-FUNCTION-TAG', everything is finalized by
`wisent-raw-tag', which appends the positional information. You either
have to do this in your CALL-TAG macro, or you use the macros provided
by Wisent (like FUNCTION-TAG).

Good luck,
David

Re: [cedet-semantic] Noob wisent grammar/parser question

From: Thomas J. <tja...@gm...> - 2013-10-29 07:13:32

A true rookie mistake :-/ Awesome, thank you!

T


2013/10/28 David Engster <de...@ra...>

> Thomas Järvstrand writes:
> > I'm currently working on a new Erlang grammar for semantic/wisent.
> Lexical
> > analysis works fine, but I can't for the life of me figure out how to
> get the
> > parser to return a tag with the bounds properly set.
>
> This is because you've defined your own tag generating macro
> CALL-TAG. If you look into wisent/grammar-macros.el, you'll see that for
> example in `wisent-grammar-FUNCTION-TAG', everything is finalized by
> `wisent-raw-tag', which appends the positional information. You either
> have to do this in your CALL-TAG macro, or you use the macros provided
> by Wisent (like FUNCTION-TAG).
>
> Good luck,
> David
>

Re: [cedet-semantic] Noob wisent grammar/parser question

From: David E. <de...@ra...> - 2013-10-29 18:45:02

Thomas Järvstrand writes:
> A true rookie mistake :-/

I'm afraid you've left rookie-land quite some time ago when you delved
into the grammar stuff. :-) I had to look this stuff up in the sources
myself, because the Bovine parser appends the location information
automatically...

-David

Re: [cedet-semantic] Noob wisent grammar/parser question

From: Thomas J. <tja...@gm...> - 2013-11-01 09:13:39

Another rookie question.

I'm getting a shift reduce conflict in my Erlang grammar, similar to the
dangling else problem<http://www.gnu.org/software/bison/manual/html_node/Shift_002fReduce.html>in
the Bison manual and I don't seem to be able to solve it using the
semantic precedence declarations.

In erlang a macro is written as ?atom or ?atom([arguments])

In my grammar this translates to the rule:
macro
  : WHY ATOM PAREN_BLOCK
  | WHY ATOM
  ;

I've tried solving this by changing this to (using %nonassoc because I
couldn't find any evidence of the %precedence declaration existing in
semantic):
%nonassoc PARAMETERIZED-MACRO
%nonassoc MACRO
...
%%
macro
  : WHY ATOM PAREN_BLOCK %prec PARAMETERIZED-MACRO
  | WHY ATOM %prec MACRO
  ;

But I still get a warning for a shift/reduce conflict when compiling the
grammar. What is the correct way of solving this issue?

Thanks,
Thomas

2013/10/29 David Engster <de...@ra...>

> Thomas Järvstrand writes:
> > A true rookie mistake :-/
>
> I'm afraid you've left rookie-land quite some time ago when you delved
> into the grammar stuff. :-) I had to look this stuff up in the sources
> myself, because the Bovine parser appends the location information
> automatically...
>
> -David
>

Re: [cedet-semantic] Noob wisent grammar/parser question

From: David E. <de...@ra...> - 2013-11-01 20:54:43

Thomas Järvstrand writes:
> In erlang a macro is written as ?atom or ?atom([arguments])
>
> In my grammar this translates to the rule:
> macro
>   : WHY ATOM PAREN_BLOCK
>   | WHY ATOM
>   ;

First off, even if you get a shift/reduce conflict for this, the default
is to do shift in such a case, so it should still work. Of course, if
the problem is fixable in the grammar, it should be fixed.

> I've tried solving this by changing this to (using %nonassoc because I couldn't
> find any evidence of the %precedence declaration existing in semantic):
> %nonassoc PARAMETERIZED-MACRO
> %nonassoc MACRO

No, %precedence does not exist. However, I think using %nonassoc has
pretty much the same effect. As far as I can see, the difference in
Bison is whether using the operator in an associative way is a run-time
or compile-time error. That being said, I don't think you need to fiddle
with precedence here.

> macro
>   : WHY ATOM PAREN_BLOCK %prec PARAMETERIZED-MACRO
>   | WHY ATOM %prec MACRO
>   ;
>
> But I still get a warning for a shift/reduce conflict when compiling the
> grammar. What is the correct way of solving this issue?

I'd really need to see the full grammar to see the problem (the conflict
might be due to interaction of separate rules), but this problem is
usually dealt with by using an additional rule for an optional argument
which contains an empty match, like

macro
   : WHY ATOM optional-args
   ;

optional-args
   : ;; EMPTY
   | PAREN_BLOCK
     (EXPAND $1 argument-list)
   ;

argument-list:
   : ... deal with open/close-paren and list of arguments ...

You should find many examples like this in other grammars, like in c.by
the optional initialization of variables:

 varname-opt-initializer
   : semantic-list
   | opt-assign
   | ;; EMPTY
   ;

or in java.wy you'll find lots of non-terminals ending in '_opt'.

-David

Re: [cedet-semantic] Noob wisent grammar/parser question

From: Thomas J. <tja...@gm...> - 2013-11-02 11:54:29

Yeah, it turns out that it was the way I was using the rule that was
causing the warning. The problem is that depending on the macro definitions
that are present ?foo(bar) can mean either "replace this expression with
the value of the parameterized macro foo(a)" or "replace foo with the value
of the unparameterized macro foo and call the result as a function with the
argument bar". I guess I'm going to have to build a pre-processor :-/

T


2013/11/1 David Engster <de...@ra...>

> Thomas Järvstrand writes:
> > In erlang a macro is written as ?atom or ?atom([arguments])
> >
> > In my grammar this translates to the rule:
> > macro
> >   : WHY ATOM PAREN_BLOCK
> >   | WHY ATOM
> >   ;
>
> First off, even if you get a shift/reduce conflict for this, the default
> is to do shift in such a case, so it should still work. Of course, if
> the problem is fixable in the grammar, it should be fixed.
>
> > I've tried solving this by changing this to (using %nonassoc because I
> couldn't
> > find any evidence of the %precedence declaration existing in semantic):
> > %nonassoc PARAMETERIZED-MACRO
> > %nonassoc MACRO
>
> No, %precedence does not exist. However, I think using %nonassoc has
> pretty much the same effect. As far as I can see, the difference in
> Bison is whether using the operator in an associative way is a run-time
> or compile-time error. That being said, I don't think you need to fiddle
> with precedence here.
>
> > macro
> >   : WHY ATOM PAREN_BLOCK %prec PARAMETERIZED-MACRO
> >   | WHY ATOM %prec MACRO
> >   ;
> >
> > But I still get a warning for a shift/reduce conflict when compiling the
> > grammar. What is the correct way of solving this issue?
>
> I'd really need to see the full grammar to see the problem (the conflict
> might be due to interaction of separate rules), but this problem is
> usually dealt with by using an additional rule for an optional argument
> which contains an empty match, like
>
> macro
>    : WHY ATOM optional-args
>    ;
>
> optional-args
>    : ;; EMPTY
>    | PAREN_BLOCK
>      (EXPAND $1 argument-list)
>    ;
>
> argument-list:
>    : ... deal with open/close-paren and list of arguments ...
>
>
> You should find many examples like this in other grammars, like in c.by
> the optional initialization of variables:
>
>  varname-opt-initializer
>    : semantic-list
>    | opt-assign
>    | ;; EMPTY
>    ;
>
> or in java.wy you'll find lots of non-terminals ending in '_opt'.
>
> -David
>

Re: [cedet-semantic] Noob wisent grammar/parser question

From: David E. <de...@ra...> - 2013-11-02 14:54:26

Thomas Järvstrand writes:
> Yeah, it turns out that it was the way I was using the rule that was causing
> the warning. The problem is that depending on the macro definitions that are
> present ?foo(bar) can mean either "replace this expression with the value of
> the parameterized macro foo(a)" or "replace foo with the value of the
> unparameterized macro foo and call the result as a function with the argument
> bar". I guess I'm going to have to build a pre-processor :-/

Aah, the joys of pre-processing. So if I understand you correctly, the
problem you have is equivalent to this in C/C++:

#define THEFUNC1 some_func
#define THEFUNC2(x) 5*(x)

a = THEFUNC1(3)  // --> a = some_func(3)
b = THEFUNC2(3)  // --> b = 5*(3)

Is this a correct description of your problem? If so, then Semantic can
already deal with this through lex-spp, which does preprocessing as part
of the lexing process. If you look at the definition of the C lexer,
you'll see that it includes special lexers like
`semantic-lex-cpp-define', which parses #define macro definitions, and
`semantic-lex-spp-replace-or-symbol-or-keyword', which checks whether a
symbol is actually a macro, expands it in-place and returns the correct
lexical tokens. You can try it out by putting point on the 'a' and run
semantic-lex-test:

((symbol 55 . 56)
 (punctuation 57 . 58)
 (symbol "some_func" 59 . 67)
 (semantic-list 67 . 70)
 (symbol 96 . 97)
 (punctuation 98 . 99)
 (number "5" 100 . 111)
 (punctuation "*" 100 . 111)
 (semantic-list
  #("(x)" 0 1
    (macros
     (("x" number "3" 109 . 110))))
  100 . 111))

Handling the C/C++-preprocessor at the lexing stage is surprisingly
complex, which is why lex-spp is a large package; it might be overkill
for your use-case. But even if you don't end up using it, you might see
some hints there on how to deal with this problem.

Another possibility would be to not deal with this at the lexing stage,
but do it afterwards in the tag expansion. For instance, look at how
Semantic makes several tags out of things like

int a,b=5,c=3;

in semantic-expand-c-tag. This function could also do macro
expansion. It's really a matter of what would fit Erlang better (which I
know nothing of).

-David

Re: [cedet-semantic] Noob wisent grammar/parser question

From: Thomas J. <tja...@gm...> - 2013-11-03 20:42:59

Thanks for the extensive reply! Yes, your example captures the gist of it.
Unfortunately I think macro expansion during lexing would be impractical
because macros are usually defined in header files that are often included
by using a path that is dependent on the compile-time environment. For
example, Erlang's xunit, eunit, has assertion macros that are normally
included with:
-include_lib("eunit/include/eunit.hrl").

The compiler then expects to find something on its path named eunit or
eunit-<version> which must contain the include/eunit.hrl file.

I will have to think some more on it. As a start I'm going to use the
parsing to figure out the arity of a function call and for that it's enough
to be able to recognize ?foo(a)(b) as a single argument.

Thanks
Thomas


2013/11/2 David Engster <de...@ra...>

> Thomas Järvstrand writes:
> > Yeah, it turns out that it was the way I was using the rule that was
> causing
> > the warning. The problem is that depending on the macro definitions that
> are
> > present ?foo(bar) can mean either "replace this expression with the
> value of
> > the parameterized macro foo(a)" or "replace foo with the value of the
> > unparameterized macro foo and call the result as a function with the
> argument
> > bar". I guess I'm going to have to build a pre-processor :-/
>
> Aah, the joys of pre-processing. So if I understand you correctly, the
> problem you have is equivalent to this in C/C++:
>
> #define THEFUNC1 some_func
> #define THEFUNC2(x) 5*(x)
>
> a = THEFUNC1(3)  // --> a = some_func(3)
> b = THEFUNC2(3)  // --> b = 5*(3)
>
> Is this a correct description of your problem? If so, then Semantic can
> already deal with this through lex-spp, which does preprocessing as part
> of the lexing process. If you look at the definition of the C lexer,
> you'll see that it includes special lexers like
> `semantic-lex-cpp-define', which parses #define macro definitions, and
> `semantic-lex-spp-replace-or-symbol-or-keyword', which checks whether a
> symbol is actually a macro, expands it in-place and returns the correct
> lexical tokens. You can try it out by putting point on the 'a' and run
> semantic-lex-test:
>
> ((symbol 55 . 56)
>  (punctuation 57 . 58)
>  (symbol "some_func" 59 . 67)
>  (semantic-list 67 . 70)
>  (symbol 96 . 97)
>  (punctuation 98 . 99)
>  (number "5" 100 . 111)
>  (punctuation "*" 100 . 111)
>  (semantic-list
>   #("(x)" 0 1
>     (macros
>      (("x" number "3" 109 . 110))))
>   100 . 111))
>
> Handling the C/C++-preprocessor at the lexing stage is surprisingly
> complex, which is why lex-spp is a large package; it might be overkill
> for your use-case. But even if you don't end up using it, you might see
> some hints there on how to deal with this problem.
>
> Another possibility would be to not deal with this at the lexing stage,
> but do it afterwards in the tag expansion. For instance, look at how
> Semantic makes several tags out of things like
>
> int a,b=5,c=3;
>
> in semantic-expand-c-tag. This function could also do macro
> expansion. It's really a matter of what would fit Erlang better (which I
> know nothing of).
>
> -David
>