Thread: [cedet-semantic] indentation engine for Emacs Ada mode; help getting started

cedet-semantic

[cedet-semantic] indentation engine for Emacs Ada mode; help getting started

From: Stephen L. <ste...@st...> - 2012-10-06 20:23:30

I'm rewriting the indentation engine for Emacs Ada mode. The trigger was
the recent sytax introduced by Ada 2012; it breaks the old indentation
engine.

I've got a smie-based engine working fairly well, but it has a major
problem; there are some keywords that can only be properly recognized by
doing a full parse from the start of the file.

So I'm looking at semantic, which could be a better parser for that
style.

First, is anyone aware of an indentation engine based on semantic? A
working example would be a big help.

For an indentation engine, I need to cache information for each found
token; mainly where the start of the enclosing statement is.

I'm having a hard time getting started; I can't find the main parsing
loop, to see what hooks are available for caching things.

I found the top level function 'bovinate'; when I run it on a python
buffer (for example), it outputs semantic information; that looks very
promising. However, I can't trace via edebug down into
`semantic-parse-region'; that is defined by define-overloadable-function. 

I don't see where python-mode defines an overload for that, and I also
don't see where the default is defined.

How do I find the local binding for an overloadable function? I tried
(mode-local-symbol 'semantic-parse-region) in the python buffer; that
returned nil.

-- 
-- Stephe

Re: [cedet-semantic] indentation engine for Emacs Ada mode; help getting started

From: Stephen L. <ste...@st...> - 2012-10-06 21:14:53

Stephen Leake <ste...@st...> writes:

> I found the top level function 'bovinate'; when I run it on a python
> buffer (for example), it outputs semantic information; that looks very
> promising. However, I can't trace via edebug down into
> `semantic-parse-region'; that is defined by define-overloadable-function. 
>
> I don't see where python-mode defines an overload for that, and I also
> don't see where the default is defined.

Well, I figured out this part; the default of an overloaded symbol is
<name>-default.

So I found the core loop for semantic-parse-region; I'll have to study
it some more.

Next question: how do I compile a wisent grammar file in Emacs 24.2?

I'd like to build a very simple grammar for a small subset of Ada, so I
can play with it interactively and see if I can do what I need for an
indentation engine.

In Emacs 23, there was a file '../etc/grammars/wisent-grammar.el' that
would read a .wy file and produce a -wy.el file. I don't see the
equivalent in Emacs 24.2.

Do I need to download something from sourceforge?

-- 
-- Stephe

Re: [cedet-semantic] indentation engine for Emacs Ada mode; help getting started

From: Stephen L. <ste...@st...> - 2012-10-08 23:20:00

Attachments: wisent-ada-simple.wy

David Engster <de...@ra...> writes:

> Stephen Leake writes:
>> David Engster <de...@ra...> writes:
>
>>> Also, IMO there are too many indentation issues that are more a matter
>>> of taste; just look at how many comment-styles the cc-mode indentation
>>> engine supports.
>>
>> Yes. That's not a problem in Ada; the Ada community is pretty uniform,
>> so a _small_ choice of styles is adequate. GPS (the AdaCore IDE) only
>> has about 4 settings for indentation!
>
> That must be a nice community to be in. ;-)

Yes, it is. It's also a great language to write code in. In only Google
appreciated that, writing code for Android would be so much easier :(.

>> Back to semantic; I have a very small grammar written, compiled, and
>> running. Here's the entire rules part:
>
> Could you please send the entire .wy file? That will it make it much
> easier for me to test.

Attached.

> I'm guessing you still need to add some boilerplate, for example you
> have to tell Semantic what kind of Lexer it should use for Ada
> buffers. 

There is an Analyzers section in the -wy.el file output by
semantic-grammar-batch-build-packages, and I see appropriate tokens when
I step thru bovinate.

> It's been a while since I've written a language mode myself, so I'm a
> bit rusty in that area. Have a look at the small examples like
> wisent-calc.el. 

Doesn't look significantly different to me, but I'm clearly missing
something. I'll try running that with bovinate, as well.

-- 
-- Stephe

Re: [cedet-semantic] indentation engine for Emacs Ada mode; help getting started

From: Stephen L. <ste...@st...> - 2012-10-09 05:00:46

Stephen Leake <ste...@st...> writes:

> David Engster <de...@ra...> writes:
>
>> It's been a while since I've written a language mode myself, so I'm a
>> bit rusty in that area. Have a look at the small examples like
>> wisent-calc.el.
>
> Doesn't look significantly different to me, but I'm clearly missing
> something. I'll try running that with bovinate, as well.

Did some experimenting. wisent-calc.el defines the main function
wisent-calc, which calls semantic/wisent/wisent.el wisent-parse to parse
the string. That code looks like the LALR parser I was expecting, and I
can see it executing the actions.

But that's plain wisent, not semantic on top of wisent.

I think I need to use semantic, eventually?

I managed to configure a buffer containing calc source (just "1 + 1;")
to do something when I call bovinate on it; it calls sematic.el
semantic-repeat-parse-whole-stream. But it puts no tags in the *Parser
Output* buffer. Stepping thru semantic-repeat-parse-whole-stream, I
don't see any lexer tokens. So I guess that lexer is not set properly
(which you suggested). 

wisent-calc calls (semantic-lex-init), which I did not have in my Ada
buffer init. Calling that in my "calc buffer" doesn't help.

semantic/semantic.el bovinate calls semantic-fetch-tags, which calls
semantic-parse-region, which is an overloadable function, which I still
have not figured out.

How do I find out what the overlaodable function actually calls? I can't
step into it in edebug, and (mode-local-symbol 'semantic-parse-region)
in an Ada buffer returns nil!

Which I guess means "run the default", which is
semantic-parse-region-default, which eventually runs
semantic-repeat-parse-whole-stream, which calls semantic-parse-stream
(another overloadable), which for Ada is semantic-parse-stream-default,
which is semantic-bovine.el semantic-bovinate-stream.

That code looks like a parser, although it's got some features I don't
understand, and it doesn't look like an LALR parser. It does have code
that seems to evaluate actions, but that never gets triggered on my Ada
code.

Is there some documentation I can read that explains how semantic is
supposed to work? It's clearly very flexible/customizable, but that also
makes it hard to understand.

I guess I can write my top level indent code to call wisent.el
wisent-parse, at least for now, so I can play with the grammar and the
indentation algorithm design.

Running semantic-grammar-batch-build-packages on wisent-calc.wy produces
the same wisent-calc-wy.el file that's in the cedet tarball. The main
difference (as far as I can tell) from my Ada file is the Epilogue, so
apparently I need to specify an Epilogue. But nothing in that file seems
to point to wisent.el wisent-parse; that's simply called directly by
wisent-calc.

--
-- Stephe

Re: [cedet-semantic] indentation engine for Emacs Ada mode; help getting started

From: David E. <de...@ra...> - 2012-10-18 15:03:06

Eric M. Ludlam writes:
>    Performance, as David pointed out, is a problem, which is why most 
> grammars are just tagging grammars.  The c++ parser is particularly slow 
> even though it is just a tagging parser, but that is because it uses the 
> 'bovine' parser.  The wisent parser is much better.  There are two java 
> parsers, one is a tagging parser, and the other parsers the whole file, 
> and is still quite  competitive with the tagging only version.

I must admit that even after all those years, the Semantic grammar
framework is confusing to me. This surely has to do with the fact that
I'm no computer science guy, so I never *properly* had to learn that
stuff. ;-)

Anyway, I think the difference between Bovine and Wisent should be
documented better in the docs. Here's what (I think) I know:

- Bovine is a LL parser. Wisent is a LALR parser.
- This entails:
  - Bovine is top-bottom, while Wisent is bottom-up
  - Bovine uses a predict/match algorithm, while Wisent uses shift/reduce

What really confuses me is the fact that the grammars fed into those
parsers look pretty much the same to me. Yes, there are some minor
issues, like Wisent having several %start rules and some changes in the
lexing part, but otherwise I don't see much difference.

Also, I learned that every LL grammar is a LR grammar (but not the other
way round). So I'm wondering: if Wisent is so much more efficient, can't
we somehow convert Bovine grammars to Wisent? I'm guessing we cannot,
but I'd like to know why. :-)

-David

Re: [cedet-semantic] indentation engine for Emacs Ada mode; help getting started

From: Stephen L. <ste...@st...> - 2012-10-06 22:46:24

Stephen Leake <ste...@st...> writes:

> Next question: how do I compile a wisent grammar file in Emacs 24.2?
>
> I'd like to build a very simple grammar for a small subset of Ada, so I
> can play with it interactively and see if I can do what I need for an
> indentation engine.
>
> In Emacs 23, there was a file '../etc/grammars/wisent-grammar.el' that
> would read a .wy file and produce a -wy.el file. I don't see the
> equivalent in Emacs 24.2.
>
> Do I need to download something from sourceforge?

So apparently the answer to that is "yes"; in cedet-1.1/semantic/wisent
there is a Makefile that compiles .wy files. Copied bits of that to my
Makefile, and now I have a file wisent-ada-simple-wy.el.

Which defines a function wisent-ada-simple-wy--install-parser, but I'm
having trouble getting that to run.

I think it may be a compatibility issue between cedet-1.1 and Emacs 24;
I'll try cedet-1.0.1 (tomorrow :)

-- 
-- Stephe

Re: [cedet-semantic] indentation engine for Emacs Ada mode; help getting started

From: David E. <de...@ra...> - 2012-10-07 13:43:42

Stephen Leake writes:
> Stephen Leake <ste...@st...> writes:
>
>> Next question: how do I compile a wisent grammar file in Emacs 24.2?
>>
>> I'd like to build a very simple grammar for a small subset of Ada, so I
>> can play with it interactively and see if I can do what I need for an
>> indentation engine.
>>
>> In Emacs 23, there was a file '../etc/grammars/wisent-grammar.el' that
>> would read a .wy file and produce a -wy.el file. I don't see the
>> equivalent in Emacs 24.2.
>>
>> Do I need to download something from sourceforge?
>
> So apparently the answer to that is "yes"; in cedet-1.1/semantic/wisent
> there is a Makefile that compiles .wy files. Copied bits of that to my
> Makefile, and now I have a file wisent-ada-simple-wy.el.
>
> Which defines a function wisent-ada-simple-wy--install-parser, but I'm
> having trouble getting that to run.
>
> I think it may be a compatibility issue between cedet-1.1 and Emacs 24;
> I'll try cedet-1.0.1 (tomorrow :)

I'm afraid you're going the wrong way. :-)

Emacs had the necessary grammar framework in admin/grammars since 23.4,
I think. Those packages were just upgraded in Emacs-bzr to first-class
citizens and moved to lisp/cedet/, so wisent-grammar-mode should kick in
as soon as you load a .wy file. You can then generate the parser by
hitting C-c C-c. If it doesn't work, please let me know (but make sure
you're using latest Emacs from bzr).

You can also use CEDET from bzr, but that shouldn't be necessary anymore
for doing grammar work.

Regarding using Wisent for indentation: no one has done that yet. In
fact, our parsers don't look at every line of code. Not even close; it
would just be way too slow. Instead, we are only parsing pretty large
semantical chunks like functions, classes, etc. and only look inside
them if we have to (for getting a list of local variables, for
instance).

On a meta-note: I've always been doubtful about using grammars for
indentation. Speed is just one problem; the real issue is that *while*
code is written, you are very often confronted with structures the
grammar considers illegal and hence cannot parse. Also, IMO there are
too many indentation issues that are more a matter of taste; just look
at how many comment-styles the cc-mode indentation engine supports.

Anyway, I'd love to be proven wrong (and maybe SMIE already does, I
haven't tried it yet). BTW, a good read on indentation are Steve Yegges
experiences while writing js2-mode:

http://steve-yegge.blogspot.de/2008/03/js2-mode-new-javascript-mode-for-emacs.html

-David

Re: [cedet-semantic] indentation engine for Emacs Ada mode; help getting started

From: Stephen L. <ste...@st...> - 2012-10-07 22:19:38

David Engster <de...@ra...> writes:

> Stephen Leake writes:
>>
>> I think it may be a compatibility issue between cedet-1.1 and Emacs 24;
>> I'll try cedet-1.0.1 (tomorrow :)
>
> I'm afraid you're going the wrong way. :-)
>
> Emacs had the necessary grammar framework in admin/grammars since 23.4,
> I think. Those packages were just upgraded in Emacs-bzr to first-class
> citizens and moved to lisp/cedet/, so wisent-grammar-mode should kick in
> as soon as you load a .wy file. You can then generate the parser by
> hitting C-c C-c. If it doesn't work, please let me know (but make sure
> you're using latest Emacs from bzr).
>
> You can also use CEDET from bzr, but that shouldn't be necessary anymore
> for doing grammar work.

Ok. I'd rather not mess with building from trunk (I have enough else to
do!). 

I managed to get things running with CEDET 1.1 and Emacs 24, by putting
the CEDET tree at the front of load-path. That will do for now; if this
project ever gets finished, I'll worry about merging to trunk.

> Regarding using Wisent for indentation: no one has done that yet. 

I was afraid so.

> In fact, our parsers don't look at every line of code. Not even close;
> it would just be way too slow. 

Do you have recent actual numbers? I'm always wary of premature
optimization in cases like this.

Which is why I'd like to get a semantic Ada parser working, so I can
measure it, and compare it to the equivalent SMIE parser.

At the same time, it's good to hear that the grammar does not have to be
exactly the same as the Ada grammar; I can leave out things that don't
affect indentation; I'm also doing that with SMIE, and it simplifies
things a lot.

> On a meta-note: I've always been doubtful about using grammars for
> indentation. 

I've been maintaining Emacs Ada mode for a while now, and every time
someone reports a bug or wish for the indentation engine, I get the urge
to rewrite it using a grammar-based approach :). So I'm finally giving
in to that.

> Speed is just one problem; the real issue is that *while* code is
> written, you are very often confronted with structures the grammar
> considers illegal and hence cannot parse. 

I'll have to include partially written code in my tests. My philosophy
here is that as long as the indentation engine doesn't crash or hang,
the bad indentation reminds you that the code is not yet legal; I find
that helpful.

In addition, one advantage of the SMIE parser is that it is mostly
local, so illegal code far away tends not to affect it.

And, as I said above, I simply leave out large parts of the grammar,
that don't affect indentation. So it just doesn't see illegalities in
those part of the grammar.

So far, I'm much happier with the SMIE approach than the former ad-hoc
approach! 

Not sure about semantic yet; I haven't got a parser working (more
below).

> Also, IMO there are too many indentation issues that are more a matter
> of taste; just look at how many comment-styles the cc-mode indentation
> engine supports.

Yes. That's not a problem in Ada; the Ada community is pretty uniform,
so a _small_ choice of styles is adequate. GPS (the AdaCore IDE) only
has about 4 settings for indentation!

> Anyway, I'd love to be proven wrong (and maybe SMIE already does, I
> haven't tried it yet). BTW, a good read on indentation are Steve Yegges
> experiences while writing js2-mode:
>
> http://steve-yegge.blogspot.de/2008/03/js2-mode-new-javascript-mode-for-emacs.html

That is interesting. He implies that most of the grammar is useless,
which I agree with. I suspect that a grammar approach to Ada is more
fruitful, because Ada is a much more structured language. SMIE does take
advantage of parse-partial-sexp when it can.

And the actual trigger for me to give semantic a serious try was the
post from Stefan Monnier pointing out that I was using the SMIE parser
to do a full parse from the beginning of the buffer, so I might as well
use an LALR parser. Which is semantic. If it works, it will reduce the
amount of ad-hoc code in my SMIE ada-indent engine (it does seem very
hard to avoid ad-hoc code!).

Back to semantic; I have a very small grammar written, compiled, and
running. Here's the entire rules part:

package_body
  : PACKAGE BODY name IS declarative_part BEGIN statement END name SEMICOLON
    (TAG $3 'block)
  ;

name
  : IDENTIFIER
  | name DOT IDENTIFIER
    (TAG $1 'name)
  ;

declarative_part
  : subprogram_body
  ;

subprogram_body
  : FUNCTION name RETURN name IS BEGIN statement END name SEMICOLON
    (TAG $2 'block)
  ;

statement
  : RETURN NUMERIC_LITERAL SEMICOLON
    (TAG $2 'return)
  ;

And the test text that it parses:

package body Ada_Mode.Nominal is

   function Function_1 return Integer
   is begin
      return 1;
   end Function_1;

begin
   Function_1;
end Ada_Mode.Nominal;

I'm running "bovinate" to test this. But I never see any tags in the
bovinate output. 

I've stepped thru semantic-repeat-parse-whole-stream, and the 'tag'
variable is always nil.

I'm expecting to see 'name, 'block, 'return when the corresponding
non-terminals are reduced. Is that what I should expect?

Any hints?

-- 
-- Stephe

Re: [cedet-semantic] indentation engine for Emacs Ada mode; help getting started

From: David E. <de...@ra...> - 2012-10-08 15:18:47

Stephen Leake writes:
> David Engster <de...@ra...> writes:
>> In fact, our parsers don't look at every line of code. Not even close;
>> it would just be way too slow. 
>
> Do you have recent actual numbers? I'm always wary of premature
> optimization in cases like this.

Well, let's say it this way: people often complain about CEDET being too
slow. Parsing a big, complicated C++ file can easily take seconds. But
that's C++, where especially the pre-processor can make things rather
difficult.

>> Speed is just one problem; the real issue is that *while* code is
>> written, you are very often confronted with structures the grammar
>> considers illegal and hence cannot parse. 
>
> I'll have to include partially written code in my tests. My philosophy
> here is that as long as the indentation engine doesn't crash or hang,
> the bad indentation reminds you that the code is not yet legal; I find
> that helpful.

Sounds reasonable. Maybe this is not a big issue for indentation; for
smart completions or stuff like Flymake this is a pretty big problem.

>> Also, IMO there are too many indentation issues that are more a matter
>> of taste; just look at how many comment-styles the cc-mode indentation
>> engine supports.
>
> Yes. That's not a problem in Ada; the Ada community is pretty uniform,
> so a _small_ choice of styles is adequate. GPS (the AdaCore IDE) only
> has about 4 settings for indentation!

That must be a nice community to be in. ;-)

> Back to semantic; I have a very small grammar written, compiled, and
> running. Here's the entire rules part:

Could you please send the entire .wy file? That will it make it much
easier for me to test.

> I'm running "bovinate" to test this. But I never see any tags in the
> bovinate output. 
>
> I've stepped thru semantic-repeat-parse-whole-stream, and the 'tag'
> variable is always nil.
>
> I'm expecting to see 'name, 'block, 'return when the corresponding
> non-terminals are reduced. Is that what I should expect?

I'm guessing you still need to add some boilerplate, for example you
have to tell Semantic what kind of Lexer it should use for Ada
buffers. It's been a while since I've written a language mode myself, so
I'm a bit rusty in that area. Have a look at the small examples like
wisent-calc.el. If you send me the complete .wy file, I can take a shot
at getting it work.

-David

Re: [cedet-semantic] indentation engine for Emacs Ada mode; help getting started

From: David E. <de...@ra...> - 2012-10-09 20:59:56

Stephen Leake writes:
>> I'm guessing you still need to add some boilerplate, for example you
>> have to tell Semantic what kind of Lexer it should use for Ada
>> buffers. 
>
> There is an Analyzers section in the -wy.el file output by
> semantic-grammar-batch-build-packages, and I see appropriate tokens when
> I step thru bovinate.

I didn't have much time this evening, but I attached a wisent-ada.el
file which contains some boilerplate to hopefully get you started. I
think most of your current problems stem from the lexing part. As you
can see in the generated parser, there are lexers for keywords, symbols,
etc. defined for you, but you still have to put those in the right
positions in the definition of the lexer (for example, do not put the
symbol lexer before the keyword lexer). This is done in the attached
wisent-ada.el file (see the `define-lex' command). BTW, I know nothing
about Ada, so this lexer surely isn't the right thing yet.

You can test the lexer by going to the beginning of the buffer and then
call `semantic-lex-test'. It should output

((PACKAGE 1 . 8)
 (BODY 9 . 13)
 (IDENTIFIER 14 . 22)
 (DOT 22 . 23)
 (IDENTIFIER 23 . 30)
 (IS 31 . 33)
 (FUNCTION 38 . 46)
 ....

and so on. Note that it has to output IDENTIFIER instead of just
'symbol', otherwise it doesn't work.

I'm still not getting any tags with your grammar because some rule does
not properly match. Especially at the beginning, writing Semantic
grammars can be pretty tedious. Unfortunately, Wisent does not have a
step debugger (Bovine has one), hence a good strategy is to start with
very, *very* simple rules. For example, just start with

compilation_unit
  : package_body
  ;

package_body
  : PACKAGE BODY IDENTIFIER
    (TAG $3 'block)
  ;

and try to parse "package body foobar". This at least works for me:

(("foobar" block nil
  (:filename "/home/void/test.ada")
  #<overlay from 1 to 20 in test.ada>))

Then slowly make things more complicated. If you get stuck, remember
that you can always call regular Elisp functions from the grammar. For
instance, in the above example, you can write

package_body
  : PACKAGE BODY IDENTIFIER
    (debug $3 'block)
  ;

If you don't get a backtrace, then obviously the rule does not match and
you have to investigate why. If you get a backtrace, you can then look
at the argument list and see what '$3' actually is.

So much for now; I hope that this helps you a bit.

-David

Re: [cedet-semantic] indentation engine for Emacs Ada mode; help getting started

From: David E. <de...@ra...> - 2012-10-09 21:01:44

Attachments: wisent-ada.el

David Engster writes:
> but I attached a wisent-ada.el file

No I did not. Classic. :-)

-David

Re: [cedet-semantic] indentation engine for Emacs Ada mode; help getting started

From: Stephen L. <ste...@st...> - 2012-10-12 09:05:04

David Engster <de...@ra...> writes:

> Stephen Leake writes:
>>> I'm guessing you still need to add some boilerplate, for example you
>>> have to tell Semantic what kind of Lexer it should use for Ada
>>> buffers. 
>>
>> There is an Analyzers section in the -wy.el file output by
>> semantic-grammar-batch-build-packages, and I see appropriate tokens when
>> I step thru bovinate.
>
> I didn't have much time this evening, but I attached a wisent-ada.el
> file which contains some boilerplate to hopefully get you started. 

<snip>

Thanks. I haven't had a chance to look at this yet; I went down another
path (see my other post today).


-- 
-- Stephe

Re: [cedet-semantic] indentation engine for Emacs Ada mode; help getting started

From: Eric M. L. <eri...@gm...> - 2012-10-12 01:42:17

Hi Stephen & David

   Sorry about being slow to reply.  Punkin Chunkin season is upon me, 
and keeps me distracted.

   David is right about using semantic for indentation.  There are no 
examples, and no good patterns to follow.  I've thought about this 
problem on and off for a long time but I've been too focused on getting 
the rest of the system working well to try it out.

   Performance, as David pointed out, is a problem, which is why most 
grammars are just tagging grammars.  The c++ parser is particularly slow 
even though it is just a tagging parser, but that is because it uses the 
'bovine' parser.  The wisent parser is much better.  There are two java 
parsers, one is a tagging parser, and the other parsers the whole file, 
and is still quite  competitive with the tagging only version.

   My conclusion has been that an indentation parser would start with a 
tagging parser, exactly as Stephen is starting out.  Step one, make that 
work.  Once that works, you can create additional goals for function 
bodies, and use the partial reparse engine, or something like it to 
start at the beginning of your function, block, or other solid anchor 
point, and parse up to the cursor.  Wherever it stops, you would then 
determine indentation from that.  How you get that data when it stops, I 
have no idea, but that's been my guess so far, and might not be too bad 
performance wise.

   The attachment with the example code didn't extract for me so I 
didn't look at it.  The most important bit to get your parser running is 
the setup function.  For example, wisent-java-default-setup is in the 
java-mode-hook.  That function installs the parser and sets up lots of 
customizations needed for the java language.  That function can also be 
quite simple, such as wisent-dot-setup-parser in cogre/wisent-dot.el.

   After re-creating you grammar, you often have to eval the newly 
created variables in the -wy.el file by hand w/ C-M-x because they are 
defvar, and just reloading the file won't work.  It is a pain, but it 
will force things to work.

   You can use `semantic-describe-buffer' to make sure key overrides 
have been set in your buffer.

   Use semantic-lex-debug to make sure the lexer is creating the tokens 
your grammar expects.

   Use wisent-debug-on-entry in a rule to make sure your actions are 
being called when you expect.  Sticking calls to (message ...) is also 
an easy way to make sure your actions are called.

Good Luck
Eric

Re: [cedet-semantic] indentation engine for Emacs Ada mode; help getting started

From: Stephen L. <ste...@st...> - 2012-10-12 09:02:48

"Eric M. Ludlam" <eri...@gm...> writes:

>   Sorry about being slow to reply.  Punkin Chunkin season is upon me,
> and keeps me distracted.

<snip>

Thanks for the advice. I must admit this project has me a little
obsessed; my real work is suffering :)

I'm currently experimenting with using the Wisent parser, and a lexer
built directly on the Emacs C scanning functions forward-comment and
skip-syntax-forward. That's the lexer SMIE uses, and it's quite fast,
and certainly adequate for the task - everything is either a keyword or
an identifier, I don't need any more detailed info than that for
indentation. I don't see any fundamental reason why an LALR parser
should be slower than an operator precedence parser, or at least, not
enough slower to matter. In both cases, I end up parsing large portions
of the text (whole functions, anyway), in order to compute indentation,
and the SMIE parser is more than fast enough.

I'm using the wisent parser because I understand it, and because it
seemed easier to bypass the semantic lexer that way. Is there any
documentation on the semantic parser? 

It works quite nicely for a very simple subset of the Ada syntax. But
when I try to expand it, the wisent grammar compiler complains that all
the rules and nonterminals are useless. Apparently it sees some conflict
I don't see. So I wrote the same grammar in OpenToken
(http://stephe-leake.org/ada/opentoken.html, an LALR parser generator
written in Ada that I maintain), because it has lots of nice diagnostics
to help debug grammar issues, and it does not complain, but generates a
parser that works. So now I'm puzzled.

There is a strong motivation to use an LALR parser for indentation; I
can encode most of the information needed for indenting directly in the
grammar actions. For example:

package_body
  : PACKAGE BODY name IS declarations BEGIN statements END name SEMICOLON
    (wisi-cache-keywords
     '($1 'other nil $region1)
     '($4 'block-start nil $region4)
     '($6 'block-both nil $region6)
     '($8 'block-end nil $region8)
     '($10 'statement-end nil $region10))
  ;

In SMIE, that same information is embedded in ad-hoc code to "refine"
the tokens.

So the actual indentation engine code is quite simple; _much_ simpler,
and _much_ less language-specific, than in SMIE. So the final system
will be easier to maintain (because it is more closely related to the
language grammar), and _much_ easier to port to a new language.

So I'll keep working at it. I may try writing Ada code to dump the
OpenToken parser table as a wisent lisp parser table. That will be
easier for me than figuring out what's going on in the Wisent grammar
compiler. 

-- 
-- Stephe

Re: [cedet-semantic] indentation engine for Emacs Ada mode; help getting started

From: Eric L. <eri...@gm...> - 2012-10-19 03:02:21

  On 10/18/2012 11:02 AM, David Engster wrote:
> Eric M. Ludlam writes:
>>     Performance, as David pointed out, is a problem, which is why most
>> grammars are just tagging grammars.  The c++ parser is particularly slow
>> even though it is just a tagging parser, but that is because it uses the
>> 'bovine' parser.  The wisent parser is much better.  There are two java
>> parsers, one is a tagging parser, and the other parsers the whole file,
>> and is still quite  competitive with the tagging only version.
> I must admit that even after all those years, the Semantic grammar
> framework is confusing to me. This surely has to do with the fact that
> I'm no computer science guy, so I never *properly* had to learn that
> stuff. ;-)
>
> Anyway, I think the difference between Bovine and Wisent should be
> documented better in the docs. Here's what (I think) I know:
>
> - Bovine is a LL parser. Wisent is a LALR parser.
> - This entails:
>    - Bovine is top-bottom, while Wisent is bottom-up
>    - Bovine uses a predict/match algorithm, while Wisent uses shift/reduce
>
> What really confuses me is the fact that the grammars fed into those
> parsers look pretty much the same to me. Yes, there are some minor
> issues, like Wisent having several %start rules and some changes in the
> lexing part, but otherwise I don't see much difference.
>
> Also, I learned that every LL grammar is a LR grammar (but not the other
> way round). So I'm wondering: if Wisent is so much more efficient, can't
> we somehow convert Bovine grammars to Wisent? I'm guessing we cannot,
> but I'd like to know why. :-)
>

It hadn't occurred to me to try to port bovine grammars to wisent.  
Early on, that would have been very challenging, but when David Ponce 
merged the two language types together, it should be much easier now.

That would be a big win to do that, since one whole style of parser 
generator could be eliminated.  Sadly, I'd have to relearn all the 
specifics again to explain the differences.  I seem to remember one 
difference was related to how the actions and tag quoting worked, but I 
don't really remember.

Eric

Re: [cedet-semantic] indentation engine for Emacs Ada mode; help getting started

From: David E. <de...@ra...> - 2012-10-19 06:26:52

Eric Ludlam writes:
> It hadn't occurred to me to try to port bovine grammars to wisent.
> Early on, that would have been very challenging, but when David Ponce
> merged the two language types together, it should be much easier now.
>
> That would be a big win to do that, since one whole style of parser
> generator could be eliminated.  Sadly, I'd have to relearn all the
> specifics again to explain the differences.  I seem to remember one
> difference was related to how the actions and tag quoting worked, but
> I don't really remember.

Yes, I saw some differences there, but I think - while tedious - it's
not difficult to convert that. One could probably write a converter
which does of most of that stuff.

I also thing that would be a big improvement, since we could drop Bovine
and would probably gain some speed in the parsing department. Maybe I'll
just try it with my Fortran90 tagging parser and see how difficult it
actually is. OTOH, there are currently more urgent things to do (but
much less interesting)...

-David

Re: [cedet-semantic] indentation engine for Emacs Ada mode; help getting started

From: David E. <de...@ra...> - 2012-10-21 20:14:26

David Engster writes:
> Eric Ludlam writes:
>> It hadn't occurred to me to try to port bovine grammars to wisent.
>> Early on, that would have been very challenging, but when David Ponce
>> merged the two language types together, it should be much easier now.
>>
>> That would be a big win to do that, since one whole style of parser
>> generator could be eliminated.  Sadly, I'd have to relearn all the
>> specifics again to explain the differences.  I seem to remember one
>> difference was related to how the actions and tag quoting worked, but
>> I don't really remember.
>
> Yes, I saw some differences there, but I think - while tedious - it's
> not difficult to convert that. One could probably write a converter
> which does of most of that stuff.

I looked into it a bit, and it is definitely doable. Most of the
differences I've seen so far are in the lexing part and in how lambda
expressions have to be quoted.

The F90 tagging parser worked pretty much after 30 minutes. I then
turned to the C grammar, and here things don't go as smoothly. The main
problem is that it is nigh impossible to debug the actual parser. Even
if we had a stepping debugger for Wisent, it actually wouldn't help
much. I mean, I *can* output the shifts and reduces the parser is doing,
and it also outputs what token it would expect next to match "some
rule", but it does not say which. All it has are these magic tables
which tell the parser what to do next; it's just so removed from the
actual grammar. I can see now why people are still fond of LL parsing,
it's way easier to step through the rules. Anyway, I'll try a bit
more. I have a hunch it's a just little thing that's not working yet.

-David

Re: [cedet-semantic] indentation engine for Emacs Ada mode; help getting started

From: David E. <de...@ra...> - 2012-10-21 21:24:36

David Engster writes:
> The F90 tagging parser worked pretty much after 30 minutes. I then
> turned to the C grammar, and here things don't go as smoothly.

What is really strange is that my c.wy produces oodles of shift/reduce
and reduce/reduce conflicts. So I guess there still must be something
I'm missing, but I cannot figure out why I get these ambiguities in the
grammar.

-David

Re: [cedet-semantic] indentation engine for Emacs Ada mode; help getting started

From: David E. <de...@ra...> - 2012-10-22 18:02:24

David Engster writes:
> David Engster writes:
>> The F90 tagging parser worked pretty much after 30 minutes. I then
>> turned to the C grammar, and here things don't go as smoothly.
>
> What is really strange is that my c.wy produces oodles of shift/reduce
> and reduce/reduce conflicts. So I guess there still must be something
> I'm missing, but I cannot figure out why I get these ambiguities in the
> grammar.

I removed the lambda expressions and send the grammar through Bison to
get a more verbose output. Indeed, the grammar has several ambiguities.
I think I could remove the reduce/reduce conflicts, but then there are
also a few dozen(!) shift/reduce errors... so I'd rather not go further
with this. It's a shame, I'd really like to now how much faster a LALR
parser would be. Also, I'm wondering why we even get a working LL parser
from that grammar.

-David