Re: [cedet-semantic] indentation engine for Emacs Ada mode; help getting started

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

David Engster <de...@ra...> writes:

> Stephen Leake writes:
>>
>> I think it may be a compatibility issue between cedet-1.1 and Emacs 24;
>> I'll try cedet-1.0.1 (tomorrow :)
>
> I'm afraid you're going the wrong way. :-)
>
> Emacs had the necessary grammar framework in admin/grammars since 23.4,
> I think. Those packages were just upgraded in Emacs-bzr to first-class
> citizens and moved to lisp/cedet/, so wisent-grammar-mode should kick in
> as soon as you load a .wy file. You can then generate the parser by
> hitting C-c C-c. If it doesn't work, please let me know (but make sure
> you're using latest Emacs from bzr).
>
> You can also use CEDET from bzr, but that shouldn't be necessary anymore
> for doing grammar work.

Ok. I'd rather not mess with building from trunk (I have enough else to
do!). 

I managed to get things running with CEDET 1.1 and Emacs 24, by putting
the CEDET tree at the front of load-path. That will do for now; if this
project ever gets finished, I'll worry about merging to trunk.

> Regarding using Wisent for indentation: no one has done that yet. 

I was afraid so.

> In fact, our parsers don't look at every line of code. Not even close;
> it would just be way too slow. 

Do you have recent actual numbers? I'm always wary of premature
optimization in cases like this.

Which is why I'd like to get a semantic Ada parser working, so I can
measure it, and compare it to the equivalent SMIE parser.

At the same time, it's good to hear that the grammar does not have to be
exactly the same as the Ada grammar; I can leave out things that don't
affect indentation; I'm also doing that with SMIE, and it simplifies
things a lot.

> On a meta-note: I've always been doubtful about using grammars for
> indentation. 

I've been maintaining Emacs Ada mode for a while now, and every time
someone reports a bug or wish for the indentation engine, I get the urge
to rewrite it using a grammar-based approach :). So I'm finally giving
in to that.

> Speed is just one problem; the real issue is that *while* code is
> written, you are very often confronted with structures the grammar
> considers illegal and hence cannot parse. 

I'll have to include partially written code in my tests. My philosophy
here is that as long as the indentation engine doesn't crash or hang,
the bad indentation reminds you that the code is not yet legal; I find
that helpful.

In addition, one advantage of the SMIE parser is that it is mostly
local, so illegal code far away tends not to affect it.

And, as I said above, I simply leave out large parts of the grammar,
that don't affect indentation. So it just doesn't see illegalities in
those part of the grammar.

So far, I'm much happier with the SMIE approach than the former ad-hoc
approach! 

Not sure about semantic yet; I haven't got a parser working (more
below).

> Also, IMO there are too many indentation issues that are more a matter
> of taste; just look at how many comment-styles the cc-mode indentation
> engine supports.

Yes. That's not a problem in Ada; the Ada community is pretty uniform,
so a _small_ choice of styles is adequate. GPS (the AdaCore IDE) only
has about 4 settings for indentation!

> Anyway, I'd love to be proven wrong (and maybe SMIE already does, I
> haven't tried it yet). BTW, a good read on indentation are Steve Yegges
> experiences while writing js2-mode:
>
> http://steve-yegge.blogspot.de/2008/03/js2-mode-new-javascript-mode-for-emacs.html

That is interesting. He implies that most of the grammar is useless,
which I agree with. I suspect that a grammar approach to Ada is more
fruitful, because Ada is a much more structured language. SMIE does take
advantage of parse-partial-sexp when it can.

And the actual trigger for me to give semantic a serious try was the
post from Stefan Monnier pointing out that I was using the SMIE parser
to do a full parse from the beginning of the buffer, so I might as well
use an LALR parser. Which is semantic. If it works, it will reduce the
amount of ad-hoc code in my SMIE ada-indent engine (it does seem very
hard to avoid ad-hoc code!).

Back to semantic; I have a very small grammar written, compiled, and
running. Here's the entire rules part:

package_body
  : PACKAGE BODY name IS declarative_part BEGIN statement END name SEMICOLON
    (TAG $3 'block)
  ;

name
  : IDENTIFIER
  | name DOT IDENTIFIER
    (TAG $1 'name)
  ;

declarative_part
  : subprogram_body
  ;

subprogram_body
  : FUNCTION name RETURN name IS BEGIN statement END name SEMICOLON
    (TAG $2 'block)
  ;

statement
  : RETURN NUMERIC_LITERAL SEMICOLON
    (TAG $2 'return)
  ;

And the test text that it parses:

package body Ada_Mode.Nominal is

   function Function_1 return Integer
   is begin
      return 1;
   end Function_1;

begin
   Function_1;
end Ada_Mode.Nominal;

I'm running "bovinate" to test this. But I never see any tags in the
bovinate output. 

I've stepped thru semantic-repeat-parse-whole-stream, and the 'tag'
variable is always nil.

I'm expecting to see 'name, 'block, 'return when the corresponding
non-terminals are reduced. Is that what I should expect?

Any hints?

-- 
-- Stephe