[cedet-semantic] Wisent vs. Bovinate; -flex vs. -lex

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Hi all,

I've been trying to get a semantic-based parser to work for the PVS
language for some time now.  As I watch work on CEDET-related project
charge forward, and as often Wisent is suggested for new developers
and grammars, I decided to give it a whirl.

I find the state of the documentation and source quite confusing at
times (most of the time).  This has led me to a partial understanding
of how semantic and wisent fit together.

My state at the moment:

o I have a Wisent-based grammar for PVS written.

  It has basic lambda actions written for most important rules.

  It has what I consider a reasonable set of conflicts (i.e., walking
  the state transition table by-hand through the rules with conflicts
  seems to do the right thing).

o I have tested semantic-flex to determine that I am getting the
  correct token stream from the semantic (f)lexer (*).  This stream
  looks fine; all keywords are returned as such, punctuation is good,
  etc.

o I have started to debug rules in a bottom-up fashion.

  I set a single %start rule, "testrule".  testrule refers to a single
  'leaf' rule that I wish to test.  Some of these leaf rules work
  fine, when testing with semantic-show-unmatched-syntax-mode, some
  partially work, and some do not work at all.

  I believe my current problem must come down to the scanning (**)
  process (conversion of semantic-flex results to a wisent-based
  form).

o I have defined multi-character <punctuation> tokens as nearly all
  of the example wisent grammars do.  E.g.,

    %token <punctuation> COLON_EQ       ":="

  I presume that the wisent-flex is supposed to properly convert
  semantic-flex single-character punctuation sequences into
  multi-character punctuation tokens for wisent?

  How can I test this?  Perhaps something equivalent to 
    (setq lex-result (semantic-flex (point-min) (point-max) 0 500))

o If I edebug trace the behavior of bovinate, wisent-parse-region (!)
  is called rather than wisent-parse-stream.  Compiling my grammar
  results in a *-install-parser function that uses
  semantic-install-function-overrides to install a stream-based
  parser, like all of the included wisent grammars.

  Why is the compiler generating a stream-based parser?  When I run
  bovinate it tries to use the region-based parser (which is,
  according to the docs, mutually exclusive), and thus the parser
  falls back to the default wisent parser.

o Running bovinate always results in nil.

Here is an example of a leaf rule that works fine: 

formulaname : AXIOM | CHALLENGE | CLAIM | CONJECTURE | COROLLARY
            | FACT | FORMULA | LAW | LEMMA | OBLIGATION
            | POSTULATE | PROPOSITION | SUBLEMMA | THEOREM ;

Note that consists entirely of keywords (symbols).

Here is a rule that partially works:

unaryop : NOT | TILDE | LBRACK_RBRACK | LT_GT | UMINUS ;

The "NOT" works, but the other options do not.  TILDE is the
single-character punctuation '~', LBRACK_RBRACK is defined as the
double character punctuation "[]", but will (also?) be matched as a
semantic-list as I'm parsing with a scanner (!) depth of 0, LT_GT is
the double character punctuation "<>", and UMINUS is <punctuation>
"-*-" and is meant to be used as an illegal token sequence to provide
a placeholder for unary minus's precedence and nonassociativity.

Obviously, if I cannot get even the most simple rule to work, I have
no hope in debugging the full grammar.

The current grammar is available at 
              <http://kind.cs.kun.nl/~kiniry/tmp/pvs.wy>
if someone has interest, expertise, and generosity.

Thanks!
Joe Kiniry
-- 
Joseph R. Kiniry                    ID 78860581      ICQ 4344804 
SOS Group, University of Nijmegen  http://www.cs.kun.nl/~kiniry/
KindSoftware, LLC                   http://www.kindsoftware.com/
Board Chair: NICE                    http://www.eiffel-nice.org/

--

(*) One primary source of confusion is the hugely overloaded set of
terms used for these products.  -lex- vs. -flex-, "semantic"
vs. "bovinate" etc.

(**) This term is used to avoid this whole lex vs. flex craziness.