[cedet-semantic] Wisent vs. Bovinate; -flex vs. -lex
Brought to you by:
zappo
|
From: Joseph K. <ki...@ac...> - 2003-10-13 13:21:07
|
Hi all,
I've been trying to get a semantic-based parser to work for the PVS
language for some time now. As I watch work on CEDET-related project
charge forward, and as often Wisent is suggested for new developers
and grammars, I decided to give it a whirl.
I find the state of the documentation and source quite confusing at
times (most of the time). This has led me to a partial understanding
of how semantic and wisent fit together.
My state at the moment:
o I have a Wisent-based grammar for PVS written.
It has basic lambda actions written for most important rules.
It has what I consider a reasonable set of conflicts (i.e., walking
the state transition table by-hand through the rules with conflicts
seems to do the right thing).
o I have tested semantic-flex to determine that I am getting the
correct token stream from the semantic (f)lexer (*). This stream
looks fine; all keywords are returned as such, punctuation is good,
etc.
o I have started to debug rules in a bottom-up fashion.
I set a single %start rule, "testrule". testrule refers to a single
'leaf' rule that I wish to test. Some of these leaf rules work
fine, when testing with semantic-show-unmatched-syntax-mode, some
partially work, and some do not work at all.
I believe my current problem must come down to the scanning (**)
process (conversion of semantic-flex results to a wisent-based
form).
o I have defined multi-character <punctuation> tokens as nearly all
of the example wisent grammars do. E.g.,
%token <punctuation> COLON_EQ ":="
I presume that the wisent-flex is supposed to properly convert
semantic-flex single-character punctuation sequences into
multi-character punctuation tokens for wisent?
How can I test this? Perhaps something equivalent to
(setq lex-result (semantic-flex (point-min) (point-max) 0 500))
o If I edebug trace the behavior of bovinate, wisent-parse-region (!)
is called rather than wisent-parse-stream. Compiling my grammar
results in a *-install-parser function that uses
semantic-install-function-overrides to install a stream-based
parser, like all of the included wisent grammars.
Why is the compiler generating a stream-based parser? When I run
bovinate it tries to use the region-based parser (which is,
according to the docs, mutually exclusive), and thus the parser
falls back to the default wisent parser.
o Running bovinate always results in nil.
Here is an example of a leaf rule that works fine:
formulaname : AXIOM | CHALLENGE | CLAIM | CONJECTURE | COROLLARY
| FACT | FORMULA | LAW | LEMMA | OBLIGATION
| POSTULATE | PROPOSITION | SUBLEMMA | THEOREM ;
Note that consists entirely of keywords (symbols).
Here is a rule that partially works:
unaryop : NOT | TILDE | LBRACK_RBRACK | LT_GT | UMINUS ;
The "NOT" works, but the other options do not. TILDE is the
single-character punctuation '~', LBRACK_RBRACK is defined as the
double character punctuation "[]", but will (also?) be matched as a
semantic-list as I'm parsing with a scanner (!) depth of 0, LT_GT is
the double character punctuation "<>", and UMINUS is <punctuation>
"-*-" and is meant to be used as an illegal token sequence to provide
a placeholder for unary minus's precedence and nonassociativity.
Obviously, if I cannot get even the most simple rule to work, I have
no hope in debugging the full grammar.
The current grammar is available at
<http://kind.cs.kun.nl/~kiniry/tmp/pvs.wy>
if someone has interest, expertise, and generosity.
Thanks!
Joe Kiniry
--
Joseph R. Kiniry ID 78860581 ICQ 4344804
SOS Group, University of Nijmegen http://www.cs.kun.nl/~kiniry/
KindSoftware, LLC http://www.kindsoftware.com/
Board Chair: NICE http://www.eiffel-nice.org/
--
(*) One primary source of confusion is the hugely overloaded set of
terms used for these products. -lex- vs. -flex-, "semantic"
vs. "bovinate" etc.
(**) This term is used to avoid this whole lex vs. flex craziness.
|