Thread: [cedet-semantic] Avoiding redundant rules (EXPANDFULL-related).

Brought to you by: zappo

cedet-semantic

[cedet-semantic] Avoiding redundant rules (EXPANDFULL-related).

From: Joseph K. <ki...@ac...> - 2004-02-04 12:32:47

Hi Cedet/Semantic/Wisent folks,

I have the following rule for parsing PVS typed identifiers and
operators.

typedids : idops colon_typeexpr_opt bar_expr_opt
           (mapcar (function (lambda (idop)
                     (if $3
                         (VARIABLE-TAG idop (car $2) nil 'predicate $3)
                         (VARIABLE-TAG idop (car $2) nil))))
                     $1)
         ;

I have another rule called typedid corresponding to the single
identifier/operator case.

Both of these rules perform as I expect.

A (full) example of such an expression is

  a, b, c, -, ~ : int | n >= 0

Unfortunately, typed identifiers can also occur in parentheses.

My lex depth is zero, thus I have a rule of the form:

binding : typedid
        | PAREN_BLOCK
          ***
        ;

I don't know what to put where the *** is.  While I could put
something like

          (EXPANDFULL $1 some_typedids-expandfull)

where the rule some_typedids-expandfull is going to be semantically
equivalent to typedids, but will have to be written in the
"expandfull" style.  I wish to avoid this redundancy.

I have tried something like

  (EXPAND $1 parenthesized-typedids)

parenthesized-typedids : LPAREN typedids RPAREN
                         (identity $2)
                       ;

but, for some reason, with this expression,
semantic-show-unmatched-syntax-mode shows no unmatched characters in
my test buffer, but bovinate returns only *unparenthesized* typedids
when binding is the top-most rule.

Is this a standard pattern for wisent parsers?  What is the suggested
structuring solution?

Thanks,
Joe

P.S. Any comments on my previous post?
 http://sourceforge.net/mailarchive/message.php?msg_id=7112833

-- 
Joseph R. Kiniry                    ID 78860581      ICQ 4344804 
SOS Group, University of Nijmegen  http://www.cs.kun.nl/~kiniry/
KindSoftware, LLC                   http://www.kindsoftware.com/
Board Chair: NICE                    http://www.eiffel-nice.org/

Re[1]: [cedet-semantic] Avoiding redundant rules (EXPANDFULL-related).

From: Eric M. L. <er...@si...> - 2004-02-04 13:52:13

Howdy!

>>> Joseph Kiniry <ki...@ac...> seems to think that:
>Hi Cedet/Semantic/Wisent folks,
>
>I have the following rule for parsing PVS typed identifiers and
>operators.
>
>typedids : idops colon_typeexpr_opt bar_expr_opt
>           (mapcar (function (lambda (idop)
>                     (if $3
>                         (VARIABLE-TAG idop (car $2) nil 'predicate $3)
>                         (VARIABLE-TAG idop (car $2) nil))))
>                     $1)
>         ;
>
>I have another rule called typedid corresponding to the single
>identifier/operator case.
>
>Both of these rules perform as I expect.
>
>A (full) example of such an expression is
>
>  a, b, c, -, ~ : int | n >= 0

Hmmm, let me try and translate:

  VARNAME1, VARNAME2 : DATATYPE | EXPRESSION

Is that what I'm looking at?

>Unfortunately, typed identifiers can also occur in parentheses.

as such?

  ( VARNAME1, VARNAME2 : DATATYPE | EXPRESSION )

A handy trick for these situation would be to make a tag like this:

(VARIABLE-TAG list-o-names-here (car $2) ...)

so the representation might be:

(("a" "b" "c") 'variable (:type "int))

and then write a function for the variable
semantic-tag-expand-function.  In this function you might write a
function that turns your one compounded tag into three little tags
overlapping the same area.

This used to be the only way to do what you are doing with mapc above.
I have not examined closely the possibility on which style is better
so you can use your judgment.

>My lex depth is zero, thus I have a rule of the form:
>
>binding : typedid
>        | PAREN_BLOCK
>          ***
>        ;
>
>I don't know what to put where the *** is.  While I could put
>something like
>
>          (EXPANDFULL $1 some_typedids-expandfull)

EXPAND, and EXPANDFULL have two different purposes.

Use EXPAND if you want to require a perfect match on rules within the
block.  A missed match will cause the upper expression to become a
failed match as well.  I'm pretty sure it still does that.

If you have a repeating entity in the block and you use EXPAND, you
need to use yacc/bison style iteration to capture them, so EXPAND is
also useful if you have only a single entity to match.

Use EXPANDFULL when you have a repeating entity, of if you want
parser failures inside the block to be ignored, and treated as
unmatched syntax.

>where the rule some_typedids-expandfull is going to be semantically
>equivalent to typedids, but will have to be written in the
>"expandfull" style.  I wish to avoid this redundancy.

In this case, you can use a helper-rule that calls back to the
original rule you want to use.  In c.by, see arg-list, and
arg-sub-list.

>I have tried something like
>
>  (EXPAND $1 parenthesized-typedids)
>
>parenthesized-typedids : LPAREN typedids RPAREN
>                         (identity $2)
>                       ;
>
>but, for some reason, with this expression,
>semantic-show-unmatched-syntax-mode shows no unmatched characters in
>my test buffer, but bovinate returns only *unparenthesized* typedids
>when binding is the top-most rule.

Short answer: You need to save to return value of EXPAND macros

Long answer:

The EXPAND or EXPANDFULL macros have a return value.  If you do not
use the return value, it is lost.  For example, you might do this:

mythingy : pre-thing PAREN_BLOCK post_thing
	   (TAG $1 'thing :innerthing (car (EXPAND $2 'inner-thing)))
         ;

inner-thing: ...
        ;


If innerthing above is not stored in some way, it will be lost.  In
c.by, see the rule "extern-c" for an example.  Then look in
semantic-c.el at the function semantic-expand-c-tag, and how it
treats tags of type 'extern.

Your case may be simpler.  Perhaps a new macro to promote a tag
created in an EXTERN directly into the local list is needed.  That is
unclear to me.

>Is this a standard pattern for wisent parsers?  What is the suggested
>structuring solution?

The semantic-tag-exand-function is how we have done stuff in the past.
Your recent approach with mapc is an intriguing idea (to me at least,
David may have other thoughts.)

>P.S. Any comments on my previous post?
> http://sourceforge.net/mailarchive/message.php?msg_id=7112833

I grepped through my local mail cache and did not see it, so it has
not been delivered to my mailbox yet.

>>>> In the archives bu not in my mailbox, Joseph Kiniry things that:
>
> What are the suggestions of the Cedet developers?  Should I hold off
> with what I have until you release a 1.0, or should I attempt to use
> a more recent version of the repository?  Is there a stable tag I can
> upgrade to, rather than using HEAD?  Any other comments?

Short Answer: Keep going.  No changes for needed for you.

Long Answer:

David's changes do not break existing lexical analyzer creation
schemes.  The old schemes will (probably) always be required for
complex languages.  Simple languages (or more specifically,
languages with simple lexical rules) can use David's new support.

It may be that working in the beta1c style lexical rules will benefit
you in the future because you will better understand what is going on
with the auto-generated lexical analyzer.

>I have one question about the results of bovination.  Some of my
> resulting sexps are of the form:
> 
>  ("{ |=, y }" type
>               (:members
>                ("y" "|=")
>                :type "enumeration")
>               (reparse-symbol braced_typeexpr-expandfull)
>               #<overlay from 665 to 668 in typeexpr.pvs>)
> 
> What does the "reparse-symbol" bit mean?  Am I forgetting to use an
> EXPANDFULL somewhere, or is this just a representation issue?
> Something else?

Short answer: It is added by wisent automatically.

Long Answer:

A tags structure is:

( "NAME" CLASS ATTRIBUTES PROPERTIES OVERLAY )

See the manual entry "Tag Basics" for more.

the TAG, VARIABLE-TAG and related macros only allow you to specify a
NAME, CLASS, and ATTRIBUTES.  PROPERTIES and OVERLAY are internal,
and not related to parsing.

The bovine and wisent parsing framework framework specify some
properties automatically, one of which is 'reparse-symbol'.  When a
tag is created, the parser knows which specific %start rule created
it and saves it here.

When you edit a buffer, and changes are made to the body of a tag
created with a reparse-symbol, the partial reparse mechanism will
know where to start because of the reparse symbol.  This makes
reparsing an edited buffer much faster than if the entire parent tag
had to be reprased from scratch.

-- 
          Eric Ludlam:                 za...@gn..., er...@si...
   Home: http://www.ludlam.net            Siege: www.siege-engine.com
Emacs: http://cedet.sourceforge.net               GNU: www.gnu.org

[cedet-semantic] Re: Avoiding redundant rules (EXPANDFULL-related).

From: Joseph K. <ki...@ac...> - 2004-02-05 12:32:20

Why can a rule used in an EXPANDFULL not return a list?

E.g.,

binding_list-expandfull : LPAREN
                          ()
                        | RPAREN
                          ()
                        | COMMA
                          ()
                        | typedid
                        | typedids
                        ;

typedid returns a single variable tag like
  ("n" 'variable (:type "BOOL"))
and typeids returns a list of variable tags like
  ( ("n" 'variable (:type "BOOL")) ("m" 'variable (:type "BOOL")))

In such a case, semantic signals and error and provides the backtrace:

Debugger entered: ((("n" variable (:type "BOOL") nil nil 637 647) ("m" variable (:type "BOOL") nil nil 637 647)))
  semantic--tag-expand((("n" variable (:type "BOOL") nil nil 637 647) ("m" variable (:type "BOOL") nil nil 637 647)))
  semantic-repeat-parse-whole-stream(((LPAREN 636 . 637) (IDENTIFIER 637 . 638) (COMMA 638 . 639) (IDENTIFIER 640 . 641) (COLON 641 . 642) (IDENTIFIER 643 . 647) (RPAREN 647 . 648)) binding_list-expandfull nil)
  semantic-parse-region-default(636 648 binding_list-expandfull 1 nil)
  semantic-parse-region(636 648 binding_list-expandfull 1)

Perhaps semantic--tag-expand should be modified to handle lists of
raw tags?

I guess I must lift the handling of the grouped variable tag handling
yet again?

Joe

[cedet-semantic] Moving to Semantic 2.0 & %start nonterminal that generate multiple tags.

From: Joseph K. <ki...@ac...> - 2004-02-12 16:41:09

"Eric M. Ludlam" <er...@si...> writes:

>> Joe wrote:
>> P.S. Any comments on my previous post?
>> http://sourceforge.net/mailarchive/message.php?msg_id=7112833
>
> I grepped through my local mail cache and did not see it, so it has
> not been delivered to my mailbox yet.
>
>>>>> In the archives but not in my mailbox, Joseph Kiniry things that:
>>
>> What are the suggestions of the Cedet developers?  Should I hold off
>> with what I have until you release a 1.0, or should I attempt to use
>> a more recent version of the repository?  Is there a stable tag I can
>> upgrade to, rather than using HEAD?  Any other comments?
>
> Short Answer: Keep going.  No changes for needed for you.
>
> Long Answer:
>
> David's changes do not break existing lexical analyzer creation
> schemes.  The old schemes will (probably) always be required for
> complex languages.  Simple languages (or more specifically,
> languages with simple lexical rules) can use David's new support.
>
> It may be that working in the beta1c style lexical rules will benefit
> you in the future because you will better understand what is going on
> with the auto-generated lexical analyzer.

I've begun testing my grammar with CVS HEAD today and things look
good so far.

I'm now generating good looking sets of tags, but my current problem
is one of structure.  It is just the standard loosly-typed Lispisms;
things like something should be a list of lists, but I only have a
list, etc.

Here is a problem that has come up though that I must ask about.

I have top-level constructs that have multiple meanings.  For example,
a PVS datatype is a datatype, a type, and a function.  Unfortunately,
a top-level %start rule can only return a *single* tag, whereas I
would like to return a list of tags.

E.g.,
datatype : id theoryformals_opt COLON datatype_or_codatatype with_subtypes_ids_opt
           BEGIN
           importing_semicolon_opt
           assumingpart_opt
           datatypepart
           END id
           (list
            (TYPE-TAG $1 $4 (append $7 $8 $9) nil)
            (when $2
               (FUNCTION-TAG $1 $1 $2))
            (TAG $1 'datatype (append $7 $8 $9)))
         ;

Any ideas on how to handle this problem?

Thanks,
Joe
-- 
Joseph R. Kiniry                    ID 78860581      ICQ 4344804 
SOS Group, University of Nijmegen  http://www.cs.kun.nl/~kiniry/
KindSoftware, LLC                   http://www.kindsoftware.com/
Board Chair: NICE                    http://www.eiffel-nice.org/

[cedet-semantic] Re[2]: Avoiding redundant rules (EXPANDFULL-related).

From: Eric M. L. <er...@si...> - 2004-02-05 14:12:16

Hi,

  The reasons are historical.  I started long ago with the premise that
one call into the parser returned one or fewer tags.  This worked
well, and the bovine parser was pretty easy to use, but quite
limiting.  You could write no optional lambda expressions, and still
get an interesting parse tree.

  We have since made things more flexible, and added the TAG style
macros to help simplify things which, apparently, leads to some
confusion.

  It is unclear to me what the right answer is.  At a minimum, a
useful error message out of that routine would be good.

  Probably if you put your tag list in as the NAME slot in a tag, then
your extract mechanism would be pretty easy.

Eric

>>> Joseph Kiniry <ki...@ac...> seems to think that:
>Why can a rule used in an EXPANDFULL not return a list?
>
>E.g.,
>
>binding_list-expandfull : LPAREN
>                          ()
>                        | RPAREN
>                          ()
>                        | COMMA
>                          ()
>                        | typedid
>                        | typedids
>                        ;
>
>typedid returns a single variable tag like
>  ("n" 'variable (:type "BOOL"))
>and typeids returns a list of variable tags like
>  ( ("n" 'variable (:type "BOOL")) ("m" 'variable (:type "BOOL")))
>
>In such a case, semantic signals and error and provides the backtrace:
>
>Debugger entered: ((("n" variable (:type "BOOL") nil nil 637 647) ("m" variable (:type "BOOL") nil nil 637 647)))
>  semantic--tag-expand((("n" variable (:type "BOOL") nil nil 637 647) ("m" variable (:type "BOOL") nil nil 637 647)))
>  semantic-repeat-parse-whole-stream(((LPAREN 636 . 637) (IDENTIFIER 637 . 638) (COMMA 638 . 639) (IDENTIFIER 640 . 641) (COLON 641 . 642) (IDENTIFIER 643 . 647) (RPAREN 647 . 648)) binding_list-expandfull nil)
>  semantic-parse-region-default(636 648 binding_list-expandfull 1 nil)
>  semantic-parse-region(636 648 binding_list-expandfull 1)
>
>Perhaps semantic--tag-expand should be modified to handle lists of
>raw tags?
>
>I guess I must lift the handling of the grouped variable tag handling
>yet again?
>
>Joe
>

-- 
          Eric Ludlam:                 za...@gn..., er...@si...
   Home: http://www.ludlam.net            Siege: www.siege-engine.com
Emacs: http://cedet.sourceforge.net               GNU: www.gnu.org

[cedet-semantic] Re[1]: Moving to Semantic 2.0 & %start nonterminal that generate multiple tags.

From: Eric M. L. <er...@si...> - 2004-02-12 18:43:14

>>> Joseph Kiniry <ki...@ac...> seems to think that:
  [ ... ]
>I've begun testing my grammar with CVS HEAD today and things look
>good so far.

Yay!

>I'm now generating good looking sets of tags, but my current problem
>is one of structure.  It is just the standard loosly-typed Lispisms;
>things like something should be a list of lists, but I only have a
>list, etc.
>
>Here is a problem that has come up though that I must ask about.
>
>I have top-level constructs that have multiple meanings.  For example,
>a PVS datatype is a datatype, a type, and a function.  Unfortunately,
>a top-level %start rule can only return a *single* tag, whereas I
>would like to return a list of tags.

I had identified that if I iterate in code over the same rule to
generate the tags, it was more robust that using typical looping via
recursive grammar rules.  This is how I can identify unmatched syntax.

The assumption was that a give call into the parser would then return
only one tag.

>E.g.,
>datatype : id theoryformals_opt COLON datatype_or_codatatype with_subtypes_ids_opt
>           BEGIN
>           importing_semicolon_opt
>           assumingpart_opt
>           datatypepart
>           END id
>           (list
>            (TYPE-TAG $1 $4 (append $7 $8 $9) nil)
>            (when $2
>               (FUNCTION-TAG $1 $1 $2))
>            (TAG $1 'datatype (append $7 $8 $9)))
>         ;
>
>Any ideas on how to handle this problem?
  [ ... ]

I think I would return a tag of a new class.  Give it a name specific
to your language.

Then implement a function for `semantic-tag-expand-function'.  When
it sees a tag of that new class, it will replace it with two or more
new tags.

If the function part of your declaration is a constructor, don't
forget to set the attribute :constructor to non-nil.

Hmmm.  I think it is the symbol `constructor' without the :.  I should
fix that.

Anyway, that is the official way to do that, and is used in C and
Java for statements like this:

int a, b;

Good Luck
Eric

-- 
          Eric Ludlam:                 za...@gn..., er...@si...
   Home: http://www.ludlam.net            Siege: www.siege-engine.com
Emacs: http://cedet.sourceforge.net               GNU: www.gnu.org

[cedet-semantic] Re: Moving to Semantic 2.0 & %start nonterminal that generate multiple tags.

From: Joseph K. <ki...@ac...> - 2004-02-16 15:30:00

Hello again Eric,

"Eric M. Ludlam" <er...@si...> writes:

>>>> Joseph Kiniry <ki...@ac...> seems to think that:
>   [ ... ]
>>I've begun testing my grammar with CVS HEAD today and things look
>>good so far.
>
> Yay!

Right-on.

>>I'm now generating good looking sets of tags, but my current problem
>>is one of structure.  It is just the standard loosly-typed Lispisms;
>>things like something should be a list of lists, but I only have a
>>list, etc.
>>
>>Here is a problem that has come up though that I must ask about.
>>
>>I have top-level constructs that have multiple meanings.  For example,
>>a PVS datatype is a datatype, a type, and a function.  Unfortunately,
>>a top-level %start rule can only return a *single* tag, whereas I
>>would like to return a list of tags.
>
> I had identified that if I iterate in code over the same rule to
> generate the tags, it was more robust that using typical looping via
> recursive grammar rules.  This is how I can identify unmatched syntax.

I understand this choice.

> The assumption was that a give call into the parser would then return
> only one tag.
>
>>E.g.,
>>datatype : id theoryformals_opt COLON datatype_or_codatatype with_subtypes_ids_opt
>>           BEGIN
>>           importing_semicolon_opt
>>           assumingpart_opt
>>           datatypepart
>>           END id
>>           (list
>>            (TYPE-TAG $1 $4 (append $7 $8 $9) nil)
>>            (when $2
>>               (FUNCTION-TAG $1 $1 $2))
>>            (TAG $1 'datatype (append $7 $8 $9)))
>>         ;
>>
>>Any ideas on how to handle this problem?
>   [ ... ]
>
> I think I would return a tag of a new class.  Give it a name specific
> to your language.
>
> Then implement a function for `semantic-tag-expand-function'.  When
> it sees a tag of that new class, it will replace it with two or more
> new tags.
>
> If the function part of your declaration is a constructor, don't
> forget to set the attribute :constructor to non-nil.
>
> Hmmm.  I think it is the symbol `constructor' without the :.  I should
> fix that.
>
> Anyway, that is the official way to do that, and is used in C and
> Java for statements like this:
>
> int a, b;

Must (sub)functions of -expand-tag be written for all non-core
semantics tags, or only ones that are returned by %start denoted
rules?

In other words, were I to use custom tags on other, non-expand(full)
related production rules, would they would be expanded with my
semantic-tag-expand-function as well?  (I'm deep in refactor mode so I
cannot even compile my grammar at the moment, thus the silly
question.)

I don't see any discussion of semantic-tag-expand-function in the
current CVS head documentation (beyond an extremely brief mention in
the Semantic Tags chapter's Misc Tag Internals subsection), FWIW.

I've been updating the docs a bit in my local sandbox, fixing typos,
spelling errors, grammar, and clarifying issues.  Shall I send a diff
eventually to someone?

Joe
-- 
Joseph R. Kiniry                    ID 78860581      ICQ 4344804 
SOS Group, University of Nijmegen  http://www.cs.kun.nl/~kiniry/
KindSoftware, LLC                   http://www.kindsoftware.com/
Board Chair: NICE                    http://www.eiffel-nice.org/