Re[2]: [CEDET-devel] Incremental parser behavior

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

>>> David Ponce <da...@dp...> seems to think that:
>Hi Eric,
>
>[...]
> > Originally, I felt there were two distinctions between "extra
> > specifiers" and "properties" on a token.
> >
> > Extra specifiers were things that a language knew about a token.
> > Properties were things added to tokens after the parser was done with
> > them, such as "dirty" when editing, or the "reparse symbol" set up
> > in the iterative parser.
>
>IMO, extra specifiers are more what a token [Semantic] knows about
>language syntax than what a language knows about a token ;-)

Yup, you got me there. ;)

  [ ... ]
> > I think there are a few options which should be considered.  I'm not
> > sure which is the best.  Obviously, the first is your solution with a
> > :properties specifier.  Benifit: this provides maximum flexibility.
> > Cons: sacrifices the purity of the language with details needed for
> > running aspects of semantic.  Basically, if we chose to change
> > something in the incremental parser like a property name, or desired
> > features, vast numbers of language locations would need to change to
> > handle it.
>[...]
>
>I think that properties like `reparse-symbol' or `reparse-safepoint'
>are different from properties like the `dirty' one.
>
>The latter is processing data, that can change during the life time of
>the parse tree.  Such properties are read/set by the particular tools
>which use them.  So, they can completely depend on the implementation
>of these tools, it does not really matter.
>
>The formers are permanent data carrying important informations about
>the parse tree structure.  For a given language, `reparse-symbol'
>links a source part with a token production rule in a grammar.
>`reparse-safepoint' marks a node as a leaf node, even if it has
>children.  [Currently, it seems that only the incremental parser uses
>these properties.  But maybe we could imagine other ways to use them
>(debug, trace, help)?  Perhaps the `reparse-' prefix is not a good
>idea and more general property names like "production" and "leaf-node"
>would be better?]

In this particular situation, perhaps having the reparse-symbol (or
whichever takes it's place) be an extra specifier instead makes sense.
My reasoning is that there is a nice simple way for users (meaning
programmers using the semantic package as a library) to add/delete
properties from a token.  Not so for extra specifiers.  (Actually,
`semantic-token-add-extra-spec' does exist, but I like to think of it
as an internal function.)

If reparse symbol is so key, perhaps it should be an extra specifier
instead.

I do like the idea of renaming it too.  I think Hollywood when you
say 'production' though. ;)

>As the various parsers used by Semantic can be customized, I think it
>is worth having an equivalent flexibility to "propertize" the parse
>tree.  I agree with you that setting token properties in grammar can
>introduce details needed for running aspects of Semantic.  But is this
>really an issue?  IMO, grammars are already designed to take into
>account some running aspects of Semantic, particularly grammars that
>use the iterative parser.  For example, in such grammars it is very
>important to take care of rules where Semantic tokens are produced, so
>the `reparse-symbol' property will be correctly set!
>
>IMO, the fact that some of these properties can be automatically
>provided, and others cannot, is not fundamental.
>
>If a parser, or another tool, needs new attributes in the parse tree,
>the developer should have a easy way to provide new token properties
>in the grammar.
>
>Finally, I doubt one can plug a new parser in, or make major changes
>in the specification of existing parsers, without having to change
>something in grammars.
>
>And of course, we use Emacs which simplify a lot the programmer's task
>;-)

In the previous email, the format you suggested was something vaguely
like this:

(wisent-token normal args here :property prop info)

I find it a tad strange to have a mix of regular arguments followed
by a property list.

I found myself sometimes trying to put comments in my token generation
so I could keep track of which position mean which value.  Perhaps all
of wisent-token could be this way.  Reading it would be clearer:

(wisent-token name 'function ; required args
	      :args $3
	      :type $1
	      :something $5
 	      :else $6
              :property 'reparse 'cool-thingy)

Then the properties could be in random order, and :something and
:else would automatically get turned into ((something . $5) (else . $6))
in the extra specifier slot.

Dilemma: Languages introducing new token tokens would have to have a
way of specifying preferred order.

I think this would make things much more readable, AND let us change
the token format without re-writing language files.  Nifty!
Cons: make things slower, not faster. ;(  Perhaps a crazy macro could
compile them into the right format.  Zoiks.

> > Of import, I think that for your very specific example, that
> > individual rules in a rule should not have overlays.  I can think of
> > no specific benefit, nor name that could be attributed to their
> > overlays to warrant the confusion it could add to your language
> > definitions.  I'm sure there are other situations were it would be
> > useful though.
>[...]
>
>If I correctly understood your point, what you suggest is to produce a
>"true" token (with an overlay) only for language syntax elements that
>are self contained, that is, that can be [re-]parsed independently.
>
>Another form to introduce details needed for running aspects of
>Semantic?
>
>Also, I really like to be able to use senator navigation, search,
>completion, etc., on grammar rules, because they are an important part
>of a context free grammar.  For now, most of these features probably
>won't work without overlays.  It would be a pity to loose some of them
>because of incremental parser constraints!
  [ ... ]

Hmm, I suspected that there was some reason but wasn't sure what it
was.  You could enable the reparse if you had specialty rules for the
first token, and follwing tokens separate from the rule which
generates the master list during a full parse.  Then the incremental
parser would work.

Have fun
Eric

-- 
          Eric Ludlam:                 za...@gn..., er...@si...
   Home: http://www.ludlam.net            Siege: www.siege-engine.com
Emacs: http://cedet.sourceforge.net               GNU: www.gnu.org