>>After converting wisent-java.wy, I will be able to completely remove
>>wisent-java-lex.el. And probably some specific predefined analyzers
>>in semantic-lex.el could be removed too ;-)
> Wow, that is pretty impressive. I like the idea of having everything
> together in one file that explains how to parse a file.
> Don't you still need wisent-java-lex.el for the other java (non tag)
No, wisent-java-lex is only used in wisent-java-tags.el and
wisent-java.el to provide the lexers. When both lexers will be
auto-generated that file wont be necessary anymore.
>>- Defined two new `matchdatatype' generated analyzers:
>> * A sexp analyzer, defined by `define-lex-sexp-type-analyzer', to
>> handle tokens which are s-expressions anchored by a `syntax'
>> regexp. A trivial example is given by strings, anchored by
>> "\\s"" and whose end is given by a "guarded" `forward-sexp'.
> Very nice.
> It might not be much of a stretch for this to imply any kind of
> matching, not just those available in the syntax table? Like BEGIN ->
> END in Ada, for example.
That's a good idea. Perhaps a solution could be to allow overriding
of the hard-coded `forward-sexp' by a user's defined function that can
handle specific language expression like BEGIN -> END in Ada. The
sexp and block type analyzers could both benefit of that feature.
>> * A keyword analyzer, defined by `define-lex-keyword-type-analyzer',
>> to handle only language keywords. That permit to separate
>> analysis of true keywords from other symbols.
> Does this mean a separate entry in the lexer construction?
> ie. 1 call to find 'keywords' and a second to find 'symbol' %types?
> That could slow things down some.
Yes this means that. For what I saw, using the new Java lexer,
parsing hasn't slowed down. Matching a keyword is very fast, as it
uses a symbol table. The only extra cost I see can be a redundant
looking-at. IMO this tiny extra cost is largely compensated by the
flexibility given to the programmer. Keywords and other symbols no
more need to be handled in the same way, and can be matched by
different syntax regexp. For example we can imagine that keywords
start with a percent (like in grammars) and ordinary symbols not.
%type <keyword> syntax "%\\(\\sw\\|\\s_\\)+" matchdatatype keyword
%type <symbol> syntax "\\(\\sw\\|\\s_\\)+" matchdatatype regexp
In a such case using two different regexp to respectively match
keywords and symbols could be faster than looking up in the keyword
table for each symbol.
A more interesting case would be if, for some language, keywords were
not based on Emacs symbol characters, as opposed to other ordinary
symbols ;-) Separating keyword from symbol analyzers permits to
>>- <type> in the generated analyzer name is now enclosed between < > to
>> improve readability. For example:
>> %package my-foo
>> %type <symbol> syntax "\\(\\sw\\|\\s_\\)+" matchdatatype regexp
> This entry is beginning to strike me as being a bit long.
> Technically, the word "syntax" and "matchdatatype" are sugar, meaning
> they add anchors for the developer to know what is coming.
> Are sections of this optional? eg:
> %type <symbol> syntax "blahblah"
> implies a matchdatatype of regexp
> %type <symbol> matchdatatype regexp
> implies some sort of default syntax regular expression?
> I guess just
> %type <symbol>
> could mean something if there are defaults.
It will be easy to preset useful defaults in the type table for well
known types like <symbol>, <string>, <punctuation>, etc.., by adding
things like the following in `semantic-lex-make-type-table':
;; Set up some useful default properties
For now generation of a <type> analyzer is controlled by the existence
or a syntax property for that <type>.
To let the developer control what analyzers to generate it could be
possible to use a special flag like "%type <xxx> generate t", or
simply check that a %type statement is present.
> That's fine. I'd probably adapt it to use all your new stuff anyway. ;)
> I like the new representation as it says a lot more with the <> around
I committed the changes.