Share

Scintilla

Tracker: Feature Requests

1 Lexer customization - ID: 2892640
Last Update: Comment added ( nobody )

Hi,

I was directed to you from the Notepad++ project (see
https://sourceforge.net/projects/notepad-plus/forums/forum/331753/topic/344
9665). I didn't have the time to look at the code (I might do that
someday), so my requests may not be properly worded or may not make too
much sense within this project (sorry for that), but here they are.

1. As I assume all lexers work with some token categories (keywords,
operators, etc.) my request would be to allow an interface for the user to
add elements to each category, at least for the languages where this makes
sense (i.e. Lisp/Scheme).
2. All lexers should probably allow a few (4 maybe) user-defined token
categories (to allow highlighting customization) and at least one of
user-defined operators (to accomodate less usual pre-processors).
3. I don't know if that would make sense, but a generic, fully
configurable, lexer may be useful for special situations. I know that this
couldn't really be efficient, and there may need to be a few of them to
accomodate fundamentally different ways of defining a syntax, but I think
this could prove useful for many situations.

Thanks and sorry if I'm too naive.

Vlad D.


Vlad ( dobrotescu ) - 2009-11-05 14:33

1

Open

None

Neil Hodgson

Scintilla

Won't Implement

Public


Comments ( 7 )




Date: 2009-11-11 22:04
Sender: nobody

The XML configuration files are used by notepad to initialize scintilla
wordlists and other items. Scintilla has a similar construct for SciTE
where the contents of notepad's langs and styles xml files and possibly
more are stored in .properties files (another different format). I have
been working a little recently between the two environments.


Date: 2009-11-08 20:42
Sender: donhoAccepting Donations

"ScintillaEditView.h" is part of Notepad++ project, So it's definitively
Notepad++ issue but not Scintilla's one.

Don


Date: 2009-11-07 07:45
Sender: dobrotescu

OK ... I've done some more digging. In LexLisp.cxx, in the classifyWordLisp
function, only SCE_LISP_KEYWORD and SCE_LISP_KEYWORD_KW are determined
after the invocation of the InList method of the associated objects. The
fact that those two list objects exist gave me hope.

But I did some more digging and I found out that Notepad++ has a header
file from its ScitillaComponent source folder, called ScintillaEditView.h,
in which the setLispLexer function was defined as {setLexer(SCLEX_LISP,
L_LISP, LIST_0);}; ... comparing this with the similar vhdl function,
defined as {setLexer(SCLEX_VHDL, L_VHDL, LIST_0 | LIST_1 | LIST_2 | LIST_3
| LIST_4 | LIST_5 | LIST_6);}; it turns out that when the Scintilla dll is
built, the Lisp lexer is only given access to ONE keyword list (which turns
out to be the "instre1" named xml tag I was talking before.

So, as I haven't found any equivalent of ScintillaEditView.h in your
source code tree, I conclude that the issue of not being able to access the
second keyword list stays with Notepad++. It would seem that just adding |
LIST_1 to the lexer definition would provide access to the second keyword
list as "instre2".

Now my whish for a few more lists still stays with your project, but until
I didn't solve the first issue, any list enhancement stays inaccessible.
Anyhow, all this digging showed me that adding more generic lists should be
a pretty simple job ... if anybody is touching that part of code for
whatever reason, please keep in mind my whish.


Date: 2009-11-06 22:57
Sender: nyamatongweProject Admin

I don't use or understand Notepad++, instead using SciTE and Komodo. The
Lisp lexer supports two sets of keywords which divide lexemes recognized as
identifiers into one of SCE_LISP_IDENTIFIER, SCE_LISP_KEYWORD,
SCE_LISP_KEYWORD_KW. I thought your point (1) meant you wanted to add
SCE_LISP_KEYWORD_3.


Date: 2009-11-06 12:57
Sender: dobrotescu

I've browsed a bit the source code of both Scintilla/Scite and Notepad++.
While I still didn't figure out how the xml definitions are used by the C
code (I've only done some superficial code scanning), it seems to me that
the relevant xml information is the styleID from stylers.xml rather than
the keywordClass. If this is true, it means that the "type1" problem comes
from Notepad++. Please confirm.


Date: 2009-11-06 05:56
Sender: dobrotescu

Thanks for the quick answer. Yes, the first two requests go in the
direction of making the lexer more generic. The third one goes all the way
there, and I had in mind something along the lines of setting up some kind
of a voluntarily compliance rules/guidelines. But since you say that there
are totally diferent groups working on each lexer, I'll drop it for now.

The comments that follow are therefore targeted to the group that has
implemented the Scheme lexer (and probably the Lisp lexer, as it seems one
is a slightly modified clone of the other), and the details are related to
the way Notepad++ is using it/them.

So for both lexers, their language definition (in langs.xml) contains a
type (class) of keywords called "instre1" that encapsulates the keywords of
the language. In their style definition (in stylers.xml), this keywordClass
is associated to the "FUNCTION WORD" style, and more keywords can be added
here. Any keyword added here behaves on the screen in the same way as the
"language" ones. Being a keywodClass, the style settings editor in
Notepad++ offers a GUI widget to add words. I would call this "the expected
behavior".

Now the style definition of both lexers also include a keywordClass named
"type1", associated to the "OPERATOR" style. Therefore the settings editor
GUI offers a widget to add elements to this style as well. But there is no
language definition associated to "type1" and any token added to this style
is ignored. I tried to manually add a "type1" language definition, but this
had no effect. I would call this "unexpected behavior" and this whole
"type1" thing seems a forgotten testing or cloning artifact.

Looking at other language definitions, it seems that both lexers were
cloned after the C lexer, that uses "instre1" for "INSTRUCTION WORD" and
"type1" for "TYPE WORD" (now the names used do make sense ..). In the same
line of thought, the ASM lexer is a good example of using multiple keyword
classes (instre1, instre2, type1, type2, type3, type4), but they don't
allow the addition of operators either (that wouldn't make too much sense
for ASM, would it?). In fact no other lexers allow the addition of
operators and I assume that this is because operator behavior is the most
important language element to hard-code.

So here is my "updated" bug-fix/feature request(s), for the Lisp/Scheme
lexers:

1. Please eliminate the useless and confusing "type1" reference (that
would be a bug fix).

2. Please take a look at ASM and add more keyword types and associated
styles to allow the lexer to be adapted to various language implementations
(that would be a feature request).

3. Along the lines of point #2, the addition of a customizable list of
operator-like tokens would be really nice (this is probably whishful
thinking, but just take a look at the Notepad++'s "User defined language"
...).

Thanks (and sorry if I totally missed the point again)

Vlad D.


Date: 2009-11-06 02:39
Sender: nyamatongweProject Admin

1) Sounds like a generic lexer. I don't know a good way to implement this.
2) Lexers are developed by people that use that language and it is up to
those people to decide what features they want.
3) Same as 1.


Log in to comment.




Attached File

No Files Currently Attached

Changes ( 3 )

Field Old Value Date By
artifact_group_id None 2009-11-06 02:39 nyamatongwe
priority 5 2009-11-06 02:39 nyamatongwe
assigned_to nobody 2009-11-06 02:39 nyamatongwe