Dear Scintilla,
I have been working on a Raku (Perl6) lexer implementation and have a version for 4.2.1 included. I have covered support for the following so far:
$ hg clone http://hg.code.sf.net/p/scintilla/code -r rel-4-2-1 scintilla
$ cd scintilla
$ cp ../scintilla-4.2.1_raku_patch.diff .
$ hg import scintilla-4.2.1_raku_patch.diff
$ cd gtk/
$ make
$ wget https://excellmedia.dl.sourceforge.net/project/scintilla/SciTE/4.2.1/scite421.tgz
$ tar xzf scite421.tgz
$ cd scite
$ cp ../scite-4.2.1_raku_patch.diff .
$ patch -s -p0 < scite_4.2.1_raku_patch.diff
$ cd gtk/
$ make
$ sudo make install
There is still work to do, but I though it would be worth throwing open to the community. I also have an implementation for the Geany editor.
Best regards,
Mark.
Not related to the content of the lexer it self, but
1. GPL is not suitable, application use Scintilla may not use GPL.
2. 130 is already assigned to SCLEX_HOLLYWOOD (4.2.2, pending release)
3. The website for Raku is now https://raku.org/
Related [feature-requests:#1207]
Related
Feature Requests:
#1207Thanks Zufu,
Attached are two new patch files based on the latest pre 4.2.2 Scintilla release:
Patch / Build Scintilla 4.2.2 (pre)
$ hg clone http://hg.code.sf.net/p/scintilla/code -r 1b8ce5991cb9 scintilla
$ cd scintilla
$ hg import scintilla-4.2.2-pre_raku_patch.diff
$ cd gtk/
$ make
Patch / Build SciTE 4.2.1
$ wget https://excellmedia.dl.sourceforge.net/project/scintilla/SciTE/4.2.1/scite421.tgz
$ tar xzf scite421.tgz
$ cd scite
$ patch -s -p0 < scite_4.2.1_raku_patch2.diff
$ cd gtk/
$ make
$ sudo make install
Mark.
The property names added by this lexer should be namespaced with the lexer name, similar to the Perl lexer, so
fold.raku.comment.multiline
andfold.raku.comment.pod
. Properties that are global since they are used in other lexers,fold
,fold.compact
andfold.comment
, should omit descriptions so they are not treated as lexer-specific.There are non-ASCII characters in comments which can lead to problems with Microsoft Visual C++ in non-English locales. It is generally simplest to replace the literal characters with their Unicode description so the source code is pure ASCII.
The license is normally included by reference so consumers don't have to check whether this file's license text differs from License.txt.
The unnamed namespace does the same job as 'static' so makes the use of 'static' redundant.
There are some warnings from various tools. cppcheck is worth running although you should ignore 'constParameter' warnings.
For this warning from cppcheck 1.89, you are allowed to use the 'switch' statement. Coverity also doesn't like this code.
Scope limiting can be useful but its a question of taste.
Unused variables are clutter and can also reveal unfinished plans.
Possible that control flow ensures this is always set but its difficult to tell.
Debugging code is non-portable, never works for anyone else, and implies maintenance that doesn't happen, so DebugPrintSectionUnicode and printf shouldn't be included.
Thanks Neil,
I have made the following changes:
I have also taken the opportunity to add better Unicode character mapping for Raku. On a personal level, I am not a fan of a language with such broad character support. It's one thing to allow diverse language characters, but Raku also interprets number glyphs as numbers. It's all a bit broad.
Attached are the new patch files for Scintilla (rev: 1b8ce5991cb9) and SciTE 4.2.1
Mark.
Last edit: Mark Reay 2019-12-06
Hi Mark, personally, I think it's preferable to use anonymous namespace instead of static for entire file (after includes and before LexerModule lmRaku).
The LexerRaku::IsWordChar() method seems can be simplified by using IsIdStart(), IsIdContinue(), etc. from CharacterCategory header.
Hi Zufu,
Good point. I just changed the Raku::IsWordChar() function. That makes it very compact and much more efficient, now it's just:
I also put the anonymous namespace back, I think it makes sence too, if that's okay.
Attached is LexRaku.cxx, as that contain the only changes.
Thanks,
Mark.
I think I was a bit tired when I implemented the CharacterCategory check last night. I've now used the following instead:
The categorisation also works well with allowed numbers so the two functions I've updated are:
I will work to simplify GetBracketCloseChar next. Valid opening and closing delimiters can be any bi-directional pair of Unicode characters, as described in the first section of: http://www.unicode.org/Public/5.1.0/ucd/BidiMirroring.txt
Mark.
Last edit: Mark Reay 2019-12-07
'const' is redundant on return types. From clang-tidy:
'alowNumber' should probably be spelled 'allowNumber'.
Scintilla uses a fixed #include order with C++ library headers after C library headers. The order is defined in scripts/HeaderOrder.txt and checked by scripts/HeaderCheck.py. <vector> goes between <string> and <map>.
I have reduced the size of the switch statement that was in GetBracketCloseChar. What was a 178 cases has now been simplified to three CharacterCategory types. Some opening characters have matching closing characters that are not simply opener + 1. These have been cased as follows:
I have also removed the superfluous 'const' qualifiers from returning functions. Also moved <vector> to it's appropriate position.
Last edit: Mark Reay 2019-12-10
The latest version seems reasonable to me.
Due to some upcoming changes to the way lexers work, the Raku lexer won't be committed until those changes have been committed. This will most likely occur in a couple of weeks and may require minor changes to this lexer.
New lexing features have been committed and the ILexer interface updated to ILexer5 which adds new metadata retrieval calls. A patch with the changes needed is attached as RakuILexer5.patch.
The new lexer testing framework uses example files which are controlled by SciTE.properties files to produce expected output in .styled files. A minimal example x.p6 is attached as RakuTest.patch. The test file should include an example of each possible style.
Is the numeric '0' supposed to be in SCE_RAKU_DEFAULT instead of SCE_RAKU_NUMBER?
Thank you Neil,
I have gone over the lexer for final checks and fixed a few bugs:
I have updated the new style test files with tests for all style types:
The raku.properties file has been updated for SciTE. Just fixed the keywords line wrapping.
Attached are the patch files for both Scintilla (tip: 295a6e54d582) and SciTE 4.2.3
Mark.
Last edit: Mark Reay 2020-01-03
Committed as [bcb951], [3be72c], [604485], [cb4c65].
In ProcessValidRegQlangStart, the decrementing of length inside a loop appears wrong as using the unchanging startPos turns a linear change into a multiplicative change and a possible early termination. Maybe something like
When using a file with \r\n line ends as is common on Windows, there are often mismatched styles on the \r and \n when turning on visible line ends - SciTE: View | End of Line). The most common is at the end of a '#' line comment where the \r is green and the \n grey. While this is not an error and happens with some other lexers, it is a source of problems and lexers that style both the \r and \n with the same style are more robust.
Related
Commit: [604485]
Commit: [cb4c65]
Commit: [3be72c]
Commit: [bcb951]
I have found two places where CRLF was not being handled propperly and corrected them for:
The length calculation in ProcessValidRegQlangStart was clearly in error. I have replaced it with:
Committed as [4bdfd4].
Related
Commit: [4bdfd4]