I've had a SCLEX_CONTAINER lexer for Scintilla for a while now, and thought it would be nice to give back, as I've got a lot of benefit from Scintilla over the years. It's fairly simple, but it gives a different style to segment headers, element and composite punctuation, and to special segment headers. Lastly, if the edifact file is not closed properly (with a ') it will show an error style.
Attached is a diff of the necessary files (the sce_edi defines, the calalogue addition, and a new LexEDIFSCT.cxx file).
I hope this benefits others!
Iain.
Thanks for the contribution. There is a good chance the lexer will be useful to some people in its current form.
There are a few warnings and other problems with the code currently which means it can't be included in the standard distribution.
There should be a reference to the license as is present in other lexers. Without any indication of a license, it is dangerous to include code this in a project.
Scintilla has to build with old compilers across several perating systems so C++11 features are not available, including auto, nullptr and override. This should change in 2017.
Version is the version of the ILexer interface supported which should be lvOriginal. See LexerBasic.
Add a virtual destructor.
Use '\377' which is a char.
VS2008 is particularly fun:
Calling pAccess->BufferPointer() can be expensive on large files as it copies (on average) half the document with each call to make all the text contiguous when there has been a change. So, if the user is typing, the copy occurs after each character is typed. Most lexer code should use LexAccessor::operator[] or LexAccessor::SafeGetCharAt which don't collapse the gap.
I've sorted out the above issues. I don't have access to GCC etc, but the new cpp file compiles cleanly in vs2015 and 2005. I think it's safe to assume it compiles cleanly in inbetween versions too!
Feel free to edit the comments at the start or suggest changes. I am happy for licensing to bewide open, though vanity makes me want my name to stay up there.
And here's one with folding operational too. It folds at UNH segments, headers for messages inside the overall package.
OK, that's better.
However there is a warning from VS 2015:
The code is a little weird, using a single iteration loop to be able to break instead of using ifs but it also appears wrong. The call to memcpy will always succeed since memcpy returns its destination argument. Perhaps you want memcmp?
FindPreviousEnd can be marked const.
LexerModule.h should be at the end of the list of #includes. Scintilla has a canonical header order defined in scripts/HeaderOrder.txt and a checking script scripts/HeaderCheck.py to ensure there aren't header ordering bugs and precompiled headers work reasonably.
The memcpy was a stupid typo - I had already fixed it, but had to go back in time to an earlier version when I messed up folding, and missed that memcpy.
I quite like the do { } while (0); with breaks when you have a lot of failure conditions to jump out from, as it prevents a lot of if nesting. but as there's not much of an endurance test here, I've changed to a couple of if's instead.
All my helper functions have been const'd, except for InitialiseFromUNA (). I am a big fan of const, and the new override modifiers - anything that makes it harder to do silly mistakes must be good.
The include order has been fixed, and a comment added to remind future Iain not to mess it up later.
Committed as [fa9991].
Related
Commit: [fa9991]
Additional changes committed to enable building on macOS: [94f0d4].
Related
Commit: [94f0d4]
Thanks for implementing this Lexer. I'd like to suggest some improvements...
Currently UNA line is colored in one color and most of the tags are same colors.
If you look at TextWrangler editor, they have made it a bit more flexible. Screenshot can be found here for inspiration; https://github.com/pliegl/textwrangler-edi
Hello Kalle,
The main difference that I can see is that "I" return three segment styles - normal (shared with segment separator), UNA and UNH. Whereas they have: Normal, UNA and UN*.
It would not be hard to change the match for UNH to UN. The question is whether that's the right thing to do.
I suppose we do have options, like "fold", so "make UN special" woud not be hard.... I shall have a think.
Iain.
I've made the changes, and attached a patch and new cxx file. Give them a play, and comment, before Neil need to have an opinion. - Iain.
And the cxx file.
OK. The name of the property should be namespaced with a "lexer.edifact." prefix which segregates each lexer's properties except for some of the earliest.
As well as runtime property discovery, there is also a script, scintilla/scripts/ScintillaData.py, that can discover all the properties for documentation or code generation. Some of SciTE's documentation is created by generating HTML from the property documentation. This is mostly built around the OptionSet class for new lexers or recognizing calls to GetProperty and some commenting conventions for old lexers but it can be triggered by placing a comment above the retrieval of the property plus a "// GetProperty" comment at the line end. Something like:
Be sure to run the script to make sure your lexer turns up in the list of lexers, the property name is in the list of property names and the documentation is there too.
Last edit: Neil Hodgson 2018-04-23
Here's an updated patch, using the better name above, and some // property comments added. I could not fine any // GetProperty comments in other lexers to compare against, but I did find the // property ones.
I currently have 4.0.3 - if they've been added later, I apologise.
There are 2 separate elements found by the script: documentation and property names. The documentation is found from your addition but the property name isn't - the script doesn't look for "strcmp" as that is quite common. It looks for lines that contain GetProperty or DefineProperty and with a quoted string. I added "// GetProperty" to the change set. No other lexers use this since they call either GetProperty* or DefineProperty.
The "fold" property is common to almost all lexers so adding many slightly different pieces of documentation would not be useful. Removing the comment above the declaration m_bFold omits that documentation.
The updated change set is committed as [5f511d].
Related
Commit: [5f511d]