This appears to be based on an older version of LexCPP.cxx.
8 months ago, lexers were changed to use Sci_Position and Sci_PositionU for 64/32-bit reasons. LexProgress should use these types in a similar way to LexCPP. Use an interactive differ between LexCPP and LexProgress to look for occurrences of these in LexCPP unmatched by LexProgress.
There are some features used by the C++ lexer to handle string literals in similar languages like Vala with names like hashquoted #", triplequoted """ and backquoted `. These do not appear to be used in Progress so should be removed. For each field in OptionsABL, see if it is referenced anywhere apart from a DefineProperty call and remove it if not used. Its difficult enough to understand code without extra unused entities.
Unless the Progress preprocessor handles macros with arguments, much of the complexity of the preprocesor handling in methods like EvaluateTokens can be removed. EvaluateTokens, Tokenize and EvaluateTokens are not called by other code, so can be completely removed. Other fields then become unused such as preprocessorDefinitions and preprocessorDefinitionsStart. The 'active' style flag mechanism appears inoperative and so PPDefinition, PPStates, and LinePPState, ... are dead. LexerFactoryABL can be removed as only the case-insensitive form is referenced. Thus the caseSensitive field is now a constant false and its uses should be removed. Possibly you want to enable some of these features later but, if so, they should be maintained in a separate fork since they are currently just code bloat making the file 4 times larger than before.
Include statements are in a specific order to ensure consistency and this is checked by an unpublished script. The #include <ctype.h> should move to the same place as in LexCPP.cxx. If possible, avoid ctype.h in favour of CharacterSet.h functions like IsAlphaNumeric since ctype.h functions may fail badly with non-ASCII values.</ctype.h>
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
This has introduced the term ABL and uses it in the state constants SCE_ABL_* but does not include the values for SCE_ABL_* in a header (SciLexer.h) or Scintilla.iface. Changing the names from SCE_4GL_* may cause failures in existing client code. Its possible that this change is sufficiently significant that the new lexer should be completely separate (in a new file) and exist in parallel to the current lexer which is then deprecated. However, if it is likely that client code will work if the existing states are used then that is a more compatible option.
The code LexerModule lmPS(SCLEX_PS is from the PostScript lexer and will cause confusion between PostScript mode and Progress mode. Each language/lexer gets a unique SCLEX_ value and SCLEX_PROGRESS was allocated to Progress. If this lexer is to be separate, it can have the next value
#define SCLEX_ABL 121
Every property should have a DefineProperty call so that the set of properties can be found. lexer.abl.allow.dollars doesn't have one unlike lexer.cpp.allow.dollars which it is based on. This is needed in C because some variants allow $ in identifiers and some don't. If ABL always allows $ in identifiers or never allows $ in identifiers then it shouldn't be an option.
Raw strings are not used in the lexer so rawStringTerminators, rawStringTerminator, rawSTNew and SparseState.h should be removed. Raw strings look like this in C++ where the "XXX" and brackets are part of the quoting, not part of the string:
R"XXX(I'm a "raw" string.)XXX
definitionsChanged is never changed so it and the call to ChangeLexerState should be removed.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Coverity doesn't like the way the isLastWordEnd variable is used. The effective life is just within the case for SCE_ABL_IDENTIFIER and the earlier assignments have no effect. Either the variable is global to the method and it should be used somewhere else, or it should be local within the if block after "case SCE_ABL_IDENTIFIER:".
If isLastWordEnd doesn't need to be initialized then maybe checkLastWordEnd isn't needed either.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Here is a revised code:
Cleaned up dead code
Fixed the lexer name
Attached the new values of SCE_ABL_* (SciLexer.h and Scintilla.iface)
Changing from SCE_4GL to SCE_ABL reflects a significant change (32 states --> 12 states). There are 2 clients using this lexer and both of them have already adopted the new scheme.
I recently made some changes to the old lexer. These changes were committed and are in LexProgress.cxx in the current release of Scintilla. They are:
Unlimited nesting of comments
Support for hexadecimal constants
Single-line comments (with a separate style identifier)
Support for abbreviated keywords
I downloaded the new lexer and tried it. It doesn't have support for single-line comments (which are new in OpenEdge 11.6) or abbreviated keywords. Adding support for abbreviated keywords is simple. Change the Inlist calls to InListAbbreviated and pass '(' as the second argument. The keyword list should contain words in the form "def(ine", "var(iable", etc. This is the standard that was established in OpenEdge when color-coding was introduced in the development tools.
The delimiter for single-line comments is "//". The "//" must be at the start of a line or be preceded by a space or tab. A single-line comment can't immediately follow other text on the line. Single-line comments should have their own style ID - for example, SCE_ABL_LINECOMMENT.
//valid comment - beginning of line
END. // valid comment - space before //
END.// invalid comment - no space before //
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Neil - I've fixed per your comments. Also removed the foldAtElse option which wasn't effective.
Matt - I've added all the features you've mentioned, with slight differences:
LineState might not be accurate enough for nested comments, so I'm calculating it by the char.
Numbers include a few less characters, similar to latest CPP implementation
LineComments can have any white space before them. It also supports line continuation. I don't have OE11.6 to test it so please fill free to check on me
Abbreviated keywords are supported as you've mentioned
I've attached SciLexer.h and Scintilla.iface with the additional value.
Line comments don't support the line continutation character. The comment ends at the end of the line even if there is a ~ on the line. Otherwise, the line comments look good.
I'm seeing a problem with nested comments. The editor isn't always updated immediately when I add a nesting level. This may be a bug in Scintilla rather than in the lexer.
I start with code like this:
/*
*/
DEF VAR
"DEF VAR" is correctly styled as SCE_ABL_WORD and is blue in my editor. Now I nest the comment by adding another "/*" at the beginning of the first line:
/ /
*/
DEF VAR
"DEF VAR" doesn't immediately turn green. If I move the cursor using the down arrow "DEF VAR" then turns green as expected. Can you reproduce this behavior?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
If the line end style of the last line lexed isn't changed by the lexer then Scintilla assumes that nothing past that point needs to be redrawn. If a change is made to the state at the end of lexing that is not represented by a change in the last style byte then the lexer should call styler.ChangeLexerState. For an example, see the C++ lexer which does this for raw strings and preprocessor changes.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
The LineEndTypesSupported method shouldn't return SC_LINE_END_TYPE_UNICODE unless it really can handle the Unicode line ends LS, PS, and NEL. That would also require updating the Version method to return lvSubStyles (LineEndTypesSupported is in ILexerWithSubStyles) and implementing (with at least dummy methods) the ILexerWithSubStyles interface. If the lexer doesn't really support Unicode line ends then SC_LINE_END_TYPE_DEFAULT should be returned.
Attached is a file with some Unicode line ends to start testing if you want.
A revised code is attached with below changes:
Recovered the Postscript lexer id
Adjusted to the default EOL.
Removed the line-continuation from single line comment
Added the explicit lexer state change
Enhanced the support of single line comment
These changes resolve the problems I saw with single-line comments and nested comments. I haven't run our automated test suite on this version of the lexer yet but my manual testing looks good.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
This appears to be based on an older version of LexCPP.cxx.
8 months ago, lexers were changed to use Sci_Position and Sci_PositionU for 64/32-bit reasons. LexProgress should use these types in a similar way to LexCPP. Use an interactive differ between LexCPP and LexProgress to look for occurrences of these in LexCPP unmatched by LexProgress.
There are some features used by the C++ lexer to handle string literals in similar languages like Vala with names like hashquoted #", triplequoted """ and backquoted `. These do not appear to be used in Progress so should be removed. For each field in OptionsABL, see if it is referenced anywhere apart from a DefineProperty call and remove it if not used. Its difficult enough to understand code without extra unused entities.
Unless the Progress preprocessor handles macros with arguments, much of the complexity of the preprocesor handling in methods like EvaluateTokens can be removed. EvaluateTokens, Tokenize and EvaluateTokens are not called by other code, so can be completely removed. Other fields then become unused such as preprocessorDefinitions and preprocessorDefinitionsStart. The 'active' style flag mechanism appears inoperative and so PPDefinition, PPStates, and LinePPState, ... are dead. LexerFactoryABL can be removed as only the case-insensitive form is referenced. Thus the caseSensitive field is now a constant false and its uses should be removed. Possibly you want to enable some of these features later but, if so, they should be maintained in a separate fork since they are currently just code bloat making the file 4 times larger than before.
Include statements are in a specific order to ensure consistency and this is checked by an unpublished script. The #include <ctype.h> should move to the same place as in LexCPP.cxx. If possible, avoid ctype.h in favour of CharacterSet.h functions like IsAlphaNumeric since ctype.h functions may fail badly with non-ASCII values.</ctype.h>
Thanks a lot for your thourough code review.
I've used 100% of your insights, removing SubStates as well.
Revised source is attached.
A revised code is attached:
Fixing a bug where scrollng down has produced wrong styling.
Small styling, etc. fixes
This has introduced the term ABL and uses it in the state constants SCE_ABL_* but does not include the values for SCE_ABL_* in a header (SciLexer.h) or Scintilla.iface. Changing the names from SCE_4GL_* may cause failures in existing client code. Its possible that this change is sufficiently significant that the new lexer should be completely separate (in a new file) and exist in parallel to the current lexer which is then deprecated. However, if it is likely that client code will work if the existing states are used then that is a more compatible option.
The code LexerModule lmPS(SCLEX_PS is from the PostScript lexer and will cause confusion between PostScript mode and Progress mode. Each language/lexer gets a unique SCLEX_ value and SCLEX_PROGRESS was allocated to Progress. If this lexer is to be separate, it can have the next value
Every property should have a DefineProperty call so that the set of properties can be found. lexer.abl.allow.dollars doesn't have one unlike lexer.cpp.allow.dollars which it is based on. This is needed in C because some variants allow $ in identifiers and some don't. If ABL always allows $ in identifiers or never allows $ in identifiers then it shouldn't be an option.
Raw strings are not used in the lexer so rawStringTerminators, rawStringTerminator, rawSTNew and SparseState.h should be removed. Raw strings look like this in C++ where the "XXX" and brackets are part of the quoting, not part of the string:
definitionsChanged is never changed so it and the call to ChangeLexerState should be removed.
Coverity doesn't like the way the isLastWordEnd variable is used. The effective life is just within the case for SCE_ABL_IDENTIFIER and the earlier assignments have no effect. Either the variable is global to the method and it should be used somewhere else, or it should be local within the if block after "case SCE_ABL_IDENTIFIER:".
If isLastWordEnd doesn't need to be initialized then maybe checkLastWordEnd isn't needed either.
Here is a revised code:
Cleaned up dead code
Fixed the lexer name
Attached the new values of SCE_ABL_* (SciLexer.h and Scintilla.iface)
Changing from SCE_4GL to SCE_ABL reflects a significant change (32 states --> 12 states). There are 2 clients using this lexer and both of them have already adopted the new scheme.
I just found a regression issue, I'll submit a new source soon.
The comment for the "fold.comment" property appears wrong as "//{" has no effect.
styleSubable is not used so should be removed.
I recently made some changes to the old lexer. These changes were committed and are in LexProgress.cxx in the current release of Scintilla. They are:
Unlimited nesting of comments
Support for hexadecimal constants
Single-line comments (with a separate style identifier)
Support for abbreviated keywords
I downloaded the new lexer and tried it. It doesn't have support for single-line comments (which are new in OpenEdge 11.6) or abbreviated keywords. Adding support for abbreviated keywords is simple. Change the Inlist calls to InListAbbreviated and pass '(' as the second argument. The keyword list should contain words in the form "def(ine", "var(iable", etc. This is the standard that was established in OpenEdge when color-coding was introduced in the development tools.
The delimiter for single-line comments is "//". The "//" must be at the start of a line or be preceded by a space or tab. A single-line comment can't immediately follow other text on the line. Single-line comments should have their own style ID - for example, SCE_ABL_LINECOMMENT.
//valid comment - beginning of line
END. // valid comment - space before //
END.// invalid comment - no space before //
Neil - I've fixed per your comments. Also removed the foldAtElse option which wasn't effective.
Matt - I've added all the features you've mentioned, with slight differences:
LineState might not be accurate enough for nested comments, so I'm calculating it by the char.
Numbers include a few less characters, similar to latest CPP implementation
LineComments can have any white space before them. It also supports line continuation. I don't have OE11.6 to test it so please fill free to check on me
Abbreviated keywords are supported as you've mentioned
Line comments don't support the line continutation character. The comment ends at the end of the line even if there is a ~ on the line. Otherwise, the line comments look good.
I'm seeing a problem with nested comments. The editor isn't always updated immediately when I add a nesting level. This may be a bug in Scintilla rather than in the lexer.
I start with code like this:
/*
*/
DEF VAR
"DEF VAR" is correctly styled as SCE_ABL_WORD and is blue in my editor. Now I nest the comment by adding another "/*" at the beginning of the first line:
/ /
*/
DEF VAR
"DEF VAR" doesn't immediately turn green. If I move the cursor using the down arrow "DEF VAR" then turns green as expected. Can you reproduce this behavior?
If the line end style of the last line lexed isn't changed by the lexer then Scintilla assumes that nothing past that point needs to be redrawn. If a change is made to the state at the end of lexing that is not represented by a change in the last style byte then the lexer should call styler.ChangeLexerState. For an example, see the C++ lexer which does this for raw strings and preprocessor changes.
The Scintilla.iface has
This steals the PostScript lexer's ID and should be returned to
The LineEndTypesSupported method shouldn't return SC_LINE_END_TYPE_UNICODE unless it really can handle the Unicode line ends LS, PS, and NEL. That would also require updating the Version method to return lvSubStyles (LineEndTypesSupported is in ILexerWithSubStyles) and implementing (with at least dummy methods) the ILexerWithSubStyles interface. If the lexer doesn't really support Unicode line ends then SC_LINE_END_TYPE_DEFAULT should be returned.
Attached is a file with some Unicode line ends to start testing if you want.
A revised code is attached with below changes:
Recovered the Postscript lexer id
Adjusted to the default EOL.
Removed the line-continuation from single line comment
Added the explicit lexer state change
Enhanced the support of single line comment
These changes resolve the problems I saw with single-line comments and nested comments. I haven't run our automated test suite on this version of the lexer yet but my manual testing looks good.
I noticed that task markers don't work in single-line comments. Here's a proposed fix:
I fully agree Matt. Would you upload a revised version?
Thanks,
Yuval
Here is my version with the fix for task markers in single-line comments.
Last edit: Matt G 2016-09-19
Committed as [a965d9].
Related
Commit: [a965d9]