#1143 Updated Progress 4GL lexer - Code attached!!

Committed
closed
5
2016-10-16
2016-04-01
Yuval1
No

New revision of Prorgess 4GL / ABL includes:
ILexer implementation
Lexer can now correctly handle:
"end triggers" phrase
"last-event:function" phrase
Indefinite comment level depth

Cheers,
Yuval

1 Attachments

Discussion

  • Neil Hodgson

    Neil Hodgson - 2016-04-04

    This appears to be based on an older version of LexCPP.cxx.

    8 months ago, lexers were changed to use Sci_Position and Sci_PositionU for 64/32-bit reasons. LexProgress should use these types in a similar way to LexCPP. Use an interactive differ between LexCPP and LexProgress to look for occurrences of these in LexCPP unmatched by LexProgress.

    There are some features used by the C++ lexer to handle string literals in similar languages like Vala with names like hashquoted #", triplequoted """ and backquoted `. These do not appear to be used in Progress so should be removed. For each field in OptionsABL, see if it is referenced anywhere apart from a DefineProperty call and remove it if not used. Its difficult enough to understand code without extra unused entities.

    Unless the Progress preprocessor handles macros with arguments, much of the complexity of the preprocesor handling in methods like EvaluateTokens can be removed. EvaluateTokens, Tokenize and EvaluateTokens are not called by other code, so can be completely removed. Other fields then become unused such as preprocessorDefinitions and preprocessorDefinitionsStart. The 'active' style flag mechanism appears inoperative and so PPDefinition, PPStates, and LinePPState, ... are dead. LexerFactoryABL can be removed as only the case-insensitive form is referenced. Thus the caseSensitive field is now a constant false and its uses should be removed. Possibly you want to enable some of these features later but, if so, they should be maintained in a separate fork since they are currently just code bloat making the file 4 times larger than before.

    Include statements are in a specific order to ensure consistency and this is checked by an unpublished script. The #include <ctype.h> should move to the same place as in LexCPP.cxx. If possible, avoid ctype.h in favour of CharacterSet.h functions like IsAlphaNumeric since ctype.h functions may fail badly with non-ASCII values.

     
  • Yuval1

    Yuval1 - 2016-06-08

    Thanks a lot for your thourough code review.
    I've used 100% of your insights, removing SubStates as well.

    Revised source is attached.

     
  • Yuval1

    Yuval1 - 2016-06-18

    A revised code is attached:

    Fixing a bug where scrollng down has produced wrong styling.
    Small styling, etc. fixes

     
  • Neil Hodgson

    Neil Hodgson - 2016-08-15

    This has introduced the term ABL and uses it in the state constants SCE_ABL_* but does not include the values for SCE_ABL_* in a header (SciLexer.h) or Scintilla.iface. Changing the names from SCE_4GL_* may cause failures in existing client code. Its possible that this change is sufficiently significant that the new lexer should be completely separate (in a new file) and exist in parallel to the current lexer which is then deprecated. However, if it is likely that client code will work if the existing states are used then that is a more compatible option.

    The code LexerModule lmPS(SCLEX_PS is from the PostScript lexer and will cause confusion between PostScript mode and Progress mode. Each language/lexer gets a unique SCLEX_ value and SCLEX_PROGRESS was allocated to Progress. If this lexer is to be separate, it can have the next value

    #define SCLEX_ABL 121
    

    Every property should have a DefineProperty call so that the set of properties can be found. lexer.abl.allow.dollars doesn't have one unlike lexer.cpp.allow.dollars which it is based on. This is needed in C because some variants allow $ in identifiers and some don't. If ABL always allows $ in identifiers or never allows $ in identifiers then it shouldn't be an option.

    Raw strings are not used in the lexer so rawStringTerminators, rawStringTerminator, rawSTNew and SparseState.h should be removed. Raw strings look like this in C++ where the "XXX" and brackets are part of the quoting, not part of the string:
    ~~~c++11
    R"XXX(I'm a "raw" string.)XXX
    ~~~

    definitionsChanged is never changed so it and the call to ChangeLexerState should be removed.

     
  • Neil Hodgson

    Neil Hodgson - 2016-08-16

    Coverity doesn't like the way the isLastWordEnd variable is used. The effective life is just within the case for SCE_ABL_IDENTIFIER and the earlier assignments have no effect. Either the variable is global to the method and it should be used somewhere else, or it should be local within the if block after "case SCE_ABL_IDENTIFIER:".

    If isLastWordEnd doesn't need to be initialized then maybe checkLastWordEnd isn't needed either.

     
  • Yuval1

    Yuval1 - 2016-09-03

    Here is a revised code:
    Cleaned up dead code
    Fixed the lexer name
    Attached the new values of SCE_ABL_* (SciLexer.h and Scintilla.iface)

    Changing from SCE_4GL to SCE_ABL reflects a significant change (32 states --> 12 states). There are 2 clients using this lexer and both of them have already adopted the new scheme.

     
  • Yuval1

    Yuval1 - 2016-09-04

    I just found a regression issue, I'll submit a new source soon.

     
    • Neil Hodgson

      Neil Hodgson - 2016-09-04

      The comment for the "fold.comment" property appears wrong as "//{" has no effect.

       
    • Neil Hodgson

      Neil Hodgson - 2016-09-05

      styleSubable is not used so should be removed.

       
  • Matt G

    Matt G - 2016-09-06

    I recently made some changes to the old lexer. These changes were committed and are in LexProgress.cxx in the current release of Scintilla. They are:

    Unlimited nesting of comments
    Support for hexadecimal constants
    Single-line comments (with a separate style identifier)
    Support for abbreviated keywords

    I downloaded the new lexer and tried it. It doesn't have support for single-line comments (which are new in OpenEdge 11.6) or abbreviated keywords. Adding support for abbreviated keywords is simple. Change the Inlist calls to InListAbbreviated and pass '(' as the second argument. The keyword list should contain words in the form "def(ine", "var(iable", etc. This is the standard that was established in OpenEdge when color-coding was introduced in the development tools.

    The delimiter for single-line comments is "//". The "//" must be at the start of a line or be preceded by a space or tab. A single-line comment can't immediately follow other text on the line. Single-line comments should have their own style ID - for example, SCE_ABL_LINECOMMENT.

    //valid comment - beginning of line
    END. // valid comment - space before //
    END.// invalid comment - no space before //

     
  • Yuval1

    Yuval1 - 2016-09-06

    Neil - I've fixed per your comments. Also removed the foldAtElse option which wasn't effective.

    Matt - I've added all the features you've mentioned, with slight differences:
    LineState might not be accurate enough for nested comments, so I'm calculating it by the char.
    Numbers include a few less characters, similar to latest CPP implementation
    LineComments can have any white space before them. It also supports line continuation. I don't have OE11.6 to test it so please fill free to check on me
    Abbreviated keywords are supported as you've mentioned

         I've attached SciLexer.h and Scintilla.iface with the additional value.
    
     
    • Matt G

      Matt G - 2016-09-07

      Line comments don't support the line continutation character. The comment ends at the end of the line even if there is a ~ on the line. Otherwise, the line comments look good.

      I'm seeing a problem with nested comments. The editor isn't always updated immediately when I add a nesting level. This may be a bug in Scintilla rather than in the lexer.

      I start with code like this:

      /*

      */

      DEF VAR

      "DEF VAR" is correctly styled as SCE_ABL_WORD and is blue in my editor. Now I nest the comment by adding another "/*" at the beginning of the first line:

      / /

      */

      DEF VAR

      "DEF VAR" doesn't immediately turn green. If I move the cursor using the down arrow "DEF VAR" then turns green as expected. Can you reproduce this behavior?

       
      • Neil Hodgson

        Neil Hodgson - 2016-09-07

        If the line end style of the last line lexed isn't changed by the lexer then Scintilla assumes that nothing past that point needs to be redrawn. If a change is made to the state at the end of lexing that is not represented by a change in the last style byte then the lexer should call styler.ChangeLexerState. For an example, see the C++ lexer which does this for raw strings and preprocessor changes.

         
    • Neil Hodgson

      Neil Hodgson - 2016-09-07

      The Scintilla.iface has

      lex PSxxx=SCLEX_PSxxx SCE_PS_
      lex PS=SCLEX_PS SCE_ABL_
      

      This steals the PostScript lexer's ID and should be returned to

      lex PS=SCLEX_PS SCE_PS_
      lex Progress=SCLEX_PROGRESS SCE_ABL_
      

      The LineEndTypesSupported method shouldn't return SC_LINE_END_TYPE_UNICODE unless it really can handle the Unicode line ends LS, PS, and NEL. That would also require updating the Version method to return lvSubStyles (LineEndTypesSupported is in ILexerWithSubStyles) and implementing (with at least dummy methods) the ILexerWithSubStyles interface. If the lexer doesn't really support Unicode line ends then SC_LINE_END_TYPE_DEFAULT should be returned.

      Attached is a file with some Unicode line ends to start testing if you want.

       
  • Yuval1

    Yuval1 - 2016-09-10

    A revised code is attached with below changes:
    Recovered the Postscript lexer id
    Adjusted to the default EOL.
    Removed the line-continuation from single line comment
    Added the explicit lexer state change
    Enhanced the support of single line comment

     
    • Matt G

      Matt G - 2016-09-12

      These changes resolve the problems I saw with single-line comments and nested comments. I haven't run our automated test suite on this version of the lexer yet but my manual testing looks good.

       
  • Matt G

    Matt G - 2016-09-16

    I noticed that task markers don't work in single-line comments. Here's a proposed fix:

    OLD:
             case SCE_ABL_LINECOMMENT:
                if (sc.atLineStart && !continuationLine) {
                   sc.SetState(SCE_ABL_DEFAULT);
                   isSentenceStart = true;
                }
                break;
    
    NEW:            
             case SCE_ABL_LINECOMMENT:
                if (sc.atLineStart && !continuationLine) {
                   sc.SetState(SCE_ABL_DEFAULT);
                   isSentenceStart = true;
                } else {
                   styleBeforeTaskMarker = SCE_ABL_LINECOMMENT;
                   highlightTaskMarker(sc, styler, keywords4);
                }
                break;
    
     
  • Yuval1

    Yuval1 - 2016-09-17

    I fully agree Matt. Would you upload a revised version?

    Thanks,
    Yuval

     
    • Matt G

      Matt G - 2016-09-19

      Here is my version with the fix for task markers in single-line comments.

       
      Last edit: Matt G 2016-09-19
  • Neil Hodgson

    Neil Hodgson - 2016-09-21
    • labels: --> scintilla, lexer, progress
    • assigned_to: Neil Hodgson
    • Group: Completed --> Committed
     
  • Neil Hodgson

    Neil Hodgson - 2016-10-16
    • status: open --> closed
     

Log in to post a comment.

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:





No, thanks