Menu

#1265 Lexer for Common Intermediate Language

Committed
closed
5
2019-03-07
2019-02-22
Jad Altahan
No

Lately I've been playing with .NET CIL and needed a syntax highlighter for the code, but wasn't lucky to find one. So I wrote a lexer for it. It's a lightweight one with just 400 lines and uses only 10 styles.

Here is a current draft of it:
https://gist.github.com/xv/4d109ade9dedc931e41906bf2e76da73

This is what the language basically looks like:
Imgur

Would you be interested in adding it? I can submit a full patch for it within few days.

Discussion

  • Neil Hodgson

    Neil Hodgson - 2019-02-23

    It looks like an assembly language. Did you check LexAsm and LexASY to see if they could work or could be made to work? Concentrating effort on existing lexers may lead to better outcomes than dispersing over more lexers.

    If there are advantages over the existing lexers, it can be included.

     
    • Neil Hodgson

      Neil Hodgson - 2019-02-23

      That should have been LexA68k, not LexASY.

       
      • Jad Altahan

        Jad Altahan - 2019-02-24

        I checked both LexAsm and LexA68k before creating a new lexer. There are multiple reasons why those two are incompatible with LexCIL (my new lexer):

        • LexCIL uses // & /* */ for comments, whereas the other two only recognize ;.
        • LexCIL does not have a style for single-quoted literals , whereas the other two have it. CIL uses ' ', but they are not recognized as a character or a string literal. Using one of the other lexers will essentially end up styling stuff that shouldn't be styled.
        • LexCIL makes use of labels (e.g. IL_xxxx:). Only LexA68 has an implementation for label styling .
        • CIL uses scopes, so the lexer folds braces { } like any C-style language. While LexA68k does not have a folding capability, LexAsm can have user-defined fold markers. So that's a plus for it.
        • CIL is entirely a case-sensitive language.


        Large example of a CIL code:
        https://gist.github.com/xv/f75488a69f6d95cb9f72821cbac3a2d2

        Wiki page for CIL:
        https://en.wikipedia.org/wiki/Common_Intermediate_Language

        CIL is considered a low-level language, but feels more of a high-level C-style language with assembly instructions rathar than an actual assembly variant.

        CIL being a simple OOP language, this is all it needs for styling:

        SCE_CIL_DEFAULT
        SCE_CIL_COMMENT     # /* */
        SCE_CIL_COMMENTLINE # //
        SCE_CIL_WORD        # primary; void, intern, extern, static, etc
        SCE_CIL_WORD2       # metadata; .class, .method, .assembly, etc
        SCE_CIL_WORD3       # opcode; nop, jmp, xor, conv.i, stind.r4, etc
        SCE_CIL_STRING      # " ... "
        SCE_CIL_LABEL       # IL_0000:, blah:, blah_blah:, etc
        SCE_CIL_OPERATOR
        SCE_CIL_STRINGEOL
        SCE_CIL_IDENTIFIER
        

        Do you think there's something we could do about it as an alternative to adding a new lexer? Or just forget about it altogether?

         

        Last edit: Jad Altahan 2019-02-24
        • Neil Hodgson

          Neil Hodgson - 2019-02-26

          Seems reasonable to add a new lexer if its seen as separate from assemblers.

           
          • Jad Altahan

            Jad Altahan - 2019-02-26

            Ok. That's fine.

            I have attached the final release of the lexer along with its properties file for anyone who may come across this ticket in the future and needs the lexer for their own fork/project.

            You can close this, I guess.

             
  • Neil Hodgson

    Neil Hodgson - 2019-02-28
    • Group: Completed --> Committed
     
  • Neil Hodgson

    Neil Hodgson - 2019-02-28

    Committed as [aaeca7], [375695].

    For SciTE, to minimize menu length and other UI elements, cil.properties is excluded by default.

     
    ❤️
    1

    Related

    Commit: [375695]
    Commit: [aaeca7]

  • Neil Hodgson

    Neil Hodgson - 2019-03-07
    • status: open --> closed
     

Log in to post a comment.