#1459 Improvements to Haskell Lexer

Bug
closed-fixed
Neil Hodgson
3
2013-07-21
2013-04-03
kudah
No
  • Added support for MagicHash extension (lexer.haskell.allow.hash)
  • $ and # are now colored as operators.
  • .0 and -0 are now properly colored as operator and a number, not as just number.
  • Operators starting with double dash (e.g ---->) are properly colored as operators, not comments.
  • Added pragma highlighting.
  • Added basic C-preprocessor highlighting.
  • Qualified names (e.g. ABC.xyz) are now properly highlighted as identifiers, not types.
  • Qualified operators (e.g. ABC.<$>) are now properly highlighted as operators.
  • Operator . is now properly highlighted as an operator, not part of the identifier, when applied to a qualified and an unqualified value (e.g. Abc.xyz.yzx <- last one is an operator)
  • Operators starting with ':' are now properly highlighted as type constructors, not operators.
  • "family" after "data" is highlighted, as per TypeFamilies.
1 Attachments

Discussion

<< < 1 2 3 4 > >> (Page 2 of 4)
  • kudah
    kudah
    2013-04-20

    I'd prefer not to distribute large Unicode tables.

    Glib and Qt provide their own, but I doubt you can do anything about that on Win32.

     
  • kudah
    kudah
    2013-04-21

    • Fixed a bug when a dashes-only comment double-counted a line and could hang the editor
    • Comments inside pragmas are now highlighted
    • Reserved operators are now highlighted
      (Note, this patch defines u_is* functions as stubs for now, until Scintilla goes full on Unicode; it does not depend on GHC-tables patch)
     
    • Neil Hodgson
      Neil Hodgson
      2013-04-22

      Some warnings from MSVC:
      ..\lexers\LexHaskell.cxx(70) : warning C4100: 'ch' : unreferenced formal parameter
      ..\lexers\LexHaskell.cxx(74) : warning C4100: 'ch' : unreferenced formal parameter
      ..\lexers\LexHaskell.cxx(78) : warning C4100: 'ch' : unreferenced formal parameter
      ..\lexers\LexHaskell.cxx(82) : warning C4100: 'ch' : unreferenced formal parameter
      ..\lexers\LexHaskell.cxx(93) : warning C4800: 'int' : forcing value to bool 'true' or 'false' (performance warning)
      ..\lexers\LexHaskell.cxx(101) : warning C4800: 'int' : forcing value to bool 'true' or 'false' (performance warning)
      ..\lexers\LexHaskell.cxx(109) : warning C4800: 'int' : forcing value to bool 'true' or 'false' (performance warning)
      ..\lexers\LexHaskell.cxx(122) : warning C4800: 'int' : forcing value to bool 'true' or 'false' (performance warning)
      Prevent warnings int->bool with !=0

      The u_* stubs should be static to avoid namespace leaks.

      iswalpha and similar are likely to have some portability problems but could be wrapped in something that tries to handle platform differences.

      IsHaskellSymbol is more difficult but an approximation like iswgraph&&!iswalnum may be better than nothing.

       
      • kudah
        kudah
        2013-04-22

        iswalpha and similar are likely to have some portability problems but could be wrapped in something that tries to handle platform differences.

        IsHaskellSymbol is more difficult but an approximation like iswgraph&&!iswalnum may be better than nothing.

        It makes little sense to use platform specific wide char functions(which might as well be non-unicode), because Haskell is officially Unicode-only.

         
        Last edit: kudah 2013-04-22
        • kudah
          kudah
          2013-04-22

          Kept running into strange behavior with my own lexer, turns out atLineEnd in 3.3 is buggy and doesn't work at end of the file, unlike in 3.2, so all the matches on it were incorrect(or rather atLineEnd is incorrect).

           
          Last edit: kudah 2013-04-22
          • Neil Hodgson
            Neil Hodgson
            2013-04-22

            By 3.2 do you mean 3.2.4 and earlier or 3.2.5 which is where support for the Unicode line end PS and LS characters was added.

             
            • kudah
              kudah
              2013-04-23

              Tested with 3.2.3. Current Scintilla doesn't trigger atLineEnd when line end is the last character in the file.

               
              Last edit: kudah 2013-04-23
  • Neil Hodgson
    Neil Hodgson
    2013-04-24

    This change may fix the line end issue
    http://www.scintilla.org/StyleContext.patch

     
    • kudah
      kudah
      2013-04-24

      It did fix the issue for me.

       
  • Neil Hodgson
    Neil Hodgson
    2013-04-25

    Line end detection can be written as

    (sc.atLineEnd || sc.ch == '\n' || sc.ch == '\r')
    

    but this often leads to different styles for the CR and LF in a Windows line end. Some current lexers do this and its been a recurring cause of problems, mostly when the lexer developer only works on one platform or the other. This is one of the reasons for implementing atLineEnd and why it is preferred.
    Difference LF vs CRLF

     
    • kudah
      kudah
      2013-04-25

       
      Last edit: kudah 2013-04-25
  • kudah
    kudah
    2013-04-28

    Fixed folder a bit
    - Fixed incoherent folding at end of the file
    - Comments are now folded with fold.compact
    - Comment blocks are now treated as whitespace by the folder

     
    Last edit: kudah 2013-04-28
  • Neil Hodgson
    Neil Hodgson
    2013-04-30

    I'd like to avoid many change sets appearing in the main repository particularly when some are just churning back and forth so will merge when it appears stable and I have some time.

     
    • kudah
      kudah
      2013-04-30

      I'd like to avoid many change sets appearing in the main repository particularly when some are just churning back and forth so will merge when it appears stable and I have some time.

      I didn't say for you to merge all that pile in one go, just 0001 should be in next release; Barring that you can revert https://sourceforge.net/p/scintilla/code/ci/3176ee2f4014c16509b742c1b304fa2b5dab60d9/ so at least the editor won't hang on specially formatted comments.

       
      Last edit: kudah 2013-04-30
      • Neil Hodgson
        Neil Hodgson
        2013-05-01

        Merged the two changesets as [527231].

         

        Related

        Commit: [527231]

  • kudah
    kudah
    2013-05-04

    Reformatted all fixes from last merged patch minus line ending changes.

    • Allow arbitrary amount of # suffixes in identifiers with lexer.haskell.allow.hash
    • Allow only one dot in base 10 numeric literals
    • Comments are now treated as whitespace by the folder
    • Fixed inconsistent folding at end of the file
     
    • Neil Hodgson
      Neil Hodgson
      2013-05-07

      Committed as [e25d77].

       

      Related

      Commit: [e25d77]

  • Neil Hodgson
    Neil Hodgson
    2013-05-07

    With Unicode classification, the patch is bulked out with a subset of the case conversion table and can be made shorter by dropping that data. A simple binary searchable look up table of character ranges that share a type should require less than 4000 entries each containing a 21 bit start character and a 5 bit type. Could either be accessed through an IDocument method if added to Scintilla or be a class in lexlib to be useful for DLL lexers.

     
    • kudah
      kudah
      2013-05-08

      Okay.

       
      • Neil Hodgson
        Neil Hodgson
        2013-05-09

        That was just a note for anyone in the future (possibly me) that wants to add character categorization. I'm not implying you should do it.

         
<< < 1 2 3 4 > >> (Page 2 of 4)