#1459 Improvements to Haskell Lexer

Bug
closed-fixed
3
2013-07-21
2013-04-03
kudah
No
  • Added support for MagicHash extension (lexer.haskell.allow.hash)
  • $ and # are now colored as operators.
  • .0 and -0 are now properly colored as operator and a number, not as just number.
  • Operators starting with double dash (e.g ---->) are properly colored as operators, not comments.
  • Added pragma highlighting.
  • Added basic C-preprocessor highlighting.
  • Qualified names (e.g. ABC.xyz) are now properly highlighted as identifiers, not types.
  • Qualified operators (e.g. ABC.<$>) are now properly highlighted as operators.
  • Operator . is now properly highlighted as an operator, not part of the identifier, when applied to a qualified and an unqualified value (e.g. Abc.xyz.yzx <- last one is an operator)
  • Operators starting with ':' are now properly highlighted as type constructors, not operators.
  • "family" after "data" is highlighted, as per TypeFamilies.
1 Attachments

Discussion

<< < 1 2 3 4 > >> (Page 2 of 4)
  • kudah

    kudah - 2013-04-20

    I'd prefer not to distribute large Unicode tables.

    Glib and Qt provide their own, but I doubt you can do anything about that on Win32.

     
  • kudah

    kudah - 2013-04-21
    • Fixed a bug when a dashes-only comment double-counted a line and could hang the editor
    • Comments inside pragmas are now highlighted
    • Reserved operators are now highlighted
      (Note, this patch defines u_is* functions as stubs for now, until Scintilla goes full on Unicode; it does not depend on GHC-tables patch)
     
    • Neil Hodgson

      Neil Hodgson - 2013-04-22

      Some warnings from MSVC:
      ..\lexers\LexHaskell.cxx(70) : warning C4100: 'ch' : unreferenced formal parameter
      ..\lexers\LexHaskell.cxx(74) : warning C4100: 'ch' : unreferenced formal parameter
      ..\lexers\LexHaskell.cxx(78) : warning C4100: 'ch' : unreferenced formal parameter
      ..\lexers\LexHaskell.cxx(82) : warning C4100: 'ch' : unreferenced formal parameter
      ..\lexers\LexHaskell.cxx(93) : warning C4800: 'int' : forcing value to bool 'true' or 'false' (performance warning)
      ..\lexers\LexHaskell.cxx(101) : warning C4800: 'int' : forcing value to bool 'true' or 'false' (performance warning)
      ..\lexers\LexHaskell.cxx(109) : warning C4800: 'int' : forcing value to bool 'true' or 'false' (performance warning)
      ..\lexers\LexHaskell.cxx(122) : warning C4800: 'int' : forcing value to bool 'true' or 'false' (performance warning)
      Prevent warnings int->bool with !=0

      The u_* stubs should be static to avoid namespace leaks.

      iswalpha and similar are likely to have some portability problems but could be wrapped in something that tries to handle platform differences.

      IsHaskellSymbol is more difficult but an approximation like iswgraph&&!iswalnum may be better than nothing.

       
      • kudah

        kudah - 2013-04-22

        iswalpha and similar are likely to have some portability problems but could be wrapped in something that tries to handle platform differences.

        IsHaskellSymbol is more difficult but an approximation like iswgraph&&!iswalnum may be better than nothing.

        It makes little sense to use platform specific wide char functions(which might as well be non-unicode), because Haskell is officially Unicode-only.

         
        Last edit: kudah 2013-04-22
        • kudah

          kudah - 2013-04-22

          Kept running into strange behavior with my own lexer, turns out atLineEnd in 3.3 is buggy and doesn't work at end of the file, unlike in 3.2, so all the matches on it were incorrect(or rather atLineEnd is incorrect).

           
          Last edit: kudah 2013-04-22
          • Neil Hodgson

            Neil Hodgson - 2013-04-22

            By 3.2 do you mean 3.2.4 and earlier or 3.2.5 which is where support for the Unicode line end PS and LS characters was added.

             
            • kudah

              kudah - 2013-04-23

              Tested with 3.2.3. Current Scintilla doesn't trigger atLineEnd when line end is the last character in the file.

               
              Last edit: kudah 2013-04-23
  • Neil Hodgson

    Neil Hodgson - 2013-04-24

    This change may fix the line end issue
    http://www.scintilla.org/StyleContext.patch

     
    • kudah

      kudah - 2013-04-24

      It did fix the issue for me.

       
  • Neil Hodgson

    Neil Hodgson - 2013-04-25

    Line end detection can be written as

    (sc.atLineEnd || sc.ch == '\n' || sc.ch == '\r')
    

    but this often leads to different styles for the CR and LF in a Windows line end. Some current lexers do this and its been a recurring cause of problems, mostly when the lexer developer only works on one platform or the other. This is one of the reasons for implementing atLineEnd and why it is preferred.
    Difference LF vs CRLF

     
    • kudah

      kudah - 2013-04-25
       
      Last edit: kudah 2013-04-25
  • kudah

    kudah - 2013-04-28

    Fixed folder a bit
    - Fixed incoherent folding at end of the file
    - Comments are now folded with fold.compact
    - Comment blocks are now treated as whitespace by the folder

     
    Last edit: kudah 2013-04-28
  • Neil Hodgson

    Neil Hodgson - 2013-04-30

    I'd like to avoid many change sets appearing in the main repository particularly when some are just churning back and forth so will merge when it appears stable and I have some time.

     
    • kudah

      kudah - 2013-04-30

      I'd like to avoid many change sets appearing in the main repository particularly when some are just churning back and forth so will merge when it appears stable and I have some time.

      I didn't say for you to merge all that pile in one go, just 0001 should be in next release; Barring that you can revert https://sourceforge.net/p/scintilla/code/ci/3176ee2f4014c16509b742c1b304fa2b5dab60d9/ so at least the editor won't hang on specially formatted comments.

       
      Last edit: kudah 2013-04-30
      • Neil Hodgson

        Neil Hodgson - 2013-05-01

        Merged the two changesets as [527231].

         

        Related

        Commit: [527231]

  • kudah

    kudah - 2013-05-04

    Reformatted all fixes from last merged patch minus line ending changes.

    • Allow arbitrary amount of # suffixes in identifiers with lexer.haskell.allow.hash
    • Allow only one dot in base 10 numeric literals
    • Comments are now treated as whitespace by the folder
    • Fixed inconsistent folding at end of the file
     
    • Neil Hodgson

      Neil Hodgson - 2013-05-07

      Committed as [e25d77].

       

      Related

      Commit: [e25d77]

  • Neil Hodgson

    Neil Hodgson - 2013-05-07

    With Unicode classification, the patch is bulked out with a subset of the case conversion table and can be made shorter by dropping that data. A simple binary searchable look up table of character ranges that share a type should require less than 4000 entries each containing a 21 bit start character and a 5 bit type. Could either be accessed through an IDocument method if added to Scintilla or be a class in lexlib to be useful for DLL lexers.

     
    • kudah

      kudah - 2013-05-08

      Okay.

       
      • Neil Hodgson

        Neil Hodgson - 2013-05-09

        That was just a note for anyone in the future (possibly me) that wants to add character categorization. I'm not implying you should do it.

         
<< < 1 2 3 4 > >> (Page 2 of 4)

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:





No, thanks