#584 D lexer update

Completed
closed
Scintilla (356)
4
2009-07-03
2009-05-19
maXmo
No

Fixed nasty comment which was highlighted wrong by viewvc.
Added support for unicode chars in identifiers as per D spec.
Added 3 extra keyword groups.
Strings are multiline in D.
Slightly more careful number parsing: don't parse 0..2 as number, parse decimal and hex floats.
Support for two types of wysiwyg strings.
Some support for hex strings (no escape sequences).

Check if it compiles and works.
Example file: http://dsource.org/projects/phobos/browser/trunk/tools/rdmd.d
I'll post more spesialized testcase later.

Discussion

1 2 3 > >> (Page 1 of 3)
  • Neil Hodgson

    Neil Hodgson - 2009-05-19

    Committed to CVS.

     
  • Neil Hodgson

    Neil Hodgson - 2009-05-19
    • milestone: --> 897169
    • priority: 5 --> 4
    • assigned_to: nobody --> nyamatongwe
     
  • Vincent Thorn

    Vincent Thorn - 2009-05-20

    Hi, I'm newbie to the project. Question: what a reason to add 3 more groups? Didn't you bother with existing 4? (plus 5 different strings, numbers, etc)

    Yesterday I rewrote this lexer completely, hope you'll find mine more usable.

     
  • maXmo

    maXmo - 2009-05-20

    Revision 1.3 supported only 3 usable keyword groups.
    I usually use different highlighting for statements, types and attributes (and D has a good set of attributes). Separate red style for casts and one for some platform types. Different strings are needed only for parser to know, what is the current context and how it can end.

     
  • maXmo

    maXmo - 2009-05-20

    I see no need for complete rewrite either.

     
  • Vincent Thorn

    Vincent Thorn - 2009-05-20

    Well, if you're happy just with "strings" and 6 sorts of keywords, OK (I have opposite prefs: 6 types of strings and not so big matter of keywords). Me was disappointed with ugly copy of C++ parser, converted to D. See my description I prepared for new lexer:

    What's done:
    1. All latest keywords, including spec.symbols (like __TIMESTAMP__, etc)
    2. Full support for normal, verbatim(WYSIWYG), hex-, delimited and token strings.
    3. Support(highlight) for one-char escape sequences inside normal strings.
    4. Limited support for numbers, including underscore and all prefixes/spec.chars; There is no semantic pass, so all mix of valid characters are allowed.
    5. All comments supported, escept nested /++/ - they cannot nest (I mean they are not highlighted properly if nested)

    What is in plans (in priority order):
    1. Full support for escape sequences, like a \&blah; \x0000 or \000
    2. Custom folding
    3. Nested comments
    4. Operators validation (like a == != >>>= !<>= etc)
    5. Number validation

     
  • Neil Hodgson

    Neil Hodgson - 2009-05-20
    • milestone: 897169 -->
     
  • Neil Hodgson

    Neil Hodgson - 2009-05-20

    Its really up to people that use D to work out what to do. For now, I have reverted to the previous version while this is being discussed.

     
  • Vincent Thorn

    Vincent Thorn - 2009-05-20

    OK, I downloaded latest maxxmo improvements and will try to merge with my code. Sure, you'll like new lexer! See example: http://i44.tinypic.com/29z4cok.png
    (sorry for colors, didn't ajust anything)

     
  • Nobody/Anonymous

    1. Ridiculous. It's not lexer's job to know latest keywords, especially "spec symbols".
    2. q{} string was a misdesign and will be probably dropped as it clashes with delegate literals.
    4. I would like to see exponent parsing no worse than in my code (though it still has two minor bugs for this matter).
    5. It's a regression.

     
  • maXmo

    maXmo - 2009-05-20

    It was me :3

     
  • maXmo

    maXmo - 2009-05-20

    These are test case for D lexer as it is required by lexTests.py

     
  • maXmo

    maXmo - 2009-05-20

    Oh... And these lines for lexTests.py (sorry if not in diff format)

    def testD\(self\):
        self.LexExample\("x.d", b"d", \[b"keyword1", b"keyword2", b"doxygenkw",
            b"keyword4", b"keyword5", b"keyword6", b"keyword7"\]\)
    
     
  • Neil Hodgson

    Neil Hodgson - 2009-05-21

    Should any keyword lists be allocated now for use in inline assembler?

     
  • maXmo

    maXmo - 2009-05-21

    Why now? They should be allocated when support for them is implemented.

    PS I think it was unnecessary to revert my patch since it was an improvement to previous version and vth will try to merge my code.

     
  • maXmo

    maXmo - 2009-05-21

    Good idea: assign special keywords like asm and doxygen via lexer properties, otherwise all keywords are regular.
    Like:
    lexer.d.doxygen.key=2
    lexer.d.asm.key=6

     
  • Nobody/Anonymous

    Well, I see my job offend some people, who very proud with his improvements. May be I'm wrong, but I thought it's a COMMON project, where EVERYBODY can make scintilla better. Well, maxxmo, I don't want to disappoint you, but my lexer WAY more better that current. If you worry about your priceless changes, no problem - I included 'em into my code (though some was superseded by original algorithm). As I promised, I did improvements on escapes, see what we got: http://i39.tinypic.com/20sjcrd.png

     
  • Nobody/Anonymous

    What's done:
    1. All latest keywords, including spec.symbols (like __TIMESTAMP__, etc) - 7 groups.
    2. Support for normal, verbatim(WYSIWYG), hex-, delimited and token strings.
    3. Support(highlight) for escape sequences inside normal strings and chars.
    4. Limited support for numbers, including underscore and all prefixes/spec.chars; There is no semantic pass, so all mix of valid characters are allowed.
    5. All comments supported, including nested.

    What is in plans (in priority order):
    1. Custom folding
    2. Operators validation (like a == != >>>= !<>= etc)
    3. Number validation (hex, binary, float, complex)
    4. String suffixes c/w/d

     
  • Vincent Thorn

    Vincent Thorn - 2009-05-21

    Dam... it was me :)

     
  • Vincent Thorn

    Vincent Thorn - 2009-05-21

    2 maxxmo:
    > 1. Ridiculous. It's not lexer's job to know latest keywords, especially
    "spec symbols".

    Wanna laughing? Look at your parser. Lexer IS a place to determine what is keyword, what is not. I name it "my work" since I add it to *.properties; what a problem with that?

    > 2. q{} string was a misdesign and will be probably dropped as it clashes
    with delegate literals.

    Why you wrote it here? Don't you know digitalmars site?

    > 4. I would like to see exponent parsing no worse than in my code (though
    it still has two minor bugs for this matter).

    No problem, but before you push this changes, try at least finish it! Half-lexer is not a lexer. Especially speaking about rare case of numbers (I use numbers twice per 1000 lines). And again, there is no any obstacles to merge our code - see at my lexer, it's obvious.

    > 5. It's a regression.

    Too bravely. In Russia after such kind of words you have at least PROVE what you said. Have you any?

    PS
    You was so hurry blame me that forgot item (3).

    PPS
    I don't pretend to make "best parser in da world", but its current structure more smarter and close to real compiler's parser - it means we have more chances to improve it latter.

     
  • Vincent Thorn

    Vincent Thorn - 2009-05-21

    Sorry, guys, just a bit change - forgot to parse _hex_ characters in \u escapes - updated sources here: http://depositfiles.com/files/c186h50ot

     
  • Neil Hodgson

    Neil Hodgson - 2009-05-21

    The reason for allocating keyword lists up front is that otherwise they won't be available when needed and rearranging things damages existing use.

    I reverted the change because its benefits were in dispute.

    Adding an extra level of indirection to keyword lists is not consistent with current use so would require changes in client applications.

     
  • Neil Hodgson

    Neil Hodgson - 2009-05-21

    Files can be uploaded only by the original author of a feature request and only when you are logged in. If you want to upload to the tracker then start a new feature request.

     
  • Vincent Thorn

    Vincent Thorn - 2009-05-21

    OK, thank you for explanation, Neil! But it seems I'm not in developer list - is it OK?

    Related asm & doxy: I'm not sure these stuff are priority. Most required help (in any language) is a completion. I already reserved 7 groups for keywords, for now it's more than enough.

     
1 2 3 > >> (Page 1 of 3)

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:





No, thanks