Menu

#1918 Misleading highlighting of Python f-string literals

Bug
closed-fixed
5
2017-03-21
2017-03-02
felix
No

Scintilla r6136.

In this code:

print(f"{2:04x}")
print(f"""{[
    x*x for
    x in (1,2,3)
]}""")
print(f"{" ".lstrip()}")
  • :04x is highlighted as if it were code, even though it is a format specifier and should be highlighted like a string;
  • The line containing for is not highlighted as code, nor is the line following it;
  • In the last line, lstrip() is highlighted as code, contrary to how CPython 3.6 interprets this line, which is as the sequence of tokens:
  • print
  • (
  • f"{" (this is a syntax error)
  • ".lstrip()}"
  • )

Scintilla should probably indicate somehow that the last line contains a syntax error. It's quite easy to misconstrue that this code is acceptable.

Discussion

  • John Ehresman

    John Ehresman - 2017-03-02

    Just to clarify, part of the problem is that {expressions} aren't restricted to a single line. I mistakenly thought they were. Neil, is there a good way to record that a line begins with {expression} characters in a triple quoted string?

     
    • Neil Hodgson

      Neil Hodgson - 2017-03-02

      Since the Python lexer is an object lexer, it can store arbitrary line end state, so a map<line_number, state=""> could hold each one. When a lex is started, the previous line's end state (if present) should be used and then lines after this removed from the map.</line_number,>

      This will use memory reasonably, as multiline f-strings will be rare and maps are stored as something equivalent to a sorted array of {line_number, state}.

      The SparseState template could be used (see LexCPP for examples) although its behaviour is a little different as its for storing mode changes and every line between changes reports the last value.

       
    • John Ehresman

      John Ehresman - 2017-03-02

      Could SetLineState() and GetLineState() on the document object be used? It doesn't seem to be used otherwise in the python lexer.

       
      • Neil Hodgson

        Neil Hodgson - 2017-03-02

        Yes, but that is less efficient as it is dense using an integer for each line.

         
    • John Ehresman

      John Ehresman - 2017-03-03

      Here's a patch to support multi-line expressions using a map to record the state (if any) at eol.

       

      Last edit: John Ehresman 2017-03-03
      • John Ehresman

        John Ehresman - 2017-03-04

        I have more changes coming -- probably in the next few days

         
  • Neil Hodgson

    Neil Hodgson - 2017-03-04

    Seems reasonable. I'll wait for the complete patch unless you want each piece committed.
    The LexPython.cxx file has also gained some "override" keywords in [7efe01] so you should probably merge that into your working copy so patches that touch lines close to the overrides still apply easily.

     

    Related

    Commit: [7efe01]

  • John Ehresman

    John Ehresman - 2017-03-06

    Here's a new version both as a single patch and an hg export of a series of commits. They're based on our sources and not on the commit with override. The new version handles multi-line expressions and expressions with {} characters in the expression. I originally thought f-string expressions were more restricted than they are. std::vector and std::map are used and may allocate a bit more dynamically than is strictly necessary but I think it should be acceptable.

    It does not color syntactically invalid f-strings like f'{1' differently because it can only catch a small subset of strings the are invalid -- also Wing and probably other IDEs will use red squiggly indicators for this.

    One thing to possibly add is to color the { and } characters on either side of an expression with a different color as well as probably any trailing ! or : clause. I think I've seen a screenshot with this approach and think it may be worth trying. It should probably be another ticket though.

     
  • Neil Hodgson

    Neil Hodgson - 2017-03-07

    Visual C++ fails with "not". I think allowing "not" for "!" was done to make it easier for programmers in countries where keyboards don't have "!" and there is likely to be some way to enable "not" in Visual C++. If you can work out how to get Visual C++ working with it then "not" is fine.

    One thing to possibly add is to color the { and } ...

    Sounds OK, although it will probably take a while before the preferred approach settles.

     
  • John Ehresman

    John Ehresman - 2017-03-07

    Oops -- I spend much more time writing Python these days so used not without thinking about it. Attached another diff with not replaced by !. Please let me know if you want the hg export.

     
  • Neil Hodgson

    Neil Hodgson - 2017-03-08

    Committed as [e58172] from f-string4.diff (+removed line end white space). Committing an export set has some benefits in allowing others to understand the sequence of changes and the reason for decisions. In this case, it seemed to me that it was more difficult to understand the stages than just a single squashed change set.
    Since the formatting had become a little inconsistent, astyle was run to standardise the format of this file with [c59523].

     

    Related

    Commit: [c59523]
    Commit: [e58172]


    Last edit: Neil Hodgson 2017-03-08
  • John Ehresman

    John Ehresman - 2017-03-09

    There's a bug -- the value of currentFStringExp isn't set if lexing starts on line 3+ of a multiline f-string. A patch is attached

     
  • Neil Hodgson

    Neil Hodgson - 2017-03-09
    • labels: --> scintilla, lexer, python, fstring
    • status: open --> open-fixed
    • assigned_to: John Ehresman
     
  • Neil Hodgson

    Neil Hodgson - 2017-03-09

    Committed as [2779eb].

     

    Related

    Commit: [2779eb]

  • Neil Hodgson

    Neil Hodgson - 2017-03-21
    • status: open-fixed --> closed-fixed
     

Log in to post a comment.

MongoDB Logo MongoDB