#1101 [ruby] quotes in here docs should be ignored

Bug
open-accepted
Neil Hodgson
Scintilla (788)
4
2014-08-14
2011-02-21
redstun
No

expected: the quote signs, both double quote and single quote, should be ignored inside Ruby here docs.

actually: the quote signs are not ignored, which causes syntax mess especially when there are odd number of quotes, see the attached screenshot for an example of a messed up ruby syntax highlighting.

Thanks

Discussion

  • redstun
    redstun
    2011-02-21

    a single quote inside Ruby here document causes a syntax mess

     
    Attachments
  • redstun
    redstun
    2011-02-21

    in the attached screenshot, check the code on line 29, you'll see the problematic single quote sign.

    BTW, we can also see that the ruby keyword 'in' (as: elsewhere in GEM_PATH) on line 29 is also highlighted even if inside the here document.

    looks like here doc should be treated just another quote mechanism, like single quote and double quote, but it isn't.

     
  • Neil Hodgson
    Neil Hodgson
    2011-02-21

    It appears that it is not treating it as a here document in this case. As I don't use Ruby, I won't be working on this myself.

     
  • Neil Hodgson
    Neil Hodgson
    2011-02-21

    • assigned_to: nobody --> nyamatongwe
    • priority: 5 --> 4
    • labels: --> Scintilla
    • status: open --> open-accepted
     
  • Eric Promislow
    Eric Promislow
    2012-05-04

    Could you attach a sample file that triggers this? I'm not seeing it in Komodo (which uses a slightly different lexer, but the synchronization code is mostly the same, and in fact is a subset of the standard lexer's).

     
  • Kevin Cox
    Kevin Cox
    2013-08-27

    I have another example of this situation. I originally reported it to the geany editor but they told me to upstream it.

    Please see the attached files (and the report itself) https://sourceforge.net/p/geany/bugs/989/

     
  • After some digging, it's because to make sure whether something after << is a heredoc or not, the Ruby lexers looks ahead in the 50 next lines for the delimiter and assumes it's not if it doesn't find it.

    See lexer/LexRuby.cxx:sureThisIsNotHeredoc(), especially lines 585-586.

    So heredocs of less than 50 lines should roughly work, although all this lookahead/lookbehind seem to lead to some refreshing issues.

     
  • Kevin Cox
    Kevin Cox
    2013-08-27

    That is a little hacky.

    Example HEREDOC that shouldn't be.

     
    Attachments
  • Kevin Cox
    Kevin Cox
    2013-08-27

    I hate how stateful ruby's syntax is. I don't see a real fix. You could try looking until the end of the file but that would be slow and still hacky. Unless you try to maintain a symbol table you are screwed, and unless you take a whole bunch of care with that symbol table it will still have edge cases that are wrong.

    The only saving grace is that most often the code is indented so that will save a number of false positives.

     
  • Kevin Cox
    Kevin Cox
    2013-08-28

    After grepping through random ruby projects a fairly reliable heuristic appears to be that the << operator usually has a space after it (or on rare cases a character like () where as the heredoc operator has no space (it isn't allowed).

    So if I were to write a regex version /<<-?('[^']*'|"[^"]*"|[A-Za-z0-9]+)/ would be close. It doesn't handle escapes inside the quoted string and I'm not exactly sure what characters can be inside an unquoted heredoc but I think it fits the basic idea.

    This matches every heredoc in the projects I grepped perfectly with no false positives. It obviously isn't perfect but coding style appears to make it fairly reliable.

     
    Last edit: Kevin Cox 2013-08-28
  • Kevin Cox
    Kevin Cox
    2013-08-28

    I should also add. Detecting with this heuristic also allows any false positives to be remedied simply by adding a space after the <<, which is better than the current where there is no work-around should you have a string over 50 lines.

     
  • Kevin Cox
    Kevin Cox
    2013-08-28

    Some searches showing that regex at work.

    Should match all heredocs and only heredocs: http://searchcode.com/?q=lang%3Aruby+%2F%3C%3C-%3F%28%27[^%27]%27|%22[^%22]%22|[A-Za-z0-9]%2B%29%2F

    Should match append operations, and no heredocs. http://searchcode.com/?q=lang%3Aruby+%2F%3C%3C[^-%27%22A-Za-z0-9]%2F

    The second one is roughly the opposite of the first except it doesn't validate the rest of the terminator string.

     
  • Kevin Cox
    Kevin Cox
    2013-08-29

    I did more research and in a couple of rare cases I saw the shift operator used as a bitshift without a space.

    1<<4
    

    And while code such as

    puts <<1
    Hi there!
    1
    

    Is a heredoc, I couldn't find any examples of it. So, if the first character of the identifier is a number is a number it should probably be considered a shift operator.

    Also this is only relevant for the plain <<EOS format as while 4 <<-1 == 2 anyone who uses that deserves whatever comes to them.

    Also, I searched around for other people syntax highlighting ruby and the space after the << appears to be the most common heuristic for differentiating the two. So people won't be complaining about you adding spaces after your shifts because you are the only person who's editor gets screwed up.

     
    Last edit: Kevin Cox 2013-08-29
  • Eric Promislow
    Eric Promislow
    2013-08-29

    You have to disambiguate a few operators in Ruby, not just "<<". For example, "/" either marks the start of a regex, or a division. ":" can be inside a conditional expression, or start a symbol.

    For Komodo, we've been using a forked Ruby lexer that does a bit more for disambiguating "<<".

    Code is at
    https://github.com/Komodo/KomodoEdit/blob/trunk/contrib/patches/scintilla/lexers/LexRuby.cxx

     
  • Kevin Cox
    Kevin Cox
    2013-08-29

    I just downloaded komodo edit and the following highlights as a shift operator.

    puts <<EOS
    Hello
    EOS
    
     
  • Eric Promislow
    Eric Promislow
    2013-08-29

    The "<<" part is always styled as an operator, since we consider it to
    be a prefix operator that takes two arguments: the terminator, and the
    string contents.

    Unfortunately out of the box Komodo colors here-documents as default text.
    In a "bright" scheme I typically use [Preferences/Fonts & Colors] to
    make the heredoc strings a garish blue on yellow, and then they show up just fine.

     
  • Kevin Cox
    Kevin Cox
    2013-08-29

    Ok, it appears to work well. Do you think there would be any issues replacing the old lexer with the new one, or would changes have to be made?

     
  • Eric Promislow
    Eric Promislow
    2013-08-29

    Neither lexer is perfect. We have some downstream requirements which called
    for the choices we made. I wouldn't replace the standard lexer whole-hog for that reason, just apply changes as desired. For example, add those "sureThisIsHeredoc" and "sureThisIsNotHeredoc" functions and let the community refine them.