#1810 Ruby: here doc not recognized after 1.9.3-style symbol hash keys

Bug
closed-fixed
5
2016-03-16
2016-02-13
No

Ruby 1.9.3 introduced an alternative way to define symbol keys in hashes: key: rather than :key =>.

Scintilla's Ruby lexer does not recognize a here doc when it follows this new syntax. For example, this here doc is not recognized, and the lexer picks up syntax from inside it and screws up the rest of the file:

hash = {
  text: <<-TEXT
    There could be something in here that breaks
    Ruby syntax highlighting, like:
    <html></html>
  TEXT
}

But this one is properly recognized (notice the change from text: to :text =>)

hash = {
  :text => <<-TEXT
    There could be something in here that breaks
    Ruby syntax highlighting, like:
    <html></html>
  TEXT
}

If it helps at all, I narrowed the cause of the issue down to the if block at line 506 of lexers/LexRuby.cxx:

c++ // Skip next batch of white-space firstWordPosn = skipWhitespace(firstWordPosn, lt2StartPos, styler); if (firstWordPosn != lt2StartPos) { // Have [[^ws[identifier]ws[*something_else*]ws<< return definitely_not_a_here_doc; }

Discussion

  • Neil Hodgson

    Neil Hodgson - 2016-02-13
    • status: open --> open-accepted
     
  • Colomban Wendling

    I have unfortunately no clue whether this is correct (I don't know Ruby :)), but it fixes the example. It treats foo: as a symbol, and allows a symbol before an heredoc.

    diff --git a/lexers/LexRuby.cxx b/lexers/LexRuby.cxx
    --- a/lexers/LexRuby.cxx
    +++ b/lexers/LexRuby.cxx
    @@ -466,6 +466,7 @@
         prevStyle = styler.StyleAt(firstWordPosn);
         // If we have '<<' following a keyword, it's not a heredoc
         if (prevStyle != SCE_RB_IDENTIFIER
    +            && prevStyle != SCE_RB_SYMBOL
                 && prevStyle != SCE_RB_INSTANCE_VAR
                 && prevStyle != SCE_RB_CLASS_VAR) {
             return definitely_not_a_here_doc;
    @@ -1088,6 +1089,10 @@
                         // <name>= is a name only when being def'd -- Get it the next time
                         // This means that <name>=<name> is always lexed as
                         // <name>, (op, =), <name>
    +                } else if (ch == ':'
    +                           && isSafeWordcharOrHigh(chPrev)
    +                           && strchr(" \t\n\r", chNext) != NULL) {
    +                    state = SCE_RB_SYMBOL;
                     } else if ((ch == '?' || ch == '!')
                                && isSafeWordcharOrHigh(chPrev)
                                && !isSafeWordcharOrHigh(chNext)) {
    

    Feel free to use, or not, depending on whether it makes sense (I can also provide a proper changeset export). I might also give it another shot if I get a easier description than "foo: is a symbol if inside a hash definition"; which is clear but requires to trach hash declarations, which is non trivial.

     
  • Neil Hodgson

    Neil Hodgson - 2016-02-16
    • status: open-accepted --> open-fixed
    • assigned_to: Neil Hodgson
     
  • Anthony Myre

    Anthony Myre - 2016-02-16

    Close, but it still doesn't work in the case that the symbol is not the first thing on the line, as in cases like these:

    render inline: <<HTML
        <div>Some HTML to render here</div>
    HTML
    

    (again, replacing inline: with :inline => results in correct behavior)
    (for reference, render is being passed a hash with a single key 'inline', and the curly braces around the hash are implied)

    hash = {str: <<-STR
            this is a string
        STR
    }
    

    I expect these might be very difficult to properly recognize due to Ruby's very flexible syntax, but if all else fails, I think it would be preferable to recognize heredocs too often than not often enough. An unrecognized heredoc can throw off the highlighting of the rest of the file, and it's more likely that someone would use one of the above syntaxes than use something that looks like a heredoc but isn't.

     
    • Colomban Wendling

      Close, but it still doesn't work in the case that the symbol is not the first thing on the line, as in cases like these: […]

      OK, that's because the extra symbol makes sureThisIsNotHeredoc() refuse it. Fixed in the first attached patch.

      hash = {str: <<-STR
              this is a string
          STR
      }
      

      This actually is caused by another issue in sureThisIsNotHeredoc(), which doesn't recognize lines with combined or nested expressions. Should be fixed in the second attached patch.

      I expect these might be very difficult to properly recognize due to Ruby's very flexible syntax, but if all else fails, I think it would be preferable to recognize heredocs too often than not often enough. An unrecognized heredoc can throw off the highlighting of the rest of the file […]

      Recognizing a HereDoc where there is none is almost certain to break highlighting of the rest of the file. Either way is bad, but if we really want to play and weight which one is worse, I'd say it's recognizing non-existent HereDocs, because with most other cases one can at the very least hack around a misdetection by adding a comment containing the proper tremination sequence, but that's not possible with a HereDoc. It's ugly, but well.

       
  • Neil Hodgson

    Neil Hodgson - 2016-03-16
    • status: open-fixed --> closed-fixed
     

Log in to post a comment.

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:





No, thanks