#1627 Geany incorrectly folds Ruby code


Some Ruby code won't fold correctly in Geany. A contributor to Geany states that this is a problem in Scintilla.

For example, if I have

def validate(arr)
if arr.any?(&:empty?)

validate(["a", "b", "c"])

then folding the if folds too much.

What I expected to happen: when folding on the if, the "end" correponding to the end of the "validate" method is visible.

What happened instead: when folding on the if, the "end" corresponding to the end of the "validate" method is not visible.

As evidence that what I expected to happen is correct, if you change :empty? into :"empty?", I get the expected behavior.

This happened on Ubuntu 12.04 LTS "Precise Pangolin" - Release amd64. I was using Geany 0.21.

A contributor to Geany states that this is a Scintilla problem:

Confirmed with Git Geany and latest SCITE.

This is a problem in the Scintilla editing component (www.scintilla.org), not Geany, as shown by its existance in Scite (which also uses Scintilla).
Please report the problem to Scintilla upstream.

The Geany bug report was at https://sourceforge.net/p/geany/bugs/1058/

I've also reported this on the bug tracker for Ubuntu: https://bugs.launchpad.net/ubuntu/+source/geany/+bug/1337015

1 Attachments


  • Neil Hodgson

    Neil Hodgson - 2014-07-25
    • labels: --> scintilla, lexer, ruby
    • status: open --> open-accepted
    • assigned_to: Neil Hodgson
  • Colomban Wendling

    The problem is that the ? after :empty is recognized as "the start of a character code escape sequence" (c.f. LexRuby.cxx:1008) -- and I guess that breaks because it includes the following closing parenthesis.

    I unfortunately don't know anything about Ruby, so I don't know what this is supposed to mean. Should a "character code escape sequence" only include a known set of characters rather than anything but whitepsaces/newlines? Or should that ? be part of the :empty (SYMBOL) thing?

    So, if anyone knows Ruby they can either fix the lexer, or if they explain me exactly what should happen I might give it a try, if really no Ruby-aware people are willing to do it themselves.

    Anyway, what I gathered is that the fix should either be

    1. if the ? should be part of :empty (SYMBOL), then avoid the condition on line 1263 from matching ? when state == SCE_RB_SYMBOL
    2. if the ? should be an OPERATOR and "character code escape sequence"s cannot include ), make the condition on line 1005 match ) (or more if appropriate)
  • Neil Hodgson

    Neil Hodgson - 2014-07-25

    The original posting treats :"empty?" as equivalent so I suspect that is the desired lex. However,online examples ("setuid?" in http://www.ruby-doc.org/core-2.1.2/Symbol.html) use double quotes for symbols where there might be ambiguity. Couldn't find a good reference on the syntax of symbols.

  • Colomban Wendling

    OK, well. Reading such a complex Yacc file is not really what I call documentation, but if it's what is really used by Ruby it's at least accurate. Anyway, reading this, I extracted the following pseudo-syntax for symbol literals:

    IDENTIFIER := ( [a-z_] [A-Za-z0-9_]*
                    # means '=' not followed by any of '~', '=' or '>', or followed by =>
                  | [a-zA-Z_] [A-Za-z0-9_]* "=" ((?![~=>])|(?==>)) )
    CONSTANT   := [A-Z] [A-Za-z0-9_]*
    IDENT     := [a-zA-Z_] [A-Za-z0-9_]*
    FID       := IDENT [?!](?!=) # '!' or '?' not followed by '='
    GVAR      := "$" ( [_~*$?!@/\\;,.=:<>"]
                     | "-" .
                     | [&`'+]
                     | [1-9] [0-9]*
                     | "0" IDENT
                     | IDENT )
    IVAR      := "@" IDENT
    CVAR      := "@@" IDENT
    OP        := ( "|"
                 | "^"
                 | "&"
                 | "<=>"
                 | "=="
                 | "==="
                 | "=~" # tMATCH
                 | "!~" # tNMATCH
                 | ">"
                 | ">="
                 | "<"
                 | "<="
                 | "!="
                 | "<<"
                 | ">>"
                 | "+"
                 | "-"
                 | "*"
                 | "*" # tSTAR
                 | "/"
                 | "%"
                 | "**"
                 | "**" # tDSTAR
                 | "!"
                 | "~"
                 | "+" # tUPLUS
                 | "-" # tUMINUS
                 | "[]"
                 | "[]="
                 | "`" )
    SYM       := ( FNAME | IVAR | GVAR | CVAR )
    SYMBOL    := ":" SYM

    What is interesting for the bug at hand is FID: it is an identifier followed by either ? or ! (itself not followed by =). Attached is a patch that add support for this (0001-Ruby-allow-FIDs-as-symbol-literals.patch).

    Then, I saw that currently the parser doesn't accept GVAR/CVAR/IVAR symbol literals, so there is a second patch (a bit more complicated) that adds support for this (0002-Ruby-allow-global-class-instance-variables-as-symbol.patch).

    Note that those are only based on my reading of the Yacc syntax file linked above and validated by some test files successfully parsed by Ruby, but as I don't know Ruby I can't be 100% sure it really does what I think it does if I overlooked something.

    Last edit: Colomban Wendling 2014-07-29
  • Colomban Wendling

    It was pointed out to me I missed one case, that identifiers can end with = (and that's supported as part of symbol literals).

    Attached patch in to of the others (but only 0001 is required normally) that adds support for this (and I'll edit my previous comment's syntax tree to reflect this).

  • Neil Hodgson

    Neil Hodgson - 2014-07-30

    Committed as [797358], [d406ee], and [09c2de].

    Clang analyze shows a dead-store warning for line 902 since chNext2 is not used after that line:
    chNext2 = styler.SafeGetCharAt(i+2);

    Thanks for working out the syntax allowed. I got a little lost in the yacc.



    Commit: [09c2de]
    Commit: [797358]
    Commit: [d406ee]

    • Colomban Wendling

      Clang analyze shows a dead-store warning for line 902 since chNext2 is not used after that line:
      chNext2 = styler.SafeGetCharAt(i+2);

      Indeed. I initially had it set in case it was reused, but as it isn't and the other branches around don't update it I guess getting rid of it is just fine if you want.

    • Colomban Wendling

      When looking at this dead assignment, I also realized this branch doesn't properly update ch, which leads to improper updating of chPrev at the end of the loop. This isn't a problem in practice as chPrev won't be used in this case, but it could lead to tricky bugs if someone starts using it in the future.

      Attached is a patch removing the dead assignment and properly updating ch.

  • Neil Hodgson

    Neil Hodgson - 2014-07-30
    • status: open-accepted --> open-fixed
  • Neil Hodgson

    Neil Hodgson - 2014-08-14
    • status: open-fixed --> closed-fixed

Log in to post a comment.