#1553 Shell/bash: hash (#) within words wrongly interpreted as comment

Bug
closed-fixed
5
2014-08-21
2013-11-18
Cousteau
No

The # in shell scripts doesn't represent a beginning of a comment if it's not at the beginning of a word:

echo hello #world    # this prints "hello"; the "#world" is a comment
echo hello#world     # this prints "hello#world"

However, #world is highlighted as a comment in both cases.

From the bash manpage (but this behavior is also present in other shell interpreters such as dash):

COMMENTS
[...] a word beginning with # causes that word and all remaining characters on that line to be ignored.

This is a bit tricky because in some cases such as echo hello&#world, the & marks the end of the command and thus # is considered to be at the beginning of a word, so it's not as easy as checking if the char before the # is a whitespace char.

Discussion

  • Colomban Wendling

    Quoting the Bash manual:

    The following definitions are used throughout the rest of this document.

    • blank: A space or tab.
    • word: A sequence of characters considered as a single unit by the shell. Also known as a token.
    • name: A word consisting only of alphanumeric characters and underscores, and beginning with an alphabetic character or an underscore. Also referred to as an identifier.
    • metacharacter: A character that, when unquoted, separates words. One of the following: | & ; ( ) < > space tab
    • control operator: A token that performs a control function. It is one of the following symbols: || & && ; ;; ( ) | |& <newline>

    I guess the "right" solution would be to properly read words/tokens and then do something with them. This however required quite a lot of changes.

    A simpler solution would be to have a set of characters that separate words (the metacharacters), and check if the character before the # is one of them and that it doesn't have word or identifier state (for escaped stuff like in foo\;).

    Attached is a kind of hackish patch implementing the second option since it's simpler. It properly handles constructs like foo#bar, foo\;#bar, 'foo'#bar, etc, but *#* should also be a word and the patch leaves those as a sequence of OPERATOR/IDENTIFIER/OPERATOR. Not sure how much of a problem it is though.

     
  • Kein-Hong Man

    Kein-Hong Man - 2013-11-18

    Fine by me. I imagine such usage is uncommon on this planet.

     
  • Neil Hodgson

    Neil Hodgson - 2013-11-18
    • labels: --> scintilla, lexer, bash
    • status: open --> open-fixed
    • assigned_to: Neil Hodgson
     
  • Neil Hodgson

    Neil Hodgson - 2013-12-01

    There was a problem with this when incrementally lexing from a line comment because sc.chPrev wasn't set to a line end but instead to 0, causing the line to not be treated as a comment. Fixed for now by adding \0 to set of characters. A better fix may be to initialize chPrev correctly.

     
  • Kein-Hong Man

    Kein-Hong Man - 2013-12-02

    Okay, I'll check the chPrev thingy. Needed to look at LexBash anyway; on a recent build I noticed a minor regression in the handling of certain kinds of incomplete HEREDOCs (complete HEREDOCs are fine). Will post a patch elsewhere later.

     
  • Neil Hodgson

    Neil Hodgson - 2013-12-12
    • status: open-fixed --> closed-fixed
     

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:





No, thanks