Menu

#282 Filenames with spaces still don't work

open
nobody
None
5
2014-11-20
2013-04-29
Darryl
No

(I realize that this won't be fixed any time soon, due to the difficulty, but I'm still reporting it for posterity.)

While you can index files whose (properly quoted) pathnames contain spaces, a match in a file whose pathname contains a space will cause cscope to fail with a "File does not have expected format" error. It is important to note that cscope will appear to work fine, as long as all matches are in files whose pathnames do not contain spaces. However, if a match occurs in a file whose pathname contains a space, cscope will fail.

This is caused by the use of fscanf() to parse pathnames, where whitespace is used to separate the fields. This will, of course, fail if the pathname contains a space.

The main example for this is in countrefs(), in src/command.c, near line 897:

while (EOF != (i = fscanf(refsfound, 
                          "%" PATHLEN_STR "s%" PATLEN_STR "s%" NUMLEN_STR "s %" TEMPSTRING_LEN_STR "[^\n]",
                          file, function, linenum, tempstring
                         )
              )
      ) {

If the filename contains a space, the above will improperly parse the filename, resulting in an eventual error.

Unfortunately, fixing this is non-trivial, as the use of scanf() and friends occurs in a number of places, and would likely require a slightly modified database format.

Discussion

  • Stéphane Aulery

    Hello,

    A user Debiian submitted a more comprehensive patch to address this problem. You'll find it at: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=579531 or in attachment.

    Can integrate it please?

    Thank you for your help.

    Regards,

    --
    Stéphane Aulery

     

    Last edit: Stéphane Aulery 2014-11-20
    • Hans-Bernhard Broeker

      The patch only improves the situation from "problems with pathnames containing whitespace" to "problems with pathnames containing tabulators". While that's an improvement, I don't feel comfortable with the externally visible changes it causes to the files handled by commands '<', '>', '^' and '|'.

      I can accept the change to cscope-indexer as-is.

      I can't vouch for the change to cscope.el any more than the OP on the debian tracker could, and our own ELisp expert-in-charge, Darryl Okahata, vanished a long time ago. Do you have an experienced ELisp hacker who could check this?

       
      • Stéphane Aulery

        Hello,

        Le jeudi 20 novembre 2014 à 07:01:43, Hans-Bernhard Broeker a écrit :

        The patch only improves the situation from "problems with pathnames
        containing whitespace" to "problems with pathnames containing tabulators".
        While that's an improvement, I don't feel comfortable with the externally
        visible changes it causes to the files handled by commands '<', '>', '^' and
        '|'.

        I can accept the change to cscope-indexer as-is.

        I can't vouch for the change to cscope.el any more than the OP on the debian
        tracker could, and our own ELisp expert-in-charge, Darryl Okahata, vanished
        a long time ago. Do you have an experienced ELisp hacker who could check
        this?

        Below are the answers I got. I'm forcing you to anything. Please tell me
        only if you accept the patch in whole or in part. If the problem of space in
        file names can not be resolved properly in the current state, it could be
        documented, maybe.

        Josh Triplett josh@joshtriplett.org:

        • sed -e 's/. ./\"&\"/' > $LIST_FILE

        I can't speak for the rest of the patch without digging into quite a
        bit
        of the context and assumptions in the elisp, but regarding this bit,
        rather than checking for spaces and only quoting filenames then, just
        always quote all filenames. That also makes sure you test that path
        thoroughly.

        Matthias Urlichs matthias@urlichs.de:

        Hmm. I notice that there's no code in this patch which removes the quotes
        you just added around these file names, which makes me wonder what other
        assumptions the code which does interpret them makes.

        Also, this code may or may not fail on filenames which contain quotes, tabs,
        newlines, backslashes, single quotes, and/or dollar signs, depending on what
        then interprets that file. (Whose filename also should be quoted?)

        Some of these characters obviously wouldn't be used by anybody who's even
        remotely sane, esp. in the context of source code, but not all of them. I
        would at least check whether $ and/or single quote works.

        Regarding this bit of elisp, the pattern change looks reasonably sane.

        Remi Vanicat vanicat@debian.org:

        The change to elisp only touch one regexp, and only to its beginning.
        when one remove the first level of quoting it is:

        ^([^ \t]+)[ \t]+([^ \t]+)[ \t]+([0-9]+)[ \t]+(.)\n"
        replaced by
        ^([^\t]+)[ \t]+([^ \t]+)[ \t]+([0-9]+)[ \t]+(.
        )\n"

        So it was looking for 4 group of char, separate by one space or tab, the
        first and second being made of any char not char or space, the third
        being made of digit, and the last of everything until end of line.

        It is now mostly the same, the only difference is that the first group
        may also contain space, and in practice must be ended by one tabulation
        (as the regexp is greedy, if there is a space at the end of the first
        group, it will be added to the first group...)

        By looking at
        http://www.opensource.apple.com/source/cscope/cscope-5/cscope/contrib/xcscope/xcscope.el
        it seem that this regexp is use to parse the output of some program,
        each line should always match it, the first group being the file, the
        second the function-name, the third the line-number and the last the
        line.

        This seem good, the only problem being that the old regexp make the
        assumption that the file and the function-name are separated by a space
        or a tab, and the new one might failed if the separator is a space and
        not a tab.

        Regards,

        --
        Stéphane Aulery

         
  • Stéphane Aulery

    I must say that I am triaging bug and I haven't verified the accuracy of the patch. Sorry. I can try to dig up a ELisp dev. on a Debian list.

     

Log in to post a comment.