Menu

#74 Nonstandard whitespace in source confuses Unix display

closed-fixed
5
2004-01-08
2002-01-24
Anonymous
No

CSCOPE 15.3 compiled with GCC 3.0.x on Solaris 7 using
Solaris curses. Source code is from a DOS machine and
has CR/LF at the end of lines. Unless the source line
is too long to fit on the screen (a large one -
136x61), the display for the source lines is blank;
remove the CR characters, and all is back to normal.

Discussion

  • Hans-Bernhard Broeker

    • assigned_to: nobody --> broeker
     
  • Hans-Bernhard Broeker

    Logged In: YES
    user_id=27517

    The real breakage is outside cscope, of course: you should
    have transferred these files as text, and thus got rid of
    those CR's at the source already. That's what ftp has that
    "ascii" transfer mode for, see? Or zip/unzip their '-a'
    option.

    I'm not quite sure it would be a good idea to artificially
    remove CR
    characters from cscope.out if they are found by cscope
    running on a system that doesn't usually have CRLF lineends.
    Plus, it might be rather hard to do.

    Can't you just

    recode ibmpc..latin1
    

    your files and be done with it?

     
  • Jonathan Leffler

    Logged In: YES
    user_id=22937

    I submitted this bug - can we update the records to reflect
    this?

    I agree with Hans-Bernhard that I should have converted the
    source; indeed, that was the workaround. At the time when I
    ran CSCOPE on the source, I was not aware that it had been
    modified by a DOS box.

    I note that although I mentioned carriage return (CR or ^M)
    as the problem, vertical tab (VT or ^K) and formfeed (FF or
    ^L) also yield interesting effects. Note that GCC accepts
    the source code as valid - primarily because all those
    characters are (presumably) mapped to white space and none
    of the syntax depends on the type of white space. I think
    that it would be wise for CSCOPE to note that characters
    such as these do not have a printable representation and to
    ensure that when the character is displayed, it is handled
    suitably.

    What is suitable? That is harder to answer. I incline
    towards printing a space, at least for characters that
    satisfy isspace(c). For the more general case, I don't know
    that I have a good solution to offer -- maybe '@' since it
    is not a valid character in C outside of a string.

    You may well be right to decide this is outside the scope of
    your project. OTOH, it means there are some files that are
    accepted by some C compilers that cannot usefully be shown
    by CSCOPE on some platforms; it seems a pity not to handle
    it somehow. However, I have not looked at what it would
    take to handle it properly, so I am not in a position to
    offer detailed coding suggestions (yet).

    Question (for which I certainly don't have an answer): will
    C compilers start accepting Unicode characters? Probably
    not, but...

     
  • Hans-Bernhard Broeker

    Logged In: YES
    user_id=27517

    We can't update the records, but that doesn't really matter
    --- you can subscribe to be emailed updates of the bugreport
    without being the submittor.

    GCC may seem to accept those files as-is, but I wouldn't bet
    money on that before trying whether the preprocessor does,
    too. The typical breakage is that backslash-newline
    sequences aren't recognized if there's a CRLF as the
    newline. To GCC (at least to all versions I've tested that
    on), this doesn't qualify as backslash-newline, and that
    leads to all kinds of funny problems.
    To test, consider this two-liner:

    #define MACRO blabla \
    #error backslash-newline got eaten!

    And see what happens:

    broeker ~> gcc -E tt.c
    # 1 "tt.c"

    broeker ~> unix2dos tt.c
    unix2dos: converting file tt.c to DOS format ...
    broeker ~> gcc -E tt.c
    tt.c:2: #error backslash-newline got eaten!
    # 1 "tt.c"

    This is with gcc-2.8.1 on an ancient Linux box. But I very
    strongly suspect the same bug to happen if there's any other
    whitespace (FF, VT, HT) between the backslash and the
    newline, and also with more recent GCCs.

    I'll change the subject line to more closely reflect the
    scope of this problem then.

     
  • Hans-Bernhard Broeker

    • summary: CR/LF in source confuses Unix display --> Nonstandard whitespace in source confuses Unix display
     
  • Hans-Bernhard Broeker

    Logged In: YES
    user_id=27517

    I've checked in a patch to fix this at least for the
    flex-based scanner (fscanner.l).

     
  • Hans-Bernhard Broeker

    • status: open --> closed-fixed
     
  • Hans-Bernhard Broeker

    Logged In: YES
    user_id=27517

    An equivalent patch for scanner.l is (finally) going in,
    too. Closing this.

     

Log in to post a comment.

MongoDB Logo MongoDB