cscope / Bugs / #74 Nonstandard whitespace in source confuses Unix display

Hans-Bernhard Broeker - 2002-01-25

assigned_to: nobody --> broeker
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Hans-Bernhard Broeker - 2002-01-25

Logged In: YES
user_id=27517

The real breakage is outside cscope, of course: you should
have transferred these files as text, and thus got rid of
those CR's at the source already. That's what ftp has that
"ascii" transfer mode for, see? Or zip/unzip their '-a'
option.

I'm not quite sure it would be a good idea to artificially
remove CR
characters from cscope.out if they are found by cscope
running on a system that doesn't usually have CRLF lineends.
Plus, it might be rather hard to do.

Can't you just

recode ibmpc..latin1

your files and be done with it?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Jonathan Leffler - 2002-01-28

Logged In: YES
user_id=22937

I submitted this bug - can we update the records to reflect
this?

I agree with Hans-Bernhard that I should have converted the
source; indeed, that was the workaround. At the time when I
ran CSCOPE on the source, I was not aware that it had been
modified by a DOS box.

I note that although I mentioned carriage return (CR or ^M)
as the problem, vertical tab (VT or ^K) and formfeed (FF or
^L) also yield interesting effects. Note that GCC accepts
the source code as valid - primarily because all those
characters are (presumably) mapped to white space and none
of the syntax depends on the type of white space. I think
that it would be wise for CSCOPE to note that characters
such as these do not have a printable representation and to
ensure that when the character is displayed, it is handled
suitably.

What is suitable? That is harder to answer. I incline
towards printing a space, at least for characters that
satisfy isspace(c). For the more general case, I don't know
that I have a good solution to offer -- maybe '@' since it
is not a valid character in C outside of a string.

You may well be right to decide this is outside the scope of
your project. OTOH, it means there are some files that are
accepted by some C compilers that cannot usefully be shown
by CSCOPE on some platforms; it seems a pity not to handle
it somehow. However, I have not looked at what it would
take to handle it properly, so I am not in a position to
offer detailed coding suggestions (yet).

Question (for which I certainly don't have an answer): will
C compilers start accepting Unicode characters? Probably
not, but...

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Hans-Bernhard Broeker - 2002-01-29

Logged In: YES
user_id=27517

We can't update the records, but that doesn't really matter
--- you can subscribe to be emailed updates of the bugreport
without being the submittor.

GCC may seem to accept those files as-is, but I wouldn't bet
money on that before trying whether the preprocessor does,
too. The typical breakage is that backslash-newline
sequences aren't recognized if there's a CRLF as the
newline. To GCC (at least to all versions I've tested that
on), this doesn't qualify as backslash-newline, and that
leads to all kinds of funny problems.
To test, consider this two-liner:

#define MACRO blabla \
#error backslash-newline got eaten!

And see what happens:

broeker ~> gcc -E tt.c
# 1 "tt.c"

broeker ~> unix2dos tt.c
unix2dos: converting file tt.c to DOS format ...
broeker ~> gcc -E tt.c
tt.c:2: #error backslash-newline got eaten!
# 1 "tt.c"

This is with gcc-2.8.1 on an ancient Linux box. But I very
strongly suspect the same bug to happen if there's any other
whitespace (FF, VT, HT) between the backslash and the
newline, and also with more recent GCCs.

I'll change the subject line to more closely reflect the
scope of this problem then.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Hans-Bernhard Broeker - 2002-01-29

summary: CR/LF in source confuses Unix display --> Nonstandard whitespace in source confuses Unix display
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Hans-Bernhard Broeker - 2002-06-20

Logged In: YES
user_id=27517

I've checked in a patch to fix this at least for the
flex-based scanner (fscanner.l).

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Hans-Bernhard Broeker - 2004-01-08

status: open --> closed-fixed
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Hans-Bernhard Broeker - 2004-01-08

Logged In: YES
user_id=27517

An equivalent patch for scanner.l is (finally) going in,
too. Closing this.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Nonstandard whitespace in source confuses Unix display

Efficient, text-only browser for C sources

Group

Searches

Help

#74 Nonstandard whitespace in source confuses Unix display

Discussion