@Andrew: there is a question for you below.
As part of the comment style change I have been using "less" to look
at a lot of our source code, and that revealed some corrupt characters
(i.e., characters with the parity-bit set) in two of our files.
To fix those issues I have changed (as of revision 11274) all
instances of 0xa0 (space with sign-bit set) to 0x20 (space) in
drivers/tkwin.c, and I have completely removed all instances of 0x85
(ctrl-E with sign bit set) in drivers/wingcc.c. In all cases, the
corrupted characters were in commentary or (in one case) in a menu
These files were developed on Windows quite a few years ago, and I
ascribe these corrupted characters to problematic editors or bad cvs
commits back then.
I was obviously concerned about the possibility these corrupted
characters might represent a general problem in our source code.
Therefore, I implemented (revision 11280) the utils/parity_bit_check
application in the build tree (source code in
utils/parity_bit_check.c) which is built by "make parity_bit_check" in
the build tree. This application finds the first stdin character with
parity bit set and returns that character as a return code (or returns
0 if there are no characters with parity bit set). That application
is run by scripts/parity_bit_check.sh to check for such issues for all
files in our source code except for those listed in
Currently that file has the following patterns that are excluded from
the check which I annotate here:
# Exclude these because they would not be part of fresh checkout
# exclude various image formats
# Exclude UTF-8 files. The latter part of this subset (from NEWS on) has
# recently been converted from latin1 to UTF-8 using iconv so that
# developer's names that occur in those files will be rendered
# correctly in the UTF-8 locale that is the norm these days.
# latin1 encoding.
(Andrew, is there any reason to keep this octave source file any more?
The idea behind it is to approximately render latin1 characters from
octave, but latin1 is extremely outdated now, and octave users would
be much better off to use the default PLplot utf8 encoding for all
# These files generated by some proprietary MS app which scattered
# some "smart" MS characters throughout. I haven't bothered to
# demoronize these files since they will likely be replaced in the
# future in any case.
# This php file was copied from some website by Werner and is used to give
# us a good-looking newsfeed on our website. Some of the characters in
# this file have their parity bit set (again probably thanks to our MS
# friends). It works in its present form on our website so I didn't
# bother to try and fix it.
There appear to be good reasons to exclude all the above files from
the parity bit check. Running the script then showed two README*
documentation files that needed to be fixed up (by replacing MS quotes
by ordinary quotes). After that fixup, running the script reveals no
remaining corrupted characters in our files, and in particular, there
are no corrupted characters in our language source files. That is
extremely good news, considering the large scope this problem _could_
Alan W. Irwin
Astronomical research affiliation with Department of Physics and Astronomy,
University of Victoria (astrowww.phys.uvic.ca).
Programming affiliations with the FreeEOS equation-of-state implementation
for stellar interiors (freeeos.sf.net); PLplot scientific plotting software
package (plplot.org); the libLASi project (unifont.org/lasi); the Loads of
Linux Links project (loll.sf.net); and the Linux Brochure Project