Update of /cvsroot/lxr/lxr
In directory sfp-cvs-1.v30.ch3.sourceforge.com:/tmp/cvs-serv10682
Modified Files:
source
Log Message:
source: UTF-8 truncation
This patch is an unperfect attempt to truncate author's name at a valid UTF-8 character boundary when displaying UTF-8. For other display encodings, truncation is made on a byte basis and may not be correct if an UTF-8 sequence is dumped into a non-UTF-8 stream. Reversely, truncation is erroneous for a non-UTF-8 string dumped into UTF-8 stream.
IMPORTANT: erroneous truncations may occur in diff since we do not take UTF-8 into account in sub htmljust().
Index: source
===================================================================
RCS file: /cvsroot/lxr/lxr/source,v
retrieving revision 1.69
retrieving revision 1.70
diff -u -d -r1.69 -r1.70
--- source 25 Oct 2014 18:29:06 -0000 1.69
+++ source 26 Oct 2014 13:47:18 -0000 1.70
@@ -770,7 +770,24 @@
}
$$r = $rev;
- my $la = length($auth);
+ # NOTE: modern VCSes return their annotations in Unicode, but user
+ # may have requested another display encoding (e.g. ISO-8859-x).
+ # We don't try to transcode since this may be time-consuming for
+ # little benefit. We just hope that, on average, truncation will
+ # not occur too frequently in the middle of an UTF-8 sequence.
+ # UTF-8-aware length computation and truncation is attempted only
+ # on author's name. Nothing is done on the revision id because
+ # it usually does not contain fancy characters (read it is numeric
+ # with eventual ASCII punctuation). svn allows more freedom in
+ # revision naming and may conflict with this choice. You'll also
+ # be in trouble when displaying UTF-8 with CVS returning ISO-8859.
+ my $la;
+ if ('utf-8' ne $config->{'encoding'}) {
+ $la = length($auth);
+ } else {
+ use utf8;
+ $la = length($auth);
+ };
my $lr = length($rev);
# After this call to length, $rev may be edited to contain
# HTML element and $lr will be different from length($rev).
@@ -783,8 +800,13 @@
) { # truncate first author
$la = 14 - $lr;
$la = 4 if $la < 4;
- $auth = substr($auth, 0, $la++)
- . '<span class="error">*</span>';
+ if ('utf-8' ne $config->{'encoding'}) {
+ $auth = substr($auth, 0, $la++)
+ } else {
+ use utf8;
+ $auth = substr($auth, 0, $la++)
+ };
+ $auth .= '<span class="error">*</span>';
}
if ($lr+$la >15) { # now truncate revision
$lr = 14 - $la;
|