[Lxr-commits] CVS: lxr source,1.69,1.70
Brought to you by:
ajlittoz
From: Andre-Littoz <ajl...@us...> - 2014-10-26 13:47:21
|
Update of /cvsroot/lxr/lxr In directory sfp-cvs-1.v30.ch3.sourceforge.com:/tmp/cvs-serv10682 Modified Files: source Log Message: source: UTF-8 truncation This patch is an unperfect attempt to truncate author's name at a valid UTF-8 character boundary when displaying UTF-8. For other display encodings, truncation is made on a byte basis and may not be correct if an UTF-8 sequence is dumped into a non-UTF-8 stream. Reversely, truncation is erroneous for a non-UTF-8 string dumped into UTF-8 stream. IMPORTANT: erroneous truncations may occur in diff since we do not take UTF-8 into account in sub htmljust(). Index: source =================================================================== RCS file: /cvsroot/lxr/lxr/source,v retrieving revision 1.69 retrieving revision 1.70 diff -u -d -r1.69 -r1.70 --- source 25 Oct 2014 18:29:06 -0000 1.69 +++ source 26 Oct 2014 13:47:18 -0000 1.70 @@ -770,7 +770,24 @@ } $$r = $rev; - my $la = length($auth); + # NOTE: modern VCSes return their annotations in Unicode, but user + # may have requested another display encoding (e.g. ISO-8859-x). + # We don't try to transcode since this may be time-consuming for + # little benefit. We just hope that, on average, truncation will + # not occur too frequently in the middle of an UTF-8 sequence. + # UTF-8-aware length computation and truncation is attempted only + # on author's name. Nothing is done on the revision id because + # it usually does not contain fancy characters (read it is numeric + # with eventual ASCII punctuation). svn allows more freedom in + # revision naming and may conflict with this choice. You'll also + # be in trouble when displaying UTF-8 with CVS returning ISO-8859. + my $la; + if ('utf-8' ne $config->{'encoding'}) { + $la = length($auth); + } else { + use utf8; + $la = length($auth); + }; my $lr = length($rev); # After this call to length, $rev may be edited to contain # HTML element and $lr will be different from length($rev). @@ -783,8 +800,13 @@ ) { # truncate first author $la = 14 - $lr; $la = 4 if $la < 4; - $auth = substr($auth, 0, $la++) - . '<span class="error">*</span>'; + if ('utf-8' ne $config->{'encoding'}) { + $auth = substr($auth, 0, $la++) + } else { + use utf8; + $auth = substr($auth, 0, $la++) + }; + $auth .= '<span class="error">*</span>'; } if ($lr+$la >15) { # now truncate revision $lr = 14 - $la; |