From: Richard D. <rd...@us...> - 2005-10-11 16:55:40
|
> Richard Donkin wrote: > >>Here's what I get when trying to do an Edit on >>http://develop.twiki.org/~develop/cgi-bin/view/Bugs/Item625 - although I >>am already recognised as RichardDonkin in the left bar, it redirects me >> to >>http://develop.twiki.org/~develop/cgi-bin/login/Bugs/Item625 and gives >>this message: >> >> > OK, so you have logged in. > >>TWiki detected an error or attempted hack - please check your TWiki logs >>and webserver logs for more information. >> >>Malformed UTF-8 character (unexpected continuation byte 0xa4) in >>substitution iterator >> >> > Och, that's a bug. We haven't worked out why it happens, but it appears > to be unique to perl 5.6.1 installations that have locales switched on, > when someone enters a character using a charset different to the one on > the server - in this case, the euro character that Will entered, I > suspect. > > Your guidance on correcting this would be very welcome. > I had a think and a Google about this UTF-8 issue - basically Perl's UTF-8 mode is being turned on, hence the ISO-8859-x characters are misinterpreted as UTF-8. I believe the culprit is the 'use utf8' line in http://search.cpan.org/src/SBURKE/Locale-Maketext-1.09/lib/Locale/Maketext/Guts.pm and possibly some other Locale::Maketext or L::M::Lexicon modules. 'use utf8' in Perl 5.6 switches on UTF-8 mode and causes this issue. In Perl 5.8, it does very little (just makes the Perl parser allow UTF-8 characters in literals, variable names, etc - see 'perldoc utf8'). The quick solution IMO is to comment out all 'use utf8' lines in these modules - after all, TWiki is not yet meant to work in Perl UTF-8 mode (it's a painful process as I discovered), and Perl 5.6's UTF-8 is so deeply broken that its UTF-8 mode should never, ever be used. (In fact only the very latest Perl 5.8.x should be considered for UTF-8 since significant bugs are fixed in each release.) Protecting the 'use utf8' with a test for $] in a BEGIN block, as in the dynamic 'use locale' code in TWiki, might be a good idea, and the Locale::Maketext maintainers might even accept this as a patch for non-UTF-8 users. Cheers, Richard |