Non-Latin 1 characters in the UI
Status: Beta
Brought to you by:
morbus
Non-Latin 1 characters are rendered as garbage in the UI.
e.g. when adding two channels in UTF the UI reports
AmphetaDesk has added 'IBM Aktuelles - Österreich'
successfully.
AmphetaDesk has added 'IBM Новости -
Российская Федерация'
successfully.
Logged In: YES
user_id=69804
I'm not sure the best way to handle this. Since AmphetaDesk just
passes the data off to the web browser for display, it seems the
proper thing to do is change all non-latin's to their equivalent
HTML entity with HTML::Entities. A stronger solution would be to
convert them to their decimal encoding (since only a limited set of
HTML wordlike [amp; copy; etc.] entities are available).
Logged In: YES
user_id=365576
The browser interface uses UTF-8 and renders characters correctly.
The non-browser interface, however, renders all text as single byte characters and is broken.
Sample channel: http://www.ibm.com/news/at/de/index.rss
Logged In: YES
user_id=69804
Ah, I see what you mean. Missed the "UI reports" in your
original bug report. I hesitate to think I can fix this
though - the Win32 UI is a standard text widget box, and I
recall nothing about making it utf8 aware. Nor has
Win32::GUI been worked on for a while.
Logged In: YES
user_id=365576
Win32::GUI does not seem to support UTF-8 indeed, a partial
solution would be transcoding the string to Latin-1, which
would preserve encoded Latin-1 entities, and eliminate other
characters which cannot be rendered from the title, or display
the URL instead of the title.
Logged In: YES
user_id=69804
Klaus, a new version of Win32::GUI has recently been
released, but I see nothing magical about utf8 support.
Could you demonstrate some code, or provide a patch, to
Latin1 data received by AmphetaDesk::OS::Windows::gui_note?
Should this Latin1 be done in all gui_notes? How do UTF8
characters get displayed in a normal Linux terminal?
Logged In: YES
user_id=365576
Unfortunately the Encode module is only available for Perl 5.8
and higher (see http://www.mail-archive.com/perl-
unicode@perl.org/msg01461.html), and the encode/decode
functions in utf8 are not available with Perl 5.6 either, so the
only option would be manually translating known UTF-8
characters (essentially recreating the functionality of
Encode.pm for UTF-8) or stripping every character > x7f
Logged In: YES
user_id=365576
Something like
sub gui_note {
use utf8;
my ($message) = @_;
$message =~ s/[^\x01-\x7f]/*/g;
might help strip the characters that cannot be rendered --
haven't found a good way to convert Latin-1 characters in
UTF-8 encoding back to Latin-1 codepoints.