Non-Latin 1 characters in the UI

Status: Beta

Brought to you by: morbus

#8 Non-Latin 1 characters in the UI

Milestone: v1.0_(example)

Status: open

Owner: Morbus Iff

Labels: Interface (example) (1)

Priority: 5

Updated: 2003-09-06

Created: 2002-12-18

Creator: Klaus Johannes Rusch

Private: No

Non-Latin 1 characters are rendered as garbage in the UI.
e.g. when adding two channels in UTF the UI reports

AmphetaDesk has added 'IBM Aktuelles - Österreich'
successfully.
AmphetaDesk has added 'IBM Новости -
Российская Федерация'
successfully.

Discussion

Morbus Iff - 2003-08-27

assigned_to: nobody --> morbus
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Morbus Iff - 2003-08-27

Logged In: YES
user_id=69804

I'm not sure the best way to handle this. Since AmphetaDesk just
passes the data off to the web browser for display, it seems the
proper thing to do is change all non-latin's to their equivalent
HTML entity with HTML::Entities. A stronger solution would be to
convert them to their decimal encoding (since only a limited set of
HTML wordlike [amp; copy; etc.] entities are available).

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Klaus Johannes Rusch - 2003-08-27

Logged In: YES
user_id=365576

The browser interface uses UTF-8 and renders characters correctly.
The non-browser interface, however, renders all text as single byte characters and is broken.

Sample channel: http://www.ibm.com/news/at/de/index.rss

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Morbus Iff - 2003-08-27

Logged In: YES
user_id=69804

Ah, I see what you mean. Missed the "UI reports" in your
original bug report. I hesitate to think I can fix this
though - the Win32 UI is a standard text widget box, and I
recall nothing about making it utf8 aware. Nor has
Win32::GUI been worked on for a while.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Klaus Johannes Rusch - 2003-09-06

Logged In: YES
user_id=365576

Win32::GUI does not seem to support UTF-8 indeed, a partial
solution would be transcoding the string to Latin-1, which
would preserve encoded Latin-1 entities, and eliminate other
characters which cannot be rendered from the title, or display
the URL instead of the title.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Klaus Johannes Rusch - 2003-09-06

summary: Non-Latin 1 characters in the U --> Non-Latin 1 characters in the UI
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Morbus Iff - 2003-12-18

Logged In: YES
user_id=69804

Klaus, a new version of Win32::GUI has recently been
released, but I see nothing magical about utf8 support.
Could you demonstrate some code, or provide a patch, to
Latin1 data received by AmphetaDesk::OS::Windows::gui_note?

Should this Latin1 be done in all gui_notes? How do UTF8
characters get displayed in a normal Linux terminal?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Klaus Johannes Rusch - 2003-12-21

Logged In: YES
user_id=365576

Unfortunately the Encode module is only available for Perl 5.8
and higher (see http://www.mail-archive.com/perl-
unicode@perl.org/msg01461.html), and the encode/decode
functions in utf8 are not available with Perl 5.6 either, so the
only option would be manually translating known UTF-8
characters (essentially recreating the functionality of
Encode.pm for UTF-8) or stripping every character > x7f

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Klaus Johannes Rusch - 2003-12-21

Logged In: YES
user_id=365576

Something like

sub gui_note {
use utf8;
my ($message) = @_;
$message =~ s/[^\x01-\x7f]/*/g;

might help strip the characters that cannot be rendered --
haven't found a good way to convert Latin-1 characters in
UTF-8 encoding back to Latin-1 codepoints.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Log in to post a comment.