#8 Non-Latin 1 characters in the UI

v1.0_(example)
open
Morbus Iff
5
2003-09-06
2002-12-18
No

Non-Latin 1 characters are rendered as garbage in the UI.
e.g. when adding two channels in UTF the UI reports

AmphetaDesk has added 'IBM Aktuelles - Österreich'
successfully.
AmphetaDesk has added 'IBM Новости -
Российская Федерация'
successfully.

Discussion

  • Morbus Iff
    Morbus Iff
    2003-08-27

    • assigned_to: nobody --> morbus
     
  • Morbus Iff
    Morbus Iff
    2003-08-27

    Logged In: YES
    user_id=69804

    I'm not sure the best way to handle this. Since AmphetaDesk just
    passes the data off to the web browser for display, it seems the
    proper thing to do is change all non-latin's to their equivalent
    HTML entity with HTML::Entities. A stronger solution would be to
    convert them to their decimal encoding (since only a limited set of
    HTML wordlike [amp; copy; etc.] entities are available).

     
  • Logged In: YES
    user_id=365576

    The browser interface uses UTF-8 and renders characters correctly.
    The non-browser interface, however, renders all text as single byte characters and is broken.

    Sample channel: http://www.ibm.com/news/at/de/index.rss

     
  • Morbus Iff
    Morbus Iff
    2003-08-27

    Logged In: YES
    user_id=69804

    Ah, I see what you mean. Missed the "UI reports" in your
    original bug report. I hesitate to think I can fix this
    though - the Win32 UI is a standard text widget box, and I
    recall nothing about making it utf8 aware. Nor has
    Win32::GUI been worked on for a while.

     
  • Logged In: YES
    user_id=365576

    Win32::GUI does not seem to support UTF-8 indeed, a partial
    solution would be transcoding the string to Latin-1, which
    would preserve encoded Latin-1 entities, and eliminate other
    characters which cannot be rendered from the title, or display
    the URL instead of the title.

     
    • summary: Non-Latin 1 characters in the U --> Non-Latin 1 characters in the UI
     
  • Morbus Iff
    Morbus Iff
    2003-12-18

    Logged In: YES
    user_id=69804

    Klaus, a new version of Win32::GUI has recently been
    released, but I see nothing magical about utf8 support.
    Could you demonstrate some code, or provide a patch, to
    Latin1 data received by AmphetaDesk::OS::Windows::gui_note?

    Should this Latin1 be done in all gui_notes? How do UTF8
    characters get displayed in a normal Linux terminal?

     
  • Logged In: YES
    user_id=365576

    Unfortunately the Encode module is only available for Perl 5.8
    and higher (see http://www.mail-archive.com/perl-
    unicode@perl.org/msg01461.html), and the encode/decode
    functions in utf8 are not available with Perl 5.6 either, so the
    only option would be manually translating known UTF-8
    characters (essentially recreating the functionality of
    Encode.pm for UTF-8) or stripping every character > x7f

     
  • Logged In: YES
    user_id=365576

    Something like

    sub gui_note {
    use utf8;
    my ($message) = @_;
    $message =~ s/[^\x01-\x7f]/*/g;

    might help strip the characters that cannot be rendered --
    haven't found a good way to convert Latin-1 characters in
    UTF-8 encoding back to Latin-1 codepoints.