Menu

#33 UTF-8 implementation missing?

closed
nobody
None
5
2006-04-10
2006-04-06
tokyoahead
No

Hi,

You set your default encoding to UTF-8.
if you receive an email in an encoding other than
UTF-8, you cannot read it, although you have mb_string
enabled.

for example, SM uses EUC as Japanese encoding, although
the std. might be set to UTF-8 (why?). So if you set
the langauge to Japanese, send an email to yourself and
then switch back to english, it cannot be read.

if utf-8 is used, the system should be automatically
converting the encoding with mb_string functions to
utf-8. Otherwise you can read only utf-8 sent-messages,
even though they come from the same copy of Squirrelmail.

Also I suggest _always_ using UTF-8 as encoding if the
default encoding if UTF-8. Why set it so if its not
done? UTF-8 was meant for that IMHO.

Oliver

Discussion

  • Tomas Kuliavas

    Tomas Kuliavas - 2006-04-06
    • labels: 102904 -->
    • milestone: 102172 -->
     
  • Tomas Kuliavas

    Tomas Kuliavas - 2006-04-06

    Logged In: YES
    user_id=225877

    > for example, SM uses EUC as Japanese encoding,
    > although the std. might be set to UTF-8 (why?).

    Historical reasons. Japanese patch was written by
    person that selected euc-jp and iso-2022-jp charsets.

    > You set your default encoding to UTF-8.
    > if you receive an email in an encoding other than
    > UTF-8, you cannot read it, although you have mb_string
    > enabled.

    Please don't use generalizations. Decoding of character
    sets is not related to mbstring extension. SquirrelMail
    doesn't use mbstring functions to read emails. You can
    read emails in other character sets. Stock SquirrelMail
    1.4.6 supports 33 different character sets. CJK
    character sets are not supported because CJK decoding
    requirements don't fit into SquirrelMail minimal
    requirements. We can't use PHP mbstring extension
    to read emails, because PHP 4.1.x mbstring extension
    does not support charsets needed for decoding.

    CJK character sets are supported by extra decoding library.

    If you have problems with extra decoding library, explain
    them and provide more information about message that you
    can't read.

     
  • Tomas Kuliavas

    Tomas Kuliavas - 2006-04-10

    Logged In: YES
    user_id=225877

    If you have SquirrelMail 1.4.6 with extra decoding library
    and PHP with recode or iconv support and some CJK or other
    character set is not supported or support is broken, please
    file other bug report and provide details about your php
    configuration, source of broken email and some image that
    shows how text should look.

    Please note that UTF-8 0x00200000 - 0x03FFFFFF and
    0x04000000 - 0x7FFFFFFF ranges (5-6 byte characters) are not
    decoded by default in order to reduce number of complex
    regexps and calculations. I'll enable appropriate code only
    if people ask for it and provide test emails that use those
    utf-8 ranges.

     
  • Tomas Kuliavas

    Tomas Kuliavas - 2006-04-10
    • status: open --> closed
     
  • Thijs Kinkhorst

    Thijs Kinkhorst - 2006-04-14

    Logged In: YES
    user_id=285765

    Would it be an idea to (at least for new translations) make
    UTF-8 the used charset? Only if people have compelling
    reasons to use another charset we could use that.

     

Log in to post a comment.