SquirrelMail / Internationalization / #33 UTF-8 implementation missing?

#33 UTF-8 implementation missing?

Status: closed

Owner: nobody

Labels: None

Priority: 5

Updated: 2006-04-10

Created: 2006-04-06

Creator: tokyoahead

Private: No

Hi,

You set your default encoding to UTF-8.
if you receive an email in an encoding other than
UTF-8, you cannot read it, although you have mb_string
enabled.

for example, SM uses EUC as Japanese encoding, although
the std. might be set to UTF-8 (why?). So if you set
the langauge to Japanese, send an email to yourself and
then switch back to english, it cannot be read.

if utf-8 is used, the system should be automatically
converting the encoding with mb_string functions to
utf-8. Otherwise you can read only utf-8 sent-messages,
even though they come from the same copy of Squirrelmail.

Also I suggest _always_ using UTF-8 as encoding if the
default encoding if UTF-8. Why set it so if its not
done? UTF-8 was meant for that IMHO.

Oliver

Discussion

Tomas Kuliavas - 2006-04-06

labels: 102904 -->

milestone: 102172 -->
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Tomas Kuliavas - 2006-04-06

Logged In: YES
user_id=225877

> for example, SM uses EUC as Japanese encoding,
> although the std. might be set to UTF-8 (why?).

Historical reasons. Japanese patch was written by
person that selected euc-jp and iso-2022-jp charsets.

> You set your default encoding to UTF-8.
> if you receive an email in an encoding other than
> UTF-8, you cannot read it, although you have mb_string
> enabled.

Please don't use generalizations. Decoding of character
sets is not related to mbstring extension. SquirrelMail
doesn't use mbstring functions to read emails. You can
read emails in other character sets. Stock SquirrelMail
1.4.6 supports 33 different character sets. CJK
character sets are not supported because CJK decoding
requirements don't fit into SquirrelMail minimal
requirements. We can't use PHP mbstring extension
to read emails, because PHP 4.1.x mbstring extension
does not support charsets needed for decoding.

CJK character sets are supported by extra decoding library.

If you have problems with extra decoding library, explain
them and provide more information about message that you
can't read.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Tomas Kuliavas - 2006-04-10

Logged In: YES
user_id=225877

If you have SquirrelMail 1.4.6 with extra decoding library
and PHP with recode or iconv support and some CJK or other
character set is not supported or support is broken, please
file other bug report and provide details about your php
configuration, source of broken email and some image that
shows how text should look.

Please note that UTF-8 0x00200000 - 0x03FFFFFF and
0x04000000 - 0x7FFFFFFF ranges (5-6 byte characters) are not
decoded by default in order to reduce number of complex
regexps and calculations. I'll enable appropriate code only
if people ask for it and provide test emails that use those
utf-8 ranges.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Tomas Kuliavas - 2006-04-10

status: open --> closed
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Thijs Kinkhorst - 2006-04-14

Logged In: YES
user_id=285765

Would it be an idea to (at least for new translations) make
UTF-8 the used charset? Only if people have compelling
reasons to use another charset we could use that.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Log in to post a comment.