UnrealIRCd / Bugs / #61 Chinese charset support

#61 Chinese charset support

Milestone: All Versions

Status: open

Owner: nobody

Labels: None

Priority: 5

Updated: 2001-05-26

Created: 2001-05-26

Creator: Anonymous

Private: No

Hello, my name is Alpha. I've found some problem of
UnrealIRCd in supporting chinese charset. The support
for chinese charset of Unreal3.1.1 is not thorough
enough. It only support GB chinese charset, that is
only common in mainland. For Hong Kong users (me) and
Taiwan users, the suitable chinese charset is Big5. So
I've added some lines to the s_user.c file, hope you
will add them to the newer version.
p.s. Unreal IRCd is really great.
Here is the code:

int isvalidChinese(const unsigned char c1, const
unsigned char c2)
{
const unsigned int GBK_S = 0xb0a1;
const unsigned int GBK_E = 0xf7fe;
const unsigned int GBK_2_S = 0x8140;
const unsigned int GBK_2_E = 0xa0fe;
const unsigned int GBK_3_S = 0xaa40;
const unsigned int GBK_3_E = 0xfea0;
const unsigned int BIG5_S = 0xa440;
const unsigned int BIG5_E = 0xc67e;
const unsigned int BIG5_2_S = 0xc940;
const unsigned int BIG5_2_E = 0xf9d5;
const unsigned int BIG5_3_S = 0xa140;
const unsigned int BIG5_3_E = 0xa3e0;
const unsigned int BIG5_4_S = 0xf9d6;
const unsigned int BIG5_4_E = 0xf9fe;
const unsigned int BIG5_5_S = 0xc6a1;
const unsigned int BIG5_5_E = 0xc7fe;
const unsigned int JPN_PING_S = 0xa4a1;
const unsigned int JPN_PING_E = 0xa4f3;
const unsigned int JPN_PIAN_S = 0xa5a1;
const unsigned int JPN_PIAN_E = 0xa5f7;
unsigned int AWord = c1 * 256 + c2;
#if defined(CHINESE_NICK) && defined(JAPANESE_NICK)
return (AWord >= GBK_S && AWord <= GBK_E || AWord
>= GBK_2_S && AWord <= GBK_2_E || AWord >= JPN_PING_S
&& AWord <= JPN_PING_E || AWord >= JPN_PIAN_S && AWord
<= JPN_PIAN_E || AWord >= BIG5_S && AWord <=BIG5_E ||
AWord >= BIG5_2_S && AWord <=BIG5_2_E || AWord >=
BIG5_3_S && AWord <=BIG5_3_E || AWord >= BIG5_4_S &&
AWord <=BIG5_4_E || AWord >= BIG5_5_S && AWord
<=BIG5_5_E) ? 1 : 0;
#endif
#if defined(CHINESE_NICK) && !defined(JAPANESE_NICK)
return (AWord >= GBK_S && AWord <= GBK_E || AWord
>= GBK_2_S && AWord <= GBK_2_E || AWord >= BIG5_S &&
AWord <=BIG5_E || AWord >= BIG5_2_S && AWord
<=BIG5_2_E || AWord >= BIG5_3_S && AWord <=BIG5_3_E ||
AWord >= BIG5_4_S && AWord <=BIG5_4_E || AWord >=
BIG5_5_S && AWord <=BIG5_5_E)? 1 : 0;
#endif
#if !defined(CHINESE_NICK) && defined(JAPANESE_NICK)
return (AWord >= JPN_PING_S && AWord <=
JPN_PING_E || AWord >= JPN_PIAN_S && AWord <=
JPN_PIAN_E) ? 1 : 0;
#endif

}

Discussion

Nobody/Anonymous - 2001-09-11

Logged In: NO

big 5 is not a sequential coding system...

in your "test", U presume that all "words" fall between
A440 - C67E are "chinese words in Big5"

in fact, the high byte of big5 are in
A1 - FE
8E - A0
81 - 8D

and the low byte are in
40 - 7E
A1 - FE

so after A47E, the next VALID big5 chinese is A4A1 and not
A47F as you might preusme....

------------------------------------

hope this can help..

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Nobody/Anonymous - 2002-02-01

Logged In: NO

what a mess code, why not use #define, haha
GB/GBK code still buggy!
and also, the JPN is just within GBK not the chatset which
Japanese system used, right?

part of my code

int isvalidChinese(const unsigned char c1, const unsigned
char c2)
{
unsigned int w = (((unsigned int)c1) << 8) | c2;

#define gbk(s, e) (w >= ((unsigned int)s) && w <=
((unsigned int)e)) ||

return (
gbk(0xb0a1, 0xd7fa) // gb2312
gbk(0xd8a1, 0xf7fe) // gb2312
............. so on
0
// finally failed

? 1 : 0);
#undef gbk
}

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Chinese charset support

Group

Searches

Help

#61 Chinese charset support

Discussion