Menu

#61 Chinese charset support

All Versions
open
nobody
None
5
2001-05-26
2001-05-26
Anonymous
No

Hello, my name is Alpha. I've found some problem of
UnrealIRCd in supporting chinese charset. The support
for chinese charset of Unreal3.1.1 is not thorough
enough. It only support GB chinese charset, that is
only common in mainland. For Hong Kong users (me) and
Taiwan users, the suitable chinese charset is Big5. So
I've added some lines to the s_user.c file, hope you
will add them to the newer version.
p.s. Unreal IRCd is really great.
Here is the code:

int isvalidChinese(const unsigned char c1, const
unsigned char c2)
{
const unsigned int GBK_S = 0xb0a1;
const unsigned int GBK_E = 0xf7fe;
const unsigned int GBK_2_S = 0x8140;
const unsigned int GBK_2_E = 0xa0fe;
const unsigned int GBK_3_S = 0xaa40;
const unsigned int GBK_3_E = 0xfea0;
const unsigned int BIG5_S = 0xa440;
const unsigned int BIG5_E = 0xc67e;
const unsigned int BIG5_2_S = 0xc940;
const unsigned int BIG5_2_E = 0xf9d5;
const unsigned int BIG5_3_S = 0xa140;
const unsigned int BIG5_3_E = 0xa3e0;
const unsigned int BIG5_4_S = 0xf9d6;
const unsigned int BIG5_4_E = 0xf9fe;
const unsigned int BIG5_5_S = 0xc6a1;
const unsigned int BIG5_5_E = 0xc7fe;
const unsigned int JPN_PING_S = 0xa4a1;
const unsigned int JPN_PING_E = 0xa4f3;
const unsigned int JPN_PIAN_S = 0xa5a1;
const unsigned int JPN_PIAN_E = 0xa5f7;
unsigned int AWord = c1 * 256 + c2;
#if defined(CHINESE_NICK) && defined(JAPANESE_NICK)
return (AWord >= GBK_S && AWord <= GBK_E || AWord
>= GBK_2_S && AWord <= GBK_2_E || AWord >= JPN_PING_S
&& AWord <= JPN_PING_E || AWord >= JPN_PIAN_S && AWord
<= JPN_PIAN_E || AWord >= BIG5_S && AWord <=BIG5_E ||
AWord >= BIG5_2_S && AWord <=BIG5_2_E || AWord >=
BIG5_3_S && AWord <=BIG5_3_E || AWord >= BIG5_4_S &&
AWord <=BIG5_4_E || AWord >= BIG5_5_S && AWord
<=BIG5_5_E) ? 1 : 0;
#endif
#if defined(CHINESE_NICK) && !defined(JAPANESE_NICK)
return (AWord >= GBK_S && AWord <= GBK_E || AWord
>= GBK_2_S && AWord <= GBK_2_E || AWord >= BIG5_S &&
AWord <=BIG5_E || AWord >= BIG5_2_S && AWord
<=BIG5_2_E || AWord >= BIG5_3_S && AWord <=BIG5_3_E ||
AWord >= BIG5_4_S && AWord <=BIG5_4_E || AWord >=
BIG5_5_S && AWord <=BIG5_5_E)? 1 : 0;
#endif
#if !defined(CHINESE_NICK) && defined(JAPANESE_NICK)
return (AWord >= JPN_PING_S && AWord <=
JPN_PING_E || AWord >= JPN_PIAN_S && AWord <=
JPN_PIAN_E) ? 1 : 0;
#endif

}

Discussion

  • Nobody/Anonymous

    Logged In: NO

    big 5 is not a sequential coding system...

    in your "test", U presume that all "words" fall between
    A440 - C67E are "chinese words in Big5"

    in fact, the high byte of big5 are in
    A1 - FE
    8E - A0
    81 - 8D

    and the low byte are in
    40 - 7E
    A1 - FE

    so after A47E, the next VALID big5 chinese is A4A1 and not
    A47F as you might preusme....

    ------------------------------------

    hope this can help..

     
  • Nobody/Anonymous

    Logged In: NO

    what a mess code, why not use #define, haha
    GB/GBK code still buggy!
    and also, the JPN is just within GBK not the chatset which
    Japanese system used, right?

    part of my code

    int isvalidChinese(const unsigned char c1, const unsigned
    char c2)
    {
    unsigned int w = (((unsigned int)c1) << 8) | c2;

    #define gbk(s, e) (w >= ((unsigned int)s) && w <=
    ((unsigned int)e)) ||

    return (
    gbk(0xb0a1, 0xd7fa) // gb2312
    gbk(0xd8a1, 0xf7fe) // gb2312
    ............. so on
    0
    // finally failed

    ? 1 : 0);
    #undef gbk
    }

     

Log in to post a comment.