#22 Please changes all UTF-8 chararacters in CXX to escaped form

closed
None
5
2007-06-14
2007-04-04
b6s
No

For example, please change UTF-8 BOM to "\xEF\xBB\xBF" since some compiler (e.g. VC++) on some locale (e.g. Chinese) will complain about "newline constant" on UTF-8 characters.

Thank you.

Discussion

  • b6s

    b6s - 2007-04-04

    Logged In: YES
    user_id=1561489
    Originator: YES

    File Added: hunspell.cxx.diff

     
  • b6s

    b6s - 2007-04-04
     
  • b6s

    b6s - 2007-04-04
     
  • b6s

    b6s - 2007-04-04

    Logged In: YES
    user_id=1561489
    Originator: YES

    File Added: hashmgr.cxx.diff

     
  • b6s

    b6s - 2007-04-04

    Logged In: YES
    user_id=1561489
    Originator: YES

    File Added: affixmgr.cxx.diff

     
  • b6s

    b6s - 2007-04-04
     
  • b6s

    b6s - 2007-04-04

    Logged In: YES
    user_id=1561489
    Originator: YES

    The attached diff files could not be correct:

    1. The original CXX file might be corrupted due to my OS locale.
    2. Because of 1, there's many "empty constant" characters, those I suspect that they might be some UTF-8 character originally.

     
  • Németh László

    Logged In: YES
    user_id=726595
    Originator: NO

    Thank you for your help and patches. They will be in the next release. Laci

     
  • Németh László

    • assigned_to: nobody --> nemethl
     
  • Nobody/Anonymous

    Logged In: NO

    diff -r1.1 hunspell.cxx
    309c309
    < if (*source == '?) *--p = '?;
    ---
    > if (*source == '\x9F') *--p = '\xDF';
    314c314
    < // recursive search for right ss-?permutations
    ---
    > // recursive search for right ss-\xDF permutations
    319,320c319,320
    < *pos = '?;
    < *(pos + 1) = '?;
    ---
    > *pos = '\xC3';
    > *(pos + 1) = '\x9F';
    394c394
    < if ((nstate == NNUM) && ((cw[i] == '%') || (cw[i] == '?))
    ---
    > if ((nstate == NNUM) && ((cw[i] == '%') || (cw[i] == '\xB0'))
    450c450
    < // if CHECKSHARPS: KEEPCASE words with ?are allowed
    ---
    > // if CHECKSHARPS: KEEPCASE words with \xDF are allowed
    452,453c452,453
    < pAMgr->get_checksharps() && ((utf8 && strstr(wspace, "脽")) ||
    < (!utf8 && strchr(wspace, '?)))))) {
    ---
    > pAMgr->get_checksharps() && ((utf8 && strstr(wspace, "\xC3\x9F")) ||
    > (!utf8 && strchr(wspace, '\xDF')))))) {
    502c502
    < dash = (char *) strstr(cw,"鈥?);
    ---
    > dash = (char *) strstr(cw,"\xE2\x80\x93");
    507c507
    < *dash = '?;
    ---
    > *dash = '\xE2';
    510c510
    < *dash = '?;
    ---
    > *dash = '\xE2';
    748c748
    < pos = strstr((*slst)[j], "脽");
    ---
    > pos = strstr((*slst)[j], "\xC3\x9F");
    752c752
    < pos = strstr(pos+2, "脽");
    ---
    > pos = strstr(pos+2, "\xC3\x9F");
    755c755
    < pos = strchr((*slst)[j], '?);
    ---
    > pos = strchr((*slst)[j], '\xDF');
    758,759c758,759
    < mystrrep((*slst)[j], "?, "SS");
    < pos = strchr((*slst)[j], '?);
    ---
    > mystrrep((*slst)[j], "\xDF", "SS");
    > pos = strchr((*slst)[j], '\xDF');
    1280c1280
    < if ((n == wl) || ((n>0) && ((cw[n]=='%') || (cw[n]=='?)) && checkword(cw+n, NULL, NULL))) {
    ---
    > if ((n == wl) || ((n>0) && ((cw[n]=='%') || (cw[n]=='\xB0')) && checkword(cw+n, NULL, NULL))) {

     
  • Nobody/Anonymous

    Logged In: NO

    The attached hunspell.cxx.diff is wrong

     
  • b6s

    b6s - 2007-04-18

    Logged In: YES
    user_id=1561489
    Originator: YES

    Yes, that's why I said it could be incorrect, since I generated the patch on maybe inappropriate locale, and also I'm not sure why there's some empty constants.

    Anyway, my point is, if all string constants in hunspell can be replaced by escaped form, some compiler on some locale will be happy. :)

    Thank you.

     
  • Jungshik Shin

    Jungshik Shin - 2007-04-24

    Logged In: YES
    user_id=307557
    Originator: NO

    Concurring the above.

    Not just UTF-8 characters but also any non-ASCII characters in 'various encodings' need to be in escaped hex form.

     
  • Németh László

    • status: open --> closed
     
  • Németh László

    Logged In: YES
    user_id=726595
    Originator: NO

    Fixed (in comments, too). Thanks, Laci

     
  • b6s

    b6s - 2007-06-15

    Logged In: YES
    user_id=1561489
    Originator: YES

    Great! Thanks to you all.

    May I have the source code to test?

    Cheers,
    /Mike/

     

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:





No, thanks