From: Steven R. L. <sr...@ic...> - 2008-08-21 20:19:46
|
Right, that would be a good solution, or using u_unescape() from a char* to convert to unicode. Or, you could use a Hex transliterator to unescape. -s Jordan Rose wrote: > The docs for U_STRING_DECL/U_STRING_INIT say that the string has to > contain only "invariant characters" (loosely, most of ISO-8859-1). > There's no way it's going to work for "中文". > > But, since you're using UTF-16 anyway, why not just use a usual C > array initializer? > const UChar input[] = { 0x4E2D, 0x6587, 0x0 }; > > Jordan > > > On Aug 21, 20 Heisei, at 12:13, Aaron Fernandes wrote: > > >> Steven and Will, thanks for your responses. >> Using U_STRING_DECL("\\u4E2D\\u6587",15) with GCC on Linux or >> Solaris gives us "\u4E2D\u6587" as output. >> Should we be doing anything else, in addition? >> >> -- Aaron >> >> Date: Wed, 20 Aug 2008 08:59:18 -0700 >> From: "Steven R. Loomis" <sr...@ic...> >> Subject: Re: [icu-support] Instantiating a UChar* in C on *nix >> To: ICU support mailing list <icu...@li...> >> Message-ID: <48A...@ic...> >> Content-Type: text/plain; charset=ISO-8859-1; format=flowed >> >> Are you using \u or \\u? You should use \\u - otherwise your compiler >> may be interpreting \u >> >> -s >> >> Aaron Fernandes wrote: >> >>> Hi Steven, >>> >>> Thanks for pointing that out. We changed the U_STRING_DECL and >>> U_STRING_INIT lines to: >>> U_STRING_DECL(input,"\u4E2D\u6587",13); >>> U_STRING_INIT(input,"\u4E2D\u6587",13); >>> >>> We still get the same erroneous results. >>> >>> - Aaron >>> >>> >>> Date: Mon, 18 Aug 2008 10:37:24 -0700 >>> From: "Steven R. Loomis" <sr...@ic...> >>> Subject: Re: [icu-support] Instantiating a UChar* in C on *nix >>> To: ICU support mailing list <icu...@li...> >>> Message-ID: <48A...@ic...> >>> Content-Type: text/plain; charset=ISO-8859-1; format=flowed >>> >>> You have a 12 character string and are passing in '10' to >>> U_STRING_DECL >>> and U_STRING_INIT. >>> >>> unistr.h: "The length of the string, not including the terminating >>> <code>NUL</code>, must be specified as a constant." >>> >>> -s >>> >>> >>> Aaron Fernandes wrote: >>> >>> >>>> Sample test code is as follows: >>>> >>>> extern "C" int transliterate(const UChar* unicode, int >>>> unicode_length, const char* lang, char* result); >>>> >>>> int main(int argc, char* argv[]) >>>> { >>>> int rc, length; >>>> char *ret; >>>> const char *ascii_string = "\\u4E2D\\u6587"; >>>> >>>> /* C++ version - working as expected */ >>>> UnicodeString input2=UNICODE_STRING(ascii_string, >>>> 50).unescape(); >>>> const UChar* input=input2.getBuffer(); >>>> >>>> /* C version - not working as expected */ >>>> U_STRING_DECL(input,ascii_string,10); >>>> U_STRING_INIT(input,ascii_string,10); >>>> >>>> ret = (char*) malloc(MAXCHARS * sizeof(char)); >>>> length=u_strlen(input); >>>> >>>> /* Passing "Any" as the language to transliterate from */ >>>> rc=transliterate(input,length,"Any",ret); >>>> >>>> printf("\nReturn Code: %d", rc); >>>> printf("\nAfter Conversion: %s\n", ret); >>>> >>>> free(ret); >>>> return 0; >>>> } >>>> >>>> - Aaron >>>> >> >> >> ------------------------------ >> >> Message: 4 >> Date: Wed, 20 Aug 2008 11:10:57 -0500 >> From: "Will Mason" <wil...@us...> >> Subject: Re: [icu-support] Instantiating a UChar* in C on *nix >> To: "ICU support mailing list" <icu...@li...> >> Message-ID: >> <ab2...@ma...> >> Content-Type: text/plain; charset="utf-8" >> >> It looks like you're not escaping the \ and you're including the >> terminating >> null. Shouldn't it be U_STRING_DECL(input, "\\u4E2D\\u6587", 12) ? >> >> On Wed, Aug 20, 2008 at 10:48 AM, Aaron Fernandes < >> a.f...@in...> wrote: >> >> >>> Hi Steven, >>> >>> Thanks for pointing that out. We changed the U_STRING_DECL and >>> U_STRING_INIT lines to: >>> U_STRING_DECL(input,"\u4E2D\u6587",13); >>> U_STRING_INIT(input,"\u4E2D\u6587",13); >>> >>> We still get the same erroneous results. >>> >>> - Aaron >>> >>> >>> Date: Mon, 18 Aug 2008 10:37:24 -0700 >>> From: "Steven R. Loomis" <sr...@ic...> >>> Subject: Re: [icu-support] Instantiating a UChar* in C on *nix >>> To: ICU support mailing list <icu...@li...> >>> Message-ID: <48A...@ic...> >>> Content-Type: text/plain; charset=ISO-8859-1; format=flowed >>> >>> You have a 12 character string and are passing in '10' to >>> U_STRING_DECL >>> and U_STRING_INIT. >>> >>> unistr.h: "The length of the string, not including the terminating >>> <code>NUL</code>, must be specified as a constant." >>> >>> -s >>> >>> >>> Aaron Fernandes wrote: >>> >>>> Sample test code is as follows: >>>> >>>> extern "C" int transliterate(const UChar* unicode, int >>>> unicode_length, >>>> >>> const char* lang, char* result); >>> >>>> int main(int argc, char* argv[]) >>>> { >>>> int rc, length; >>>> char *ret; >>>> const char *ascii_string = "\\u4E2D\\u6587"; >>>> >>>> /* C++ version - working as expected */ >>>> UnicodeString input2=UNICODE_STRING(ascii_string, >>>> 50).unescape(); >>>> const UChar* input=input2.getBuffer(); >>>> >>>> /* C version - not working as expected */ >>>> U_STRING_DECL(input,ascii_string,10); >>>> U_STRING_INIT(input,ascii_string,10); >>>> >>>> ret = (char*) malloc(MAXCHARS * sizeof(char)); >>>> length=u_strlen(input); >>>> >>>> /* Passing "Any" as the language to transliterate from */ >>>> rc=transliterate(input,length,"Any",ret); >>>> >>>> printf("\nReturn Code: %d", rc); >>>> printf("\nAfter Conversion: %s\n", ret); >>>> >>>> free(ret); >>>> return 0; >>>> } >>>> >>>> - Aaron >>>> >>> >>> |