|
From: Robert B. <rb...@ma...> - 2003-02-27 02:00:29
|
Please forgive the sloppy test case below; I put it together quickly to
illustrate an issue I am running into. Likely just a matter of my
understanding of an API in ICU.
Here is familiar test data from the ICU test suite that I borrowed:
UChar testData16[]={
// 0 1 2 3 4 5 6 7
0xd841, 0xdc02, 0x0071, 0xdc02, 0xd841, 0x0071, 0xd841, 0xdc02,
// 8 9 10 11 12 13 14 15
0x0071, 0x0072, 0xd841, 0xdc02, 0x0071, 0xd841, 0xdc02, 0x0071,
// 16 17 18 19
0xdc02, 0xd841, 0x0073, 0x0000
};
const UChar* u16Ptr = testData16;
const UChar* u16Limit = u16Ptr + u_strlen(testData16);
UErrorCode err = U_ZERO_ERROR;
const size_t u8BufLen = 1024;
UChar8 u8Buffer[u8BufLen];
UChar8* u8BufPtr = u8Buffer;
int32_t dstLen = 0;
u_strToUTF8((char*)u8BufPtr, u8BufLen, &dstLen, u16Ptr,
u_strlen(u16Ptr), &err);
// err now == U_INVALID_CHAR_FOUND
UnicodeString us(testData16);
UErrorCode u8cnvStatus = U_ZERO_ERROR;
UConverter* u8cnv = ucnv_open("UTF-8", &u8cnvStatus);
UChar8 u8Buffer2[1024];
u8cnvStatus = U_ZERO_ERROR;
us.extract((char*)u8Buffer2, 1024, u8cnv, u8cnvStatus);
ucnv_close(u8cnv);
// but here using the C converter I get API U_ZERO_ERROR
Why do the two API's give differing error status?
Stepping through the u_strToUTF8 API...
0,1 is interpreted as the first character
2 is interpreted as the second character
3,4 attempts to interpret as the third character but fails since it
discovers that dc02 is not a leading surrogate and d841 is not a trailing.
Thanks in advance,
Bob
|