Quick question regarding using u_strToUTF8

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Please forgive the sloppy test case below; I put it together quickly to
illustrate an issue I am running into. Likely just a matter of my
understanding of an API in ICU.

Here is familiar test data from the ICU test suite that I borrowed:

    UChar testData16[]={
        //   0       1       2       3       4       5       6       7
        0xd841, 0xdc02, 0x0071, 0xdc02, 0xd841, 0x0071, 0xd841, 0xdc02,
            //   8       9      10      11      12      13      14      15
            0x0071, 0x0072, 0xd841, 0xdc02, 0x0071, 0xd841, 0xdc02, 0x0071,
            //  16      17      18      19
            0xdc02, 0xd841, 0x0073, 0x0000
    };

    const UChar* u16Ptr = testData16;
    const UChar* u16Limit = u16Ptr + u_strlen(testData16);
    UErrorCode err = U_ZERO_ERROR;
    const size_t u8BufLen = 1024;
    UChar8 u8Buffer[u8BufLen];
    UChar8* u8BufPtr = u8Buffer;
    int32_t dstLen = 0;
    u_strToUTF8((char*)u8BufPtr, u8BufLen, &dstLen, u16Ptr,
u_strlen(u16Ptr), &err);

// err now == U_INVALID_CHAR_FOUND

    UnicodeString us(testData16);
    UErrorCode u8cnvStatus = U_ZERO_ERROR;
    UConverter* u8cnv = ucnv_open("UTF-8", &u8cnvStatus);
    UChar8 u8Buffer2[1024];
    u8cnvStatus = U_ZERO_ERROR;
    us.extract((char*)u8Buffer2, 1024, u8cnv, u8cnvStatus);
    ucnv_close(u8cnv);

// but here using the C converter I get API U_ZERO_ERROR

Why do the two API's give differing error status?

Stepping through the u_strToUTF8 API...

0,1 is interpreted as the first character
2   is interpreted as the second character
3,4 attempts to interpret as the third character but fails since it
discovers that dc02 is not a leading surrogate and d841 is not a trailing.

Thanks in advance,

Bob

Quick question regarding using u_strToUTF8

Open Source C/C++/Java libraries from Unicode

Quick question regarding using u_strToUTF8