[icu-support] convert DBCS to UTF-8

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Hi,

I'm working in a project where I have to support Double Byte Character Set (DBCS). The input string I receive from a user can be in English (SBCS), Japanese (DBCS).. then I have to convert this string to UTF-8 before saving it in a UTF-8 database. I don't have any problem with SBCS input. But when the user provides DBCS then the string got corrupted after the conversion.
The function I am using is:

int32_t ucnv_fromUChars  ( UConverter *  cnv,  

 char *  dest,  

 int32_t  destCapacity,  

 const UChar *  src,  

 int32_t  srcLength,  

 UErrorCode *  pErrorCode   

 ) 

An example of in put is in Japanese:

src in hexadecimal: 65e5 672c 8a9e
after the call, dest in hex is: 93 fa 96 7b 8c ea fc fc fc fc
fc fc
I printed "dest" after the call using "cout << dest; " or using a loop to print each character and I saw garbage after the first three characters.

Why the string become longer in length after the conversion (from 3 into 12)?
Can someone tell me if this function work with DBCS? What am I doing wrong here?

Thanks for your help!
-Andy.

[icu-support] convert DBCS to UTF-8

Open Source C/C++/Java libraries from Unicode

[icu-support] convert DBCS to UTF-8