Mateusz Loskot - 2004-12-13

Hello,

I'm trying to understand the way TinyXML interfaces with the external "world". Simply, its API.
TinyXML returns all strings as char (one byte) strings, am I right?
So, if the read & parsed XML document is encoded in UTF-8, TinyXML returns a set of chars (bytes string) containing Unicode content, am I right?

I use TinyXML on Windows CE 4.2 (Pocket PC 2003) and I have to operate Microsoft Unicode strings (wide-char strings).
So, I have following function to convert one-byte-char encoded Unicode strings (seems strange, right? ;-) to wide-char strings Unicode.
Here I use wstring (compile with _UNICODE defined).

std::wstring UnicodeStringToWString(const std::string& s)
{
    // Unicode is encoded into one-byte-char strings, so
    // we have to use CP_UTF8 code page in conversion.
   
    // Get input string length in bytes (chars)
    int len = ::MultiByteToWideChar(CP_UTF8, 0, s.c_str(), -1, NULL, 0);

    // Allocate wide-char buffer
    //wchar_t * tmp = new wchar_t[s.length() + 1] ;
    wchar_t* buffer = new wchar_t[len];

    // Translate set of chars to set of wide-chars
    ::MultiByteToWideChar(CP_UTF8, 0, s.c_str(), -1, buffer, len);
   
    std::wstring result(buffer) ;
    delete [] buffer;
   
    return result ;
}

So, is may seems strange, that I use CP_UTF8, because usually MultiByteToWideChar call is used with CP_ACP code page.
But CP_UTF8 have to be used to tell that API call it should treat input chars string as a UTF-8 input and only convert it to wide-chars string.
If CP_ACP is used, then characters other than English will be lost.
I tested it with Simple Chinese, Russian and Polish characters.

I would like to assure myself if I understand TinyXML correctly.
So, any comments are welcome.

Mateusz Łoskot