TinyXML / Discussion / Open Discussion: Is Unicode encoded in string of char?

Hello,

I'm trying to understand the way TinyXML interfaces with the external "world". Simply, its API.
TinyXML returns all strings as char (one byte) strings, am I right?
So, if the read & parsed XML document is encoded in UTF-8, TinyXML returns a set of chars (bytes string) containing Unicode content, am I right?

I use TinyXML on Windows CE 4.2 (Pocket PC 2003) and I have to operate Microsoft Unicode strings (wide-char strings).
So, I have following function to convert one-byte-char encoded Unicode strings (seems strange, right? ;-) to wide-char strings Unicode.
Here I use wstring (compile with _UNICODE defined).

std::wstring UnicodeStringToWString(const std::string& s)
{
    // Unicode is encoded into one-byte-char strings, so
    // we have to use CP_UTF8 code page in conversion.

    // Get input string length in bytes (chars)
    int len = ::MultiByteToWideChar(CP_UTF8, 0, s.c_str(), -1, NULL, 0);

    // Allocate wide-char buffer
    //wchar_t * tmp = new wchar_t[s.length() + 1] ;
    wchar_t* buffer = new wchar_t[len];

    // Translate set of chars to set of wide-chars
    ::MultiByteToWideChar(CP_UTF8, 0, s.c_str(), -1, buffer, len);

    std::wstring result(buffer) ;
    delete [] buffer;

    return result ;
}

So, is may seems strange, that I use CP_UTF8, because usually MultiByteToWideChar call is used with CP_ACP code page.
But CP_UTF8 have to be used to tell that API call it should treat input chars string as a UTF-8 input and only convert it to wide-chars string.
If CP_ACP is used, then characters other than English will be lost.
I tested it with Simple Chinese, Russian and Polish characters.

I would like to assure myself if I understand TinyXML correctly.
So, any comments are welcome.

Mateusz Łoskot

Is Unicode encoded in string of char?

Forums

Help

Is Unicode encoded in string of char?

Is Unicode encoded in string of char?

Forums

Help

Is Unicode encoded in string of char? document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

Is Unicode encoded in string of char?