Menu

Process unicode string using tinyxml

Anonymous
2004-08-13
2004-08-26
  • Anonymous

    Anonymous - 2004-08-13

    If you want get or set unicode value using tinyxml in win32 system, you need to use  WideCharToMultiByte convert wchar_t* to char*.

    if you use char* to process unicode value, you may be get a error result.

    The code like this:

    #include "tinyxml.h"

    #ifdef TIXML_USE_STL
        #include <iostream>
        #include <sstream>
        using namespace std;
    #else
        #include <stdio.h>
    #endif

    #include <string>
    #define XRB_DOM_TRANS_BUF_LEN 200

    std::string Transcode(const wchar_t *pwszString, unsigned int nCodePage = CP_UTF8)
    {
        std::string strRet;
        int nRawLen = wcslen(pwszString);
        int nReqLen = nRawLen<<2; // max possible length of converted 8bit string

        // convert to UTF-8
        char szDst[XRB_DOM_TRANS_BUF_LEN], *pszDst=szDst; //initially, point to array
       
        // check if szDst is long enough
        if (nReqLen >= XRB_DOM_TRANS_BUF_LEN){ // szDst not big enough
            pszDst = new char[nReqLen];
        }
       
        // get string
        int nLen = WideCharToMultiByte(nCodePage,0,pwszString,nRawLen,pszDst,nReqLen,0,0);
       
        if (nLen){
            pszDst[nLen] = 0; // null terminator
            strRet = pszDst; // copy to STL string
        }
        if (pszDst!=szDst) delete[] pszDst; // delete, if allocated
       
        return strRet;
    }

    void SetChildNodeTextValue(TiXmlNode* parent, const wchar_t* name, const wchar_t* value)
    {
        TiXmlNode* tNode = parent->FirstChild(Transcode(name).c_str());
        if (tNode)
        {
            TiXmlText* tText = tNode->FirstChild()->ToText();
            if (tText == 0)
            {
                TiXmlText tNewText( Transcode(value).c_str() );
                tNode->InsertEndChild(tNewText);
            }
            else
            {
                tText->SetValue( Transcode(value).c_str() );
            }
        }
    }

    int main()
    {
        TiXmlDocument doc( "utf8test.xml" );
        //TiXmlBase::SetCondenseWhiteSpace( false );
        bool loadOkay = doc.LoadFile();

        if ( !loadOkay )
        {
            printf( "Could not load test file 'demotest.xml'. Error='%s'. Exiting.\n", doc.ErrorDesc() );
            exit( 1 );
        }

        TiXmlNode* tNode = 0;
        TiXmlNode* tNodeItem = 0;
        TiXmlElement* tConfigElement = 0;
        TiXmlElement* itemElement = 0;

        // AppSetting
        tNode = doc.FirstChildElement( "document" );
        SetChildNodeTextValue(tNode, L"&#27721;&#35821;", L"&#20013;&#25991;&#27979;&#35797;");

        doc.SaveFile("demotest.xml");

        return 0;
    }

     
    • kazi

      kazi - 2004-08-26

      yeah!
      After I use your "Transcode()", all goes well of my simplified Chinese characters.
      But if i want read some characters, i need to call "MultiByteToWideChar()" to transcode them too. Can you made the transcode inside the tinyxml project? Then i don't need to call them when i use "UTF-8" characters.
      thanks.

       
      • Lee Thomason

        Lee Thomason - 2004-08-26

        Encoding conversion: UTF-8, UTF-16, Big-5 (probably your chinese system), Latin-1, ISO-xyz is a huge problem. A far bigger problem than XML parsing. The OS (Windows, Mac, Linux, etc) has orders of magnitude more code and data for encoding conversion than all of TinyXml put together.

        Encoding conversion is much too big a problem to put in TinyXml.

        You've got a nice little bit of code to solve your problem, and I'm hoping it will solve lots of people's troubles. But it's one little part of the encoding issue, and to put encoding conversion into the TinyXml code would be the first bucket of an ocean of code I would need to solve this across operating systems.

        TinyXml tries to do simple thing well:
        - Process UTF-8 correctly. UTF-8 in, UTF-8 out.
        - It will also process in legacy mode: default system encoding in, sometime default system encoding out. (Legacy is broken, but tries to be broken in exactly the same way it was prior to 2.3).

        Any patches or utilities people would like to post and share are always appreciated by me and others. Especially code like this that solves an immediate problem. I will not put them into the mainline however, because the issue is just too big.

        lee

         
      • Ellers

        Ellers - 2004-08-26

        Further to what Lee said...

        if you want a DOM XML parsing library in C++ that has extensive support for various encodings then go to xerces.

        But so long as the Chinese characters are entered in valid utf-8 TinyXml as-is should work fine...

         

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.