Menu

UTF-8 in XMLWriter

VaKa78
2015-03-17
2016-02-12
  • VaKa78

    VaKa78 - 2015-03-17

    Hello to everyone!

    To quickly add UTF-8 support in XmlStream Class (inside XmlWriter.h) I overload << operator for _tstring:

    XmlStream& operator<<(const _tstring& value) 
    {
        return PutUtf8String( value );
    }
    
    XmlStream& operator<<(_tstring& value) 
    {
        return PutUtf8String ( value );
    }
    

    And add following function:

    XmlStream& PutUtf8String (const _tstring& value) 
    {
        //calc block size to be returned
        int len = MultiByteToWideChar(CP_ACP, NULL, value.c_str(), value.size(), NULL, 0);
    
        //malloc and fill the returned block
        wchar_t* szUnicode = new wchar_t[len+1];
    
        MultiByteToWideChar(CP_ACP, NULL, value.c_str(), value.size(), szUnicode, len);
        szUnicode[len] = 0;
        std::wstring tempVal = szUnicode;
        delete[] szUnicode;
    
        int nInputLen = tempVal.length();
        int nChars = WideCharToMultiByte(CP_UTF8, 0, tempVal.c_str(), nInputLen, NULL, 0, NULL, NULL); 
        if (nChars)
        {
            char* pszUTF8 = new char[nChars+1];
            nChars = WideCharToMultiByte(CP_UTF8, 0, tempVal.c_str(), nInputLen, pszUTF8, nChars, NULL, NULL);
            if (nChars)
            {
                pszUTF8[nChars] = '\0';
    
              if (stateTagName == state)
                 tagName << pszUTF8;
              s << pszUTF8;
            }
            delete [] pszUTF8;
        }
        return *this;
    }
    

    Now three provided test works with UTF (at least Russian text displayed as typed)
    Yes, it quick and dirty, but it works ;)

     
  • FinnG

    FinnG - 2016-02-12

    WideCharToMultiByte might not work because i use unicode, std::wofstream << std::string is the problem.
    Instead, i figure it out using boost.

    if you have built "boost filesystem",
    then you only have to modify xmlwriter.h:
    add

    #include <boost/filesystem/detail/utf8_codecvt_facet.hpp>
    

    and, line 38
    s.imbue(std::locale("C"));
    modify it to

    s.imbue(std::locale(std::locale(), new boost::filesystem::detail::utf8_codecvt_facet));
    

    done.
    however, if you have not built boost filesystem and you won't plan to,
    then steps are more complicated as below.

    .............................................................................
    insert code below to xmlwriter.h, line 8

    #define BOOST_UTF8_BEGIN_NAMESPACE namespace name_it {
    #define BOOST_UTF8_DECL
    #define BOOST_UTF8_END_NAMESPACE }
    #include <boost/detail/utf8_codecvt_facet.hpp>
    #undef BOOST_UTF8_END_NAMESPACE
    #undef BOOST_UTF8_DECL
    #undef BOOST_UTF8_BEGIN_NAMESPACE
    

    xmlwriter.h, line 38, delete or comment it, as
    // s.imbue(std::locale("C"));
    and insert code below

    s.imbue(std::locale(std::locale(), new name_it::utf8_codecvt_facet));
    

    insert code below to Workbook.cpp, line 54

    #define BOOST_UTF8_BEGIN_NAMESPACE namespace name_it {
    #define BOOST_UTF8_DECL
    #define BOOST_UTF8_END_NAMESPACE }
    #include <boost/detail/utf8_codecvt_facet.ipp>
    #undef BOOST_UTF8_END_NAMESPACE
    #undef BOOST_UTF8_DECL
    #undef BOOST_UTF8_BEGIN_NAMESPACE
    

    ....................................................................................................
    done, try it with
    SimpleXlsx::CWorkbook book;
    SimpleXlsx::CWorksheet &sheet = book.AddSheet(L"檊");
    SimpleXlsx::CellDataStr cds;
    cds.value = L"檊";
    sheet.BeginRow();
    sheet.AddCell(cds);
    sheet.EndRow();
    It works.
    Remarks:
    SimpleXlsx version 0.20
    Visual Studio 2010 Express, Win32, unicode, boost_1_58_0

     

    Last edit: FinnG 2016-02-15

Log in to post a comment.