MinGW-w64 - for 32 and 64 bit Windows / Bugs / #538 Incorrect conversion result from utf-8 to wchar_t by codecvt

#538 Incorrect conversion result from utf-8 to wchar_t by codecvt_utf8 on windows

Milestone: v1.0 (example)

Status: open

Owner: niXman

Labels: locale (1) codecvt (1) libstdc++ (3)

Priority: 5

Updated: 2021-07-15

Created: 2016-05-06

Creator: Li Xiang

Private: No

Environment Tried: Win 10 / Win 2012 64bit, zh_CN/en_US locale
Version Tested: x86_64, seh/sjlj, posix, 5.3.0/5.2.0

Consider following code:

#include <codecvt>
#include <locale>
#include <cstdio>
#include <string>
#include <windows.h>

using std::wstring_convert;
using std::codecvt_utf8;
using std::wstring;

int main()
{
    wstring_convert<codecvt_utf8<wchar_t>, wchar_t> cv;
    const char* s = u8"file.txt";
    wstring filename = cv.from_bytes(s);

    wchar_t buffer[256];
    MultiByteToWideChar(CP_UTF8, 0, s, -1, buffer, 256);

    for (wchar_t c : filename)
    {
        printf("%d ", (int)c);
    }
    printf("\n");

    for (int i = 0; buffer[i] != 0; ++i)
    {
        printf("%d ", (int)buffer[i]);
    }
    printf("\n");

    return 0;
}

compile command line:
g++ 1.cc -O2 -std=c++14 -s

expected result:
102 105 108 101 46 116 120 116
102 105 108 101 46 116 120 116

actual result:
26112 26880 27648 25856 11776 29696 30720 29296
102 105 108 101 46 116 120 116

All charactor result was mutiplied by 256.

It looks like a regression introduced in 5.2.0. 5.1.0 is OK.

Discussion

Li Xiang - 2016-05-06

A double confirm showes that:

Clang 3.7/3.8 on windows has the same issue.

g++ on Linux not impacted.

It seems issue is in libstdc++.dll.

Last edit: Li Xiang 2017-01-02
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Li Xiang - 2016-05-09

It seems codecvt incorrectly choosed big endian. Setting little endian not working.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Emily Leiviskä - 2016-11-02

We are also affected by this. Win7 latin 1. Mingw 6.2.0-2

Last edit: Emily Leiviskä 2016-11-02

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

niXman - 2016-11-02

assigned_to: niXman
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

dejan crnila - 2017-06-07

we are using 6.3.0 and are also affected by this. Any info on solution?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- niXman - 2017-06-07
  
  What about 7.1 version?
  
  I have no solution yet...
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Jan Niklas Hasse - 2018-01-29

still happens in 7.2.0.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Zufu Liu - 2018-01-31

Change the line

wstring_convert<codecvt_utf8<wchar_t>, wchar_t> cv;

to

wstring_convert<codecvt_utf8<wchar_t, 0x10ffff, std::little_endian>, wchar_t> cv;

works fine on Win10 x64 1709 with gcc version 7.2.0 (x86_64-posix-seh-rev0, Built by MinGW-W64 project), and output as expected:

D:\>g++ -Wall -Wextra -O2 t1.cpp -s D:\>a 102 105 108 101 46 116 120 116 102 105 108 101 46 116 120 116

Last edit: Zufu Liu 2018-01-31
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Roman Khazanskii - 2021-07-15

Still happens in 8.1.0 but workaround by Zufu Liu works!

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Incorrect conversion result from utf-8 to wchar_t by codecvt_utf8 on windows

A complete runtime environment for gcc

Group

Searches

Help

#538 Incorrect conversion result from utf-8 to wchar_t by codecvt_utf8 on windows

Discussion