Menu

#943 __mingw_printf() does not print non-ASCII chars correctly.

v1.0 (example)
open
nobody
None
5
2022-07-25
2022-06-17
No

Overview

With the following code, __mingw_printf() does not print non-ASCII chars correctly while __ms_printf() does.

/* Save this source code in UTF-8 */
#include <stdio.h>
#include <locale.h>

int main()
{
    const wchar_t *wstr = L"αβγδ";
    setlocale(LC_CTYPE, "");
    __ms_printf("%ls\n", wstr);
    __mingw_printf("%ls\n", wstr);
    return 0;
}

Environment

  • OS: Windows 10 Pro 64 bit (21H2)
  • cygwin 3.3.5 + mingw64-x86_64-gcc-g++ 11.3.0-1
  • msys2 3.3.5 + mingw64/mingw-w64-x86_64-gcc 12.1.0-2
  • System locale: Japanese
  • Default code page: 932 (Japanese)

How to reproduce

  1. Compile the test code above with x86_64-w64-mingw32-gcc.
  2. Execute the a.exe file in command prompt.

Expected result

αβγδ
αβγδ

Actual result

Output of a.exe is:

αβγδ
ソタチツ

However, a.exe | more or a.exe > output.txt outputs non-ASCII chars as expected with both __ms_printf() and __mingw_printf().

Is this the intentional behavior?

Discussion

  • Takashi Yano

    Takashi Yano - 2022-06-17

    I have tracked down the issue and found the problem caused by following code in mingw_pformat.c.

    /* Emit the data, converting each character from the wide
       * to the multibyte domain as we go...
       */
      while( (count-- > 0) && ((len = wcrtomb( buf, *s++, &state )) > 0) )
      {
        char *p = buf;
        while( len-- > 0 )
          __pformat_putc( *p++, stream );
      }
    

    When locale is set to Japanese, multi-byte char is not show correctly if it is not output in atomic.

    #include <stdio.h>
    #include <locale.h>
    
    void pr()
    {
        char buf[] = {0x83, 0xbf, '\n', 0}; /* "α\n" in CP932 */
        fputs(buf, stdout);
        for (int i=0; i<sizeof(buf)-1; i++) fputc(buf[i], stdout);
    }
    
    int main()
    {
        pr();
        setlocale(LC_CTYPE, "");
        pr();
        return 0;
    }
    

    The code above outputs:

    α
    α
    α
    ソ
    

    After setlocale(LC_CTYPE, ""), printing 'α' using fputc() loop is broken.

     
  • Ozkan Sezer

    Ozkan Sezer - 2022-06-17

    Please send the analysis, and a patch if you have one, to the mingw-w64-public mailing list. Very unfortunately, the issue tracker doesn't get much attention.

     
  • Takashi Yano

    Takashi Yano - 2022-06-19

    This seems to be a problem of msvcrt.dll. The following code also fails in the same way.

    #include <windows.h>
    #include <stdio.h>
    #include <locale.h>
    
    int main()
    {
        char buf[] = {0x83, 0xbf, '\n', 0}; /* "α\n" in CP932 */
        int (*__putchar)(int);
        char *(*__setlocale)(int, char *);
        HMODULE h = LoadLibrary("msvcrt.dll");
        __setlocale = (char *(*)(int, char *))GetProcAddress(h, "setlocale");
        __setlocale(LC_CTYPE, "");
        __putchar = (int (*)(int))GetProcAddress(h, "putchar");
        for (int i=0; i<sizeof(buf)-1; i++) __putchar(buf[i]);
        return 0;
    }
    

    What can be done for that?

     
  • Takashi Yano

    Takashi Yano - 2022-06-24

    I found a workaround for this issue.
    Adding

    if (_isatty(_fileno(stdout))) _setmode(_fileno(stdout), _O_BINARY);
    

    after setlocale() solves the issue. Now I am trying to imply this into mingw.

     
  • Takashi Yano

    Takashi Yano - 2022-07-08

    I have built a patch for this issue as attached. This is PoC patch and the file name and its location may not be appropriate.

     

    Last edit: Takashi Yano 2022-07-08
  • Takashi Yano

    Takashi Yano - 2022-07-11

    The patch revised.

     
  • Takashi Yano

    Takashi Yano - 2022-07-12

    Revised again.

     
  • Takashi Yano

    Takashi Yano - 2022-07-13

    The patch revised further.

     
  • Takashi Yano

    Takashi Yano - 2022-07-13

    The patch revised a bit.

     
  • Takashi Yano

    Takashi Yano - 2022-07-14

    The patch revised furthermore.

     
  • Takashi Yano

    Takashi Yano - 2022-07-15

    The patch revised a bit again.

     
  • Takashi Yano

    Takashi Yano - 2022-07-15

    Revised.

     
  • Takashi Yano

    Takashi Yano - 2022-07-15

    Revised just a bit.

     
  • Takashi Yano

    Takashi Yano - 2022-07-15

    Revised again on one point.

     
  • Takashi Yano

    Takashi Yano - 2022-07-16

    Revised once more.

     
  • Takashi Yano

    Takashi Yano - 2022-07-17

    The previous patches have undesired side effects.
    The new patch introduces another very different strategy rather than using _setmode(). With this patch, multibyte char is outputted in one action using the buffer even if user code output it by separate calls.

     
  • Takashi Yano

    Takashi Yano - 2022-07-18

    Fixed a few bugs.

     
  • Takashi Yano

    Takashi Yano - 2022-07-20
    • Fix a problem caused when console output is buffered.
    • Change the condition to call the code to fix the problem.
    • Fix behaviour for UNICODE mode.
    • Some bug fixes and improvements.
    • Add missing error handling.
     

    Last edit: Takashi Yano 2022-07-24
  • Takashi Yano

    Takashi Yano - 2022-07-25

    I noticed that the previous patch breaks the UCRT compatibility.
    Fixed version is attached.

     

    Last edit: Takashi Yano 2022-07-25

Log in to post a comment.