With the following code, __mingw_printf()
does not print non-ASCII chars correctly while __ms_printf()
does.
/* Save this source code in UTF-8 */
#include <stdio.h>
#include <locale.h>
int main()
{
const wchar_t *wstr = L"αβγδ";
setlocale(LC_CTYPE, "");
__ms_printf("%ls\n", wstr);
__mingw_printf("%ls\n", wstr);
return 0;
}
αβγδ
αβγδ
Output of a.exe
is:
αβγδ
ソタチツ
However, a.exe | more
or a.exe > output.txt
outputs non-ASCII chars as expected with both __ms_printf()
and __mingw_printf()
.
Is this the intentional behavior?
I have tracked down the issue and found the problem caused by following code in mingw_pformat.c.
When locale is set to Japanese, multi-byte char is not show correctly if it is not output in atomic.
The code above outputs:
After
setlocale(LC_CTYPE, "")
, printing 'α' usingfputc()
loop is broken.Please send the analysis, and a patch if you have one, to the mingw-w64-public mailing list. Very unfortunately, the issue tracker doesn't get much attention.
This seems to be a problem of msvcrt.dll. The following code also fails in the same way.
What can be done for that?
I found a workaround for this issue.
Adding
after
setlocale()
solves the issue. Now I am trying to imply this into mingw.I have built a patch for this issue as attached. This is PoC patch and the file name and its location may not be appropriate.
Last edit: Takashi Yano 2022-07-08
The patch revised.
Revised again.
The patch revised further.
The patch revised a bit.
The patch revised furthermore.
The patch revised a bit again.
Revised.
Revised just a bit.
Revised again on one point.
Revised once more.
The previous patches have undesired side effects.
The new patch introduces another very different strategy rather than using
_setmode()
. With this patch, multibyte char is outputted in one action using the buffer even if user code output it by separate calls.Fixed a few bugs.
Patch is revised in the git format-patch format.
Last edit: Takashi Yano 2022-07-24
I noticed that the previous patch breaks the UCRT compatibility.
Fixed version is attached.
Last edit: Takashi Yano 2022-07-25