Hi,
This is forwarded from http://bugs.debian.org/625918
The attached program computes basename of a 3-bytes long (which denotes 2 characters in some encodings) filename. Everything works fine if a single byte character set is used:
$ LC_ALL=pl_PL.utf8 ./test.exe
basename("\312\253\172") = "\312\253\172"
However, in the Chinese locale the last byte is truncated:
$ LC_ALL=zh_CN.utf8 ./test.exe
basename("\312\253\172") = "\312\253"
The original reporter believes the culprit is the following fragment of mingwex/basename.c:
if( (len = wcstombs( path, refcopy, len )) != (size_t)(-1) )
path[ len ] = '\0';
where len was previously initialized to the number of _characters_ of the input string.
Looking at implementation of dirname(), it might be affected by a similar bug as well.
Test program
Is the text encoded as CP936 or UTF8?
According to the comment in the attached file, it's CP936.
The code is from mingw.org, do you know whether the problem also shows itself with mingw? (Do they know about this?..)
The problem also shows itself with mingw (runtime 3.18 with w32api 3.17); I'll file a bug with them too.
this function in dirname.c or basename.c set the current locale to value returned by GetACP(). And remove this line is the temp solution to the BUG.The dirname() and basename() works well in "C" locale.
Last edit: 张天师 2023-03-21
the
lenvariable returned in line 51len = mbstowcs (NULL, path, 0)function is different to thelenparameter in line 57mbstowcs( refpath, path, len)function,the former is about wide byte characters needed, the latter is about multiplebyte characters needed,the original programmer confused them In many places (though in "C" locale they have the same value),so cause the truncation.Last edit: 张天师 2023-03-22
Processing in wide-byte characters is not a good idea.Now I fix it by totally rewrite it without converting to wide-byte characters,It's in the attachment,have a try:)
Last edit: 张天师 2023-03-22