#227 basename() truncates filenames with variable-width encoding

open
nobody
crt (85)
5
2011-05-22
2011-05-22
No

Hi,

This is forwarded from http://bugs.debian.org/625918

The attached program computes basename of a 3-bytes long (which denotes 2 characters in some encodings) filename. Everything works fine if a single byte character set is used:

$ LC_ALL=pl_PL.utf8 ./test.exe
basename("\312\253\172") = "\312\253\172"

However, in the Chinese locale the last byte is truncated:

$ LC_ALL=zh_CN.utf8 ./test.exe
basename("\312\253\172") = "\312\253"

The original reporter believes the culprit is the following fragment of mingwex/basename.c:

if( (len = wcstombs( path, refcopy, len )) != (size_t)(-1) )
path[ len ] = '\0';

where len was previously initialized to the number of _characters_ of the input string.

Looking at implementation of dirname(), it might be affected by a similar bug as well.

Discussion

  • Stephen Kitt

    Stephen Kitt - 2011-05-22

    Test program

     
    Attachments
  • Jonathan Yong

    Jonathan Yong - 2011-05-23

    Is the text encoded as CP936 or UTF8?

     
  • Stephen Kitt

    Stephen Kitt - 2011-05-23

    According to the comment in the attached file, it's CP936.

     
  • Ozkan Sezer

    Ozkan Sezer - 2011-05-23

    The code is from mingw.org, do you know whether the problem also shows itself with mingw? (Do they know about this?..)

     
  • Stephen Kitt

    Stephen Kitt - 2011-05-24

    The problem also shows itself with mingw (runtime 3.18 with w32api 3.17); I'll file a bug with them too.

     

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:





No, thanks