Setting the locale on Windows translates to mouse clicking on "Control
Panel", "Regional Settings", then finding the options for "Language for
non-Unicode programs". Unfortunately, "UTF-8"is not an option.
Another approach is to use setlocale() function.
First, the local name argument does not take a string of "UTF-8" or "UTF8".
Second, there is a comment on the web page:
The character-handling functions (except isdigit, isxdigit, mbstowcs,
and mbtowc, which are unaffected).",
Where mbstowcs and mbtowc happen to be the string encoding conversion
functions that we are interested in.
In short, I believe UTF-8 is not a valid language option on Windows.
From: Ian Scott [mailto:ian.m.scott@...]
Sent: Thursday, September 10, 2009 12:57 PM
To: Gehua Yang
Subject: Re: [Vxl-maintainers] Internationalization / Unicode representation
in vil and vul
Have you tried setting the LOCALE (specifically "LC_ALL", "LC_CTYPE" and
"LANG") to "UTF-8" before trying all those functions below. Some of them
might start working.
Gehua Yang wrote:
> Sorry for missing this email discussion.
> Ian's comments are accurate on the subject. In addition, I have my two
> to share, which I found out during my research on this topic:
> 1. Microsoft provides two sets of function to convert strings between
> (char*) and wide (wchar_t*) representation：
> a) MultiByteToWideChar and WideCharToMultiByte, defined in Winnls.h
> (include Windows.h) and Kernel32.dll since Windows 2000
> b) wcstombs and mbstowcs, defined in stdlib.h since Windows 95
> It is worthy to note that Option B explicitly forbids conversion with
> UFT-8 encoding; whereas Option A does support.
> 2. For most of the system level API functions, Windows provide two
> for each function: one supports narrow char and the other wide char. In
> many cases, the narrow char version converts the string to wide char and
> calls the other version which does the actual job. But how the conversion
> is done is an interesting question. See Remark 3.
> 3. System level API on Windows does *NOT* behave coherently on how to
> convert a narrow char string to a wide one. This is an observation I
> learned during my experiment inside Visual Studio 2005. The conversion
> behavior differs when characters are beyond the ASCII table. (I have been
> using Chinese characters in the file path and file names during this
> For an example, I was able to call chdir(char*) to change into a directory
> with Chinese characters in its name. HOWEVER, when I tried to open a file
> to read with ifstream::ifstream(const char * filename), it failed to open
> the file with Chinese characters in its name. After tracing inside the
> constructor function, it turns out that it fails at the call of
> which converts the file name to wide char representation before calling
> _wfopen(). Even though the system language locale is set to Chinese, this
> mbstowcs() function fails to convert any string with Chinese characters in
> it. In a comparison, if I chose instead to convert the file name to wide
> char with MultiByteToWideChar() and call ifstream::ifstream(const wchar_t
> filename), it successfully opened the file.
> It is worthy to note that I obtained the narrow char string representation
> using QString::toLocal8Bit() in Qt, which (not too surprisingly) in turn
> calls WideCharToMultiByte that does the actual job.
> 4. So I decided to implement "wide char extenstion" in vil and vul in my
> local VXL copy. These extension functions are available in the attached
> header files for anyone who are interested. The implementation is about
> copy and paste from the original implementation.
> We can also make this extension optional by introducing a macro such as
> While these extension functions work as intended, they do post some
> in particular, code modification and code testing. In other words, any
> modification or test cases have to be repeated twice.
> 5. If we do not take the extension approach, but convert a filename
> *everywhere* that VXL calls the standard library, I feel it may be too
> intimidating a task.
> 6. In my private project, I have the following macro definition in a
> and use "DCHAR" AND "DSTDSTRING" everywhere else in the project whenever a
> string is required. "DSTR" is used to define a string literal. Though
> elegant, doing so guarantees the library to be "Unicode-ready".
> #ifdef USE_WIN_UNICODE
> #include <string>
> typedef wchar_t DCHAR;
> typedef std::wstring DSTDSTRING;
> #define DSTR(s) L##s
> #else // for Linux and Mac
> #include <string>
> typedef char DCHAR;
> typedef std::string DSTDSTRING;
> #define DSTR(s) s
> Gehua Yang
> -----Original Message-----
> From: Ian Scott [mailto:ian.m.scott@...]
> Sent: Friday, August 28, 2009 1:31 PM
> To: p.vanroose@...; Gehua Yang
> Cc: Vxl-maintainers
> Subject: Re: [Vxl-maintainers] Internationalization / Unicode
> in vil and vul
> If I understand unicode and the windows API correctly (which I possibly
> do not,) Peter's approach will not work.
> Since UTF16 encodes ASCII characters as the ASCII value and a zero, the
> for simple ASCII filenames you will get lots of zeros. If you ask VXL
> (and ultimately the C++ runtime) to interpret this as a null-terminated
> char*, you will loose everything but the first letter. In order to have
> windows interpret it as UTF16 instead, you would need for VXL, and the
> iostream library to call _wfopen() rather than fopen(), etc.
> I don't know what the ideal solution is.
> One possibility might be to declare that VXL prefers UTF8. Then
> everywhere that VXL calls the standard library with a filename, have
> some CMAKE/ifdeffed-controlled code convert UTF8 to UTF16, and call
> ifstream with a wchar_t.
> This ifstream::ifstream(const wchar_t* filename) is a Microsoft-specific
> library extension. See
> Peter Vanroose wrote:
>> Never done this myself, so I could be mistaken. But from what I know of
> Unicode, and if it's indeed easy to "just encode the path in utf-8" on
> Linux, I would say that you need to "just encode the path in utf-16",
> that's what MS-Windows uses.
>> Essentially, a wchar_t* (containing "real" utf-16 characters) and a char*
> (containing pairs of bytes which together form the utf-16 encoding of a
> character) are not distinguishable by a function which expects to see
>> -- Peter.
>>> For Mac / Linux / BSD, it is simple --- just encode the path in
>>> utf-8 encoding and pass the char* to the function. However, for
>>> Windows, it seems the only proper way to handle Unicode is
>>> to use wide char string (wchar_t*) instead. But as far as
>>> I can see, none of the functions in VXL core libraries takes
>>> wchar_t* as an argument. Does anyone have experience on
>>> this aspect?
>>> Gary Yang
>>> DualAlign LLC