From: Gehua Y. <yan...@gm...> - 2009-08-27 18:56:23
|
Hi folks, I am wondering if anyone has experience in the handling of Unicode file path when working with vil and vul. For Mac / Linux / BSD, it is simple --- just encode the path in utf-8 encoding and pass the char* to the function. However, for Windows, it seems the only proper way to handle Unicode is to use wide char string (wchar_t*) instead. But as far as I can see, none of the functions in VXL core libraries takes wchar_t* as an argument. Does anyone have experience on this aspect? Thanks. Gary Yang DualAlign LLC |
From: Peter V. <pet...@ya...> - 2009-08-28 06:49:08
|
Never done this myself, so I could be mistaken. But from what I know of Unicode, and if it's indeed easy to "just encode the path in utf-8" on Linux, I would say that you need to "just encode the path in utf-16", since that's what MS-Windows uses. Essentially, a wchar_t* (containing "real" utf-16 characters) and a char* (containing pairs of bytes which together form the utf-16 encoding of a character) are not distinguishable by a function which expects to see wchar_t*. -- Peter. > For Mac / Linux / BSD, it is simple --- just encode the path in > utf-8 encoding and pass the char* to the function. However, for > Windows, it seems the only proper way to handle Unicode is > to use wide char string (wchar_t*) instead. But as far as > I can see, none of the functions in VXL core libraries takes > wchar_t* as an argument. Does anyone have experience on > this aspect? > > > Thanks. > Gary Yang > DualAlign LLC -- ___________________________________________________ Sök efter kärleken! Hitta din tvillingsjäl på Yahoo! Dejting: http://ad.doubleclick.net/clk;185753627;24584539;x?http://se.meetic.yahoo.net/index.php?mtcmk=148783 |
From: Gehua Y. <yan...@gm...> - 2009-08-28 14:20:36
|
Peter, Thanks for the response. Maybe I did not get your second point correctly, but I would like to point out that compilers do catch mismatch between the types of string pointers. For instance, passing a char* to a function which expects to see wchar_t* will result in an error (with MSVC): error C2664: 'foo' : cannot convert parameter 1 from 'char *' to 'wchar_t *' Regards, Gehua On Fri, Aug 28, 2009 at 2:48 AM, Peter Vanroose <pet...@ya...>wrote: > Never done this myself, so I could be mistaken. But from what I know of > Unicode, and if it's indeed easy to "just encode the path in utf-8" on > Linux, I would say that you need to "just encode the path in utf-16", since > that's what MS-Windows uses. > Essentially, a wchar_t* (containing "real" utf-16 characters) and a char* > (containing pairs of bytes which together form the utf-16 encoding of a > character) are not distinguishable by a function which expects to see > wchar_t*. > > -- Peter. > > > For Mac / Linux / BSD, it is simple --- just encode the path in > > utf-8 encoding and pass the char* to the function. However, for > > Windows, it seems the only proper way to handle Unicode is > > to use wide char string (wchar_t*) instead. But as far as > > I can see, none of the functions in VXL core libraries takes > > wchar_t* as an argument. Does anyone have experience on > > this aspect? > > > > > > Thanks. > > Gary Yang > > DualAlign LLC > > > > > > > > > > > > > > > > -- > > > > ___________________________________________________ > Sök efter kärleken! > Hitta din tvillingsjäl på Yahoo! Dejting: > http://ad.doubleclick.net/clk;185753627;24584539;x?http://se.meetic.yahoo.net/index.php?mtcmk=148783 > |
From: Peter V. <pet...@ya...> - 2009-08-28 16:11:20
|
> error C2664: 'foo' : cannot convert parameter 1 > from 'char *' to 'wchar_t *' I see; but since both are pointers, one could blindly cast from one to the other (possibly via void*, if the compiler would protest against a direct explicit cast): char* filename = ...; wchar_t* utf16_filename = (wchar_t*)my_string_to_utf16(filename); (Only, I don't know which function is available as "my_string_to_utf16". Actually, it should not be tto difficult to write one yourself based on a "string_to_utf8" function.) -- Peter. -- __________________________________________________________ Ta semester! - sök efter resor hos Kelkoo. Jämför pris på flygbiljetter och hotellrum här: http://www.kelkoo.se/c-169901-resor-biljetter.html?partnerId=96914052 |
From: Ian S. <ian...@st...> - 2009-08-28 17:31:01
|
If I understand unicode and the windows API correctly (which I possibly do not,) Peter's approach will not work. Since UTF16 encodes ASCII characters as the ASCII value and a zero, the for simple ASCII filenames you will get lots of zeros. If you ask VXL (and ultimately the C++ runtime) to interpret this as a null-terminated char*, you will loose everything but the first letter. In order to have windows interpret it as UTF16 instead, you would need for VXL, and the iostream library to call _wfopen() rather than fopen(), etc. I don't know what the ideal solution is. One possibility might be to declare that VXL prefers UTF8. Then everywhere that VXL calls the standard library with a filename, have some CMAKE/ifdeffed-controlled code convert UTF8 to UTF16, and call ifstream with a wchar_t. This ifstream::ifstream(const wchar_t* filename) is a Microsoft-specific library extension. See http://stackoverflow.com/questions/821873/how-to-open-an-stdfstream-ofstream-or-ifstream-with-a-unicode-filename VXL_CONVERT_FILENAMES_TO_UTF16 Ian. Peter Vanroose wrote: > Never done this myself, so I could be mistaken. But from what I know of Unicode, and if it's indeed easy to "just encode the path in utf-8" on Linux, I would say that you need to "just encode the path in utf-16", since that's what MS-Windows uses. > Essentially, a wchar_t* (containing "real" utf-16 characters) and a char* (containing pairs of bytes which together form the utf-16 encoding of a character) are not distinguishable by a function which expects to see wchar_t*. > > -- Peter. > >> For Mac / Linux / BSD, it is simple --- just encode the path in >> utf-8 encoding and pass the char* to the function. However, for >> Windows, it seems the only proper way to handle Unicode is >> to use wide char string (wchar_t*) instead. But as far as >> I can see, none of the functions in VXL core libraries takes >> wchar_t* as an argument. Does anyone have experience on >> this aspect? >> >> >> Thanks. >> Gary Yang >> DualAlign LLC > > |
From: Brad K. <bra...@ki...> - 2009-08-28 21:16:45
|
Ian Scott wrote: > One possibility might be to declare that VXL prefers UTF8. Then > everywhere that VXL calls the standard library with a filename, have > some CMAKE/ifdeffed-controlled code convert UTF8 to UTF16, and call > ifstream with a wchar_t. > This ifstream::ifstream(const wchar_t* filename) is a Microsoft-specific > library extension. See > http://stackoverflow.com/questions/821873/how-to-open-an-stdfstream-ofstream-or-ifstream-with-a-unicode-filename FYI, I was able to read a unicode filename with the GNU compiler on cygwin like this: $ cat myfile.txt hello, world $ cat stdio_filebuf.cxx #include <sys/types.h> #include <sys/stat.h> #include <sys/fcntl.h> #include <ext/stdio_filebuf.h> #include <iostream> #include <io.h> int main() { int fd = _wopen(L"myfile.txt", O_RDONLY); __gnu_cxx::stdio_filebuf<char> ibuf(fd, std::ios::in); std::istream in(&ibuf); std::cout << in.rdbuf(); return 0; } $ g++ -mno-cygwin stdio_filebuf.cxx $ ./a.exe hello, world I think it also works with stdio.h-style C FILE* buffers. -Brad |
From: Peter V. <pet...@ya...> - 2009-08-28 22:09:21
|
> ... If you ask VXL (and ultimately the C++ runtime) to > interpret this as a null-terminated char*, you will loose > everything but the first letter ... You're right. I didn't think about this side-effect of NULL-bytes. -- Peter. -- __________________________________________________________ Går det långsamt? Skaffa dig en snabbare bredbandsuppkoppling. Sök och jämför priser hos Kelkoo. http://www.kelkoo.se/c-100015813-bredband.html?partnerId=96914325 |
From: Peter V. <pet...@ya...> - 2009-08-30 06:45:29
|
> ... If you ask VXL (and ultimately the C++ runtime) to > interpret this as a null-terminated char*, you will loose > everything but the first letter ... You're right. I didn't think about this side-effect of NULL-bytes. -- Peter. -- __________________________________________________________ Låna pengar utan säkerhet. Jämför vilkor online hos Kelkoo. http://www.kelkoo.se/c-100390123-lan-utan-sakerhet.html?partnerId=96915014 |
From: Gehua Y. <yan...@gm...> - 2009-09-10 16:01:18
|
Sorry for missing this email discussion. Ian's comments are accurate on the subject. In addition, I have my two cents to share, which I found out during my research on this topic: 1. Microsoft provides two sets of function to convert strings between narrow (char*) and wide (wchar_t*) representation: a) MultiByteToWideChar and WideCharToMultiByte, defined in Winnls.h (include Windows.h) and Kernel32.dll since Windows 2000 http://msdn.microsoft.com/en-us/library/dd319072%28VS.85%29.aspx b) wcstombs and mbstowcs, defined in stdlib.h since Windows 95 http://msdn.microsoft.com/en-us/library/5d7tc9zw%28VS.80%29.aspx It is worthy to note that Option B explicitly forbids conversion with UFT-8 encoding; whereas Option A does support. 2. For most of the system level API functions, Windows provide two versions for each function: one supports narrow char and the other wide char. In many cases, the narrow char version converts the string to wide char and calls the other version which does the actual job. But how the conversion is done is an interesting question. See Remark 3. 3. System level API on Windows does *NOT* behave coherently on how to convert a narrow char string to a wide one. This is an observation I learned during my experiment inside Visual Studio 2005. The conversion behavior differs when characters are beyond the ASCII table. (I have been using Chinese characters in the file path and file names during this test.) For an example, I was able to call chdir(char*) to change into a directory with Chinese characters in its name. HOWEVER, when I tried to open a file to read with ifstream::ifstream(const char * filename), it failed to open the file with Chinese characters in its name. After tracing inside the constructor function, it turns out that it fails at the call of mbstowcs(), which converts the file name to wide char representation before calling _wfopen(). Even though the system language locale is set to Chinese, this mbstowcs() function fails to convert any string with Chinese characters in it. In a comparison, if I chose instead to convert the file name to wide char with MultiByteToWideChar() and call ifstream::ifstream(const wchar_t * filename), it successfully opened the file. It is worthy to note that I obtained the narrow char string representation using QString::toLocal8Bit() in Qt, which (not too surprisingly) in turn calls WideCharToMultiByte that does the actual job. 4. So I decided to implement "wide char extenstion" in vil and vul in my local VXL copy. These extension functions are available in the attached header files for anyone who are interested. The implementation is about 90% copy and paste from the original implementation. We can also make this extension optional by introducing a macro such as VXL_USE_WIN_WCHAR_EXTENSION While these extension functions work as intended, they do post some burdens, in particular, code modification and code testing. In other words, any modification or test cases have to be repeated twice. 5. If we do not take the extension approach, but convert a filename *everywhere* that VXL calls the standard library, I feel it may be too much intimidating a task. 6. In my private project, I have the following macro definition in a header and use "DCHAR" AND "DSTDSTRING" everywhere else in the project whenever a string is required. "DSTR" is used to define a string literal. Though not elegant, doing so guarantees the library to be "Unicode-ready". #ifdef USE_WIN_UNICODE #include <string> typedef wchar_t DCHAR; typedef std::wstring DSTDSTRING; #define DSTR(s) L##s #else // for Linux and Mac #include <string> typedef char DCHAR; typedef std::string DSTDSTRING; #define DSTR(s) s #endif Regards, Gehua Yang -----Original Message----- From: Ian Scott [mailto:ian...@st...] Sent: Friday, August 28, 2009 1:31 PM To: p.v...@ie...; Gehua Yang Cc: Vxl-maintainers Subject: Re: [Vxl-maintainers] Internationalization / Unicode representation in vil and vul If I understand unicode and the windows API correctly (which I possibly do not,) Peter's approach will not work. Since UTF16 encodes ASCII characters as the ASCII value and a zero, the for simple ASCII filenames you will get lots of zeros. If you ask VXL (and ultimately the C++ runtime) to interpret this as a null-terminated char*, you will loose everything but the first letter. In order to have windows interpret it as UTF16 instead, you would need for VXL, and the iostream library to call _wfopen() rather than fopen(), etc. I don't know what the ideal solution is. One possibility might be to declare that VXL prefers UTF8. Then everywhere that VXL calls the standard library with a filename, have some CMAKE/ifdeffed-controlled code convert UTF8 to UTF16, and call ifstream with a wchar_t. This ifstream::ifstream(const wchar_t* filename) is a Microsoft-specific library extension. See http://stackoverflow.com/questions/821873/how-to-open-an-stdfstream-ofstream -or-ifstream-with-a-unicode-filename VXL_CONVERT_FILENAMES_TO_UTF16 Ian. Peter Vanroose wrote: > Never done this myself, so I could be mistaken. But from what I know of Unicode, and if it's indeed easy to "just encode the path in utf-8" on Linux, I would say that you need to "just encode the path in utf-16", since that's what MS-Windows uses. > Essentially, a wchar_t* (containing "real" utf-16 characters) and a char* (containing pairs of bytes which together form the utf-16 encoding of a character) are not distinguishable by a function which expects to see wchar_t*. > > -- Peter. > >> For Mac / Linux / BSD, it is simple --- just encode the path in >> utf-8 encoding and pass the char* to the function. However, for >> Windows, it seems the only proper way to handle Unicode is >> to use wide char string (wchar_t*) instead. But as far as >> I can see, none of the functions in VXL core libraries takes >> wchar_t* as an argument. Does anyone have experience on >> this aspect? >> >> >> Thanks. >> Gary Yang >> DualAlign LLC > > |
From: Ian S. <ian...@st...> - 2009-09-10 16:57:19
|
Have you tried setting the LOCALE (specifically "LC_ALL", "LC_CTYPE" and "LANG") to "UTF-8" before trying all those functions below. Some of them might start working. Ian. Gehua Yang wrote: > Sorry for missing this email discussion. > > Ian's comments are accurate on the subject. In addition, I have my two cents > to share, which I found out during my research on this topic: > > 1. Microsoft provides two sets of function to convert strings between narrow > (char*) and wide (wchar_t*) representation: > a) MultiByteToWideChar and WideCharToMultiByte, defined in Winnls.h > (include Windows.h) and Kernel32.dll since Windows 2000 > http://msdn.microsoft.com/en-us/library/dd319072%28VS.85%29.aspx > > b) wcstombs and mbstowcs, defined in stdlib.h since Windows 95 > http://msdn.microsoft.com/en-us/library/5d7tc9zw%28VS.80%29.aspx > > > It is worthy to note that Option B explicitly forbids conversion with > UFT-8 encoding; whereas Option A does support. > > 2. For most of the system level API functions, Windows provide two versions > for each function: one supports narrow char and the other wide char. In > many cases, the narrow char version converts the string to wide char and > calls the other version which does the actual job. But how the conversion > is done is an interesting question. See Remark 3. > > > 3. System level API on Windows does *NOT* behave coherently on how to > convert a narrow char string to a wide one. This is an observation I > learned during my experiment inside Visual Studio 2005. The conversion > behavior differs when characters are beyond the ASCII table. (I have been > using Chinese characters in the file path and file names during this test.) > > > For an example, I was able to call chdir(char*) to change into a directory > with Chinese characters in its name. HOWEVER, when I tried to open a file > to read with ifstream::ifstream(const char * filename), it failed to open > the file with Chinese characters in its name. After tracing inside the > constructor function, it turns out that it fails at the call of mbstowcs(), > which converts the file name to wide char representation before calling > _wfopen(). Even though the system language locale is set to Chinese, this > mbstowcs() function fails to convert any string with Chinese characters in > it. In a comparison, if I chose instead to convert the file name to wide > char with MultiByteToWideChar() and call ifstream::ifstream(const wchar_t * > filename), it successfully opened the file. > > It is worthy to note that I obtained the narrow char string representation > using QString::toLocal8Bit() in Qt, which (not too surprisingly) in turn > calls WideCharToMultiByte that does the actual job. > > > > 4. So I decided to implement "wide char extenstion" in vil and vul in my > local VXL copy. These extension functions are available in the attached > header files for anyone who are interested. The implementation is about 90% > copy and paste from the original implementation. > > We can also make this extension optional by introducing a macro such as > VXL_USE_WIN_WCHAR_EXTENSION > > While these extension functions work as intended, they do post some burdens, > in particular, code modification and code testing. In other words, any > modification or test cases have to be repeated twice. > > > 5. If we do not take the extension approach, but convert a filename > *everywhere* that VXL calls the standard library, I feel it may be too much > intimidating a task. > > > 6. In my private project, I have the following macro definition in a header > and use "DCHAR" AND "DSTDSTRING" everywhere else in the project whenever a > string is required. "DSTR" is used to define a string literal. Though not > elegant, doing so guarantees the library to be "Unicode-ready". > > #ifdef USE_WIN_UNICODE > #include <string> > typedef wchar_t DCHAR; > typedef std::wstring DSTDSTRING; > #define DSTR(s) L##s > #else // for Linux and Mac > #include <string> > typedef char DCHAR; > typedef std::string DSTDSTRING; > #define DSTR(s) s > #endif > > > > Regards, > Gehua Yang > > > > > -----Original Message----- > From: Ian Scott [mailto:ian...@st...] > Sent: Friday, August 28, 2009 1:31 PM > To: p.v...@ie...; Gehua Yang > Cc: Vxl-maintainers > Subject: Re: [Vxl-maintainers] Internationalization / Unicode representation > in vil and vul > > If I understand unicode and the windows API correctly (which I possibly > do not,) Peter's approach will not work. > > Since UTF16 encodes ASCII characters as the ASCII value and a zero, the > for simple ASCII filenames you will get lots of zeros. If you ask VXL > (and ultimately the C++ runtime) to interpret this as a null-terminated > char*, you will loose everything but the first letter. In order to have > windows interpret it as UTF16 instead, you would need for VXL, and the > iostream library to call _wfopen() rather than fopen(), etc. > > I don't know what the ideal solution is. > > One possibility might be to declare that VXL prefers UTF8. Then > everywhere that VXL calls the standard library with a filename, have > some CMAKE/ifdeffed-controlled code convert UTF8 to UTF16, and call > ifstream with a wchar_t. > This ifstream::ifstream(const wchar_t* filename) is a Microsoft-specific > library extension. See > http://stackoverflow.com/questions/821873/how-to-open-an-stdfstream-ofstream > -or-ifstream-with-a-unicode-filename > > VXL_CONVERT_FILENAMES_TO_UTF16 > > Ian. > > > Peter Vanroose wrote: >> Never done this myself, so I could be mistaken. But from what I know of > Unicode, and if it's indeed easy to "just encode the path in utf-8" on > Linux, I would say that you need to "just encode the path in utf-16", since > that's what MS-Windows uses. >> Essentially, a wchar_t* (containing "real" utf-16 characters) and a char* > (containing pairs of bytes which together form the utf-16 encoding of a > character) are not distinguishable by a function which expects to see > wchar_t*. >> -- Peter. >> >>> For Mac / Linux / BSD, it is simple --- just encode the path in >>> utf-8 encoding and pass the char* to the function. However, for >>> Windows, it seems the only proper way to handle Unicode is >>> to use wide char string (wchar_t*) instead. But as far as >>> I can see, none of the functions in VXL core libraries takes >>> wchar_t* as an argument. Does anyone have experience on >>> this aspect? >>> >>> >>> Thanks. >>> Gary Yang >>> DualAlign LLC >> >> |
From: Gehua Y. <yan...@gm...> - 2009-09-11 15:49:42
|
Hi Ian, Setting the locale on Windows translates to mouse clicking on "Control Panel", "Regional Settings", then finding the options for "Language for non-Unicode programs". Unfortunately, "UTF-8"is not an option. Another approach is to use setlocale() function. (http://msdn.microsoft.com/en-us/library/x99tb11d%28VS.71%29.aspx) First, the local name argument does not take a string of "UTF-8" or "UTF8". Second, there is a comment on the web page: " LC_CTYPE The character-handling functions (except isdigit, isxdigit, mbstowcs, and mbtowc, which are unaffected).", Where mbstowcs and mbtowc happen to be the string encoding conversion functions that we are interested in. In short, I believe UTF-8 is not a valid language option on Windows. Gary -----Original Message----- From: Ian Scott [mailto:ian...@st...] Sent: Thursday, September 10, 2009 12:57 PM To: Gehua Yang Cc: 'Vxl-maintainers' Subject: Re: [Vxl-maintainers] Internationalization / Unicode representation in vil and vul Have you tried setting the LOCALE (specifically "LC_ALL", "LC_CTYPE" and "LANG") to "UTF-8" before trying all those functions below. Some of them might start working. Ian. Gehua Yang wrote: > Sorry for missing this email discussion. > > Ian's comments are accurate on the subject. In addition, I have my two cents > to share, which I found out during my research on this topic: > > 1. Microsoft provides two sets of function to convert strings between narrow > (char*) and wide (wchar_t*) representation: > a) MultiByteToWideChar and WideCharToMultiByte, defined in Winnls.h > (include Windows.h) and Kernel32.dll since Windows 2000 > http://msdn.microsoft.com/en-us/library/dd319072%28VS.85%29.aspx > > b) wcstombs and mbstowcs, defined in stdlib.h since Windows 95 > http://msdn.microsoft.com/en-us/library/5d7tc9zw%28VS.80%29.aspx > > > It is worthy to note that Option B explicitly forbids conversion with > UFT-8 encoding; whereas Option A does support. > > 2. For most of the system level API functions, Windows provide two versions > for each function: one supports narrow char and the other wide char. In > many cases, the narrow char version converts the string to wide char and > calls the other version which does the actual job. But how the conversion > is done is an interesting question. See Remark 3. > > > 3. System level API on Windows does *NOT* behave coherently on how to > convert a narrow char string to a wide one. This is an observation I > learned during my experiment inside Visual Studio 2005. The conversion > behavior differs when characters are beyond the ASCII table. (I have been > using Chinese characters in the file path and file names during this test.) > > > For an example, I was able to call chdir(char*) to change into a directory > with Chinese characters in its name. HOWEVER, when I tried to open a file > to read with ifstream::ifstream(const char * filename), it failed to open > the file with Chinese characters in its name. After tracing inside the > constructor function, it turns out that it fails at the call of mbstowcs(), > which converts the file name to wide char representation before calling > _wfopen(). Even though the system language locale is set to Chinese, this > mbstowcs() function fails to convert any string with Chinese characters in > it. In a comparison, if I chose instead to convert the file name to wide > char with MultiByteToWideChar() and call ifstream::ifstream(const wchar_t * > filename), it successfully opened the file. > > It is worthy to note that I obtained the narrow char string representation > using QString::toLocal8Bit() in Qt, which (not too surprisingly) in turn > calls WideCharToMultiByte that does the actual job. > > > > 4. So I decided to implement "wide char extenstion" in vil and vul in my > local VXL copy. These extension functions are available in the attached > header files for anyone who are interested. The implementation is about 90% > copy and paste from the original implementation. > > We can also make this extension optional by introducing a macro such as > VXL_USE_WIN_WCHAR_EXTENSION > > While these extension functions work as intended, they do post some burdens, > in particular, code modification and code testing. In other words, any > modification or test cases have to be repeated twice. > > > 5. If we do not take the extension approach, but convert a filename > *everywhere* that VXL calls the standard library, I feel it may be too much > intimidating a task. > > > 6. In my private project, I have the following macro definition in a header > and use "DCHAR" AND "DSTDSTRING" everywhere else in the project whenever a > string is required. "DSTR" is used to define a string literal. Though not > elegant, doing so guarantees the library to be "Unicode-ready". > > #ifdef USE_WIN_UNICODE > #include <string> > typedef wchar_t DCHAR; > typedef std::wstring DSTDSTRING; > #define DSTR(s) L##s > #else // for Linux and Mac > #include <string> > typedef char DCHAR; > typedef std::string DSTDSTRING; > #define DSTR(s) s > #endif > > > > Regards, > Gehua Yang > > > > > -----Original Message----- > From: Ian Scott [mailto:ian...@st...] > Sent: Friday, August 28, 2009 1:31 PM > To: p.v...@ie...; Gehua Yang > Cc: Vxl-maintainers > Subject: Re: [Vxl-maintainers] Internationalization / Unicode representation > in vil and vul > > If I understand unicode and the windows API correctly (which I possibly > do not,) Peter's approach will not work. > > Since UTF16 encodes ASCII characters as the ASCII value and a zero, the > for simple ASCII filenames you will get lots of zeros. If you ask VXL > (and ultimately the C++ runtime) to interpret this as a null-terminated > char*, you will loose everything but the first letter. In order to have > windows interpret it as UTF16 instead, you would need for VXL, and the > iostream library to call _wfopen() rather than fopen(), etc. > > I don't know what the ideal solution is. > > One possibility might be to declare that VXL prefers UTF8. Then > everywhere that VXL calls the standard library with a filename, have > some CMAKE/ifdeffed-controlled code convert UTF8 to UTF16, and call > ifstream with a wchar_t. > This ifstream::ifstream(const wchar_t* filename) is a Microsoft-specific > library extension. See > http://stackoverflow.com/questions/821873/how-to-open-an-stdfstream-ofstream > -or-ifstream-with-a-unicode-filename > > VXL_CONVERT_FILENAMES_TO_UTF16 > > Ian. > > > Peter Vanroose wrote: >> Never done this myself, so I could be mistaken. But from what I know of > Unicode, and if it's indeed easy to "just encode the path in utf-8" on > Linux, I would say that you need to "just encode the path in utf-16", since > that's what MS-Windows uses. >> Essentially, a wchar_t* (containing "real" utf-16 characters) and a char* > (containing pairs of bytes which together form the utf-16 encoding of a > character) are not distinguishable by a function which expects to see > wchar_t*. >> -- Peter. >> >>> For Mac / Linux / BSD, it is simple --- just encode the path in >>> utf-8 encoding and pass the char* to the function. However, for >>> Windows, it seems the only proper way to handle Unicode is >>> to use wide char string (wchar_t*) instead. But as far as >>> I can see, none of the functions in VXL core libraries takes >>> wchar_t* as an argument. Does anyone have experience on >>> this aspect? >>> >>> >>> Thanks. >>> Gary Yang >>> DualAlign LLC >> >> |
From: Ian S. <ian...@st...> - 2009-09-11 11:06:17
|
I'm not a big fan of your proposal. I much prefer the Unix approach of using setting the locale to UTF-8, passing UTF-8 char* filenames, and letting the OS/CRT handle the conversion. On the other hand this "duplicate the API" approach (e.g function(char *) to function(wchar_t *) ) is exactly what the C and C++ standards body have chosen to implement. So we should probably be doing the same. I'd suggest going ahead and committing your code. If you have time, it would be useful to duplicate the tests also. Ian. Gehua Yang wrote: > 4. So I decided to implement "wide char extenstion" in vil and vul in my > local VXL copy. These extension functions are available in the attached > header files for anyone who are interested. The implementation is about 90% > copy and paste from the original implementation. > > We can also make this extension optional by introducing a macro such as > VXL_USE_WIN_WCHAR_EXTENSION > > While these extension functions work as intended, they do post some burdens, > in particular, code modification and code testing. In other words, any > modification or test cases have to be repeated twice. > > > 5. If we do not take the extension approach, but convert a filename > *everywhere* that VXL calls the standard library, I feel it may be too much > intimidating a task. > |
From: Gehua Y. <yan...@gm...> - 2009-10-16 13:57:27
|
Hi Ian and others, I have committed the changes to vul_file and vul_expand_path to provide overloading functions with "wchar_t*" argument. The compilation is governed by a CMake option VXL_SUPPORT_WIN_UNICODE. This option is made only available on Windows machines that passes the try-compile case for "wchar_t" type. I'll commit changes to vil_open, vil_load, and vil_save in a few days. I would like to thank Ian, Peter, Brad and the community for the ideas and the comments. Just a side comment, I do not like Microsoft's approach. But we must have Unicode functionality on Windows for a project. That is all it is... Best Regards, Gehua Yang -----Original Message----- From: Ian Scott [mailto:ian...@st...] Sent: Friday, September 11, 2009 7:06 AM To: Gehua Yang Cc: 'Vxl-maintainers' Subject: Re: [Vxl-maintainers] Internationalization / Unicode representation in vil and vul I'm not a big fan of your proposal. I much prefer the Unix approach of using setting the locale to UTF-8, passing UTF-8 char* filenames, and letting the OS/CRT handle the conversion. On the other hand this "duplicate the API" approach (e.g function(char *) to function(wchar_t *) ) is exactly what the C and C++ standards body have chosen to implement. So we should probably be doing the same. I'd suggest going ahead and committing your code. If you have time, it would be useful to duplicate the tests also. Ian. Gehua Yang wrote: > 4. So I decided to implement "wide char extenstion" in vil and vul in my > local VXL copy. These extension functions are available in the attached > header files for anyone who are interested. The implementation is about 90% > copy and paste from the original implementation. > > We can also make this extension optional by introducing a macro such as > VXL_USE_WIN_WCHAR_EXTENSION > > While these extension functions work as intended, they do post some burdens, > in particular, code modification and code testing. In other words, any > modification or test cases have to be repeated twice. > > > 5. If we do not take the extension approach, but convert a filename > *everywhere* that VXL calls the standard library, I feel it may be too much > intimidating a task. > |