|
From: Tor L. <tm...@ik...> - 2007-08-21 14:43:18
|
tuitfun writes: > it's been suggested in another list (thanks Tor Lillqvist) You're welcome ;) I'll reply to this message on this list, too. But hopefully somebody else will also reply, maybe with different opinions. > after some more searching, i've found tchar.h in mingw which can be used in > both unicode & non-unicode systems. is tchar.h still the proper way to > accomplish this? I guess that depends on who you ask;) My personal opinion is that if you come from a Unix background and intend to write portable code, perhaps code that needs to be read and understood by other Unixish programmers, you should ignore tchar.h and TCHAR, TEXT(), etc. Instead just use UTF-8 and convert to/from wchar_t only at the lowest level when passing/receiving filenames to/from the C library or Win32 API. > i am using ffmpeg libraries in this program. so i cannot modify all > code to use tchar.h. If you use some external library that has an API that only takes "narrow" char* filenames, you should pass system codepage (for instance CP1252 on Western European and US Windows machines) filenames to it. For names of *existing* files that are not expressible in the system codepage (for instance containing Hebrew characters on an English Windows machine), if the volume the file is on has short name (8.3) generation turned on (NTFS volumes have that by default AFAIK), you can call GetShortPathNameW() to get the ASCII-only 8.3 name for the file and pass that to the external library instead. However, if you need such an external library to *create* a file with a name not expressible in the system codepage, you can't. You need to do something like have the library create the file using an ASCII-only (or system codepage only) temporary name first, and then rename the file afterwards. > i need to call funtions that expect regular char * arguments. so, > before calling those functions, i need to convert the wchar_t * to > utf-8 and pass as char *. Umm, no. You can't pass UTF-8 filenames to some random library that then passes those filenames on to the Win32 API or Microsoft C library. The Microsoft C library and the Win32 API expects system codepage for char* file names. (And UTF-16 for wchar_t* ones.) UTF-8 is useful on Windows mainly because of its general niceness and "modern" feel; you can use normal C string functions on UTF-8 strings, and because of commonality with most modern Unixes that use UTF-8 locales. But in the Win32 API and the Microsoft C library UTF-8 is not used. > - unix doesn't have tchar.h, so i'll need a header file that maps > _t* functions back to the normal ones. is there such a header file? Maybe, but I haven't come across a such. > - i cant get mbstowcs() & wcstombs() to convert to utf-8. what am i doing > wrong? mbstowcs() and wcstombs() convert from/to the system codepage to/from UTF-16. UTF-8 is not the system codepage in any locale on Windows. The East Asian multi-byte codepages are double-byte ones (where characters are either one or two chars). There is a pseudo-codepage CP_UTF8 that can be used in some functions, but the system codepage is never UTF-8. > - how to get argv[] in utf-8? Use the functions GetCommandLineW() and CommandLineToArgvW() to get it in UTF-16, then convert the UTF-16 strings to UTF-8. --tml |