Re: [Mingw-users] unicode filenames in both windows & unix

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

tuitfun writes:
 > it's been suggested in another list (thanks Tor Lillqvist)

You're welcome ;) I'll reply to this message on this list, too. But
hopefully somebody else will also reply, maybe with different
opinions.

 > after some more searching, i've found tchar.h in mingw which can be used in
 > both unicode & non-unicode systems. is tchar.h still the proper way to
 > accomplish this?

I guess that depends on who you ask;) My personal opinion is that if
you come from a Unix background and intend to write portable code,
perhaps code that needs to be read and understood by other Unixish
programmers, you should ignore tchar.h and TCHAR, TEXT(), etc. Instead
just use UTF-8 and convert to/from wchar_t only at the lowest level
when passing/receiving filenames to/from the C library or Win32 API.

 > i am using ffmpeg libraries in this program. so i cannot modify all
 > code to use tchar.h.

If you use some external library that has an API that only takes
"narrow" char* filenames, you should pass system codepage (for
instance CP1252 on Western European and US Windows machines) filenames
to it.

For names of *existing* files that are not expressible in the system
codepage (for instance containing Hebrew characters on an English
Windows machine), if the volume the file is on has short name (8.3)
generation turned on (NTFS volumes have that by default AFAIK), you
can call GetShortPathNameW() to get the ASCII-only 8.3 name for the
file and pass that to the external library instead.

However, if you need such an external library to *create* a file with
a name not expressible in the system codepage, you can't. You need to
do something like have the library create the file using an ASCII-only
(or system codepage only) temporary name first, and then rename the
file afterwards.

 > i need to call funtions that expect regular char * arguments. so,
 > before calling those functions, i need to convert the wchar_t * to
 > utf-8 and pass as char *.

Umm, no. You can't pass UTF-8 filenames to some random library that
then passes those filenames on to the Win32 API or Microsoft C
library. The Microsoft C library and the Win32 API expects system
codepage for char* file names. (And UTF-16 for wchar_t* ones.)

UTF-8 is useful on Windows mainly because of its general niceness and
"modern" feel; you can use normal C string functions on UTF-8 strings,
and because of commonality with most modern Unixes that use UTF-8
locales. But in the Win32 API and the Microsoft C library UTF-8 is not
used.

 > - unix doesn't have tchar.h, so i'll need a header file that maps
 > _t* functions back to the normal ones. is there such a header file?

Maybe, but I haven't come across a such.

 > - i cant get mbstowcs() & wcstombs() to convert to utf-8. what am i doing
 > wrong?

mbstowcs() and wcstombs() convert from/to the system codepage to/from
UTF-16. UTF-8 is not the system codepage in any locale on Windows. The
East Asian multi-byte codepages are double-byte ones (where characters
are either one or two chars). There is a pseudo-codepage CP_UTF8 that
can be used in some functions, but the system codepage is never UTF-8.

 > - how to get argv[] in utf-8?

Use the functions GetCommandLineW() and CommandLineToArgvW() to get it
in UTF-16, then convert the UTF-16 strings to UTF-8.

--tml

Re: [Mingw-users] unicode filenames in both windows & unix

A native Windows port of the GNU Compiler Collection (GCC)

Re: [Mingw-users] unicode filenames in both windows & unix