> These functions have the same names as their normal "narrow char"
> counterparts, but prefixed with _w. For instance _wfopen(),
> _wreaddir(), _wstat(). As for C++ stuff like the ifstream you mention
> I have no personal experience, but a quick glance does show me that
> there is something called wifstream (etc, check the <iosfwd> header).
> I am not interested enough in C++ to bother finding out whether the
> "wideness" of these classes relate to just the data being written to /
> read from them or also the names of files, though. (It does seem so,
> unfortunately, so in that case you probably need to use a mix of C and
> C++ to use wide character file names in C++.)
Thank you for these details, though I already knew most of them; I was just hoping that I missed something, because what happens doesn't make sense to me. As far as I know, Microsoft is trying to move coders from 8bit local codepage to Unicode since as least Windows 2000 (by using CreateFileW or _wfopen() rather than CreateFileA() or fopen()), but it has to provide the "A" functions for "legacy" applications (well, I'm aware that many new programs just use the extended ASCII charset and don't care about Unicode, but they weren't supposed to.)
OTOH MinGW didn't have the "legacy" issue, so it could have gone UTF8 from the beginning. To me it makes a lot more sense to do it this way, because it both removes inconsistent behavior and makes it easier to write programs that work in various countries, on Linux and on Windows. Since the conversion between UTF8 and UTF16 is trivial, it seems not to be a big deal to call the 16bit functions instead of the 8bit ones.
So my idea of how fopen() should work in MinGW is that it should allocate a temporary buffer in which to run a UTF8-to-UTF16 conversion and then it should call Microsoft's _wfopen(), or just pass a "ccs=UTF-8" in newer versions of MS's fopen(). This would make writing programs that are portable between Windows and Linux significantly easier.
I realize that there should be some issues with this approach, but all I could think of was some performance penalty (which should be insignificant in most cases), the possibility of introducing bugs (as the code would be more complicated), passing UTF-8 names to non-MinGW DLLs that expect local codepage names, and the human resource requirement that somebody should actually do it. However, none of these issues seems big enough to me to justify not supporting Unicode names. OTOH I obviously don't know enough about MinGW's history and quite likely there are are issues with my approach that I don't see yet.
> The way file names are handled is a fundamental difference between
> Windows and Unix. On Windows, file names are UTF-16. On Unix, file
> names are arbitrary sequences of bytes ("char" in C). So if you want
Yes, that's true in theory, but there are some practical considerations: as far as I know Linux uses UTF8 by default on its partitions. Sure, in your particular program you can use Latin-1 or something else to create and read files and it will work just fine except that some characters will not look OK when running a "ls". However, programs don't seem to force their encoding, so for practical purposes the names are UTF8.
This brings me back to this question: I have a program that processes files on Linux and I want to make it work on Windows; so what do I do? I'm not willing to pepper my code with #ifdefs, and I HAVE to use a C++ class to handle the files anyway. The answer is that I'm going to get rid of ofstream, readdir() and the rest, and use QFile and QDir from Qt instead, since this is a Qt program anyway. Also, I should keep in mind that ofstream cannot be used in programs that hope to achieve some portability. That's a bit sad given that ofstream is supposed THE portable class to use in C++ to write to files. Note that in Linux one can use ofstream even if the file names are not UTF8, but with MinGW's ofstream it's just not possible to create files whose names have some foreign characters, and I guess this is my main issue. (The "w" in wifstream is about text files that contain 16bit, wchar_t characters; to open one of them you still pass a char* file name.) At least MS's ofstream has a wchar_t* constructor, so you can take care of portability by calling a macro that converts UTF8 to UTF16 on Windows and does nothing on Linux.
Now I don't want to seem too ungrateful. I appreciate the effort that was put in to get MinGW where it is today, and I'm thankful for that, but, from my limited point of view, some things could have been done better.
FREE 3D EARTH SCREENSAVER - Watch the Earth right on your desktop!
Check it out at http://www.inbox.com/earth