Re: [Tapioca-devel] Archive format

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

On Sun, 22 Jul 2001, Eric Lee Green wrote:

> Re: Internationalization: Will think about that. You're right, we need to
> do something there. Agree about the feebleness of the C++ String class,
> even the Java String class is better (at least it can represent all known
> international character sets!). More after I've thunk on it :-).

C++ already handles wide characters.  string is really a typedef of
basic_string<char>, and wstring is a typedef of basic_string<wchar_t>.
All we really need there is a couple of simple typedefs:

#ifdef UNICODE
#define TCHAR wchar_t
#define _T(x) L##x
#else
#define TCHAR char
#define _T(x) x
#endif

typedef basic_string<TCHAR> tstring;

Then, we just use tstring in all places we would use string, and wrap all
of our string constants in the _T macro.  This way, we can switch between
8 and 16-bit characters just by defining UNICODE.  There are probably a
few other things to class this way, like input and output streams, etc.
But they are fairly easy to handle.  FYI, this is (sort of) what windows
programs do to maintain source portability between NT/2000, which support
Unicode throughout, and 95/98 which have very limited Unicode support.

The trick is we need an enhanced basic_string class, that can handle the
useful operations like paramter substitution and search-and-replace.  And
it, or a super-class, should also be able to handle the transparent
language translation.

What I don't know is if Java can handle ANSI strings passed to it from C++
code.  Any comments?

BTW, there are only a couple of problems with 16-bit character strings,
the most important is that tstring.length() != sizeof(wstring.data()).
If you want to know the actual byte-length of a string for an IO
operation, you have to do tstring.length() * sizeof(TCHAR).

-- 
Richard Fish, Unix/Linux Software Engineer, rj...@fi...