i am a newbie of c++,i want to use unicode in my app,but i don't know how to use it .any examples,
please! and i want the standard c++,not windows.
thanks!
my english is not very good. i am sorry.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Anonymous
-
2004-05-07
There is no standard for Unicode in C++. Support for Unicode is OS specific.
there IS "standard for Unicode in C++"!
but the libg++ implementation lacks support for it :(
a unicode character is wchar_t
a unicode string is wstring
for writing in unicode to standard output use wcout
etc. etc.
there are unicode functions in C too: wcslen, wcscpy, wprintf etc.
Adrian
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
MinGW uses MSCVRT.DLL (the Microsoft C runtime library), this does have support for the wide character library (include <wchar.h>). wchar_thttp://msdn.microsoft.com/library/default.asp?url=/library/en-us/vclang98/html/WCHAR.asp.
Windows also supports Multi-byte Character Sets (MBCS), which is also not Unicode.
While MinGW uses Microsoft's C run-time, it uses the GNU C++ library, so you are likely to be restricted to the C library for Unicode support.
Clifford.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Unicode and wide character are not the same thing!!
Wide character can have more than bytes. You can say Unicode is wide character but you can't say wide character is unicode!
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Anonymous
-
2004-05-10
I think I said that (more or less). The msvcrt library implements wide characters using Unicode. This is not to say that all other development tools or platforms will also do so. Caution need be applied!
Clifford
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
There are several encodings under the Name unicode.
There is UTF-8, UTF-16, UTF-32 and others.
UTF-8 is a multibyte encoding, which means the amount of bytes for one character is not always the same.
All characters present in ASCII only take 1 byte int UTF8 and are identical to the ASCII values while other characters take 2 bytes.
UTF-16 is a widecharacter encoding, so every character has always 2 bytes, which makes files bigger but a lot of things are easier (you'll know why when you try to work with UTF-8).
Windows only supports UTF-16, uses it internally and calls it UNICODE as if there was nothing else.
To store a UTF-8 string you can use "char" as you used to, to store UTF-16 you have to use "wchar_t".
Most methods like strcpy/wcscpy, strcmp/wcscmp and so on do not care what encoding is used inside char/wchar_t, they don't know and they don't have to.
For most Windows-API methods there are 2 Versions, one that handles ASCII Characters (They end on an A like MessageBoxA) and one that handles UTF-16 (that end on a W like MessageBoxW).
The functions you usually use (MessageBox) are nothing more than an alias to one of those functions mentioned above. Usually, they point the ASCII-version, so if you call MessageBox in your program, it will interpret strings as ascii-strings and won't work with wide character strings.
To use Unicode, you have to define "UNICODE" (-DUNICODE as a compiler option).
This will make all windows-headers use the unicode-version of the methods.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Anonymous
-
2004-05-10
Such an elequent explanation, such a shame you chose to post anomously and without signing.
Thanks anyway.
Clifford
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
steps to use unicode safely? is there any?
I read in MSDN that:
by defining UNICODE the functions in Windows are properly called. the ones that have the W version.
TCHAR variables and other windows variables with a T in their names should be preferred to allow easy switch between both ansi and unicode types.
TEXT("a literal") here, in Windows also would convert the literal to UNICODE when necessary.
for C, it seems we can do nearly the same with:
define _UNICODE
what would make some variables and generic function names defined in TCHAR.H evaluate accordingly to the names we are used to when using ansi or unicode different names when _UNICODE is defined.
again, it seems WINDOWS has been following the same ideas C used since there is also a _TEXT() or _T() macro that transform literals the same way windows does with its TEXT().
again, C also has a TCHAR evaluating either to char or wchar_t.
ok, coming that far, probably I may have typed something wrong, got something wrong and I highly advise anyone reading this to check all that on msdn pages and <tchar.h> header file.
my big question now is:
what about C++?
I reckon it will benefit from all these things C has.
but what about the streams? according to a book I've been reading, it does treat wide chars since any stringstream is a typedef basic_stringstream<char> stringstream and this being done to all stream classes one can easily access all of them with for instance: stringstream<wchar_t> or by its respectives typedefs: wstringstream in this case.
the thing is
it doesn't look that neat to use stringstream<TCHAR> every time we need to use a stream of string. while I hope it will work fine most of the time, the book I've been reading this from even suggests that when you'r to define your own inserters you'd better do that with the template form.
I then was just wonders is that the way to go? am I missing something?
in fact I hadn't even noticed till I wrote this here, that not so ugly form: stringstream<TCHAR>. Kinda like it already :P but I'm still wondering if this is the way to go or it is flawed in some way and even starting to wonder if it is worth the effort to make a code UNICODE friendly :P
the strongest thing to me in favor of it is that as the libraries grow thicker and wild, it may become an herculean job not to have done that in earlier stages of development.
just a newbie here trying to get what I got right, what I got wrong and sorry for any misinformation that is very likely but I hope you have the ability to check or ask if I didn't type it wrong :)
I wonder if I can use all of these thing in the DEV C++ environment.
thanks to all for your attention and patience :)
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
sorry, mistake detected.
where I wrote stringstream<TCHAR>, you should read basic_stringstream<TCHAR>
hmm, it is beginning to look ugly again :P
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Anonymous
-
2007-08-10
stringstream is defined in terms of basic_stringstream<>, you can hide the 'ugliness' by creating your own derived class - perhaps tstringstream.
BTW digging up and tagging on to a three year old thread on a vaguely related subject is not a good idea. Yours is a new question, start a new thread. The danger is that someone might start answering the original question!
Clifford
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
i am a newbie of c++,i want to use unicode in my app,but i don't know how to use it .any examples,
please! and i want the standard c++,not windows.
thanks!
my english is not very good. i am sorry.
There is no standard for Unicode in C++. Support for Unicode is OS specific.
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/vccore98/HTML/_core_unicode_programming_tasks.asp
Clifford
there IS "standard for Unicode in C++"!
but the libg++ implementation lacks support for it :(
a unicode character is wchar_t
a unicode string is wstring
for writing in unicode to standard output use wcout
etc. etc.
there are unicode functions in C too: wcslen, wcscpy, wprintf etc.
Adrian
however, the standard calls them "wide characters" instead of "unicode"
AFAIK, they're the same thing (only the encoding is not specified)
Adrian
Maybe I stand corrected, but I am not certain that wide characters and Unicode are necessarily the same thing. However In Windows they are. See for example the MSDN page for printf/wprintf http://msdn.microsoft.com/library/default.asp?url=/library/en-us/vclib/html/_crt_printf.2c_.wprintf.asp
MinGW uses MSCVRT.DLL (the Microsoft C runtime library), this does have support for the wide character library (include <wchar.h>). wchar_thttp://msdn.microsoft.com/library/default.asp?url=/library/en-us/vclang98/html/WCHAR.asp.
Windows also supports Multi-byte Character Sets (MBCS), which is also not Unicode.
While MinGW uses Microsoft's C run-time, it uses the GNU C++ library, so you are likely to be restricted to the C library for Unicode support.
Clifford.
Thank you ,Clifford!Good guy.
Unicode and wide character are not the same thing!!
Wide character can have more than bytes. You can say Unicode is wide character but you can't say wide character is unicode!
I think I said that (more or less). The msvcrt library implements wide characters using Unicode. This is not to say that all other development tools or platforms will also do so. Caution need be applied!
Clifford
There are several encodings under the Name unicode.
There is UTF-8, UTF-16, UTF-32 and others.
UTF-8 is a multibyte encoding, which means the amount of bytes for one character is not always the same.
All characters present in ASCII only take 1 byte int UTF8 and are identical to the ASCII values while other characters take 2 bytes.
UTF-16 is a widecharacter encoding, so every character has always 2 bytes, which makes files bigger but a lot of things are easier (you'll know why when you try to work with UTF-8).
Windows only supports UTF-16, uses it internally and calls it UNICODE as if there was nothing else.
To store a UTF-8 string you can use "char" as you used to, to store UTF-16 you have to use "wchar_t".
Most methods like strcpy/wcscpy, strcmp/wcscmp and so on do not care what encoding is used inside char/wchar_t, they don't know and they don't have to.
For most Windows-API methods there are 2 Versions, one that handles ASCII Characters (They end on an A like MessageBoxA) and one that handles UTF-16 (that end on a W like MessageBoxW).
The functions you usually use (MessageBox) are nothing more than an alias to one of those functions mentioned above. Usually, they point the ASCII-version, so if you call MessageBox in your program, it will interpret strings as ascii-strings and won't work with wide character strings.
To use Unicode, you have to define "UNICODE" (-DUNICODE as a compiler option).
This will make all windows-headers use the unicode-version of the methods.
Such an elequent explanation, such a shame you chose to post anomously and without signing.
Thanks anyway.
Clifford
steps to use unicode safely? is there any?
I read in MSDN that:
by defining UNICODE the functions in Windows are properly called. the ones that have the W version.
TCHAR variables and other windows variables with a T in their names should be preferred to allow easy switch between both ansi and unicode types.
TEXT("a literal") here, in Windows also would convert the literal to UNICODE when necessary.
for C, it seems we can do nearly the same with:
define _UNICODE
what would make some variables and generic function names defined in TCHAR.H evaluate accordingly to the names we are used to when using ansi or unicode different names when _UNICODE is defined.
again, it seems WINDOWS has been following the same ideas C used since there is also a _TEXT() or _T() macro that transform literals the same way windows does with its TEXT().
again, C also has a TCHAR evaluating either to char or wchar_t.
ok, coming that far, probably I may have typed something wrong, got something wrong and I highly advise anyone reading this to check all that on msdn pages and <tchar.h> header file.
my big question now is:
what about C++?
I reckon it will benefit from all these things C has.
but what about the streams? according to a book I've been reading, it does treat wide chars since any stringstream is a typedef basic_stringstream<char> stringstream and this being done to all stream classes one can easily access all of them with for instance: stringstream<wchar_t> or by its respectives typedefs: wstringstream in this case.
the thing is
it doesn't look that neat to use stringstream<TCHAR> every time we need to use a stream of string. while I hope it will work fine most of the time, the book I've been reading this from even suggests that when you'r to define your own inserters you'd better do that with the template form.
I then was just wonders is that the way to go? am I missing something?
in fact I hadn't even noticed till I wrote this here, that not so ugly form: stringstream<TCHAR>. Kinda like it already :P but I'm still wondering if this is the way to go or it is flawed in some way and even starting to wonder if it is worth the effort to make a code UNICODE friendly :P
the strongest thing to me in favor of it is that as the libraries grow thicker and wild, it may become an herculean job not to have done that in earlier stages of development.
just a newbie here trying to get what I got right, what I got wrong and sorry for any misinformation that is very likely but I hope you have the ability to check or ask if I didn't type it wrong :)
I wonder if I can use all of these thing in the DEV C++ environment.
thanks to all for your attention and patience :)
sorry, mistake detected.
where I wrote stringstream<TCHAR>, you should read basic_stringstream<TCHAR>
hmm, it is beginning to look ugly again :P
stringstream is defined in terms of basic_stringstream<>, you can hide the 'ugliness' by creating your own derived class - perhaps tstringstream.
BTW digging up and tagging on to a three year old thread on a vaguely related subject is not a good idea. Yours is a new question, start a new thread. The danger is that someone might start answering the original question!
Clifford
thanks for the hint. I hadn't noticed it was that old till I had posted it :P
I have a question that will follow up in a new thread though.
Just use wxString. Unicode on Windows is busted to shit.
Start here:
http://www.ubuntu.com/download
Kip