Menu

how to use unicode?

2004-05-07
2012-09-26
  • Nobody/Anonymous

    i am a newbie of c++,i want to use unicode in my app,but i don't know how to use it .any examples,
    please! and i want the standard c++,not windows.
    thanks!
    my english is not very good. i am sorry.

     
    • Anonymous

      Anonymous - 2004-05-07

      There is no standard for Unicode in C++. Support for Unicode is OS specific.

      http://msdn.microsoft.com/library/default.asp?url=/library/en-us/vccore98/HTML/_core_unicode_programming_tasks.asp

      Clifford

       
    • aditsu

      aditsu - 2004-05-07

      there IS "standard for Unicode in C++"!
      but the libg++ implementation lacks support for it :(
      a unicode character is wchar_t
      a unicode string is wstring
      for writing in unicode to standard output use wcout
      etc. etc.
      there are unicode functions in C too: wcslen, wcscpy, wprintf etc.

      Adrian

       
    • aditsu

      aditsu - 2004-05-07

      however, the standard calls them "wide characters" instead of "unicode"
      AFAIK, they're the same thing (only the encoding is not specified)

      Adrian

       
    • Anonymous

      Anonymous - 2004-05-07

      Maybe I stand corrected, but I am not certain that wide characters and Unicode are necessarily the same thing. However In Windows they are. See for example the MSDN page for printf/wprintf http://msdn.microsoft.com/library/default.asp?url=/library/en-us/vclib/html/_crt_printf.2c_.wprintf.asp

      MinGW uses MSCVRT.DLL (the Microsoft C runtime library), this does have support for the wide character library (include <wchar.h>). wchar_thttp://msdn.microsoft.com/library/default.asp?url=/library/en-us/vclang98/html/WCHAR.asp.

      Windows also supports Multi-byte Character Sets (MBCS), which is also not Unicode.

      While MinGW uses Microsoft's C run-time, it uses the GNU C++ library, so you are likely to be restricted to the C library for Unicode support.

      Clifford.

       
    • Nobody/Anonymous

      Thank you ,Clifford!Good guy.

       
    • Nobody/Anonymous

      Unicode and wide character are not the same thing!!

      Wide character can have more than bytes. You can say Unicode is wide character but you can't say wide character is unicode!

       
      • Anonymous

        Anonymous - 2004-05-10

        I think I said that (more or less). The msvcrt library implements wide characters using Unicode. This is not to say that all other development tools or platforms will also do so. Caution need be applied!

        Clifford

         
    • Nobody/Anonymous

      There are several encodings under the Name unicode.
      There is UTF-8, UTF-16, UTF-32 and others.
      UTF-8 is a multibyte encoding, which means the amount of bytes for one character is not always the same.
      All characters present in ASCII only take 1 byte int UTF8 and are identical to the ASCII values while other characters take 2 bytes.
      UTF-16 is a widecharacter encoding, so every character has always 2 bytes, which makes files bigger but a lot of things are easier (you'll know why when you try to work with UTF-8).
      Windows only supports UTF-16, uses it internally and calls it UNICODE as if there was nothing else.

      To store a UTF-8 string you can use "char" as you used to, to store UTF-16 you have to use "wchar_t".
      Most methods like strcpy/wcscpy, strcmp/wcscmp and so on do not care what encoding is used inside char/wchar_t, they don't know and they don't have to.

      For most Windows-API methods there are 2 Versions, one that handles ASCII Characters (They end on an A like MessageBoxA) and one that handles UTF-16 (that end on a W like MessageBoxW).
      The functions you usually use (MessageBox) are nothing more than an alias to one of those functions mentioned above. Usually, they point the ASCII-version, so if you call MessageBox in your program, it will interpret strings as ascii-strings and won't work with wide character strings.
      To use Unicode, you have to define "UNICODE" (-DUNICODE as a compiler option).
      This will make all windows-headers use the unicode-version of the methods.

       
      • Anonymous

        Anonymous - 2004-05-10

        Such an elequent explanation, such a shame you chose to post anomously and without signing.

        Thanks anyway.

        Clifford

         
        • aventura_alex

          aventura_alex - 2007-08-10

          steps to use unicode safely? is there any?
          I read in MSDN that:
          by defining UNICODE the functions in Windows are properly called. the ones that have the W version.
          TCHAR variables and other windows variables with a T in their names should be preferred to allow easy switch between both ansi and unicode types.
          TEXT("a literal") here, in Windows also would convert the literal to UNICODE when necessary.

          for C, it seems we can do nearly the same with:

          define _UNICODE

          what would make some variables and generic function names defined in TCHAR.H evaluate accordingly to the names we are used to when using ansi or unicode different names when _UNICODE is defined.
          again, it seems WINDOWS has been following the same ideas C used since there is also a _TEXT() or _T() macro that transform literals the same way windows does with its TEXT().
          again, C also has a TCHAR evaluating either to char or wchar_t.
          ok, coming that far, probably I may have typed something wrong, got something wrong and I highly advise anyone reading this to check all that on msdn pages and <tchar.h> header file.
          my big question now is:
          what about C++?
          I reckon it will benefit from all these things C has.
          but what about the streams? according to a book I've been reading, it does treat wide chars since any stringstream is a typedef basic_stringstream<char> stringstream and this being done to all stream classes one can easily access all of them with for instance: stringstream<wchar_t> or by its respectives typedefs: wstringstream in this case.
          the thing is
          it doesn't look that neat to use stringstream<TCHAR> every time we need to use a stream of string. while I hope it will work fine most of the time, the book I've been reading this from even suggests that when you'r to define your own inserters you'd better do that with the template form.
          I then was just wonders is that the way to go? am I missing something?
          in fact I hadn't even noticed till I wrote this here, that not so ugly form: stringstream<TCHAR>. Kinda like it already :P but I'm still wondering if this is the way to go or it is flawed in some way and even starting to wonder if it is worth the effort to make a code UNICODE friendly :P
          the strongest thing to me in favor of it is that as the libraries grow thicker and wild, it may become an herculean job not to have done that in earlier stages of development.

          just a newbie here trying to get what I got right, what I got wrong and sorry for any misinformation that is very likely but I hope you have the ability to check or ask if I didn't type it wrong :)

          I wonder if I can use all of these thing in the DEV C++ environment.
          thanks to all for your attention and patience :)

           
          • aventura_alex

            aventura_alex - 2007-08-10

            sorry, mistake detected.
            where I wrote stringstream<TCHAR>, you should read basic_stringstream<TCHAR>
            hmm, it is beginning to look ugly again :P

             
            • Anonymous

              Anonymous - 2007-08-10

              stringstream is defined in terms of basic_stringstream<>, you can hide the 'ugliness' by creating your own derived class - perhaps tstringstream.

              BTW digging up and tagging on to a three year old thread on a vaguely related subject is not a good idea. Yours is a new question, start a new thread. The danger is that someone might start answering the original question!

              Clifford

               
    • aventura_alex

      aventura_alex - 2007-08-10

      thanks for the hint. I hadn't noticed it was that old till I had posted it :P
      I have a question that will follow up in a new thread though.

       
    • Kip

      Kip - 2007-08-13

      Just use wxString. Unicode on Windows is busted to shit.

      Start here:
      http://www.ubuntu.com/download

      Kip

       

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.