From: Ruslan N. <syl...@nm...> - 2004-07-30 23:44:16
|
Again??? Please not I would not like to discuss the problem now. But if you want to listen real arguments... read it: NOTE: There are two modifications of UTF-16. I prefer variant that require always 2 bytes for symbols (of course only Basic Multilanguage Plan symbols). All my arguments only for this variant (2-byte fixed lenght). Arguments: 1. Perfomance. UTF-16 works too quickly than UTF-8. 32-bit processor handle 32-bit numbers at a time. Using 1 byte, 2 byte and 4 byte for 32-bit processor require same time. If you are using UTF-8 than processor will handle only 1 byte for a time + time for translating it to real UNICODE number. 2. Random access. You can random access any symbol using UTF-16. If you are using UTF-8 you should evalute symbol place every time (it's too slowly). 3. UTF-8 is only kludge for backward compatibility with ASCII. 4. UTF-16 doesn't require space for text restore (only limited number of symbols will be lost) while UTF-8 uses each additional bit for each byte for this purpose. 5. Copying strings is better with UTF-16, because you can copy 2 byte at a time (glibc contains such functions especially for wchar_t). Of course copying English texts will require a same time as UTF-8 but for another languages it will be better with UTF-16. 6. Searching text better UTF-16. Simular arguments. Handling 2 byte at a time. I haven't time to list all "+" arguments of using UTF-16. As I said UTF-8 is only kludge and no more. Big endian and little endian byte orders is not big problem for internal OS using. Of course UTF-8 is endian independent but it requires too much time for handling it. You can use UTF-16 Big Endian at x86 computer and handle it quickly than UTF-8. By the way: you can also reassign (virtual reassigning) symbols: 0x0020 as 0x2000 and etc. in this case UTF-16BE will work as well as UTF-16LE at x86 computer. PLEASE STOP DISCUSSION ABOUT IT!!! |