Re: [Modeling-users] type('') and unicode
Status: Abandoned
Brought to you by:
sbigaret
From: Yannick G. <ygi...@yg...> - 2003-07-16 22:05:27
|
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Wednesday 16 July 2003 17:46, you wrote: > Before I appear too stupid when trying things: UTF-8 is just a > specific way of encoding unicode in eight bits, right, i.e. one > character is translated to a serie of one to many characters, right? > It has nothing to do w/ ISO-8859-1, ISO-8859-15 etc. which are just > correspondance table between a number coded in one byte and a given > character, right? Yup, ISO-8859-X are a one byte encoding with a limited subset of characters where UTF-8 put a unicode char on one or more 8 bits byte (hence the 8 in it's name). A nice property of UTF-8 is that the 1st 127 chars are encoded one a single char and keep the same char code as it's ASCII equivalent. So a pure ASCII string (No ISO-8859-X here) is 100% valid UTF-8. UTF-8 is not the most efficient encoding for unicode but this particular compatibility makes it the most wide spread. UTF-8 is NOT compatible with Latin-1 (aka ISO-8859-1). Most Latin-1 chars with char code over 127 are encore on 2 bytes in UTF-8. Why UTF-8 and not pure unicode ? Because everything is damn too buggy to handle it ! The C++ coder will in fact want to die converting all those std::string declarations and the C coder simply can't use pointer arithmetic anymore (the width is not fixed). Luckily we use Python ! : D - -- Yannick Gingras Coder for OBB : Offside Bumptious Bastnaesite http://OpenBeatBox.org -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.0.7 (GNU/Linux) iD8DBQE/FVdJrhy5Fqn/MRARAi6fAJ43CankWZ3TxHzm+Tymmi0cEL2gtACcDcFE BRa45X6SPEGU4Y1RaSWRBWk= =BBLu -----END PGP SIGNATURE----- |