Re: [Modeling-users] type('') and unicode

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Wednesday 16 July 2003 17:46, you wrote:
>   Before I appear too stupid when trying things: UTF-8 is just a
>   specific way of encoding unicode in eight bits, right, i.e. one
>   character is translated to a serie of one to many characters, right?
>   It has nothing to do w/ ISO-8859-1, ISO-8859-15 etc. which are just
>   correspondance table between a number coded in one byte and a given
>   character, right?

Yup, ISO-8859-X are a one byte encoding with a limited subset of
characters where UTF-8 put a unicode char on one or more 8 bits byte
(hence the 8 in it's name).  A nice property of UTF-8 is that the 1st
127 chars are encoded one a single char and keep the same char code as
it's ASCII equivalent.  So a pure ASCII string (No ISO-8859-X here) is
100% valid UTF-8.  UTF-8 is not the most efficient encoding for
unicode but this particular compatibility makes it the most wide
spread.  UTF-8 is NOT compatible with Latin-1 (aka ISO-8859-1).  Most
Latin-1 chars with char code over 127 are encore on 2 bytes in UTF-8.

Why UTF-8 and not pure unicode ?  Because everything is damn too buggy
to handle it !  The C++ coder will in fact want to die converting all
those std::string declarations and the C coder simply can't use
pointer arithmetic anymore (the width is not fixed).

Luckily we use Python ! 

 : D

- -- 
Yannick Gingras
Coder for OBB : Offside Bumptious Bastnaesite
http://OpenBeatBox.org
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.7 (GNU/Linux)

iD8DBQE/FVdJrhy5Fqn/MRARAi6fAJ43CankWZ3TxHzm+Tymmi0cEL2gtACcDcFE
BRa45X6SPEGU4Y1RaSWRBWk=
=BBLu
-----END PGP SIGNATURE-----