Re: [wxCode-users] Unicode support in DatabaseLayer

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

> 1) I'm currently encoding strings to UTF-8 before sending them to the
> database and converting strings from UTF-8 into the local character
> set when reading from the database.  Is there a better way? 

Using UTF-8 is usually the best way to store Unicode strings in the 
database, unless the database has direct Unicode support. But Unicode 
resp. UTF-8 has its drawbacks, since the number of bytes needed in UTF-
8 is usually greater than the number of Unicode characters in the 
string. In most databases a string field is declared as VARCHAR(n) for 
example where n is the maximum number of characters which can be 
stored. If you store UTF-8 strings n is the number of bytes, not the 
number of characters. A Unicode character may occupy 1, 2, 3 or 4 bytes 
in UTF-8.

> 2) I'd like to add unit tests to make sure that the original Unicode
> text is preserved.  For this I was planning on reading in text files
> of different character set encodings, placing them in the database as
> strings (VARCHAR), reading the values from the database again, and
> comparing the text file to the retrieved value.  Does anyone know a
> good place to find sample Unicode documents for this purpose? 

Maybe the following pages are a good starting point to find examples:

http://www.cl.cam.ac.uk/~mgk25/unicode.html

http://www.i18nguy.com/unicode/unicode-example-intro.html

Regards,

Ulrich
-- 
E-Mail privat:  Ulr...@gm...
E-Mail Studium: Ulr...@Fe...
World Wide Web: http://www.stud.fernuni-hagen.de/q1471341