From: Ulrich T. <Ulr...@gm...> - 2005-12-03 16:27:36
|
> 1) I'm currently encoding strings to UTF-8 before sending them to the > database and converting strings from UTF-8 into the local character > set when reading from the database. Is there a better way? Using UTF-8 is usually the best way to store Unicode strings in the database, unless the database has direct Unicode support. But Unicode resp. UTF-8 has its drawbacks, since the number of bytes needed in UTF- 8 is usually greater than the number of Unicode characters in the string. In most databases a string field is declared as VARCHAR(n) for example where n is the maximum number of characters which can be stored. If you store UTF-8 strings n is the number of bytes, not the number of characters. A Unicode character may occupy 1, 2, 3 or 4 bytes in UTF-8. > 2) I'd like to add unit tests to make sure that the original Unicode > text is preserved. For this I was planning on reading in text files > of different character set encodings, placing them in the database as > strings (VARCHAR), reading the values from the database again, and > comparing the text file to the retrieved value. Does anyone know a > good place to find sample Unicode documents for this purpose? Maybe the following pages are a good starting point to find examples: http://www.cl.cam.ac.uk/~mgk25/unicode.html http://www.i18nguy.com/unicode/unicode-example-intro.html Regards, Ulrich -- E-Mail privat: Ulr...@gm... E-Mail Studium: Ulr...@Fe... World Wide Web: http://www.stud.fernuni-hagen.de/q1471341 |