From: Stuart B. <st...@st...> - 2004-12-29 22:56:58
|
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Ian Bicking wrote: | That is true. I'm not sure how to best resolve this. There were a | couple ideas when unicode columns first came up. One was adding an | encoding to converters.UnicodeConverter; really there should be *some* | UnicodeConverter even in converters, and perhaps a default encoding | (which might default to ASCII, which is implicitly the case now). | | Ideally, each column would do its own quoting, so that a UnicodeCol | would know its own encoding. But while that would allow for a database | with multiple encodings (or maybe multiple databases with multiple | encodings), that might not be a common-enough use case. I want to get | rid of .q entirely, and make columns descriptors with a __sqlrepr__ | method; at that point it would be much easier to make this addition. Is there actually a use case for allowing each column to have a different encoding? I know for PostgreSQL it is simply a matter of setting the database encoding to Unicode and sending everything as UTF-8 by simply encoding the entire query (which takes care of other issues like Unicode column names as well). The only use cases I can come up with for your scenario should be usng BINARY columns instead of VARCHAR - - in particular, since the database doesn't know the encoding you are using then all your basic string operations, sorting etc. are now broken. Hmm... perhaps if you need to store text in some encoding that doesn't contain the ASCII character set it might be necessary, but I don't know what character sets these are or if any databases actually support them. I've gone through the list of encodings PostgreSQL supports and they all contain the basic latin letters and can be used to encode SQL statements, so I suspect this is not a requirement. As I previously mentioned on this list, we are using an SQLObject patched to do just this - no need for UnicodeCol at all. Just encode the entire query before sending it to the backend, and decode all strings to Unicode on the way back out. Best practice, and no risk of acidently polluting your database with badly encoded data or booby traps set off when code that assumed ASCII gets who-knows-what encoded data. - -- Stuart Bishop <st...@st...> http://www.stuartbishop.net/ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.5 (GNU/Linux) iD8DBQFB0zYwAfqZj7rGN0oRAjA5AJsExqjU86R8obzdugpYm46WJpurrgCgnELn xVxoOF3PGLo39KdoE6ZbhAo= =q/y6 -----END PGP SIGNATURE----- |