From: Stuart B. <stu...@ca...> - 2004-12-03 08:54:11
|
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Ian Bicking wrote: | Max Ischenko wrote: | |>>> Most modern dbs and python drivers handle this problem |>>> transparently. At least, that's my experience. |>> |>> |>> Can you show an example? Using a snippet of code, with a DB API |>> driver... |> |> |> |> Guess not. ;-) |> |> Just checked with psycopg -- it does requre a parameter to be encoded |> into something like utf-8. psycopg2 does it automatically I believe. |> Either my memory make me a disservice or this psycopg is somehow |> broken. ;-) | | I suspect it's something about Unicode in databases being a pain in the | ass. At least, that's what I'm guessing; I've never tried to do it, | I've only stored ASCII and stuff that I treat as though its binary data. We happily throw Unicode strings through SQLObject and it gives us nothing but Unicode strings back. Welcome to the new millenium :-) To do this, we patched DBAPI._executeRetry and the StringValidator class: ~ def _executeRetry(self, conn, cursor, query): ~ if isinstance(query, unicode): ~ query = query.encode('utf8') ~ else: ~ # raise UnicodeError if it is not valid utf8 already ~ query.decode('utf8') ~ return cursor.execute(query) class StringValidator(validators.Validator): ~ def fromPython(self, value, state): ~ if isinstance(value, unicode): ~ return value.encode('utf8') ~ return value ~ def toPython(self, value, state): ~ if isinstance(value, str): ~ return value.decode('utf8') ~ return value This of course should really be done in the PostgreSQL driver somewhere but the above hack is fine for our needs at the moment. And psycopg2 might make it all irrelevant anyway. | I might note when I installed postgres on Debian, it asked me questions | about encoding. This might imply that encoding setup in an | installation-wide (not per-database or per-session) fashion. Then it | also asked me about how I wanted to format my dates. I answered ISO, | but what madness would happen if someone selected US format dates? I | doubt psycopg knows anything about what format date the server is using. | Maybe these are just defaults, and by explicitly setting up the | configuration for the connection you can avoid the madness. PostgreSQL hard codes the locale at initdb time, I believe because the locale you use for collation order affects index creation and cannot be changed. It is a pita, because most people really want to use the C locale and the only way of reverting is to blow away your data directory and recreate it. But the locale has nothing to do with the encoding - you can happily create databases with whatever encoding you like no matter what locale you selected at initdb time. - -- Stuart Bishop <stu...@ca...> http://www.canonical.com/ Canonical Ltd. http://www.ubuntulinux.com/ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.5 (GNU/Linux) iD8DBQFBsCmmAfqZj7rGN0oRAnzHAJwMSwERoXFFSH1Q67PXEoh87A9juQCeOQbh bUtoPdnU0pNrA3/PRnGQB+E= =/jiX -----END PGP SIGNATURE----- |