From: Gerhard <ger...@gm...> - 2001-11-22 05:46:13
|
Ok, maybe I'll just describe what I've done so far (locally). - Changed the PgSQL module to accept also UnicodeType where it accepts StringType - Before sending the query string to the libpq module, check if the query string is of type Unicode, if so, encode it via UTF-8 to a StringType and send this one instead - in pgconnection.c, added a read-write attribute clientencoding to the PgConnection_Type All of this works pretty well so far, for example the following works as expected (never mind if you see weird chars, it's 'Internet' in Russian KOI-8 encoding): #!/usr/bin/env python from pyPgSQL import PgSQL con = PgSQL.connect(database="testu") cursor = con.cursor() name = unicode("éÎÔÅÒÎÅÔ", "koi8-r") # 'Internet' in Russian cursor.execute("insert into gh (name) values ('%s')" % name) print con.conn.clientencoding # 'UNICODE' con.conn.clientencoding = 'KOI8' print con.conn.clientencoding # 'KOI-8' cursor.execute("select * from gh") print cursor.fetchone()[0] # works, is automatically converted For languages that cannot be encoded in 8 bits, I fear it will get more complicated. So I propose the following: - Strings sent to the backend: Unicode is encoded as UTF-8. StringType is sent as-is like before (with escaping as needed). If people set the clientencoding, PostgreSQL will even do the charset conversion (to Unicode or whatever) for them. - Strings retrieved from the backend: If the client-encoding is UNICODE, strings are always retrieved as UnicodeType. This is a major change, but it's IMO necessary to make using east-asian languages possible at all. If people want to receive StringType but the data can possibly be Unicode, they have to set the client-encoding accordingly. For German, I'd have to set clientencoding to 'LATIN1', for example. - If the PostgreSQL client-encoding is any of the special non-Unicode ones like SJIS, BIG5 or whatever, major reality failure happens ;-) I have no idea about these encodings, and neither has Python. Gerhard -- mail: gerhard <at> bigfoot <dot> de registered Linux user #64239 web: http://www.cs.fhm.edu/~ifw00065/ OpenPGP public key id 86AB43C0 public key fingerprint: DEC1 1D02 5743 1159 CD20 A4B6 7B22 6575 86AB 43C0 reduce(lambda x,y:x+y,map(lambda x:chr(ord(x)^42),tuple('zS^BED\nX_FOY\x0b'))) |