From: Adam B. <ad...@po...> - 2001-11-23 01:34:35
|
Hallo, I'm also interested in good working of pyPgSQL with various string encodings. I mainly use ISO 8859-2 at server side and Win CP 1250 or UTF-8 at client side. On Thu, Nov 22, 2001 at 06:46:03AM +0100, Gerhard H=E4ring wrote: > - Changed the PgSQL module to accept also UnicodeType where it accepts > StringType It sounds great for me :) > - Before sending the query string to the libpq module, check if the que= ry > string is of type Unicode, if so, encode it via UTF-8 to a StringType= and > send this one instead Well, it should be rather converted into current database client encoding IMHO. You shouldn't assume that when someone uses Python unicode strings, he/she wants also to use UNICODE at server side. The reason is that PostgreSQL still does not handle Unicode/UTF-8 completely (for example, there are problems with Polish diacritical characters which are absent when only 8-bit encoding is used at server side). > - in pgconnection.c, added a read-write attribute clientencoding to the > PgConnection_Type I cannot agree with changing anything in pyPgSQL.libpq. It is a low-level module, which has the same functionality as PostgreSQL native libpq library. It should only send data to the server and allow to read results, nothing more. Especially it shouldn't change character encodings implicitly. At least changing the way libpq deals with strings, would break some of my programs. ;(( However, such functionality should be obviously added to pyPgSQL.PgSQL module. It would be nice to write something like this (an example): conn =3D PgSQL.connect(database =3D 'dbname',=20 client_encoding =3D 'iso8859-2', unicode_results =3D 0) Then the PgSQL module should create a new Connection object, make a connection to the database, and send: SET CLIENT_ENCODING TO 'LATIN2'; to the PostgreSQL backend. Later, instructions like: c =3D conn.cursor() c.execute(u'select sth from tab where field =3D %s;', u'aaaa') should change both Unicode strings to ISO 8859-2, perform argument substitution, and send a query to backend. Results should be left without change (encoded in client_encoding), unless "unicode_results =3D=3D 1", when all strings should be converted back to Unicode strings. Please remember also that it is possible that someone uses PostgreSQL without unicode and conversion-on-the-fly facilities. In such circumstances "client_encoding" and "unicode_results" variables should not be set to anything, and PgSQL should not recode any strings (using Unicode strings should be illegal) neither send "SET CLIENT_ENCODING" commands to the backend. I attached a small Python program which checks how PgSQL works with various client-backend encodings. I wrote it for Billy G. Allie some time ago. Feel free to use and modify it, according to Your needs. Regards, --=20 Adam Buraczewski <ad...@po...> * Linux registered user #165585 GCS/TW d- s-:+>+:- a- C+++(++++) UL++++$ P++ L++++ E++ W+ N++ o? K? w-- O M- V- PS+ !PE Y PGP+ t+ 5 X+ R tv- b+ DI? D G++ e+++>++++ h r+>++ y? |