|
From: Adam B. <ad...@po...> - 2001-11-23 01:34:35
|
Hallo,
I'm also interested in good working of pyPgSQL with various string
encodings. I mainly use ISO 8859-2 at server side and Win CP 1250 or
UTF-8 at client side.
On Thu, Nov 22, 2001 at 06:46:03AM +0100, Gerhard H=E4ring wrote:
> - Changed the PgSQL module to accept also UnicodeType where it accepts
> StringType
It sounds great for me :)
> - Before sending the query string to the libpq module, check if the que=
ry
> string is of type Unicode, if so, encode it via UTF-8 to a StringType=
and
> send this one instead
Well, it should be rather converted into current database client
encoding IMHO. You shouldn't assume that when someone uses Python
unicode strings, he/she wants also to use UNICODE at server side. The
reason is that PostgreSQL still does not handle Unicode/UTF-8
completely (for example, there are problems with Polish diacritical
characters which are absent when only 8-bit encoding is used at server
side).
> - in pgconnection.c, added a read-write attribute clientencoding to the
> PgConnection_Type
I cannot agree with changing anything in pyPgSQL.libpq. It is a
low-level module, which has the same functionality as PostgreSQL
native libpq library. It should only send data to the server and
allow to read results, nothing more. Especially it shouldn't change
character encodings implicitly.
At least changing the way libpq deals with strings, would break some
of my programs. ;((
However, such functionality should be obviously added to pyPgSQL.PgSQL
module. It would be nice to write something like this (an example):
conn =3D PgSQL.connect(database =3D 'dbname',=20
client_encoding =3D 'iso8859-2',
unicode_results =3D 0)
Then the PgSQL module should create a new Connection object, make a
connection to the database, and send:
SET CLIENT_ENCODING TO 'LATIN2';
to the PostgreSQL backend. Later, instructions like:
c =3D conn.cursor()
c.execute(u'select sth from tab where field =3D %s;', u'aaaa')
should change both Unicode strings to ISO 8859-2, perform argument
substitution, and send a query to backend. Results should be left
without change (encoded in client_encoding), unless "unicode_results
=3D=3D 1", when all strings should be converted back to Unicode strings.
Please remember also that it is possible that someone uses PostgreSQL
without unicode and conversion-on-the-fly facilities. In such
circumstances "client_encoding" and "unicode_results" variables should
not be set to anything, and PgSQL should not recode any strings (using
Unicode strings should be illegal) neither send "SET CLIENT_ENCODING"
commands to the backend.
I attached a small Python program which checks how PgSQL works with
various client-backend encodings. I wrote it for Billy G. Allie some
time ago. Feel free to use and modify it, according to Your needs.
Regards,
--=20
Adam Buraczewski <ad...@po...> * Linux registered user #165585
GCS/TW d- s-:+>+:- a- C+++(++++) UL++++$ P++ L++++ E++ W+ N++ o? K? w--
O M- V- PS+ !PE Y PGP+ t+ 5 X+ R tv- b+ DI? D G++ e+++>++++ h r+>++ y?
|