Re: [Modeling-users] type('') and unicode

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

On mercredi, juil 16, 2003, at 21:35 Europe/Amsterdam, Sebastien =20
Bigaret wrote:
> Yannick Gingras <yan...@sa...> wrote:

...

>> Indeed it does the job but as it was discussed some time ago on the
>> mailling list, it does not enable case insensitive match.  A case
>> insentivice match with u"=E9=E9" encoded in utf-8 with look for =
"=C3=A9=C3=A9",
>> "=E3=A9=C3=A9", "=C3=A9=E3=A9" and "=E3=A9=E3=A9" wich does not make =
any sens once put back in
>> unicode.  "=E9=E9", "=C9=E9" and "=C9=C9" are respectivly "=C3=A9=C3=A9=
", "=C3=C3=A9" and "=C3=C3"
>> once encoded.
>>
>> So it may be wise to let the user make the utf-8 trick.  That way he
>> won't blame you for the weird result of case insensitive match.  On
>> the other hand, some databases like Postgresql detect encoding and
>> perform a descent case insitive match with utf-8 data.
>
> This needs investigation. If some of you could provide working python
> code with unicode and psycopg/pypgsql/pgdb/mysqldb/sqlitedb, please
> share. I've not time for this now.

The option I had adopted when I came across this problem was to work
completely in utf-8, from front to back. [For web, this is easily done =20=

by
encoding the pages in utf-8, i.e. setting the response header:
setHeader('Content-Type', 'text/html; charset=3Dutf-8').]
Thus, the modeling layer always gets utf-8, which is exchanged with
db as is. This, however, means that case insensitive searches do not
work -- if you case-insensitive search for =E9 or =C9, you will only
get one or the other. This is not nice, but can live with it, at least
for now.

But, it is possible to work in unicode on the client side, and Postgres
allows unicode queries, i.e. the sql query itself is in unciode.
The data in the DB being in utf-8, I would expect that
u"select * from sometable  where upper(someprop) like upper('%=E9%')"
would give all rows where someprop contains =E9 or =C9.
But, no, does not seem to work... at least I have not figured
it out yet. If anyone wants to play, I have tried to understand
this with the code below (working without modeling):

'''
Assume an i18nText table, that contains at least the 2 rows:
en		fr		...
key		cl=E9
KEY		CL=C9
'''

#
dbname =3D
dbuser=3D
dbpass =3D
#
from pyPgSQL import PgSQL
con =3D =20
PgSQL.connect(database=3Ddbname,user=3Ddbuser,password=3Ddbpass,client_enc=
odin=20
g=3D('utf-8','replace'),unicode_results=3D1)
cur =3D con.cursor()
cur.execute('SET CLIENT_ENCODING TO UNICODE')
cur.execute(u"SELECT fr FROM i18nText WHERE en =3D 'key' ")
_dbrset =3D cur.fetchall()
match_on =3D _dbrset[0][0] # evaluates to unicode string "cl=E9"
cur.execute(u"SELECT * FROM i18nText WHERE upper(fr) LIKE upper('%" =20
+match_on+ "%') " )
dbrset =3D cur.fetchall()
cur.close()
#

However, this only returns the row for fr=3D'cl=E9'.
If i change to match on en=3D'KEY', the uppercase row for fr=3D'CL=C9' =
is =20
returned.
Is this the behaviour that should I should expect?
How should the upper function behave on unicode strings?

(Note that I have sys.setdefaultencoding('utf-8') in my =20
sitecustomize.py,
and I am not sure about all effects that this does. As for any other
special settings on the DB, i do not remember any.)

mario