Thread: [Modeling-users] type('') and unicode
Status: Abandoned
Brought to you by:
sbigaret
From: Yannick G. <yan...@sa...> - 2003-07-16 15:08:38
|
=2D----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 I stumbled upon some "if type(foo)=3D=3Dtype(''):" in the code=20 (grep -r -E "type\(''\)" .). This fail to match unicode that behave like as string but is not the same type : >>> type(u'') =3D=3D type('') 0 Is this some kind of obscure feature ? This is used in the Qualifier code and it seems likely to me that someone will eventually try to make a fetch with unicode. In fact I might just try that right now... UnicodeError: ASCII encoding error: ordinal not in range(128) Argh ! there is "type(foo) in (type(''), type(u''))" or I could encode my query or who knows what. Since some RDMS (aka MySQL) choke on unicode, maybe it would be best to have every queries encoded in utf-8 but I prefer to have your opinion 1st. =2D --=20 Yannick Gingras Byte Gardener, Savoir-faire Linux inc. (514) 276-5468 =2D----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.1 (GNU/Linux) iD8DBQE/FWpurhy5Fqn/MRARAkmgAJ4n/Dvx9i/t560r20gUhYR5ZCO63wCgjEDN Qf/VDsHVZu19gxfI3wq7Hgc=3D =3DqKKk =2D----END PGP SIGNATURE----- |
From: Sebastien B. <sbi...@us...> - 2003-07-16 15:39:48
|
Yannick Gingras <yan...@sa...> writes: > I stumbled upon some "if type(foo)=3D=3Dtype(''):" in the code=20 > (grep -r -E "type\(''\)" .). This fail to match unicode that behave > like as string but is not the same type : >=20 > >>> type(u'') =3D=3D type('') > 0 >=20 > Is this some kind of obscure feature ? >=20 > This is used in the Qualifier code and it seems likely to me that > someone will eventually try to make a fetch with unicode. In fact I > might just try that right now... >=20 > UnicodeError: ASCII encoding error: ordinal not in range(128) >=20 > Argh ! Funny... There's a message from you in the archives (20 Apr 2003, thread is named "Working with unicode", I remembered Mario also discussed this there) suggesting that this was working... Did I misundertand what you were saying? In fact, this surprised me a lot since I've never made anything particular to support unicode. Given that python unicode support is not particularly wonderful (well, it wasn't when I looked at it 1 1/2 year ago: I had to dive in the code to find the encoders/decoders, the documentation was almost inexistent, and to end with everything was messed up in my mind), so I just didn't care --and never needed it to be honest (except for xml models because we were putting latin1 characters in them at that time). > there is "type(foo) in (type(''), type(u''))" or I could encode my > query or who knows what. Since some RDMS (aka MySQL) choke on > unicode, maybe it would be best to have every queries encoded in utf-8 > but I prefer to have your opinion 1st. As you can see, my opinion is that I have no opinion :/ I don't even know how the different database *and* the different python db-adaptors behave, and I must admit that I do not really want to look at that. You're right by saying such tests for strings should be made against regular and unicode strings, but I suspect this is only the easiest part of it. My opinion, though... I've been bitten by unicode too hard to be really objective about it. So if you feel like looking at these things and summarize them (either by proposing a procedure for using unicode w/ the framework, or by submitting patches), I'll be happy to collaborate to the best of my knowledge --again, this would imply my knowledge on the framework mainly, because my unicode background is something like... empty... Others interested in this topic may react here too. Regards, -- S=E9bastien. |
From: <so...@la...> - 2003-07-16 15:48:08
|
On Wed, Jul 16, 2003 at 11:08:30AM -0400, Yannick Gingras wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > > I stumbled upon some "if type(foo)==type(''):" in the code > (grep -r -E "type\(''\)" .). This fail to match unicode that behave > like as string but is not the same type : > > >>> type(u'') == type('') > 0 > > Is this some kind of obscure feature ? > > This is used in the Qualifier code and it seems likely to me that > someone will eventually try to make a fetch with unicode. In fact I > might just try that right now... > > UnicodeError: ASCII encoding error: ordinal not in range(128) > > Argh ! U get this cause you try to to something like this ' %s ' % my_unicode Could you please give us a bigger traceback. I talk w/ Sebastian last week about this. In fact i got some trouble w/ mysql and unicode (not using modeling), and ask him what he did to cover this issue in modeling. It's really seem that modeling doesn't take care about unicode. (Read : Seb hasn't done so much test about unicode ) I haven't done so much test but i think that stuff like enabling LOG will generate a lot of Unicode traceback > there is "type(foo) in (type(''), type(u''))" or I could encode my > query or who knows what. Since some RDMS (aka MySQL) choke on > unicode, maybe it would be best to have every queries encoded in utf-8 > but I prefer to have your opinion 1st. Another trick that might help you is that MySQL 4.0 support unicode in query. but the MySQLDB don't by default. In fact you can pass a special encoding charset at connection but you need to have a latest version of the package ( I have to re-build this from source on my debian since it isn't in the default unstable install) Bye Bye . |
From: Yannick G. <yan...@sa...> - 2003-07-16 18:06:06
|
=2D----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On July 16, 2003 11:44 am, you wrote: > > UnicodeError: ASCII encoding error: ordinal not in range(128) > > > > Argh ! > > U get this cause you try to to something like this > ' %s ' % my_unicode It seems to be fine here : >>> "%s" % u"=E9=E9" u'\xe9\xe9' it's the print (or write()) that may choke : >>> print u"=E9=E9" Traceback (most recent call last): File "<stdin>", line 1, in ? UnicodeError: ASCII encoding error: ordinal not in range(128) > Could you please give us a bigger traceback. > I talk w/ Sebastian last week about this. In fact i got some > trouble w/ mysql and unicode (not using modeling), and ask > him what he did to cover this issue in modeling. sure ! File "/usr/lib/python2.2/site-packages/Modeling/EditingContext.py", line 1304, in fetch return self.objectsWithFetchSpecification(fs) File "/usr/lib/python2.2/site-packages/Modeling/EditingContext.py", line 1218, in objectsWithFetchSpecification objects=3Dself.parentObjectStore().objectsWithFetchSpecification(fs, ec) File "/usr/lib/python2.2/site-packages/Modeling/ObjectStoreCoordinator.py= ", line 420, in objectsWithFetchSpecification return store.objectsWithFetchSpecification(aFetchSpecification, anEditingContext) File "/usr/lib/python2.2/site-packages/Modeling/DatabaseContext.py", line 1521, in objectsWithFetchSpecification anEditingContext) File "/usr/lib/python2.2/site-packages/Modeling/DatabaseChannel.py", line 381, in selectObjectsWithFetchSpecification entity) File "/usr/lib/python2.2/site-packages/Modeling/DatabaseAdaptors/AbstractDBAPI2A= daptorLayer/AbstractDBAPI2AdaptorChannel.py", line 295, in selectAttributes db_error(msg) File "/usr/lib/python2.2/site-packages/Modeling/logging.py", line 56, in log_stderr sys.stderr.write('%s\n'%msg) UnicodeError: ASCII encoding error: ordinal not in range(128) The unicode error is trigered by logging that tries to repport an error, probably a unicode error... > It's really seem that modeling doesn't take care about unicode. > (Read : Seb hasn't done so much test about unicode ) > > > I haven't done so much test but i think that stuff like enabling > LOG will generate a lot of Unicode traceback > > > there is "type(foo) in (type(''), type(u''))" or I could encode my > > query or who knows what. Since some RDMS (aka MySQL) choke on > > unicode, maybe it would be best to have every queries encoded in utf-8 > > but I prefer to have your opinion 1st. > > Another trick that might help you is that MySQL 4.0 support unicode > in query. but the MySQLDB don't by default. In fact you can pass > a special encoding charset at connection but you need to have > a latest version of the package ( I have to re-build this from source > on my debian since it isn't in the default unstable install) What I said earlier is that I manually utf-8 encode every thing the I *store* in the DB. I thought that I might be safe with queries but well, utf-8 encoding queries too is not that much work. =2D -- Yannick Gingras Byte Gardener, Savoir-faire Linux inc. (514) 276-5468 =2D----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.1 (GNU/Linux) iD8DBQE/FZP1rhy5Fqn/MRARAuAIAJ9auY2Kr0NgPBCma8l4UmVIU/ofhQCcCbdv qB80k8Vua8izSSILdmgY3L0=3D =3DUCaV =2D----END PGP SIGNATURE----- |
From: Yannick G. <yan...@sa...> - 2003-07-16 18:12:09
|
=2D----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On July 16, 2003 02:05 pm, Yannick Gingras wrote: > What I said earlier is that I manually utf-8 encode every thing the I > *store* in the DB. I thought that I might be safe with queries but > well, utf-8 encoding queries too is not that much work. Like this (since I make my qualifiers by hand, I have nice spot to trap unicode requests) : def makeMatchQual(self, key, matchType, matchPatern): if type(matchPatern) =3D=3D type(u""): matchPatern =3D matchPatern.encode("utf-8") =20 if matchType =3D=3D LK: return Qualifier.KeyValueQualifier(key , =20 Qualifier.QualifierOperatorLike, "*%s*" % matchPatern) [...] =2D --=20 Yannick Gingras Byte Gardener, Savoir-faire Linux inc. (514) 276-5468 =2D----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.1 (GNU/Linux) iD4DBQE/FZV2rhy5Fqn/MRARAowZAJ9zT4CtWCHmsfXzYXM6COCzmMON3QCVFlej r/fWHWx1Kedjga7JhkuvEw=3D=3D =3DQJ9u =2D----END PGP SIGNATURE----- |
From: Sebastien B. <sbi...@us...> - 2003-07-16 18:29:49
|
Yannick Gingras <yan...@sa...> writes: > Soif> Could you please give us a bigger traceback. > Soif> I talk w/ Sebastian last week about this. In fact i got some > Soif> trouble w/ mysql and unicode (not using modeling), and ask > Soif> him what he did to cover this issue in modeling. >=20 > sure ! >=20 [...] > File > "/usr/lib/python2.2/site-packages/Modeling/DatabaseAdaptors/AbstractDBAPI= 2AdaptorLayer/AbstractDBAPI2AdaptorChannel.py", > line 295, in selectAttributes > db_error(msg) > File "/usr/lib/python2.2/site-packages/Modeling/logging.py", line 56, in > log_stderr > sys.stderr.write('%s\n'%msg) > UnicodeError: ASCII encoding error: ordinal not in range(128) Okay, so Soif was right, database logging makes it fails (unsurprisingly I must admit). Could you tell what happens if you disable MDL_ENABLE_DATABASE_LOGGING? Does it fail and if yes, where? (traceback as well would be ok) [given that you disable the encoding in your makeMatchQual] > What I said earlier is that I manually utf-8 encode every thing the I > *store* in the DB. I thought that I might be safe with queries but > well, utf-8 encoding queries too is not that much work. Ok. Sorry I did not understand you right. So this is as simple as that, just encode the attributes' value and that's it? I must try that on of these days. -- S=E9bastien. |
From: Yannick G. <yan...@sa...> - 2003-07-16 19:14:30
|
=2D----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 > Okay, so Soif was right, database logging makes it fails (unsurprisingly > I must admit). > > Could you tell what happens if you disable MDL_ENABLE_DATABASE_LOGGING? > Does it fail and if yes, where? (traceback as well would be ok) > [given that you disable the encoding in your makeMatchQual] Sure ! [...] File "/usr/lib/python2.2/site-packages/Modeling/EditingContext.py", line 1304, in fetch return self.objectsWithFetchSpecification(fs) File "/usr/lib/python2.2/site-packages/Modeling/EditingContext.py", line 1218, in objectsWithFetchSpecification objects=3Dself.parentObjectStore().objectsWithFetchSpecification(fs, ec) File "/usr/lib/python2.2/site-packages/Modeling/ObjectStoreCoordinator.py= ", line 420, in objectsWithFetchSpecification return store.objectsWithFetchSpecification(aFetchSpecification, anEditingContext) File "/usr/lib/python2.2/site-packages/Modeling/DatabaseContext.py", line 1521, in objectsWithFetchSpecification anEditingContext) File "/usr/lib/python2.2/site-packages/Modeling/DatabaseChannel.py", line 381, in selectObjectsWithFetchSpecification entity) File "/usr/lib/python2.2/site-packages/Modeling/DatabaseAdaptors/AbstractDBAPI2A= daptorLayer/AbstractDBAPI2AdaptorChannel.py", line 288, in selectAttributes db_info('Evaluating: %s'%statement) File "/usr/lib/python2.2/site-packages/Modeling/logging.py", line 56, in log_stderr sys.stderr.write('%s\n'%msg) UnicodeError: ASCII encoding error: ordinal not in range(128) It sounds pretty much the same to me... > > What I said earlier is that I manually utf-8 encode every thing the I > > *store* in the DB. I thought that I might be safe with queries but > > well, utf-8 encoding queries too is not that much work. > > Ok. Sorry I did not understand you right. So this is as simple as that, > just encode the attributes' value and that's it? I must try that on of > these days. Indeed it does the job but as it was discussed some time ago on the mailling list, it does not enable case insensitive match. A case insentivice match with u"=E9=E9" encoded in utf-8 with look for "=C3=A9=C3= =A9", "=E3=A9=C3=A9", "=C3=A9=E3=A9" and "=E3=A9=E3=A9" wich does not make any se= ns once put back in unicode. "=E9=E9", "=C9=E9" and "=C9=C9" are respectivly "=C3=A9=C3=A9", "= =C3=C3=A9" and "=C3=C3" once encoded. So it may be wise to let the user make the utf-8 trick. That way he won't blame you for the weird result of case insensitive match. On the other hand, some databases like Postgresql detect encoding and perform a descent case insitive match with utf-8 data. =2D -- Yannick Gingras Byte Gardener, Savoir-faire Linux inc. (514) 276-5468 =2D----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.1 (GNU/Linux) iD8DBQE/FaQJrhy5Fqn/MRARAtstAJ0ak6KfychPQ6IXORlB+coVpN2dPwCeIljY ZFc9vy4sJZ8lIEOFVzWLvkE=3D =3DwIdl =2D----END PGP SIGNATURE----- |
From: Sebastien B. <sbi...@us...> - 2003-07-16 19:35:51
|
Yannick Gingras <yan...@sa...> wrote: > > Okay, so Soif was right, database logging makes it fails (unsurprisingly > > I must admit). > > > > Could you tell what happens if you disable MDL_ENABLE_DATABASE_LOGGING? > > Does it fail and if yes, where? (traceback as well would be ok) > > [given that you disable the encoding in your makeMatchQual] >=20 > Sure ! >=20 [...] > "/usr/lib/python2.2/site-packages/Modeling/DatabaseAdaptors/AbstractDBAPI= 2AdaptorLayer/AbstractDBAPI2AdaptorChannel.py", > line 288, in selectAttributes > db_info('Evaluating: %s'%statement) > File "/usr/lib/python2.2/site-packages/Modeling/logging.py", line 56, in > log_stderr > sys.stderr.write('%s\n'%msg) > UnicodeError: ASCII encoding error: ordinal not in range(128) >=20 > It sounds pretty much the same to me... Okay, sorry, didn't read it right. I suspect that you *enabled* it there, and that it was disabled before. Could you re-try this (without MDL_ENABLE_DATABASE_LOGGING) with: ------------------------------------------------------------------------ --- logging.py 20 Feb 2003 11:48:58 -0000 1.5 +++ logging.py 16 Jul 2003 19:28:49 -0000 @@ -53,6 +53,8 @@ import os, sys =20 def log_stderr(msg): + if type(msg) is type(u''): + msg=3Dmsg.encode('utf-8') sys.stderr.write('%s\n'%msg) no_log=3Dlambda msg, severity=3D0: None trace=3Ddebug=3Dinfo=3Dlog=3Dwarn=3Derror=3Dfatal=3Dno_log ------------------------------------------------------------------------ The first traceback you gave was referring to db_error(), and it would be interesting to see what kind of errors this was, wouldn't it ?-) > > Ok. Sorry I did not understand you right. So this is as simple as that, > > just encode the attributes' value and that's it? I must try that on of > > these days. >=20 > Indeed it does the job but as it was discussed some time ago on the > mailling list, it does not enable case insensitive match. A case > insentivice match with u"=E9=E9" encoded in utf-8 with look for "=C3=A9= =C3=A9", > "=E3=A9=C3=A9", "=C3=A9=E3=A9" and "=E3=A9=E3=A9" wich does not make any = sens once put back in > unicode. "=E9=E9", "=C9=E9" and "=C9=C9" are respectivly "=C3=A9=C3=A9",= "=C3=C3=A9" and "=C3=C3" > once encoded. >=20 > So it may be wise to let the user make the utf-8 trick. That way he > won't blame you for the weird result of case insensitive match. On > the other hand, some databases like Postgresql detect encoding and > perform a descent case insitive match with utf-8 data. This needs investigation. If some of you could provide working python code with unicode and psycopg/pypgsql/pgdb/mysqldb/sqlitedb, please share. I've not time for this now. However, speaking of case-insensitive match: if postgresql supports it, then it should work, since the SQL WHERE clause behind is UPPER(...) LIKE UPPER(...) --pure theory and not tested, so if someone feels like testing it, go ahead :) -- S=E9bastien. |
From: SoaF at H. <so...@la...> - 2003-07-16 21:48:15
|
Sebastien Bigaret wrote: > >This needs investigation. If some of you could provide working python >code with unicode and psycopg/pypgsql/pgdb/mysqldb/sqlitedb, please >share. I've not time for this now. > =20 > I don't have some working code since I resolve this by calling a=20 XMLUtils.unicode2Str() .. before doing the request .. :( >However, speaking of case-insensitive match: if postgresql supports it, >then it should work, since the SQL WHERE clause behind is UPPER(...) >LIKE UPPER(...) --pure theory and not tested, so if someone feels like >testing it, go ahead :) > > >-- S=E9bastien. > =20 > In fact I think this is a MySQL dependant since I don't manage to put=20 anything unicode in the database (even without modeling and with latest MySQLDB ) .=20 Something else be carefull that "=E9=E9" isn't a unicode only . So i think the DBA do s= ome=20 translation to utf-8 on the fly ( so it works) Bye Bye |
From: Mario R. <ma...@ru...> - 2003-07-17 13:17:51
|
On mercredi, juil 16, 2003, at 21:35 Europe/Amsterdam, Sebastien =20 Bigaret wrote: > Yannick Gingras <yan...@sa...> wrote: ... >> Indeed it does the job but as it was discussed some time ago on the >> mailling list, it does not enable case insensitive match. A case >> insentivice match with u"=E9=E9" encoded in utf-8 with look for = "=C3=A9=C3=A9", >> "=E3=A9=C3=A9", "=C3=A9=E3=A9" and "=E3=A9=E3=A9" wich does not make = any sens once put back in >> unicode. "=E9=E9", "=C9=E9" and "=C9=C9" are respectivly "=C3=A9=C3=A9= ", "=C3=C3=A9" and "=C3=C3" >> once encoded. >> >> So it may be wise to let the user make the utf-8 trick. That way he >> won't blame you for the weird result of case insensitive match. On >> the other hand, some databases like Postgresql detect encoding and >> perform a descent case insitive match with utf-8 data. > > This needs investigation. If some of you could provide working python > code with unicode and psycopg/pypgsql/pgdb/mysqldb/sqlitedb, please > share. I've not time for this now. The option I had adopted when I came across this problem was to work completely in utf-8, from front to back. [For web, this is easily done =20= by encoding the pages in utf-8, i.e. setting the response header: setHeader('Content-Type', 'text/html; charset=3Dutf-8').] Thus, the modeling layer always gets utf-8, which is exchanged with db as is. This, however, means that case insensitive searches do not work -- if you case-insensitive search for =E9 or =C9, you will only get one or the other. This is not nice, but can live with it, at least for now. But, it is possible to work in unicode on the client side, and Postgres allows unicode queries, i.e. the sql query itself is in unciode. The data in the DB being in utf-8, I would expect that u"select * from sometable where upper(someprop) like upper('%=E9%')" would give all rows where someprop contains =E9 or =C9. But, no, does not seem to work... at least I have not figured it out yet. If anyone wants to play, I have tried to understand this with the code below (working without modeling): ''' Assume an i18nText table, that contains at least the 2 rows: en fr ... key cl=E9 KEY CL=C9 ''' # dbname =3D dbuser=3D dbpass =3D # from pyPgSQL import PgSQL con =3D =20 PgSQL.connect(database=3Ddbname,user=3Ddbuser,password=3Ddbpass,client_enc= odin=20 g=3D('utf-8','replace'),unicode_results=3D1) cur =3D con.cursor() cur.execute('SET CLIENT_ENCODING TO UNICODE') cur.execute(u"SELECT fr FROM i18nText WHERE en =3D 'key' ") _dbrset =3D cur.fetchall() match_on =3D _dbrset[0][0] # evaluates to unicode string "cl=E9" cur.execute(u"SELECT * FROM i18nText WHERE upper(fr) LIKE upper('%" =20 +match_on+ "%') " ) dbrset =3D cur.fetchall() cur.close() # However, this only returns the row for fr=3D'cl=E9'. If i change to match on en=3D'KEY', the uppercase row for fr=3D'CL=C9' = is =20 returned. Is this the behaviour that should I should expect? How should the upper function behave on unicode strings? (Note that I have sys.setdefaultencoding('utf-8') in my =20 sitecustomize.py, and I am not sure about all effects that this does. As for any other special settings on the DB, i do not remember any.) mario |
From: Sebastien B. <sbi...@us...> - 2003-07-24 11:58:17
|
Hi, Federico Di Gregorio, maintainer of psycopg, asked today for suggestions <<about unicode strings in postgresql and client conversions>>. http://lists.initd.org/pipermail/psycopg/2003-July/002171.html Those of you using unicode w/ psycopg may want to follow/participate to the discussion there. -- S=E9bastien. |
From: Yannick G. <yan...@sa...> - 2003-07-16 19:52:08
|
=2D----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On July 16, 2003 03:35 pm, Sebastien Bigaret wrote: > [...] > The first traceback you gave was referring to db_error(), and it would > be interesting to see what kind of errors this was, wouldn't it ?-) Sure ! Couldn't evaluate expression SELECT t0.gl_id, t0.account, t0.control_acct, t0.uom_id, t0.acct_type, t0.is_active, t0.gvt_code FROM GL t0 WHERE (t0.is_active <> -255 AND t0.account LIKE '%=C3=A9=C3=A9%' AND t0.gvt_code = LIKE '%%'). Reason: exceptions.UnicodeError:ASCII encoding error: ordinal not in range(128) Traceback (most recent call last): File "belugaerp/modules/gl/GLModule.py", line 302, in ? mod.getGLAccountWithSpec( spec ) File "/home/ygingras/BelugaERP/belugaerp/modules/gl/GLAccountManager.py", line 62, in getGLAccountWithSpec recs =3D self.getRecsWithSpec(recSpec) File "/home/ygingras/BelugaERP/belugaerp/modules/gl/I18NedManager.py", li= ne 113, in getRecsWithSpec recs =3D self._mainManager.getRecsWithSpec(recSpec) File "/home/ygingras/BelugaERP/belugaerp/modules/gl/SimpleManager.py", li= ne 104, in getRecsWithSpec recs =3D self._mm.fetch(self._tableName, qual) File "/home/ygingras/BelugaERP/belugaerp/modules/gl/ModelManager.py", line 55, in fetch self.__ec.fetch(entName, qualifier, rawRow=3Draw) ) File "/usr/lib/python2.2/site-packages/Modeling/EditingContext.py", line 1304, in fetch return self.objectsWithFetchSpecification(fs) File "/usr/lib/python2.2/site-packages/Modeling/EditingContext.py", line 1218, in objectsWithFetchSpecification objects=3Dself.parentObjectStore().objectsWithFetchSpecification(fs, ec) File "/usr/lib/python2.2/site-packages/Modeling/ObjectStoreCoordinator.py= ", line 420, in objectsWithFetchSpecification return store.objectsWithFetchSpecification(aFetchSpecification, anEditingContext) File "/usr/lib/python2.2/site-packages/Modeling/DatabaseContext.py", line 1521, in objectsWithFetchSpecification anEditingContext) File "/usr/lib/python2.2/site-packages/Modeling/DatabaseChannel.py", line 381, in selectObjectsWithFetchSpecification entity) File "/usr/lib/python2.2/site-packages/Modeling/DatabaseAdaptors/AbstractDBAPI2A= daptorLayer/AbstractDBAPI2AdaptorChannel.py", line 297, in selectAttributes raise GeneralAdaptorException, msg Modeling.Adaptor.GeneralAdaptorException > > Indeed it does the job but as it was discussed some time ago on the > > mailling list, it does not enable case insensitive match. A case > > insentivice match with u"=E9=E9" encoded in utf-8 with look for "=C3=A9= =C3=A9", > > "=E3=A9=C3=A9", "=C3=A9=E3=A9" and "=E3=A9=E3=A9" wich does not make an= y sens once put back in > > unicode. "=E9=E9", "=C9=E9" and "=C9=C9" are respectivly "=C3=A9=C3=A9= ", "=C3=C3=A9" and "=C3=C3" > > once encoded. > > > > So it may be wise to let the user make the utf-8 trick. That way he > > won't blame you for the weird result of case insensitive match. On > > the other hand, some databases like Postgresql detect encoding and > > perform a descent case insitive match with utf-8 data. > > This needs investigation. If some of you could provide working python > code with unicode and psycopg/pypgsql/pgdb/mysqldb/sqlitedb, please > share. I've not time for this now. I'll see what I can do but unicode support is not with MySQL 4.0, it's with 4.1 which is still alpha... http://www.mysql.com/doc/en/Nutshell_4.1_features.html > However, speaking of case-insensitive match: if postgresql supports it, > then it should work, since the SQL WHERE clause behind is UPPER(...) > LIKE UPPER(...) --pure theory and not tested, so if someone feels like > testing it, go ahead :) No Postgresql here to make tests but I'd like to here from people who try this. =2D -- Yannick Gingras Byte Gardener, Savoir-faire Linux inc. (514) 276-5468 =2D----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.1 (GNU/Linux) iD8DBQE/FazVrhy5Fqn/MRARAqAUAJ0SwSixlHmhhuOErxYXDuZysIIGzACeL1Of 2BNxeUHeD5Alby920C6agcY=3D =3DP81O =2D----END PGP SIGNATURE----- |
From: Sebastien B. <sbi...@us...> - 2003-07-16 21:46:56
|
Yannick Gingras <yan...@sa...> writes: > Couldn't evaluate expression SELECT t0.gl_id, t0.account, t0.control_acct, > t0.uom_id, t0.acct_type, t0.is_active, t0.gvt_code FROM GL t0 WHERE > (t0.is_active <> -255 AND t0.account LIKE '%=C3=A9=C3=A9%' AND t0.gvt_cod= e LIKE > '%%'). Reason: exceptions.UnicodeError:ASCII encoding error: ordinal not = in > range(128) > Traceback (most recent call last): > File "/usr/lib/python2.2/site-packages/Modeling/DatabaseChannel.py", li= ne > 381, in selectObjectsWithFetchSpecification > entity) > File > "/usr/lib/python2.2/site-packages/Modeling/DatabaseAdaptors/AbstractDBAPI= 2AdaptorLayer/AbstractDBAPI2AdaptorChannel.py", > line 297, in selectAttributes > raise GeneralAdaptorException, msg > Modeling.Adaptor.GeneralAdaptorException >=20 [...] Okay, that comes from the database then, if we're at this point. Data was encoded w/ utf-8 before being sent. The fact is the framework is definitely not ready to accept raw unicode strings (the validation fails, then if we solve this SQLExpression fails to build the sql statement, etc.). Before I appear too stupid when trying things: UTF-8 is just a specific way of encoding unicode in eight bits, right, i.e. one character is translated to a serie of one to many characters, right? It has nothing to do w/ ISO-8859-1, ISO-8859-15 etc. which are just correspondance table between a number coded in one byte and a given character, right? > I'll see what I can do but unicode support is not with MySQL 4.0, it's wi= th > 4.1 which is still alpha... >=20 > http://www.mysql.com/doc/en/Nutshell_4.1_features.html mysql is known to be quite active, do you know when this will become the production release? Postgresql supports unicode/utf-8:=20 http://developer.postgresql.org/docs/postgres/multibyte.html SQLite supports it also (w/ some precautions for ORDER BY, LIKE, etc.): http://groups.yahoo.com/group/sqlite/message/1675 These links are for the archives. I /think/ I begin to see the light in this "mess" :) -- S=E9bastien. |
From: Yannick G. <ygi...@yg...> - 2003-07-16 22:05:27
|
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Wednesday 16 July 2003 17:46, you wrote: > Before I appear too stupid when trying things: UTF-8 is just a > specific way of encoding unicode in eight bits, right, i.e. one > character is translated to a serie of one to many characters, right? > It has nothing to do w/ ISO-8859-1, ISO-8859-15 etc. which are just > correspondance table between a number coded in one byte and a given > character, right? Yup, ISO-8859-X are a one byte encoding with a limited subset of characters where UTF-8 put a unicode char on one or more 8 bits byte (hence the 8 in it's name). A nice property of UTF-8 is that the 1st 127 chars are encoded one a single char and keep the same char code as it's ASCII equivalent. So a pure ASCII string (No ISO-8859-X here) is 100% valid UTF-8. UTF-8 is not the most efficient encoding for unicode but this particular compatibility makes it the most wide spread. UTF-8 is NOT compatible with Latin-1 (aka ISO-8859-1). Most Latin-1 chars with char code over 127 are encore on 2 bytes in UTF-8. Why UTF-8 and not pure unicode ? Because everything is damn too buggy to handle it ! The C++ coder will in fact want to die converting all those std::string declarations and the C coder simply can't use pointer arithmetic anymore (the width is not fixed). Luckily we use Python ! : D - -- Yannick Gingras Coder for OBB : Offside Bumptious Bastnaesite http://OpenBeatBox.org -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.0.7 (GNU/Linux) iD8DBQE/FVdJrhy5Fqn/MRARAi6fAJ43CankWZ3TxHzm+Tymmi0cEL2gtACcDcFE BRa45X6SPEGU4Y1RaSWRBWk= =BBLu -----END PGP SIGNATURE----- |