Thread: [SQLObject] still an unicode problem
SQLObject is a Python ORM.
Brought to you by:
ianbicking,
phd
From: sophana <so...@zi...> - 2006-09-26 21:01:32
|
Hi I thought I had resolved my unicode problems but there is still a bug related to mysqldb. I connect sqlobject 0.7.1 with ?use_unicode=1&sqlobject_encoding=utf8 when I put french accents in an unicode string and store it to sqlobject unicode col, it works. On my website also, pages are in utf8. However, I got an error message like this: File "/usr/lib/python2.4/site-packages/SQLObject-0.7.1dev_r1926-py2.4.egg/sqlobject/main.py", line 1143, in set [edit] <https://secure.worldspot.net/wk/Admin/EditFile?filename=/usr/lib/python2.4/site-packages/SQLObject-0.7.1dev_r1926-py2.4.egg/sqlobject/main.py&line=1143> self._connection._SO_update(self, args) File "/usr/lib/python2.4/site-packages/SQLObject-0.7.1dev_r1926-py2.4.egg/sqlobject/dbconnection.py", line 567, in _SO_update [edit] <https://secure.worldspot.net/wk/Admin/EditFile?filename=/usr/lib/python2.4/site-packages/SQLObject-0.7.1dev_r1926-py2.4.egg/sqlobject/dbconnection.py&line=567> self.query("UPDATE %s SET %s WHERE %s = (%s)" % File "/usr/lib/python2.4/site-packages/SQLObject-0.7.1dev_r1926-py2.4.egg/sqlobject/dbconnection.py", line 307, in query [edit] <https://secure.worldspot.net/wk/Admin/EditFile?filename=/usr/lib/python2.4/site-packages/SQLObject-0.7.1dev_r1926-py2.4.egg/sqlobject/dbconnection.py&line=307> return self._runWithConnection(self._query, s) File "/usr/lib/python2.4/site-packages/SQLObject-0.7.1dev_r1926-py2.4.egg/sqlobject/dbconnection.py", line 221, in _runWithConnection [edit] <https://secure.worldspot.net/wk/Admin/EditFile?filename=/usr/lib/python2.4/site-packages/SQLObject-0.7.1dev_r1926-py2.4.egg/sqlobject/dbconnection.py&line=221> val = meth(conn, *args) File "/usr/lib/python2.4/site-packages/SQLObject-0.7.1dev_r1926-py2.4.egg/sqlobject/dbconnection.py", line 304, in _query [edit] <https://secure.worldspot.net/wk/Admin/EditFile?filename=/usr/lib/python2.4/site-packages/SQLObject-0.7.1dev_r1926-py2.4.egg/sqlobject/dbconnection.py&line=304> self._executeRetry(conn, conn.cursor(), s) File "/usr/lib/python2.4/site-packages/SQLObject-0.7.1dev_r1926-py2.4.egg/sqlobject/mysql/mysqlconnection.py", line 74, in _executeRetry [edit] <https://secure.worldspot.net/wk/Admin/EditFile?filename=/usr/lib/python2.4/site-packages/SQLObject-0.7.1dev_r1926-py2.4.egg/sqlobject/mysql/mysqlconnection.py&line=74> return cursor.execute(myquery) File "build/bdist.linux-i686/egg/MySQLdb/cursors.py", line 149, in execute [edit] <https://secure.worldspot.net/wk/Admin/EditFile?filename=/var/lib/webkit/prod/build/bdist.linux-i686/egg/MySQLdb/cursors.py&line=149> UnicodeEncodeError: 'latin-1' codec can't encode character u'\ufffd' in position 722: ordinal not in range(256) I could reproduce it on the command line and did a traceback. I can see sqlobject making the _query with a simple string that is utf8 encoded (I can encode it back to utf8 unicode string and can see the u'\ufffd' character displayed.) I can't see why mysqldb reencode the string into latin-1 which does not support this character. Unfortunately I don't have mysqldb source to investigate farther. If someone has a workaround, I'd be strongly interested. NOTE: I still have the mysql5 server gone message with the new mysqldb 1.2.2 (unpatched) |
From: Oleg B. <ph...@ph...> - 2006-09-27 10:03:35
|
On Tue, Sep 26, 2006 at 11:07:41PM +0200, sophana wrote: > I connect sqlobject 0.7.1 with ?use_unicode=1&sqlobject_encoding=utf8 Try to add "&charset=utf-8". This is the encoding MySQLdb uses. Oleg. -- Oleg Broytmann http://phd.pp.ru/ ph...@ph... Programmers don't die, they just GOSUB without RETURN. |
From: Markus G. <m.g...@gm...> - 2006-09-27 11:10:49
|
On 9/27/06, Oleg Broytmann <ph...@ph...> wrote: > On Tue, Sep 26, 2006 at 11:07:41PM +0200, sophana wrote: > > I connect sqlobject 0.7.1 with ?use_unicode=1&sqlobject_encoding=utf8 > > Try to add "&charset=utf-8". This is the encoding MySQLdb uses. IMO MySQL requires the name of the encoding to be without the dash: &charset=utf8 |
From: Oleg B. <ph...@ph...> - 2006-09-27 12:02:46
|
On Wed, Sep 27, 2006 at 01:10:42PM +0200, Markus Gritsch wrote: > On 9/27/06, Oleg Broytmann <ph...@ph...> wrote: > >On Tue, Sep 26, 2006 at 11:07:41PM +0200, sophana wrote: > >> I connect sqlobject 0.7.1 with ?use_unicode=1&sqlobject_encoding=utf8 > > > > Try to add "&charset=utf-8". This is the encoding MySQLdb uses. > > IMO MySQL requires the name of the encoding to be without the dash: > > &charset=utf8 BTW, is it the same as client_encoding in the trunk? Should I merge "charset" and "client_encoding" into one parameter? Oleg. -- Oleg Broytmann http://phd.pp.ru/ ph...@ph... Programmers don't die, they just GOSUB without RETURN. |
From: sophana <so...@zi...> - 2006-09-27 12:01:12
|
Oleg Broytmann a =E9crit : > On Tue, Sep 26, 2006 at 11:07:41PM +0200, sophana wrote: > =20 >> I connect sqlobject 0.7.1 with ?use_unicode=3D1&sqlobject_encoding=3Du= tf8 >> =20 > > Try to add "&charset=3Dutf-8". This is the encoding MySQLdb uses. > > Oleg. > =20 I will try this thanks. But shouldn't this be automatic? I mean that the existance of sqlobject_encoding implies charset to the same value? Why would someone put different values in it? Another point is that I don't really need unicode strings. These unicode characters come from web forms in utf-8 standard strings. Why does sqlobject automatically reencode these strings into ascii? Even with unicode strings, I saw that the mysqldb query is made with a standard string. All these successive encode/decode/ and reencode is not very efficient in terms of cpu... I definitely don't understand all the logic behind this. |
From: Oleg B. <ph...@ph...> - 2006-09-27 14:10:04
|
On Wed, Sep 27, 2006 at 03:52:20PM +0200, Markus Gritsch wrote: > > BTW, is it the same as client_encoding in the trunk? Should I merge > >"charset" and "client_encoding" into one parameter? > > Yes. Thenk you for the help! Oleg. -- Oleg Broytmann http://phd.pp.ru/ ph...@ph... Programmers don't die, they just GOSUB without RETURN. |
From: sophana <so...@zi...> - 2006-09-27 21:16:40
|
Oleg Broytmann a =E9crit : > If you don't need unicode - don't use it. Declare an encoding for yo= ur > database (utf-8, of course) and use it everywhere. > =20 I would like to, but I can't because even standard strings are encoded into ascii in StringCol. > It is MySQLdb that insists on converting from/to unicode. SQLObject > internally uses strings, not unicode. > > =20 In the pdb session I saw that the mysqldb query was made with a standard = string even with an unicode col and use-unicode option. Oleg Broytmann a =E9crit : >> And what about removing the .encode('ascii') in the StringCol? >> =20 > > It will be replaced by .encode(charset), of course. > > Oleg. > =20 Why not just removing the .encode for StringCol and leave the encoding problem to the backend? It seems to me that it is just useless... |
From: Oleg B. <ph...@ph...> - 2006-09-27 21:37:21
|
On Wed, Sep 27, 2006 at 11:16:21PM +0200, sophana wrote: > Oleg Broytmann a ?crit : > > If you don't need unicode - don't use it. Declare an encoding for your > > database (utf-8, of course) and use it everywhere. > > > I would like to, but I can't because even standard strings are encoded > into ascii in StringCol. Only because MySQLdb returns unicode instead of a string, and because I didn't know a way to get encoding from MySQLdb. Now, when we agree to use charset and sqlobject_encoding, it'd be easy to settle. > > It is MySQLdb that insists on converting from/to unicode. SQLObject > > internally uses strings, not unicode. > > > In the pdb session I saw that the mysqldb query was made with a standard > string even with an unicode col and use-unicode option. SQLObject internally uses only strings, not unicode. > Why not just removing the .encode for StringCol and leave the encoding > problem to the backend? > It seems to me that it is just useless... It is there to convert after MySQLdb. MySQLdb returns unicode, and I have to convert it to a string. Oleg. -- Oleg Broytmann http://phd.pp.ru/ ph...@ph... Programmers don't die, they just GOSUB without RETURN. |
From: sophana <so...@zi...> - 2006-09-29 11:34:14
|
Oleg Broytmann a =E9crit : > On Tue, Sep 26, 2006 at 11:07:41PM +0200, sophana wrote: > =20 >> I connect sqlobject 0.7.1 with ?use_unicode=3D1&sqlobject_encoding=3Du= tf8 >> =20 > > Try to add "&charset=3Dutf-8". This is the encoding MySQLdb uses. > > Oleg. > =20 I tried to add charset=3Dutf8 (or utf-8 don't remember). I have the same error with latin1 encoding. |
From: Oleg B. <ph...@ph...> - 2006-09-30 13:38:26
|
Hello! On Wed, Sep 27, 2006 at 02:11:48PM +0200, sophana wrote: > Oleg Broytmann a ?crit : > > BTW, is it the same as client_encoding in the trunk? Should I merge > > "charset" and "client_encoding" into one parameter? > > > I think so. > And what about removing the .encode('ascii') in the StringCol? The "charset" parameter is now stored as dbEncoding in the connection. The value is used instead of "ascii", but "ascii" is still the default if there is no "charset" and for other databases. In the trunk I also merged "client_encoding" with "charset" - there is no "client_encoding" now, only "charset". I backported "SET NAMES" query to th 0.7-branch. Revision 1961 in the 0.7-bugfix branch, 1962 in the trunk. Please test and report if all this help, and to what extent. Oleg. -- Oleg Broytmann http://phd.pp.ru/ ph...@ph... Programmers don't die, they just GOSUB without RETURN. |
From: Oleg B. <ph...@ph...> - 2006-09-30 16:00:21
|
On Sat, Sep 30, 2006 at 05:19:54PM +0200, Markus Gritsch wrote: > As I already said, IMO it is not > necessary to do this SQL query in SQLObject, because it is already > performed by MySQLdb in case the MySQL database itself is too old to > know about setting the charset by an API call, which is tried first. That is, your advice is to remove the query altogether from the trunk? Oleg. -- Oleg Broytmann http://phd.pp.ru/ ph...@ph... Programmers don't die, they just GOSUB without RETURN. |
From: Oleg B. <ph...@ph...> - 2006-09-30 17:02:57
|
On Sat, Sep 30, 2006 at 06:25:21PM +0200, Markus Gritsch wrote: > On 9/30/06, Oleg Broytmann <ph...@ph...> wrote: > >On Sat, Sep 30, 2006 at 05:19:54PM +0200, Markus Gritsch wrote: > >> As I already said, IMO it is not > >> necessary to do this SQL query in SQLObject, because it is already > >> performed by MySQLdb in case the MySQL database itself is too old to > >> know about setting the charset by an API call, which is tried first. > > > > That is, your advice is to remove the query altogether from the trunk? > > Yes, and also from the 0.7 bugfix branch. What do other people say? I remember the query helped someone not so long ago... Oleg. -- Oleg Broytmann http://phd.pp.ru/ ph...@ph... Programmers don't die, they just GOSUB without RETURN. |
From: Oleg B. <ph...@ph...> - 2006-09-30 17:34:01
|
On Sat, Sep 30, 2006 at 07:29:26PM +0200, Markus Gritsch wrote: > Have you looked at the code of the MySQLdb conntector I was pointing I have. > to? The query IS performed, by the MySQLdb connector. The additional > query in SQLObject could only have been helpful if the charset > parameter was not used by the person. I see now. There was no "charset", I suppose. Oleg. -- Oleg Broytmann http://phd.pp.ru/ ph...@ph... Programmers don't die, they just GOSUB without RETURN. |
From: Oleg B. <ph...@ph...> - 2006-10-02 13:50:21
|
On Sat, Sep 30, 2006 at 06:25:21PM +0200, Markus Gritsch wrote: > On 9/30/06, Oleg Broytmann <ph...@ph...> wrote: > >On Sat, Sep 30, 2006 at 05:19:54PM +0200, Markus Gritsch wrote: > >> As I already said, IMO it is not > >> necessary to do this SQL query in SQLObject, because it is already > >> performed by MySQLdb in case the MySQL database itself is too old to > >> know about setting the charset by an API call, which is tried first. > > > > That is, your advice is to remove the query altogether from the trunk? > > Yes, and also from the 0.7 bugfix branch. Done, in the revisions 1963-1965 (one additional revision for documentation update in the trunk.) Thank you! Oleg. -- Oleg Broytmann http://phd.pp.ru/ ph...@ph... Programmers don't die, they just GOSUB without RETURN. |
From: Oleg B. <ph...@ph...> - 2006-09-27 12:42:56
|
On Wed, Sep 27, 2006 at 02:01:02PM +0200, sophana wrote: > But shouldn't this be automatic? > I mean that the existance of sqlobject_encoding implies charset to the > same value? > Why would someone put different values in it? I don't know. I don't use MySQL. It is the users who use it should tell me their use cases. > Another point is that I don't really need unicode strings. These unicode > characters come from web forms in utf-8 standard strings. > Why does sqlobject automatically reencode these strings into ascii? [skip] > I definitely don't understand all the logic behind this. If you don't need unicode - don't use it. Declare an encoding for your database (utf-8, of course) and use it everywhere. It is MySQLdb that insists on converting from/to unicode. SQLObject internally uses strings, not unicode. Oleg. -- Oleg Broytmann http://phd.pp.ru/ ph...@ph... Programmers don't die, they just GOSUB without RETURN. |