I get an error when I try execute an select query that contains non ascii characters as follows:
query = u"select id,ad_vorname,ad_name,ad_unternehmen_institution from tblTeve where ad_vorname like 'Ruedi' and ad_name like 'Moster' and ad_unternehmen_institution like 'Amt fr Information'"
...
r = self._cursor.execute( query )
File "/usr/lib/python2.3/site-packages/MySQLdb/cursors.py", line 137, in execute
self.errorhandler(self, exc, value)
File "/usr/lib/python2.3/site-packages/MySQLdb/connections.py", line 33, in defaulterrorhandler
raise errorclass, errorvalue
UnicodeEncodeError: 'ascii' codec can't encode character u'\xfc' in position 167: ordinal not in range(128)
the only way I can fiy this, is by changing pythons defaul char set from ascii to utf8
I tried setting it when creating the connection using the follwing config file: [client]
user="root"
database=test_energie
default-character-set=utf8
when I execute show variables I get the folowing values:
I'm using 1.2.1c3 and this particular problem is indeed fixed. But IMO there's a new one:
If u call the cursor's execute method with an unicode object as query, and with one or more unicode objects in the args dict which have non-ascii chars, the following is doomed to fail:
r = self._query(query % self.connection.literal(args))
literal(args) will encode the unicode objects in the args dict to whatever charset the database connections has (in my case utf8). python will then do the %-formatting which must result in an unicode object, because the left side of % is unicode, but to do so it needs to convert back the utf8 args to unicode, which won't work, cause python will assume they are ascii (sys.getdefaultencoding()) and fail with an UnicodeDecodeError.
My solution would be (in execute()):
If query is an unicode object, convert it to the database connection charset before doing the string formatting. The formatting would then only involve strings and no unicode objects and therefore result in a string, containing query and args in the charset of the database connection. This would make the encoding in _do_query() obsolete, I guess.
Thoughts?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I've worried that there might be a bug in the way unicode queries are handled, and you might have confirmed it. Go into MySQLdb/cursors.py, find the _do_query() method, and change:
db.query(q)
to:
db.query(q.encode(db.charset))
If that fixes your problem, then this is the actual bug: When you pass in unicode values as parameters, they are correctly converted to strings with the right encoding; see MySQLdb.connections.Connection.init() for how this happens (local unicode_literal() function). However, you are building your own literal query with some non-ASCII characters (), so these aren't encoded properly.
Also, you can change your code to do this instead and it should work:
query = "select id,ad_vorname,ad_name,ad_unternehmen_institution from tblTeve where ad_vorname like %s and ad_name like %s and ad_unternehmen_institution like %s"
Your query variable, in this case, does not have to be a unicode string. However it will probably work as a unicode string. The important thing is that the above example allows MySQLdb to encode the parameters.
I think the way I should fix this is have _do_query() perform a q.encode(db.charset) only if q is unicode, to avoid unnecessary encoding.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I get an error when I try execute an select query that contains non ascii characters as follows:
query = u"select id,ad_vorname,ad_name,ad_unternehmen_institution from tblTeve where ad_vorname like 'Ruedi' and ad_name like 'Moster' and ad_unternehmen_institution like 'Amt fr Information'"
...
r = self._cursor.execute( query )
File "/usr/lib/python2.3/site-packages/MySQLdb/cursors.py", line 137, in execute
self.errorhandler(self, exc, value)
File "/usr/lib/python2.3/site-packages/MySQLdb/connections.py", line 33, in defaulterrorhandler
raise errorclass, errorvalue
UnicodeEncodeError: 'ascii' codec can't encode character u'\xfc' in position 167: ordinal not in range(128)
the only way I can fiy this, is by changing pythons defaul char set from ascii to utf8
I tried setting it when creating the connection using the follwing config file:
[client]
user="root"
database=test_energie
default-character-set=utf8
when I execute show variables I get the folowing values:
| character_set_client | latin1 |
| character_set_connection | latin1 |
| character_set_database | latin1 |
| character_set_results | latin1 |
| character_set_server | latin1 |
| character_set_system | utf8 |
| character_sets_dir | /usr/share/mysql/charsets/ |
| collation_connection | latin1_swedish_ci |
| collation_database | latin1_swedish_ci |
| collation_server | latin1_swedish_ci |
I would be very gratefull for tips how to fix this situation.
thanks
Robert
1.2.1c3 should have fixed this problem.
I'm using 1.2.1c3 and this particular problem is indeed fixed. But IMO there's a new one:
If u call the cursor's execute method with an unicode object as query, and with one or more unicode objects in the args dict which have non-ascii chars, the following is doomed to fail:
r = self._query(query % self.connection.literal(args))
literal(args) will encode the unicode objects in the args dict to whatever charset the database connections has (in my case utf8). python will then do the %-formatting which must result in an unicode object, because the left side of % is unicode, but to do so it needs to convert back the utf8 args to unicode, which won't work, cause python will assume they are ascii (sys.getdefaultencoding()) and fail with an UnicodeDecodeError.
My solution would be (in execute()):
If query is an unicode object, convert it to the database connection charset before doing the string formatting. The formatting would then only involve strings and no unicode objects and therefore result in a string, containing query and args in the charset of the database connection. This would make the encoding in _do_query() obsolete, I guess.
Thoughts?
What version are you using?
I've worried that there might be a bug in the way unicode queries are handled, and you might have confirmed it. Go into MySQLdb/cursors.py, find the _do_query() method, and change:
to:
If that fixes your problem, then this is the actual bug: When you pass in unicode values as parameters, they are correctly converted to strings with the right encoding; see MySQLdb.connections.Connection.init() for how this happens (local unicode_literal() function). However, you are building your own literal query with some non-ASCII characters (), so these aren't encoded properly.
Also, you can change your code to do this instead and it should work:
query = "select id,ad_vorname,ad_name,ad_unternehmen_institution from tblTeve where ad_vorname like %s and ad_name like %s and ad_unternehmen_institution like %s"
...
r = self._cursor.execute( query, (u'Ruedi', u'Moster', u'Amt fr Information' )
Your query variable, in this case, does not have to be a unicode string. However it will probably work as a unicode string. The important thing is that the above example allows MySQLdb to encode the parameters.
I think the way I should fix this is have _do_query() perform a q.encode(db.charset) only if q is unicode, to avoid unnecessary encoding.