Menu

problems with characterset / utf8

Help
2005-05-11
2012-09-19
  • robert rottermann

    I get an error when I try execute an select query that contains non ascii characters as follows:

    query = u"select id,ad_vorname,ad_name,ad_unternehmen_institution from tblTeve where ad_vorname like 'Ruedi' and ad_name like 'Moster' and ad_unternehmen_institution like 'Amt fr Information'"

    ...
    r = self._cursor.execute( query )
    File "/usr/lib/python2.3/site-packages/MySQLdb/cursors.py", line 137, in execute
    self.errorhandler(self, exc, value)
    File "/usr/lib/python2.3/site-packages/MySQLdb/connections.py", line 33, in defaulterrorhandler
    raise errorclass, errorvalue
    UnicodeEncodeError: 'ascii' codec can't encode character u'\xfc' in position 167: ordinal not in range(128)

    the only way I can fiy this, is by changing pythons defaul char set from ascii to utf8

    I tried setting it when creating the connection using the follwing config file:
    [client]
    user="root"
    database=test_energie

    default-character-set=utf8

    when I execute show variables I get the folowing values:

    | character_set_client | latin1 |
    | character_set_connection | latin1 |
    | character_set_database | latin1 |
    | character_set_results | latin1 |
    | character_set_server | latin1 |
    | character_set_system | utf8 |
    | character_sets_dir | /usr/share/mysql/charsets/ |
    | collation_connection | latin1_swedish_ci |
    | collation_database | latin1_swedish_ci |
    | collation_server | latin1_swedish_ci |

    I would be very gratefull for tips how to fix this situation.

    thanks
    Robert

     
    • Andy Dustman

      Andy Dustman - 2005-05-12

      1.2.1c3 should have fixed this problem.

       
    • Markus Gebert

      Markus Gebert - 2005-10-04

      I'm using 1.2.1c3 and this particular problem is indeed fixed. But IMO there's a new one:

      If u call the cursor's execute method with an unicode object as query, and with one or more unicode objects in the args dict which have non-ascii chars, the following is doomed to fail:

      r = self._query(query % self.connection.literal(args))

      literal(args) will encode the unicode objects in the args dict to whatever charset the database connections has (in my case utf8). python will then do the %-formatting which must result in an unicode object, because the left side of % is unicode, but to do so it needs to convert back the utf8 args to unicode, which won't work, cause python will assume they are ascii (sys.getdefaultencoding()) and fail with an UnicodeDecodeError.

      My solution would be (in execute()):

      If query is an unicode object, convert it to the database connection charset before doing the string formatting. The formatting would then only involve strings and no unicode objects and therefore result in a string, containing query and args in the charset of the database connection. This would make the encoding in _do_query() obsolete, I guess.

      Thoughts?

       
    • Andy Dustman

      Andy Dustman - 2005-05-11

      What version are you using?

      I've worried that there might be a bug in the way unicode queries are handled, and you might have confirmed it. Go into MySQLdb/cursors.py, find the _do_query() method, and change:

          db.query(q)
      

      to:

          db.query(q.encode(db.charset))
      

      If that fixes your problem, then this is the actual bug: When you pass in unicode values as parameters, they are correctly converted to strings with the right encoding; see MySQLdb.connections.Connection.init() for how this happens (local unicode_literal() function). However, you are building your own literal query with some non-ASCII characters (), so these aren't encoded properly.

      Also, you can change your code to do this instead and it should work:

      query = "select id,ad_vorname,ad_name,ad_unternehmen_institution from tblTeve where ad_vorname like %s and ad_name like %s and ad_unternehmen_institution like %s"

      ...
      r = self._cursor.execute( query, (u'Ruedi', u'Moster', u'Amt fr Information' )

      Your query variable, in this case, does not have to be a unicode string. However it will probably work as a unicode string. The important thing is that the above example allows MySQLdb to encode the parameters.

      I think the way I should fix this is have _do_query() perform a q.encode(db.charset) only if q is unicode, to avoid unnecessary encoding.

       

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.