Menu

MySQLdb crashing when given unicode data

Help
Brad Smith
2006-11-09
2012-09-19
  • Brad Smith

    Brad Smith - 2006-11-09

    Hello, I've been having a heckuva time getting my Fedora Tracker project, which uses MySQLdb for a back-end, to play nicely with some unicode data that has crept into one of the rpm descriptions it is supposed to index. I've established that the problem is that MySQLdb is using 'ascii' as the default charset, which can't handle unicode, but here's the kicker: similar to some other problems I've seen reported here, changing the character set doesn't seem to work!

    Here is an illustration:

    Note the "Intel® PRO..." in the description.

    >>> q = "INSERT INTO package_fedora_6 SET name = 'ipw2100-kmdl-2.6.18-1.2798.fc6', version = '1.2.1', release = '44.fc6.at', url = 'http://ipw2100.sourceforge.net/', dlurl = 'http://dl.atrpms.net/fc6-x86_64/atrpms/testing/ipw2100-kmdl-2.6.18-1.2798.fc6-1.2.1-44.fc6.at.x86_64.rpm', description = 'This package contains kernel drivers for the Intel® PRO/Wireless 2100.\n\n\nThis package contains the ipw2100-kmdl-2.6.18-1.2798.fc6 kernel modules for the Linux kernel package:\nkernel-2.6.18-1.2798.fc6.x86_64.rpm.', rpmgroup = 'System Environment/Kernel', vendor = 'ATrpms.net', packager = 'ATrpms <http://ATrpms.net/>', prein = 'NULL', postin = 'NULL', preun = 'NULL', postun = 'NULL', arch = 'x86_64', checksum = 'sha:1e669218d974a310ad08088567c34d115c4db4c3', changelog = 'NULL', fileList = '', package_id = NULL, repo_id = 62, epoch = 0, numfiles = 0"

    >>> c = MySQLdb.connect(host=hostName,user=userName,passwd=passWord,db=dbName)
    >>> c.character_set_name()
    'latin1'

    Note: ® = 0xc2

    >>> crs = c.cursor()
    >>> crs.execute(q)
    Traceback (most recent call last):
    File "<stdin>", line 1, in ?
    File "/home/tracker/install/lib/python2.4/site-packages/MySQLdb/cursors.py", line 146, in execute
    query = query.encode(charset)
    UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 350: ordinal not in range(128)

    Why does it say 'ascii' if the charset is 'latin1'? Well, let's try utf-8...

    >>> c.set_character_set('utf8')
    >>> c.character_set_name()
    'utf8'
    >>> crs = c.cursor()
    >>> crs.execute(q)
    Traceback (most recent call last):
    File "<stdin>", line 1, in ?
    File "/home/tracker/install/lib/python2.4/site-packages/MySQLdb/cursors.py", line 146, in execute
    query = query.encode(charset)
    UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 350: ordinal not in range(128)

    Same crash, and same reporting of "ascii" as the charset! Passing the charset directly to MySQLdb.connect() produces the same results. I'm using MySQL-python-1.2.1_p2 with python 2.4.4 and mysql 5.0.20 (on which I have confirmed that utf8 is a supported charset).

    This bug is currently preventing Fedora Tracker from performing further updates, including indexing Fedora Core 6 repositories. Any help with it would be GREATLY appreciated!

     
    • Gerald Forster

      Gerald Forster - 2006-12-16

      Hi!

      I just want to state that I did not continue with version 1.2.1_p2. My solution was to use PostgreSQL and PyGreSQL. Looks good so far.

      Best regards
      Gerald

       
    • Gerald Forster

      Gerald Forster - 2006-11-17

      Hello!

      According to mysql-python, version 1.2.1_p2, I can confirm problems when using Unicode data.


      My Code:

      import MySQLdb
      [...]
      myCursor = conn.cursor()

      myQuery is an UTF8-encoded string, containing German Umlauts

      myCursor.execute(myQuery)


      Code located in .../site-packages/MySQLdb/cursors.py

      127 def execute(self, query, args=None):
      [...]
      145 charset = db.character_set_name()
      146 query = query.encode(charset)


      query is encoded already.
      Method execute() trys to encode a second time and raises a runtime error.

      A workaround my be to comment line 146 out.

      Best regards
      Gerald

       
      • Andy Dustman

        Andy Dustman - 2006-11-17

        1.2.2b2 doesn't have this problem.

         
        • Gerald Forster

          Gerald Forster - 2006-11-20

          Hi!

          Obviously there has been an interface change between version 1.2.1c3 - which worked perfectly for me - and version 1.2.1_p2.
          I wasn't aware of this.
          Is there a document which discribes such interface changes?

          With version 1.2.2b2 there are two problems:
          1) It is labelled "mysql-python-test" which doesn't seem to be the same as "stable".
          2) There is no Windows-install-package.
          (I am working on an application which should run on Linux and Windows.)

          So I will continue working with 1.2.1_p2.
          I will try to use Unicode-objects as parameters to execute()

          Best regards
          Gerald

           

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.