Menu

MySQLdb and sys.setdefaultencoding() ?

Help
2009-04-18
2012-09-19
  • james kirin

    james kirin - 2009-04-18

    Hi everyone,

    I would appreciate any help with the following problem. I have a database using the utf-8 charset (and its default collation), yet trying to insert a unicode string into a row using MySQLdb fails unless I explicitly encode it into utf-8. I would expect this to work because I have set python's default encoding to be utf-8:

    $ cat /usr/lib/python2.5/site-packages/sitecustomize.py
    import sys
    sys.setdefaultencoding('utf-8')

    Python seems to be honoring this:

    $ python
    Python 2.5.2 (r252:60911, Sep 11 2008, 13:43:31)
    [GCC 4.2.4] on linux2
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import sys
    >>> sys.getdefaultencoding()
    'utf-8'
    >>> str(u'\u2022')
    '\xe2\x80\xa2'
    >>> u'\u2022'.encode('utf-8')
    '\xe2\x80\xa2'

    The two commands above (.encode('utf-8') and str) both produce the same result on the shell. However, the default encoding into utf-8 does not seem to happen with arguments to MySQLdb's [cursor].execute():

    >>> import MySQLdb
    >>> db=MySQLdb.connect(db='site', init_command='SET NAMES utf8')
    >>> c=db.cursor()
    >>> c.execute('INSERT INTO t VALUES(%s);', u'\u2022');
    Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    File "build/bdist.linux-i686/egg/MySQLdb/cursors.py", line 151, in execute
    File "build/bdist.linux-i686/egg/MySQLdb/connections.py", line 247, in literal
    File "build/bdist.linux-i686/egg/MySQLdb/connections.py", line 185, in unicode_literal
    UnicodeEncodeError: 'latin-1' codec can't encode character u'\u2022' in position 0: ordinal not in range(256)
    >>> c.execute('INSERT INTO t VALUES(%s);', u'\u2022'.encode('utf-8'));
    1L

    Why, in the first case (where I just passed the unicode object to execute()), is default encoding into utf-8 not working?

    Any help would be much apppreciated; it is late and I must be missing something.

    Thank you,

    James

     
    • james kirin

      james kirin - 2009-04-18

      I have finally figured this out. For the benefit of others struggling with using Python unicode objects with a database that is using the utf-8 charset:

      For python unicode objects to pass seamlessly to MySQL and back, you need to create the MySQLdb connection with the following arguments:

      MySQLdb.connect(host=...,user=...,passwd=...,db=...,
      init_command='SET NAMES utf8', use_unicode=True, charset='utf8' )

      Just remember to also create your database in a way that it will use utf-8 as its charset:

      [in the mysql client]

      CREATE DATABASE [dbname] CHARACTER SET = 'utf8';

      Hope this helps

      James

       

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.