I would appreciate any help with the following problem. I have a database using the utf-8 charset (and its default collation), yet trying to insert a unicode string into a row using MySQLdb fails unless I explicitly encode it into utf-8. I would expect this to work because I have set python's default encoding to be utf-8:
$ python
Python 2.5.2 (r252:60911, Sep 11 2008, 13:43:31) [GCC 4.2.4] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> sys.getdefaultencoding()
'utf-8'
>>> str(u'\u2022')
'\xe2\x80\xa2'
>>> u'\u2022'.encode('utf-8')
'\xe2\x80\xa2'
The two commands above (.encode('utf-8') and str) both produce the same result on the shell. However, the default encoding into utf-8 does not seem to happen with arguments to MySQLdb's [cursor].execute():
>>> import MySQLdb
>>> db=MySQLdb.connect(db='site', init_command='SET NAMES utf8')
>>> c=db.cursor()
>>> c.execute('INSERT INTO t VALUES(%s);', u'\u2022');
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "build/bdist.linux-i686/egg/MySQLdb/cursors.py", line 151, in execute
File "build/bdist.linux-i686/egg/MySQLdb/connections.py", line 247, in literal
File "build/bdist.linux-i686/egg/MySQLdb/connections.py", line 185, in unicode_literal
UnicodeEncodeError: 'latin-1' codec can't encode character u'\u2022' in position 0: ordinal not in range(256)
>>> c.execute('INSERT INTO t VALUES(%s);', u'\u2022'.encode('utf-8'));
1L
Why, in the first case (where I just passed the unicode object to execute()), is default encoding into utf-8 not working?
Any help would be much apppreciated; it is late and I must be missing something.
Thank you,
James
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I have finally figured this out. For the benefit of others struggling with using Python unicode objects with a database that is using the utf-8 charset:
For python unicode objects to pass seamlessly to MySQL and back, you need to create the MySQLdb connection with the following arguments:
Hi everyone,
I would appreciate any help with the following problem. I have a database using the utf-8 charset (and its default collation), yet trying to insert a unicode string into a row using MySQLdb fails unless I explicitly encode it into utf-8. I would expect this to work because I have set python's default encoding to be utf-8:
$ cat /usr/lib/python2.5/site-packages/sitecustomize.py
import sys
sys.setdefaultencoding('utf-8')
Python seems to be honoring this:
$ python
Python 2.5.2 (r252:60911, Sep 11 2008, 13:43:31)
[GCC 4.2.4] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> sys.getdefaultencoding()
'utf-8'
>>> str(u'\u2022')
'\xe2\x80\xa2'
>>> u'\u2022'.encode('utf-8')
'\xe2\x80\xa2'
The two commands above (.encode('utf-8') and str) both produce the same result on the shell. However, the default encoding into utf-8 does not seem to happen with arguments to MySQLdb's [cursor].execute():
>>> import MySQLdb
>>> db=MySQLdb.connect(db='site', init_command='SET NAMES utf8')
>>> c=db.cursor()
>>> c.execute('INSERT INTO t VALUES(%s);', u'\u2022');
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "build/bdist.linux-i686/egg/MySQLdb/cursors.py", line 151, in execute
File "build/bdist.linux-i686/egg/MySQLdb/connections.py", line 247, in literal
File "build/bdist.linux-i686/egg/MySQLdb/connections.py", line 185, in unicode_literal
UnicodeEncodeError: 'latin-1' codec can't encode character u'\u2022' in position 0: ordinal not in range(256)
>>> c.execute('INSERT INTO t VALUES(%s);', u'\u2022'.encode('utf-8'));
1L
Why, in the first case (where I just passed the unicode object to execute()), is default encoding into utf-8 not working?
Any help would be much apppreciated; it is late and I must be missing something.
Thank you,
James
I have finally figured this out. For the benefit of others struggling with using Python unicode objects with a database that is using the utf-8 charset:
For python unicode objects to pass seamlessly to MySQL and back, you need to create the MySQLdb connection with the following arguments:
MySQLdb.connect(host=...,user=...,passwd=...,db=...,
init_command='SET NAMES utf8', use_unicode=True, charset='utf8' )
Just remember to also create your database in a way that it will use utf-8 as its charset:
[in the mysql client]
CREATE DATABASE [dbname] CHARACTER SET = 'utf8';
Hope this helps
James