Hello,
A string type column with a utf8_bin collation will not be converted to a
Python Unicode string, but instead will be returned as a utf8 (byte) string.
The MySQL documentation though clearly states: "A nonbinary string has a
character set and is converted to another character set in many cases, even
when the string has a _bin collation"[1].
I understand that a string with utf8_bin collation is still a string and
thus should not be dealt with differently. The utf8_bin collation is
essential when working with Unicode without wanting the Unicode collation
algorithm to kick in.
How to reproduce:
CREATE TABLE t1 (
a CHAR(10) CHARACTER SET utf8 COLLATE utf8_bin,
);
INSERT INTO t1 VALUES ('ü');
In Python:
import MySQLdb
db = MySQLdb.connect(db='pymysqltest', charset='utf8', use_unicode=True)
cur = db.cursor()
cur.execute("SELECT a FROM t1;")
1L
cur.fetchall()
(('\xc3\xbc',),)
Chosing utf8_general_ci instead of utf8_bin will properly yield Unicode
objects:
cur.execute("SELECT a COLLATE utf8_general_ci FROM t1;")
1L
cur.fetchall()
((u'\xfc',),)
[1] http://dev.mysql.com/doc/refman/5.1/en/charset-binary-collations.html
this bug has been reported as bug #541198 against python-mysqldb 1.2.2-8
in debian: http://bugs.debian.org/541198
original submitter was: Christoph Burgmer chrislb@gmx.de
greetings,
jonas
Thank you for the bug report! After spending nearly two days wondering why sqlalchemy was returning strings when using use_unicode=True in the connection string whereas it returns unicode when using use_unicode=False I finally found that MySQL for Python fails to detect columns defined like
page_title
varchar(255) CHARACTER SET utf8 COLLATE utf8_bin NOT NULLto be "text" columns - and they are treated as being binary. The definition above is from MediaWiki - so it's not that uncommon to have such column definitions. I'm using MySQL-python 1.2.3.
Thanks, Philipp
MySQL-python 1.2.4 seemed to fix this issue. The column mentioned in the previous post is returned as UnicodeType now when using use_unicode=True in the connection string.
Thanks for fixing this.
Philipp, I created a SourceForge account just so I could thank you! I was having the same exact problem and seriously beginning to question my sanity when I finally entered the right search terms and found this. :)