#289 utf8_bin collation will not convert to Unicode strings

MySQLdb-1.2
open
Andy Dustman
MySQLdb (285)
5
2014-02-17
2009-08-13
Jonas Meurer
No

Hello,

A string type column with a utf8_bin collation will not be converted to a
Python Unicode string, but instead will be returned as a utf8 (byte) string.

The MySQL documentation though clearly states: "A nonbinary string has a
character set and is converted to another character set in many cases, even
when the string has a _bin collation"[1].

I understand that a string with utf8_bin collation is still a string and
thus should not be dealt with differently. The utf8_bin collation is
essential when working with Unicode without wanting the Unicode collation
algorithm to kick in.

How to reproduce:

CREATE TABLE t1 (
a CHAR(10) CHARACTER SET utf8 COLLATE utf8_bin,
);

INSERT INTO t1 VALUES ('ü');

In Python:

import MySQLdb
db = MySQLdb.connect(db='pymysqltest', charset='utf8', use_unicode=True)
cur = db.cursor()
cur.execute("SELECT a FROM t1;")
1L
cur.fetchall()
(('\xc3\xbc',),)

Chosing utf8_general_ci instead of utf8_bin will properly yield Unicode
objects:

cur.execute("SELECT a COLLATE utf8_general_ci FROM t1;")
1L
cur.fetchall()
((u'\xfc',),)

[1] http://dev.mysql.com/doc/refman/5.1/en/charset-binary-collations.html

this bug has been reported as bug #541198 against python-mysqldb 1.2.2-8
in debian: http://bugs.debian.org/541198
original submitter was: Christoph Burgmer chrislb@gmx.de

greetings,
jonas

Discussion

  • Thank you for the bug report! After spending nearly two days wondering why sqlalchemy was returning strings when using use_unicode=True in the connection string whereas it returns unicode when using use_unicode=False I finally found that MySQL for Python fails to detect columns defined like

    page_title varchar(255) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL

    to be "text" columns - and they are treated as being binary. The definition above is from MediaWiki - so it's not that uncommon to have such column definitions. I'm using MySQL-python 1.2.3.

    Thanks, Philipp

     
  • MySQL-python 1.2.4 seemed to fix this issue. The column mentioned in the previous post is returned as UnicodeType now when using use_unicode=True in the connection string.
    Thanks for fixing this.

     
    • Karin Lin
      Karin Lin
      2013-09-27

      Philipp, I created a SourceForge account just so I could thank you! I was having the same exact problem and seriously beginning to question my sanity when I finally entered the right search terms and found this. :)