Work at SourceForge, help us to make it a better place! We have an immediate need for a Support Technician in our San Francisco or Denver office.

Close

#335 utf8mb4 charset support

MySQLdb-1.2
open
nobody
None
1
2014-02-25
2014-02-25
Jiri Baum
No

Summary: In MySQL, the charset name 'utf8' is used for a subset of the UTF-8 encoding; in order to use the full character range, MySQL needs to be told to use 'utf8mb4', but Python and MySQLdb don't know about 'utf8mb4'.

How to reproduce: Use a database with the 'utf8mb4' charset and attempt to connect to it using MySQLdb. Be sure the test data being inserted and/or retrieved includes text with unicode characters outside plane 0.

Expected behaviour: Text containing unicode characters outside plane 0 is handled correctly.

Actual behaviour: If 'utf8' is specified to the connect() call, warnings are given and data containing unicode characters outside plane 0 is partly discarded. If 'utf8mb4' is specified to the connect() call, exceptions are thrown on insertion and/or retrieval.

Workaround: I think the following code works around the issue, but I have no idea whether it's a full and/or correct solution.

conn = MySQLdb.connect(db='dbname', read_default_file="~/.my.cnf", charset='utf8mb4')
conn.unicode_literal.charset = 'utf8'
conn.string_decoder.charset = 'utf8'

Version: I'm using MySQLdb 1.2.3 as packaged with Ubuntu.

Discussion