utf8mb4 charset support

MySQL database connector for Python programming

Brought to you by: adustman

#335 utf8mb4 charset support

Milestone: MySQLdb-1.2

Status: open

Owner: nobody

Labels: None

Priority: 1

Updated: 2014-02-25

Created: 2014-02-25

Creator: Jiri Baum

Private: No

Summary: In MySQL, the charset name 'utf8' is used for a subset of the UTF-8 encoding; in order to use the full character range, MySQL needs to be told to use 'utf8mb4', but Python and MySQLdb don't know about 'utf8mb4'.

How to reproduce: Use a database with the 'utf8mb4' charset and attempt to connect to it using MySQLdb. Be sure the test data being inserted and/or retrieved includes text with unicode characters outside plane 0.

Expected behaviour: Text containing unicode characters outside plane 0 is handled correctly.

Actual behaviour: If 'utf8' is specified to the connect() call, warnings are given and data containing unicode characters outside plane 0 is partly discarded. If 'utf8mb4' is specified to the connect() call, exceptions are thrown on insertion and/or retrieval.

Workaround: I think the following code works around the issue, but I have no idea whether it's a full and/or correct solution.

conn = MySQLdb.connect(db='dbname', read_default_file="~/.my.cnf", charset='utf8mb4')
conn.unicode_literal.charset = 'utf8'
conn.string_decoder.charset = 'utf8'

Version: I'm using MySQLdb 1.2.3 as packaged with Ubuntu.

utf8mb4 charset support

MySQL database connector for Python programming

Group

Searches

Help

#335 utf8mb4 charset support

Discussion