Summary: In MySQL, the charset name 'utf8' is used for a subset of the UTF-8 encoding; in order to use the full character range, MySQL needs to be told to use 'utf8mb4', but Python and MySQLdb don't know about 'utf8mb4'.
How to reproduce: Use a database with the 'utf8mb4' charset and attempt to connect to it using MySQLdb. Be sure the test data being inserted and/or retrieved includes text with unicode characters outside plane 0.
Expected behaviour: Text containing unicode characters outside plane 0 is handled correctly.
Actual behaviour: If 'utf8' is specified to the connect() call, warnings are given and data containing unicode characters outside plane 0 is partly discarded. If 'utf8mb4' is specified to the connect() call, exceptions are thrown on insertion and/or retrieval.
Workaround: I think the following code works around the issue, but I have no idea whether it's a full and/or correct solution.
conn = MySQLdb.connect(db='dbname', read_default_file="~/.my.cnf", charset='utf8mb4')
conn.unicode_literal.charset = 'utf8'
conn.string_decoder.charset = 'utf8'
Version: I'm using MySQLdb 1.2.3 as packaged with Ubuntu.