Insert of UTF-8 data with MySQL 4.1 causes encoding error
MySQL database connector for Python programming
Brought to you by:
adustman
Attempting to insert a row with UTF-8 data into a MySQL table causes the
following error:
UnicodeDecodeError: 'ascii' codec can't decode byte 0xec in position 1:
ordinal not in range(128)
Using:
MySQL 4.1.18-standard
Python 2.4.2 (ActivePython 2.4.2 Build 248)
Mac OS X 10.4.4 (PowerPC)
MySQLdb 2.0
Attached is a minimal test case including:
- SQL to create a schema
- Configuration file passed when opening connection
- Python code attempting to insert a single character unicode string
- Transcript of running the above
Archive containing minimal test case
Logged In: YES
user_id=71372
OK, I've figured out how to reproduce this. It happens when
you use a unicode query string. Generally this is
unnecessary because when you have unicode parameters, the
resulting substituted query string is unicode. The only time
you would have to have a unicode query string is if you had
some unicode literal values in there.
Here's the problem. Internal parameters are converted to SQL
literals via self.connection.literal(). This actually
results in strings with the encoding of the connection. When
you try to insert these back into a unicode query string,
they have 8-bit characters but they are assumed to be ascii,
so you have the encoding error.
As previously noted, using a regular string for the query
solves the problem, but I will come up with a fix for this
since it is likely to crop up periodically.
FWIW: You do not need to specify both CHARACTER SET for your
column definition and DEFAULT CHARACTER SET for the table if
they are both the same. Also, as far as the client is
concerned, the configuration file only needs to set
default-character-set in the [client] section, and not other
sections. Once you have this, it is unnecessary to execute
SET NAMES.
Logged In: YES
user_id=71372
Fixed in CVS. 1.2.1 should be released within a week.
Sprinting at PyCon 2006.
Logged In: YES
user_id=1460504
OK, this does fix the problem (thanks!).
Note that it's not unreasonable to have unicode characters in the query string
however: not only literals, but also column names can be in utf-8 (it may not be
that common, but it's certainly supported by MySQL 4.1 and upwards).
Logged In: YES
user_id=71372
Confirmed fixed