MySQLdb takes a keyword argument in connect() to encode
unicode strings under a certain encoding type, such as
utf8. If, however, MySQL returns data which cannot be
encoded in the specified encoding type, MySQLdb raises
errors like this one:
Traceback (most recent call last):
File "<stdin>", line 1, in ?
File
"/usr/lib/python2.3/site-packages/MySQLdb/cursors.py",
line 95, in execute
return self._execute(query, args)
File
"/usr/lib/python2.3/site-packages/MySQLdb/cursors.py",
line 114, in _execute
self.errorhandler(self, exc, value)
File
"/usr/lib/python2.3/site-packages/MySQLdb/connections.py",
line 33, in defaulterrorhandler
raise errorclass, errorvalue
UnicodeDecodeError: 'utf8' codec can't decode byte 0x80
in position 2: unexpected code byte
To handle encoding errors, the Python unicode
constructor and encode() methods take an extra argument
called "errors" which specifies how to handle encoding
errors, as documented here:
unicode(string [, encoding[, errors]]) -> object
Create a new Unicode object from the given encoded
string.
encoding defaults to the current default string encoding.
errors can be 'strict', 'replace' or 'ignore' and
defaults to 'strict'.
MySQLdb, not passing errors, defaults to 'strict' which
may not be what the developer needs. The attached patch
allows the developer to send MySQLdb.connect an
additional keyword argument, unicode_errors, which will
be passed to Python's unicode constructor, thus
allowing the developer to specify how unicode encoding
problems should be handled.
Patch to allow developers to set unicode encoding type
Logged In: YES
user_id=71372
I'll probably put this in both 1.1.6 and 1.0.1; it seems
quite reasonable.
Logged In: YES
user_id=71372
Your patch, or a variation, has been applied to the current CVS tree.