Menu

#171 Insert of UTF-8 data with MySQL 4.1 causes encoding error

MySQLdb-1.2
closed
MySQLdb (285)
5
2012-09-19
2006-02-26
No

Attempting to insert a row with UTF-8 data into a MySQL table causes the
following error:

UnicodeDecodeError: 'ascii' codec can't decode byte 0xec in position 1:
ordinal not in range(128)

Using:

MySQL 4.1.18-standard
Python 2.4.2 (ActivePython 2.4.2 Build 248)
Mac OS X 10.4.4 (PowerPC)
MySQLdb 2.0

Attached is a minimal test case including:
- SQL to create a schema
- Configuration file passed when opening connection
- Python code attempting to insert a single character unicode string
- Transcript of running the above

Discussion

  • Brett Powley

    Brett Powley - 2006-02-26

    Archive containing minimal test case

     
  • Andy Dustman

    Andy Dustman - 2006-02-26

    Logged In: YES
    user_id=71372

    OK, I've figured out how to reproduce this. It happens when
    you use a unicode query string. Generally this is
    unnecessary because when you have unicode parameters, the
    resulting substituted query string is unicode. The only time
    you would have to have a unicode query string is if you had
    some unicode literal values in there.

    Here's the problem. Internal parameters are converted to SQL
    literals via self.connection.literal(). This actually
    results in strings with the encoding of the connection. When
    you try to insert these back into a unicode query string,
    they have 8-bit characters but they are assumed to be ascii,
    so you have the encoding error.

    As previously noted, using a regular string for the query
    solves the problem, but I will come up with a fix for this
    since it is likely to crop up periodically.

    FWIW: You do not need to specify both CHARACTER SET for your
    column definition and DEFAULT CHARACTER SET for the table if
    they are both the same. Also, as far as the client is
    concerned, the configuration file only needs to set
    default-character-set in the [client] section, and not other
    sections. Once you have this, it is unnecessary to execute
    SET NAMES.

     
  • Andy Dustman

    Andy Dustman - 2006-02-26

    Logged In: YES
    user_id=71372

    Fixed in CVS. 1.2.1 should be released within a week.
    Sprinting at PyCon 2006.

     
  • Brett Powley

    Brett Powley - 2006-02-26

    Logged In: YES
    user_id=1460504

    OK, this does fix the problem (thanks!).

    Note that it's not unreasonable to have unicode characters in the query string
    however: not only literals, but also column names can be in utf-8 (it may not be
    that common, but it's certainly supported by MySQL 4.1 and upwards).

     
  • Andy Dustman

    Andy Dustman - 2006-02-27

    Logged In: YES
    user_id=71372

    Confirmed fixed

     

Log in to post a comment.