#6 translate strings returned to Unicode

MySQLdb
closed
Andy Dustman
MySQLdb (53)
5
2012-09-19
2002-04-15
Skip Montanaro
No

I just whipped up this little extension to
MySQLdb/cursors.py today. It automatically
translates strings coming out of MySQL to Unicode
if appropriate. I suspect it's not quite what you'll
want (it does double the number of cursor classes),
but it does give you some code to start with.

Typical usage for people using predominantly
western European languages would be to just pick
the approprite Unicode variant of the various
cursor classes when connecting to the database. For
people who use different encodings than the defaults
listed in UnicodeMixIn.encodings, they can simply
subclass the appropriate cursor class and define
their own list of encodings to try.

The patch is against 0.9.1 but I compared my version
of cursors.py with that in 0.9.2b1 and didn't see
any obvious conflicts.

Skip

Discussion

  • Skip Montanaro
    Skip Montanaro
    2002-04-15

     
    Attachments
  • Andy Dustman
    Andy Dustman
    2002-04-15

    Logged In: YES
    user_id=71372

    I'm probably not going to use this patch. It seems to me
    that the easiest way to do this is to add a new converter, i.e.

    from MySQLdb.constants import FIELD_TYPE
    conv = MySQLdb.converters.conversions.copy()
    conv[FIELD_TYPE.VAR_STRING] = Char2Unicode

    XXX maybe also FIELD_TYPE.STRING

    db = MySQLdb.connect(..., conv=conv)

    where Char2Unicode is pretty much your encode_string method
    as a function.

    If anything, I think you would want to subclass the
    Connection object to add the unicode translation stuff,
    rather than the various Cursor classes. Take a look at
    Connection.init() to see how it handles writing out
    unicode objects.

     
  • Skip Montanaro
    Skip Montanaro
    2002-04-15

    Logged In: YES
    user_id=44345

    My first thought was to add a new converter, but it
    seems that only has an effect when passing data to
    MySQL. I need something that works for data coming
    out of MySQL. I don't see where the connection object
    gets involved with data coming out of the database
    either.

    Skip

     
  • Andy Dustman
    Andy Dustman
    2002-04-15

    Logged In: YES
    user_id=71372

    The converter works both ways. When sending data to the
    database, it looks for the Python type or class as the key.
    When retrieving data from the database, it uses a MySQL
    FIELD_TYPE. These conversions are actually done in _mysql.c;
    see _mysql_field_to_python(), _mysql_row_to_tuple(), and
    _mysql_ResultObject_New() (for reading); and
    _mysql_escape*() (for writing).

    Another reason you would want to do this as part of the
    connection is that MySQL (3.23.21+) has a default character
    set associated with each connection
    (connection.character_set_name()), which is usually latin1.
    Thus some of the default conversion functions are overridden
    by bound Connection object methods.

     
  • Andy Dustman
    Andy Dustman
    2002-06-23

    Logged In: YES
    user_id=71372

    0.9.2c1 returns CHAR and VARCHAR columns as unicode if the
    correct connection option is used. Can you give 0.9.2c1 a try?

     
  • Andy Dustman
    Andy Dustman
    2002-07-01

    Logged In: YES
    user_id=71372

    0.9.2c2 should resolve this issue for you