#121 Character set/Unicode support in 1.1.x

MySQLdb-1.1
closed
Andy Dustman
MySQLdb (285)
5
2012-09-19
2005-01-12
aaron
No

Error adding audit record: 'Connection' object has no
attribute 'charset'
Traceback (most recent call last):
File "C:\Conalarm2.0
\src\AuditManager\BaseAuditSession.py", line 223, in
doControllerSessions
self.processLogSession(sessinfo.dict)
File "C:\Conalarm2.0
\src\AuditManager\SMTPAuditor.py", line 94, in
processLogSession
audit = self._checkPMResult(pmresult, manifest)
File "C:\Conalarm2.0
\src\AuditManager\BaseAuditSession.py", line 295, in
_checkPMResult
self.audit.addAuditRecord(self.logRecord, manifests)
File "C:\Conalarm2.0\src\AuditLib\Audit.py", line 368, in
addAuditRecord
auditID = self.db.insert(query, self.AUDITS, **kwargs)
File "DBAccess.py", line 530, in insert
File "C:\Python24\Lib\site-
packages\MySQLdb\connections.py", line 145, in literal
return self.escape(o, self.converter)
File "C:\Python24\Lib\site-
packages\MySQLdb\connections.py", line 157, in
unicode_literal
return self.literal(u.encode(self.charset))
AttributeError: 'Connection' object has no
attribute 'charset'

Discussion

  • Andy Dustman
    Andy Dustman
    2005-01-12

    Logged In: YES
    user_id=71372

    I need version numbers before I touch this.

     
  • aaron
    aaron
    2005-01-12

    Logged In: YES
    user_id=1195294

    installed from: MySQL-python-1.1.8.win32-my4.1-py2.4.exe
    Python Version: ActivePython 2.4 Build 243
    MySQL: 4.1.8

     
  • rmorris
    rmorris
    2005-01-13

    Logged In: YES
    user_id=1196564

    Seeing a similar error:
    File "C:\Python23\Lib\site-packages\MySQLdb\cursors.py",
    line 134, in execute
    self.errorhandler(self, exc, value)
    File
    "C:\Python23\Lib\site-packages\MySQLdb\connections.py", line
    33, in defaulterrorhandler
    raise errorclass, errorvalue
    AttributeError: 'Connection' object has no attribute 'charset'

    Versions:
    - MySQL-python-1.1.8.win32-my4.1-py2.3.exe
    - Python 2.3.3 (#51, Dec 18 2003, 20:22:39)
    - MySQL 4.1.8

     
  • Andy Dustman
    Andy Dustman
    2005-01-17

    Logged In: YES
    user_id=71372

    There is a case where charset could go unset, and that will
    be fixed in CVS shortly.

    Judging from the traceback, it appears this insert() method
    of yours is calling connection.escape(). Don't do that.
    There's no reason for user code to ever call escape().

     
  • aaron
    aaron
    2005-01-17

    Logged In: YES
    user_id=1195294

    Checked my code, id do not call connection.escape().

     
  • Logged In: NO

    One possible quick fix for this problem would be to
    substitute each problematic instance of execute() with
    executemany(). For INSERT and UPDATE (which are causing the
    problems, I believe), executemany() should behave nearly
    identically to execute(). It would also require a minimal
    amount of re-write, since I believe they have similar
    syntax. The only change would be to change the parameter
    tuple to a list of tuples. This could be accomplished by
    putting the parameter list inside square brackets ([]).

    Just a thought...

    Note: I haven't tested this yet, I am just thinking out-loud.

     
  • Logged In: NO

    False alarm. I tried the executemany() fix and it didn't
    help anything.

     
  • Logged In: NO

    I tried with version 1.27 of connections.py, since that was
    listed in CVS as a possible fix for the charset problem. It
    seems to have shifted the problem however. The new stack
    trace I get is as follows:

    Traceback (most recent call last):
    File
    "H:\sandbox\A7433\releases\install\ctb\testmanager\RunManager.py",
    line 328, in CreateNewRun
    (associatedTest, runDatabaseName, runName,
    File "C:\Python23\Lib\site-packages\MySQLdb\cursors.py",
    line 129, in execute
    self.errorhandler(self, TypeError, m)
    File
    "C:\Python23\Lib\site-packages\MySQLdb\connections.py", line
    33, in defaulterrorhandler
    raise errorclass, errorvalue
    TypeError: encode() argument 1 must be string, not None

    This comes at the same place that I used to get the
    AttributeError regarding charset.

    Micah

     
  • Andy Dustman
    Andy Dustman
    2005-01-19

    Logged In: YES
    user_id=71372

    Are you passing unicode=<charset> to connect()? I'm guessing
    no. I probably should change the default handling. Until
    then, either set the unicode parameter to whatever character
    set you like, or change connections.py so that the default
    is what you want. I will probably make the default this
    (line 95):

        self.charset = sys.getdefaultencoding()
    

    or this:

        self.charset = self.character_set_name()
    

    Note that MySQL-4.1.8 and earlier has a bug in the client
    library that returns the collation instead of the character
    set name; this is fixed in 4.1.9.

    The unicode parameter to connect is primarily to return
    string and text-like columns as unicode. However, the
    default encoding is used when passing unicode Python values
    to MySQL. Generally, these are assumed to be the same
    character set. It might be necessary to have them be separate.

     
  • Logged In: NO

    Prior to now, I was not passing any "unicode = <charset>" I
    tried passing in " unicode='latin1' ". This is leading to
    other problems that I am having trouble tracking down. I'm
    pretty sure all my tables are actually using latin1.

    Can you explain how this is changing the return type when I
    select data through the connection? Is it now coming back
    as a unicode object instead of a string object?

    Is the following comment in connections.py now invalid?
    [snip]
    unicode -- If set to a string, character columns are
    returned as unicode objects with this encoding. If set to
    None,the default encoding is used. If not set at all,
    character columns are returned as normal strings.
    [/snip]

    Is it no longer possible to get character columns back as
    normal strings?

    Sorry if my questions are basic or nonsensical. I don't
    know a lot about low-level database stuff or character
    encodings.

    Thanks for the help,
    Micah

     
  • Andy Dustman
    Andy Dustman
    2005-01-19

    Logged In: YES
    user_id=71372

    That comment is mostly correct. If you don't pass the
    unicode parameter, all your text-like columns are returned
    as normal Python strings. If you pass it the name of a
    character set, then all text-like columns are returned as
    unicode objects with the given encoding. If you pass it
    None, then all text-like columns are returned as unicode
    objects with the default system encoding.

    In all cases, if you pass a unicode object to
    cursor.execute(), it tries to decode it using the character
    set passed via the unicode parameter; if you didn't pass
    one, it's None, which is the problem.

    Try changing line 95 of connections.py to this:

        self.charset = self.character_set_name().split('_')[0]
    

    This should work around the bug in MySQL. You'll still get
    text-like columns as normal Python strings if you don't pass
    the unicode parameter, and if you pass unicode objects to
    .execute(), they'll have the right encoding. This is most
    likely the fix I'm going to use, but I also need to fix
    converters.py a bit, even though the Unicode2Str() there is
    almost never used (I'll use sys.getdefaultencoding() there).

     
  • Logged In: YES
    user_id=274034

    The fix you specified seems to cause a new error:

    import MySQLdb
    con = MySQLdb.connect(db = "sandbox")
    Traceback (most recent call last):
    File "<interactive input="">", line 1, in ?
    File "C:\Python23\Lib\site-packages\MySQLdb_init.py",
    line 64, in Connect
    return Connection(
    args, *kwargs)
    File
    "C:\Python23\Lib\site-packages\MySQLdb\connections.py", line
    95, in init__
    self.charset = self.character_set_name().split('')[0]
    InternalError: (-1, 'server not initialized')

    Did I not place it correctly?

     
  • Andy Dustman
    Andy Dustman
    2005-01-19

    Logged In: YES
    user_id=71372

    You did, but I forgot to account for initialization of the
    super class; this has to occur afterwards. I'll fix this
    properly later tonight.

     
  • Andy Dustman
    Andy Dustman
    2005-01-20

    Logged In: YES
    user_id=71372

    Try the CVS version, which will probably be 1.1.9 this weekend.

    I've removed the unicode parameter to connect and replaced
    it with use_unicode, which is just a boolean. If true, any
    text-like columns are returned as unicode in the
    connection's character set; otherwise they are returned as
    normal strings. No matter what you set this to, if you pass
    unicode objects to execute(), it will encode them to the
    connections character set, for now; this will have to change
    when I do full 4.1 support, since even columns may have
    their own character set. In some cases, UnicodeEncodeErrors
    will be raised; for example, if your connection character
    set is latin1, passing in unicode objects that were encoded
    with something like shift-jis will make it blow up. There's
    really no good solution for something like this, from what I
    can see, unless you set your connection character set to be
    compatible, perhaps utf8.

    There was a unicode_error parameter and that too has been
    removed. I'm no longer convinced it's a good idea. I don't
    do a lot of stuff with unicode so I need some user feedback
    on this.

    I'm changing the summary in hopes that it attracts some
    attention.

     
  • Logged In: YES
    user_id=274034

    I'll try to use the CVS version, but I'm stuck in a Windows
    environment. I'm trying to figure out right now how to
    install Python-mysql in Windows, but it's slow going.

     
  • Logged In: YES
    user_id=274034

    The version in CVS seems to fix this bug. I have not seen
    it since I started using the CVS version.

    Thanks for the quick fix-up!