#59 Unicode handling flakey

open
nobody
None
5
2009-02-27
2009-02-27
Anonymous
No

This is with the latest downloaded archive. The particular string this is choking on, is "Martin v. Löwis".

Traceback (most recent call last):
File "updateCache.py", line 371, in <module>
if not ProcessSearchResult(limitedSearchResult, data, minimumToCheck=minimum
ToCheck):
File "updateCache.py", line 176, in ProcessSearchResult
messages = thread[:]
File "D:\SVN\_googlecode\gmail-backup\libgmail\libgmail.py", line 1376, in __g
etitem__
self._messages = self._getMessages(self)
File "D:\SVN\_googlecode\gmail-backup\libgmail\libgmail.py", line 1406, in _ge
tMessages
result += [GmailMessage(thread, msg, isDraft = isDraft)]
File "D:\SVN\_googlecode\gmail-backup\libgmail\libgmail.py", line 1446, in __i
nit__
self.author_fullname = msgData[MI_AUTHORNAME].decode('utf-8')
File "c:\Python26\lib\encodings\utf_8.py", line 16, in decode
return codecs.utf_8_decode(input, errors, True)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 12-13: ord
inal not in range(128)

Discussion

  • Lars Erik Jordet

    I'm getting the same error, on the name "Pål" as sender.

    My fix for this is adding the following function at the top of libgmail.py:

    def to_unicode(xstr):
    '''
    Forces string to unicode
    '''
    if type(xstr) == unicode:
    return xstr
    elif type(xstr) == str:
    return xstr.decode('utf-8')
    else:
    return xstr

    ...and then changing all calls of the type strvar.decode('utf-8') to to_unicode(strvar):

    In GmailMessage.__init__:

    self.author = to_unicode(msgData[MI_AUTHORFIRSTNAME])
    self.author_fullname = to_unicode(msgData[MI_AUTHORNAME])
    self.id = msgData[MI_MSGID]
    self.number = msgData[MI_NUM]
    self.subject = to_unicode(msgData[MI_SUBJECT])
    self.to = [to_unicode(x) for x in msgData[MI_TO]]
    self.cc = [to_unicode(x) for x in msgData[MI_CC]]
    self.bcc = [to_unicode(x) for x in msgData[MI_BCC]]
    self.sender = to_unicode(msgData[MI_AUTHOREMAIL])

    and in ._getSource:
    return to_unicode(self._source)

    (I can make a proper patch if that's necessary ;))

     
  • Waseem Daher

    Waseem Daher - 2009-04-16

    Ok, I've committed lejordet's patch to CVS. Let me know if that works for you. Thanks.

     
  • Nobody/Anonymous

    it doesn't work if the name of attached file is unicode

     

Log in to post a comment.

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:

JavaScript is required for this form.





No, thanks