UnicodeDecodeError: 'utf8' codec can't decode

Help
2008-08-01
2013-05-21
  • Stephane Perinelle

    When running simple (example) script I get UnicodeDecodeError: 'utf8' error: but that obvisouly depend on the source of message... Any idea on how to fix that?
    Thanks a lot
    Stephane

    SCRIPT:"
      import libgmail
      ga = libgmail.GmailAccount("XX@gmail.com", "XXX")
      ga.login()
      folder = ga.getMessagesByFolder('inbox')

      for thread in folder:
        print thread.id, len(thread), thread.subject
        for msg in thread:
          print "  ", msg.id, msg.number, msg.subject
          print msg.source

    I get the following error:
      UnicodeDecodeError: 'utf8' codec can't decode bytes in position 1671-1673: invalid data
    What depend on the last remain received ???? that has obvisoult something to do with email content???

    FULL TRACEBACK:

    Traceback (most recent call last):
      File "testx2.py", line 12, in ?
        print msg.source
      File "C:\Python24\Lib\site-packages\libgmail.py", line 1473, in _getSource
        self._source = self._account.getRawMessage(self.id)
      File "C:\Python24\Lib\site-packages\libgmail.py", line 498, in getRawMessage
        return self._retrievePage(
      File "C:\Python24\Lib\site-packages\libgmail.py", line 358, in _retrievePage
        return pageData.decode('utf-8')
      File "C:\Python24\lib\encodings\utf_8.py", line 16, in decode
        return codecs.utf_8_decode(input, errors, True)
    UnicodeDecodeError: 'utf8' codec can't decode bytes in position 1671-1673: invalid data

     
    • Thomas Järvstrand

      I also get the same error!

      Could someone please help?

      //T

       
    • Michael Goerz

      Michael Goerz - 2009-06-30

      This is normal. You can't use 'print' for unicode text on a normal terminal. However, you could write it to a text file, provided that you open the file with a suitable encoding. E.g.

      fh =codecs.open(filename, "w", "utf-8)
      fh.write(msg.subject)

      Alternatively, you can convert the unicode to ascii before printing it (by dropping characters > 127), for example. Maybe you can also reopen stdout with an encoding, I haven't tried that.

      Incidentally: there's a bug in the archive.py example script: The mbox file is NOT opened with an encoding, which leads to a crash just like the one described above.

       

Log in to post a comment.