Menu

Encode Error when inserting

Help
time flys
2004-09-21
2012-09-19
  • time flys

    time flys - 2004-09-21

    I am parsing a feed using the Universal Feed Parser and I get the following error when inserting into mysql.

    =====
    Traceback (most recent call last):
    File "test.py", line 20, in ?
    cursor.execute("""INSERT INTO news (content) VALUES (%s)""", (content))
    File "/System/Library/Frameworks/Python.framework/Versions/2.3/lib/python2.3/site-packages/MySQLdb/cursors.py", line 95, in execute
    return self._execute(query, args)
    File "/System/Library/Frameworks/Python.framework/Versions/2.3/lib/python2.3/site-packages/MySQLdb/cursors.py", line 114, in _execute
    self.errorhandler(self, exc, value)
    File "/System/Library/Frameworks/Python.framework/Versions/2.3/lib/python2.3/site-packages/MySQLdb/connections.py", line 33, in defaulterrorhandler
    raise errorclass, errorvalue
    UnicodeEncodeError: 'latin-1' codec can't encode character '\u2019' in position 6: ordinal not in range(256)
    =====

    I am running OSX Panther, Python 2.3, and MySQLdb 1.0. The character encoding that was used to parse the feed is utf-8. This also happens with Fedora.

    My MySQL insert code is as follows

    try:
    cursor.execute("""INSERT INTO news (content) VALUES (%s)""", (content))
    except MySQLdb.Error, e:
    print "Error %d: %s" % (e.args[0], e.args[1])
    =====

    The error above, happens for example when I set content to the following which was parsed by Universal Feed Parser:

    Arthur’s vade mecum - on Competitive Intelligence

    or

    =====
    Let’s all be individuals…now, everybody repeat after me, ‘Let’s all be individuals’
    =====

    I realize this is an encoding issue, but do not know how to handle it. Any ideas are appreciated.

     
    • deelan

      deelan - 2004-09-21

      oops, safeEncode() indentation is gone. check this instead:

      http://pastebin.de/pastebin.py?id=1278

       
    • time flys

      time flys - 2004-09-21

      Thank you deelan it works great.

      I happen to be storing data captured by the universal feed parser for search purposes, thus the ignore option is slightly better than replace.

       
    • deelan

      deelan - 2004-09-21

      this happens probably because Universal Feed Parser returns character data as UTF-8 but mysql does not (yet) support unicode (4.1 will include support UTF-8) and so it tries to encode character data into straight 8-bit strings (python's str type) using latin-1 (AFAIK mysql default encoding).

      the problem arise when an unicode symbol isn't in the latin-1 table, say a curly quote or a curly apostrophe.

      i've faced the same problem in the past and i've just written a python function that perform an unicode --> latin-1 encoding, instead of using mysql built-in converter, forcing the system to forget abut possibile encoding errors:

      def safeEncode(self, v): # v is an unicode value
      try:
      # force encoding to latin-1
      v = v.encode('latin-1', 'replace')
      except UnicodeEncodeError, ex: #unlikely
      print ex #let us known

          return v
      

      'replace' will replace trouble charaters with a question mark. you can try to use 'ignore' and see what happens.

      so, your query becomes:

      cursor.execute("""INSERT INTO news (content) VALUES (%s)""", (safeEncode(content), ))

      HTH,
      deelan

       
      • Pomin Wu

        Pomin Wu - 2004-12-01

        Perhap the converter for types.UnicodeType should also respect the 'unicode' setting of connection objects. Setting 'unicode' parameter for a connection means programmer wants to read unicode objects from queries, so it's reasonable to expect writing unicode objects into queries works too. I did this and can insert unicode objects without problem:

        encoding = 'utf-8'
        db = MySQLdb.connect(
                db = "test", user = "", passwd = "", unicode = encoding)
        
        def unicode_literal(u, dummy = None) :
            """ Unicode converter that respect our encoding. """
            return db.literal(u.encode(encoding))
        
        def install_encoding(db) :
            """ Install our unicode converter.
        
            Simply passing converter dictionary into initializer isn't going to
            work since they will be replaced.
            """
            db.converter[types.UnicodeType] = unicode_literal
        
        install_encoding(db)
        c = db.cursor()
        c.execute("""
            INSERT `test_project` VALUES (%s, %s, %s, %s)
            """, ("hello", u"some unicode literals", None, datetime.now()))
        
         

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.