Menu

problem encoding queries

2006-03-22
2012-09-19
  • Flávio Codeço Coelho

    Hi since I moved to MySQL 4.1/MySQdb 1.2.1c7, I am having problems with queries encoded in iso-8859-1.

    Before I could execute my queries transparently independent of their encoding. Now the query only works if I first decode the strings that contain non-ascii characters:
    'são paulo'.decode('iso-8859-1')

    Why is this? Is this the only solution or is there a way to make MySQLdb recognize the encoding and handle it accordingly?

    thanks

     
    • Andy Dustman

      Andy Dustman - 2006-03-22

      An error message/traceback and code example would be helpful.

       
    • Flávio Codeço Coelho

      OK, here is the traceback:

      Traceback (most recent call last):
      File "./epigrass.py", line 260, in onRun
      S.start()
      File "/home/fccoelho/Documents/Projects_software/epigrass/Epigrass-devel/Epigrass/manager.py", line 346, in start
      self.outToODb(modelname)
      File "/home/fccoelho/Documents/Projects_software/epigrass/Epigrass-devel/Epigrass/manager.py", line 430, in outToODb
      DO.Site(dicin)
      File "/usr/lib/python2.4/site-packages/sqlobject/main.py", line 890, in init
      self._create(id,
      kw)
      File "/usr/lib/python2.4/site-packages/sqlobject/main.py", line 923, in _create
      self._SO_finishCreate(id)
      File "/usr/lib/python2.4/site-packages/sqlobject/main.py", line 947, in _SO_finishCreate
      id, names, values)
      File "/usr/lib/python2.4/site-packages/sqlobject/dbconnection.py", line 241, in queryInsertID
      return self._runWithConnection(self._queryInsertID, soInstance, id, names, values)
      File "/usr/lib/python2.4/site-packages/sqlobject/dbconnection.py", line 125, in _runWithConnection
      val = meth(conn, *args)
      File "/usr/lib/python2.4/site-packages/sqlobject/mysql/mysqlconnection.py", line 53, in _queryInsertID
      self._executeRetry(conn, c, q)
      File "/usr/lib/python2.4/site-packages/sqlobject/mysql/mysqlconnection.py", line 35, in _executeRetry
      return cursor.execute(query)
      File "/usr/lib/python2.4/site-packages/MySQLdb/cursors.py", line 142, in execute
      query = query.encode(self.connection.charset)
      UnicodeDecodeError: 'ascii' codec can't decode byte 0xc1 in position 483: ordinal not in range(128)

       
      • Andy Dustman

        Andy Dustman - 2006-03-22

        That really doesn't show me what you are doing. Can you write a little standalone Python example that produces the error?

        Plus you are using sqlobject which obscures what's going on. What version of sqlobject?

         
    • Flávio Codeço Coelho

      I use sqlobject o.6.1

      here is a little script that recreates the error:

      by the way, it is not a problem of Mysql. I did this same query directly into mysql and it worked perfectly.

      import MySQLdb

      def sitesToDb(table,db='epigrass',usr='root',passw='mysql', host='localhost',port=3306):
      """
      Creates a site table on a mysql db from a sites file (fname).
      """
      try:
      #now = time.asctime().replace(' ','').replace(':','')
      #table = #table+'
      '+now
      con = MySQLdb.connect(host=host, port=port, user=usr,passwd=passw, db=db)
      Cursor = con.cursor()
      sql = """CREATE TABLE %s(
      locality VARCHAR(64)
      );
      """ % table
      Cursor.execute(sql)

          sql2 = 'INSERT INTO %s  set locality="são paulo"' % table
      
      Cursor.execute(sql2)
      finally:
          con.close()
      

      if name=='main':
      sitesToDb('teste')

      if you run the above code you should get the following error:

      Traceback (most recent call last):
      File "testemysqldb.py", line 27, in ?
      sitesToDb('teste')
      File "testemysqldb.py", line 22, in sitesToDb
      Cursor.execute(sql2)
      File "/usr/lib/python2.4/site-packages/MySQLdb/cursors.py", line 142, in execute
      query = query.encode(self.connection.charset)
      UnicodeDecodeError: 'ascii' codec can't decode byte 0xe3 in position 34: ordinal not in range(128)

       
      • Andy Dustman

        Andy Dustman - 2006-03-23

        What is your default character set? Before running execute(), do a print con.charset

        I was able to replicate it, but here's the right way to do it as a workaround:

        >>> print db.charset
        latin1
        >>> sql2 = 'INSERT INTO %s set locality="são paulo"' % table
        >>> c.execute(sql2)
        Traceback (most recent call last):
        File "<stdin>", line 1, in ?
        File "/usr/lib64/python2.4/site-packages/MySQLdb/cursors.py", line 142, in execute
        query = query.encode(self.connection.charset)
        UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 36: ordinal not in range(128)
        >>> sql2 = 'INSERT INTO %s set locality=%%s' % table
        >>> c.execute(sql2, ("são paulo",))
        1L
        >>>

         
    • Andy Dustman

      Andy Dustman - 2006-03-28

      OK, here is the answer. BTW, I am using ipython for this example.

      In [9]: query="insert into loc set locality='são paulo'"

      In [10]: type(query)
      Out[10]: <type 'str'>

      In [11]: query=u"insert into loc set locality='são paulo'"

      In [12]: query="insert into loc set locality='são paulo'"

      In [13]: c.execute(query)

      exceptions.UnicodeDecodeError Traceback (most recent call last)

      /home/andy/devel/MySQLdb-1.2/<ipython console>

      /home/andy/devel/MySQLdb-1.2/MySQLdb/cursors.py in execute(self, query, args)
      141 del self.messages[:]
      142 charset = self.connection.character_set_name()
      --> 143 query = query.encode(charset)
      144 if args is not None:
      145 query = query % self.connection.literal(args)

      UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 31: ordinal not in range(128)

      In [14]: query=u"insert into loc set locality='são paulo'"

      In [15]: c.execute(query)
      Out[15]: 1L

      When you created your query string (sql2), it has extended Latin-1 characters in it (ã), but it is still a string and not a unicode string. Since it's non-ascii, .encode() can't decode it properly, which is really what "can't decode byte 0xc3" means: It's a non-ascii character.

      However, by prefixing your string with u, i.e. u'insert...', it is created a unicode string so it can be correctly encoded.

      Of course the other way is the way I mentioned: You should be passing 'são paulo' as a parameter to .execute() and let it handle the encoding. In this case your query string (prior to substitution) is valid ascii data.

      Note that MySQL only accepts string data, not unicode data, but of course that string data may contain a unicode encoding, or latin-1 or something else.

       
      • bartx

        bartx - 2006-09-04

        These code works fine for "são paulo", but doesn't work for chinese word like "中文"

        >>> wordA = "são paulo"
        >>> wordA.encode("latin-1")
        's?o paulo'
        >>> wordB = "中文"
        >>> wordB.encode("latin-1")
        Traceback (most recent call last):
        File "<stdin>", line 1, in ?
        UnicodeDecodeError: 'ascii' codec can't decode byte 0xd6 in position 0: ordinal
        not in range(128)

        The source code in cursors.py line 145 & 146 are:
        charset = db.character_set_name()
        query = query.encode(charset)
        and the db.character_set_name() always return "latin-1", so I really dont know how to handle chinese word with MySQLdb

        All I can do is comment these two lines, and encode the query string myself, is there any way else? thanks alot:)

         
        • Andy Dustman

          Andy Dustman - 2006-09-04

          1.2.2b1 ought to fix this problem, give it a try. However, make sure your connection has the correct character set configured.

           
        • bartx

          bartx - 2006-09-04

          btw: changint to utf8 won't work

          >>> wordA = u"são paulo"
          >>> wordA.encode("latin-1")
          's?o paulo'
          >>> wordB = u"中文"
          >>> wordB.encode("latin-1")
          Traceback (most recent call last):
          File "<stdin>", line 1, in ?
          UnicodeEncodeError: 'latin-1' codec can't encode characters in position 0-1: ord
          inal not in range(256)

           

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.