Hi since I moved to MySQL 4.1/MySQdb 1.2.1c7, I am having problems with queries encoded in iso-8859-1.
Before I could execute my queries transparently independent of their encoding. Now the query only works if I first decode the strings that contain non-ascii characters:
'são paulo'.decode('iso-8859-1')
Why is this? Is this the only solution or is there a way to make MySQLdb recognize the encoding and handle it accordingly?
thanks
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Traceback (most recent call last):
File "./epigrass.py", line 260, in onRun
S.start()
File "/home/fccoelho/Documents/Projects_software/epigrass/Epigrass-devel/Epigrass/manager.py", line 346, in start
self.outToODb(modelname)
File "/home/fccoelho/Documents/Projects_software/epigrass/Epigrass-devel/Epigrass/manager.py", line 430, in outToODb
DO.Site(dicin)
File "/usr/lib/python2.4/site-packages/sqlobject/main.py", line 890, in init
self._create(id, kw)
File "/usr/lib/python2.4/site-packages/sqlobject/main.py", line 923, in _create
self._SO_finishCreate(id)
File "/usr/lib/python2.4/site-packages/sqlobject/main.py", line 947, in _SO_finishCreate
id, names, values)
File "/usr/lib/python2.4/site-packages/sqlobject/dbconnection.py", line 241, in queryInsertID
return self._runWithConnection(self._queryInsertID, soInstance, id, names, values)
File "/usr/lib/python2.4/site-packages/sqlobject/dbconnection.py", line 125, in _runWithConnection
val = meth(conn, *args)
File "/usr/lib/python2.4/site-packages/sqlobject/mysql/mysqlconnection.py", line 53, in _queryInsertID
self._executeRetry(conn, c, q)
File "/usr/lib/python2.4/site-packages/sqlobject/mysql/mysqlconnection.py", line 35, in _executeRetry
return cursor.execute(query)
File "/usr/lib/python2.4/site-packages/MySQLdb/cursors.py", line 142, in execute
query = query.encode(self.connection.charset)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc1 in position 483: ordinal not in range(128)
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
by the way, it is not a problem of Mysql. I did this same query directly into mysql and it worked perfectly.
import MySQLdb
def sitesToDb(table,db='epigrass',usr='root',passw='mysql', host='localhost',port=3306):
"""
Creates a site table on a mysql db from a sites file (fname).
"""
try:
#now = time.asctime().replace(' ','').replace(':','')
#table = #table+''+now
con = MySQLdb.connect(host=host, port=port, user=usr,passwd=passw, db=db)
Cursor = con.cursor()
sql = """CREATE TABLE %s( locality VARCHAR(64)
);
""" % table
Cursor.execute(sql)
sql2 = 'INSERT INTO %s set locality="são paulo"' % table
Cursor.execute(sql2)
finally:
con.close()
if name=='main':
sitesToDb('teste')
if you run the above code you should get the following error:
Traceback (most recent call last):
File "testemysqldb.py", line 27, in ?
sitesToDb('teste')
File "testemysqldb.py", line 22, in sitesToDb
Cursor.execute(sql2)
File "/usr/lib/python2.4/site-packages/MySQLdb/cursors.py", line 142, in execute
query = query.encode(self.connection.charset)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe3 in position 34: ordinal not in range(128)
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
/home/andy/devel/MySQLdb-1.2/MySQLdb/cursors.py in execute(self, query, args)
141 del self.messages[:]
142 charset = self.connection.character_set_name()
--> 143 query = query.encode(charset)
144 if args is not None:
145 query = query % self.connection.literal(args)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 31: ordinal not in range(128)
In [14]: query=u"insert into loc set locality='são paulo'"
In [15]: c.execute(query)
Out[15]: 1L
When you created your query string (sql2), it has extended Latin-1 characters in it (ã), but it is still a string and not a unicode string. Since it's non-ascii, .encode() can't decode it properly, which is really what "can't decode byte 0xc3" means: It's a non-ascii character.
However, by prefixing your string with u, i.e. u'insert...', it is created a unicode string so it can be correctly encoded.
Of course the other way is the way I mentioned: You should be passing 'são paulo' as a parameter to .execute() and let it handle the encoding. In this case your query string (prior to substitution) is valid ascii data.
Note that MySQL only accepts string data, not unicode data, but of course that string data may contain a unicode encoding, or latin-1 or something else.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
These code works fine for "são paulo", but doesn't work for chinese word like "中文"
>>> wordA = "são paulo"
>>> wordA.encode("latin-1")
's?o paulo'
>>> wordB = "中文"
>>> wordB.encode("latin-1")
Traceback (most recent call last):
File "<stdin>", line 1, in ?
UnicodeDecodeError: 'ascii' codec can't decode byte 0xd6 in position 0: ordinal
not in range(128)
The source code in cursors.py line 145 & 146 are:
charset = db.character_set_name()
query = query.encode(charset)
and the db.character_set_name() always return "latin-1", so I really dont know how to handle chinese word with MySQLdb
All I can do is comment these two lines, and encode the query string myself, is there any way else? thanks alot:)
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi since I moved to MySQL 4.1/MySQdb 1.2.1c7, I am having problems with queries encoded in iso-8859-1.
Before I could execute my queries transparently independent of their encoding. Now the query only works if I first decode the strings that contain non-ascii characters:
'são paulo'.decode('iso-8859-1')
Why is this? Is this the only solution or is there a way to make MySQLdb recognize the encoding and handle it accordingly?
thanks
An error message/traceback and code example would be helpful.
OK, here is the traceback:
Traceback (most recent call last):
File "./epigrass.py", line 260, in onRun
S.start()
File "/home/fccoelho/Documents/Projects_software/epigrass/Epigrass-devel/Epigrass/manager.py", line 346, in start
self.outToODb(modelname)
File "/home/fccoelho/Documents/Projects_software/epigrass/Epigrass-devel/Epigrass/manager.py", line 430, in outToODb
DO.Site(dicin)
File "/usr/lib/python2.4/site-packages/sqlobject/main.py", line 890, in init
self._create(id, kw)
File "/usr/lib/python2.4/site-packages/sqlobject/main.py", line 923, in _create
self._SO_finishCreate(id)
File "/usr/lib/python2.4/site-packages/sqlobject/main.py", line 947, in _SO_finishCreate
id, names, values)
File "/usr/lib/python2.4/site-packages/sqlobject/dbconnection.py", line 241, in queryInsertID
return self._runWithConnection(self._queryInsertID, soInstance, id, names, values)
File "/usr/lib/python2.4/site-packages/sqlobject/dbconnection.py", line 125, in _runWithConnection
val = meth(conn, *args)
File "/usr/lib/python2.4/site-packages/sqlobject/mysql/mysqlconnection.py", line 53, in _queryInsertID
self._executeRetry(conn, c, q)
File "/usr/lib/python2.4/site-packages/sqlobject/mysql/mysqlconnection.py", line 35, in _executeRetry
return cursor.execute(query)
File "/usr/lib/python2.4/site-packages/MySQLdb/cursors.py", line 142, in execute
query = query.encode(self.connection.charset)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc1 in position 483: ordinal not in range(128)
That really doesn't show me what you are doing. Can you write a little standalone Python example that produces the error?
Plus you are using sqlobject which obscures what's going on. What version of sqlobject?
I use sqlobject o.6.1
here is a little script that recreates the error:
by the way, it is not a problem of Mysql. I did this same query directly into mysql and it worked perfectly.
import MySQLdb
def sitesToDb(table,db='epigrass',usr='root',passw='mysql', host='localhost',port=3306):
"""
Creates a site table on a mysql db from a sites file (fname).
"""
try:
#now = time.asctime().replace(' ','').replace(':','')
#table = #table+''+now
con = MySQLdb.connect(host=host, port=port, user=usr,passwd=passw, db=db)
Cursor = con.cursor()
sql = """CREATE TABLE %s(
locality
VARCHAR(64));
""" % table
Cursor.execute(sql)
if name=='main':
sitesToDb('teste')
if you run the above code you should get the following error:
Traceback (most recent call last):
File "testemysqldb.py", line 27, in ?
sitesToDb('teste')
File "testemysqldb.py", line 22, in sitesToDb
Cursor.execute(sql2)
File "/usr/lib/python2.4/site-packages/MySQLdb/cursors.py", line 142, in execute
query = query.encode(self.connection.charset)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe3 in position 34: ordinal not in range(128)
What is your default character set? Before running execute(), do a print con.charset
I was able to replicate it, but here's the right way to do it as a workaround:
>>> print db.charset
latin1
>>> sql2 = 'INSERT INTO %s set locality="são paulo"' % table
>>> c.execute(sql2)
Traceback (most recent call last):
File "<stdin>", line 1, in ?
File "/usr/lib64/python2.4/site-packages/MySQLdb/cursors.py", line 142, in execute
query = query.encode(self.connection.charset)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 36: ordinal not in range(128)
>>> sql2 = 'INSERT INTO %s set locality=%%s' % table
>>> c.execute(sql2, ("são paulo",))
1L
>>>
OK, here is the answer. BTW, I am using ipython for this example.
In [9]: query="insert into loc set locality='são paulo'"
In [10]: type(query)
Out[10]: <type 'str'>
In [11]: query=u"insert into loc set locality='são paulo'"
In [12]: query="insert into loc set locality='são paulo'"
In [13]: c.execute(query)
exceptions.UnicodeDecodeError Traceback (most recent call last)
/home/andy/devel/MySQLdb-1.2/<ipython console>
/home/andy/devel/MySQLdb-1.2/MySQLdb/cursors.py in execute(self, query, args)
141 del self.messages[:]
142 charset = self.connection.character_set_name()
--> 143 query = query.encode(charset)
144 if args is not None:
145 query = query % self.connection.literal(args)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 31: ordinal not in range(128)
In [14]: query=u"insert into loc set locality='são paulo'"
In [15]: c.execute(query)
Out[15]: 1L
When you created your query string (sql2), it has extended Latin-1 characters in it (ã), but it is still a string and not a unicode string. Since it's non-ascii, .encode() can't decode it properly, which is really what "can't decode byte 0xc3" means: It's a non-ascii character.
However, by prefixing your string with u, i.e. u'insert...', it is created a unicode string so it can be correctly encoded.
Of course the other way is the way I mentioned: You should be passing 'são paulo' as a parameter to .execute() and let it handle the encoding. In this case your query string (prior to substitution) is valid ascii data.
Note that MySQL only accepts string data, not unicode data, but of course that string data may contain a unicode encoding, or latin-1 or something else.
These code works fine for "são paulo", but doesn't work for chinese word like "中文"
>>> wordA = "são paulo"
>>> wordA.encode("latin-1")
's?o paulo'
>>> wordB = "中文"
>>> wordB.encode("latin-1")
Traceback (most recent call last):
File "<stdin>", line 1, in ?
UnicodeDecodeError: 'ascii' codec can't decode byte 0xd6 in position 0: ordinal
not in range(128)
The source code in cursors.py line 145 & 146 are:
charset = db.character_set_name()
query = query.encode(charset)
and the db.character_set_name() always return "latin-1", so I really dont know how to handle chinese word with MySQLdb
All I can do is comment these two lines, and encode the query string myself, is there any way else? thanks alot:)
1.2.2b1 ought to fix this problem, give it a try. However, make sure your connection has the correct character set configured.
btw: changint to utf8 won't work
>>> wordA = u"são paulo"
>>> wordA.encode("latin-1")
's?o paulo'
>>> wordB = u"中文"
>>> wordB.encode("latin-1")
Traceback (most recent call last):
File "<stdin>", line 1, in ?
UnicodeEncodeError: 'latin-1' codec can't encode characters in position 0-1: ord
inal not in range(256)