I'm using MySQLdb 0.9.0 w/ Python 2.2 and MySQL 3.23.41 on Mandrake Linux 8.1. I'm getting a segfault at cursors.py, line 67:
r = self._query(query % escape(args, qc))
when there is Latin-1 data in the args. I can reliably segfault the Python interpreter with this code:
db = MySQLdb.Connect(...)
c = db.cursor()
c.execute("select min(id) from venues where venue=%s",
(u'Caff\xe8 Lena',))
Is this a known problem? I don't really understand Unicode manipulation very well (read: not at all), so it's quite possible that I'm doing (or not doing) something that would avoid this problem. Still, the interpreter shouldn't crash.
If you can't reproduce this, I can rebuild Python and MySQLdb with -g so I can get a gdb stack trace.
Skip Montanaro
skip@pobox.com
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
My understanding is that s = PyObject_Str(o) effectively calls str(o) on the object and thus should return a string s. Therefore it should be safe to call PyString_AsString(s).
I can reproduce this with your data and Python2.2b2 and MySQLdb-0.9.1, but I need debugging symbols myself. However:
>>> u=u'Caff\xe8 Lena'
>>> str(u)
Traceback (most recent call last):
File "<stdin>", line 1, in ?
UnicodeError: ASCII encoding error: ordinal not in range(128)
So I will need some checking in _mysql_string_literal to prevent core dumps. This will not completely solve your problem, because str() will not work and thus string_literal() will fail.
>>> u.encode('latin1')
'Caff\xe8 Lena'
It is unfortunate that you cannot set the encoding for str() on a unicode object.
One possibility is to add a unicode converter to MySQLdb's conversion dictionary. Another is to u.encode('latin1') before passing this value to MySQLdb. The following patch will prevent a core dump but not fix your problem.
I'm using MySQLdb 0.9.0 w/ Python 2.2 and MySQL 3.23.41 on Mandrake Linux 8.1. I'm getting a segfault at cursors.py, line 67:
r = self._query(query % escape(args, qc))
when there is Latin-1 data in the args. I can reliably segfault the Python interpreter with this code:
db = MySQLdb.Connect(...)
c = db.cursor()
c.execute("select min(id) from venues where venue=%s",
(u'Caff\xe8 Lena',))
Is this a known problem? I don't really understand Unicode manipulation very well (read: not at all), so it's quite possible that I'm doing (or not doing) something that would avoid this problem. Still, the interpreter shouldn't crash.
If you can't reproduce this, I can rebuild Python and MySQLdb with -g so I can get a gdb stack trace.
Skip Montanaro
skip@pobox.com
Quick followup. Here's a rather suspicious
bit of code in _mysql_string_literal:
s = PyObject_Str(o);
in = PyString_AsString(s);
If I pyo o from gdb I see that it is a Unicode
string:
(gdb) pyo o
object : u'Caff\xe8 Lena'
type : unicode
refcount: 4
address : 0x81e4370
Calling PyString_AsString on s is probably
not a good idea.
Skip
My understanding is that s = PyObject_Str(o) effectively calls str(o) on the object and thus should return a string s. Therefore it should be safe to call PyString_AsString(s).
I can reproduce this with your data and Python2.2b2 and MySQLdb-0.9.1, but I need debugging symbols myself. However:
>>> u=u'Caff\xe8 Lena'
>>> str(u)
Traceback (most recent call last):
File "<stdin>", line 1, in ?
UnicodeError: ASCII encoding error: ordinal not in range(128)
So I will need some checking in _mysql_string_literal to prevent core dumps. This will not completely solve your problem, because str() will not work and thus string_literal() will fail.
>>> u.encode('latin1')
'Caff\xe8 Lena'
It is unfortunate that you cannot set the encoding for str() on a unicode object.
One possibility is to add a unicode converter to MySQLdb's conversion dictionary. Another is to u.encode('latin1') before passing this value to MySQLdb. The following patch will prevent a core dump but not fix your problem.
Index: _mysql.c
RCS file: /cvsroot/mysql-python/MySQLdb/_mysql.c,v
retrieving revision 1.16
diff -u -r1.16 _mysql.c
--- _mysql.c 2001/10/17 03:21:22 1.16
+++ _mysql.c 2001/12/22 19:20:14
@@ -434,6 +434,7 @@
int len, size;
if (!PyArg_ParseTuple(args, "O|O:string_literal", &o, &d)) return NULL;
s = PyObject_Str(o);
+ if (!s) return NULL;
in = PyString_AsString(s);
size = PyString_GET_SIZE(s);
str = PyString_FromStringAndSize((char *) NULL, size*2+3);
I just submitted a patch that seems to work if
the data is just Latin-1.
Try my patch, attached to yours.