I've been having a terrible time trying to process submissions from a website
that may contain international characters, use MySQLdb to get those values
into a database, and successfully read them back out again.
The going has been rough, and I've got a lot of questions, but right now I'd
just like to ask a very specific one:
Why is it that if I specify "utf8" as a parameter to "connect" thus:
>>> db = MySQLdb.connect( ... use_unicode=True, charset="utf8")
Calling db.character_set_name() still returns "latin1" ?
What is it that "character_set_name" tells you exactly? Is it supposed to be
coming back differently from what I specified in the connect() method?
By the way, I created all my tables with "DEFAULT CHARACTER SET utf8"
I'm running MySQL 5.0.13 and am using MySQLdb 1.2.1. This is all on linux machines.
with thanks,
nttaylor
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I've downloaded the source code for MySQL-python-1.2.1_p2. In "_mysql.c" I found a section of strange code from 1492 to 1506:
static PyObject
_mysql_ConnectionObject_character_set_name(
_mysql_ConnectionObject self,
PyObject args)
{
const char s;
if (!PyArg_ParseTuple(args, "")) return NULL;
check_connection(self);
#if MYSQL_VERSION_ID >= 32321
s = mysql_character_set_name(&(self->connection));
#else
s = "latin1";
#endif
return PyString_FromString(s);
}
After compiled into python library, those sentence with "#" wouldn't run. So character_set_name() will always return "latin1".
Forgive my bad english, hope you can understand my meaning.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I didn't even know about "set_character_set" but you can see it didn't help.
The only difference between yours and mine is I don't have a ".my.cnf", but
that shouldn't make a difference should it? Anyway, one would still expect
"set_character_set" to work regardless.
Could it be something about the MySQL database itself? Did you create your
"test" database with a default character set? like:
CREATE DATABASE test DEFAULT CHARACTER SET utf-8
I did not do this, although I did define the "DEFAULT CHARACTER SET" for all
of my tables.
Did you maybe pass a charset to the "mysqld_safe" binary when starting the
server?
Thank you for your help,
nttaylor
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
$ mysql
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 6 to server version: 5.0.24a-Debian_9-log
Type 'help;' or '\h' for help. Type '\c' to clear the buffer.
mysql> show character set like 'utf%';
+---------+---------------+-------------------+--------+
| Charset | Description | Default collation | Maxlen |
+---------+---------------+-------------------+--------+
| utf8 | UTF-8 Unicode | utf8_general_ci | 3 |
+---------+---------------+-------------------+--------+
1 row in set (0.00 sec)
Why do you have such and old version of MySQL-5.0 anyway? I seem to remember some issues with character sets in older 5.0 versions, but I could be wrong.
I'm actually using the trunk version of MySQLdb (future 1.3-2.0) but that really shouldn't affect anything.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I upgraded to MySQL-5.0.24a and MySQLdb1.2.2b2, but nothing changed from what I described above.
set_character_set("utf8") still doesn't affect the return value of "character_set_name()"
Is there any way we can get to the bottom of this?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Sorry Andy, yes, I did verify that I do have utf-8 support,
using the query that you specified. I got back a positive
answer just like the one you posted.
nttaylor
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Having experienced similar behavior as you nttaylor I have a suggestion:
Have you tried omitting the charset-attribute in the connection string and instead setting it afterwards?
I'm using 1.2.2.b1 in a windows environment, so this is somewhat of a long shot, but having tried to get the 'in-connection-string'-attribute to work never worked for me whereas the 'set-the-attribute-right-after'-version works fine.
In short:
this: db=MySQLdb.connect(db="test",read_default_file="~/.my.cnf",use_unicode=True,charset="utf8") never worked for me.
but this: db.set_character_set("utf8") works like a charm.
Guess Andy is just lucky to have everything working (not using released versions probably helps with the 'luck'? ;)).
I realise this is not an in-depth explanation - it is merely a suggestion to a fix that may or may not work.
Keeping fingers crossed that you'll eventually find a solution.
Cheers,
Ulrik
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I tried instantiating the 'db' object without using the 'charset'
parameter like you suggested, but unfortunately got the same old
problem. The return value from "character_set_name()" is still
always "latin1". Attempting to set it with "set_character_set()"
still has no effect.
nttaylor
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Dear forum,
I've been having a terrible time trying to process submissions from a website
that may contain international characters, use MySQLdb to get those values
into a database, and successfully read them back out again.
The going has been rough, and I've got a lot of questions, but right now I'd
just like to ask a very specific one:
Why is it that if I specify "utf8" as a parameter to "connect" thus:
>>> db = MySQLdb.connect( ... use_unicode=True, charset="utf8")
Calling db.character_set_name() still returns "latin1" ?
What is it that "character_set_name" tells you exactly? Is it supposed to be
coming back differently from what I specified in the connect() method?
By the way, I created all my tables with "DEFAULT CHARACTER SET utf8"
I'm running MySQL 5.0.13 and am using MySQLdb 1.2.1. This is all on linux machines.
with thanks,
nttaylor
I've downloaded the source code for MySQL-python-1.2.1_p2. In "_mysql.c" I found a section of strange code from 1492 to 1506:
static PyObject
_mysql_ConnectionObject_character_set_name(
_mysql_ConnectionObject self,
PyObject args)
{
const char s;
if (!PyArg_ParseTuple(args, "")) return NULL;
check_connection(self);
#if MYSQL_VERSION_ID >= 32321
s = mysql_character_set_name(&(self->connection));
#else
s = "latin1";
#endif
return PyString_FromString(s);
}
After compiled into python library, those sentence with "#" wouldn't run. So character_set_name() will always return "latin1".
Forgive my bad english, hope you can understand my meaning.
I can't reproduce this.
Python 2.4.4c1 (#2, Oct 11 2006, 21:51:02)
[GCC 4.1.2 20060928 (prerelease) (Ubuntu 4.1.1-13ubuntu5)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import MySQLdb
>>> db=MySQLdb.connect(db="test",read_default_file="~/.my.cnf",use_unicode=True,charset="utf8")
>>> db.character_set_name()
'utf8'
>>> db.set_character_set("latin1")
>>> db.character_set_name()
'latin1'
>>> db=MySQLdb.connect(db="test",read_default_file="~/.my.cnf",use_unicode=True,charset="latin1")
>>> db.character_set_name()
'latin1'
>>> db.set_character_set("utf8")
>>> db.character_set_name()
'utf8'
>>>
Amazing. I tried your experiment exactly and I got back "latin1"
everytime:
>>> import MySQLdb
>>> db = MySQLdb.connect( ... use_unicode=True, charset="utf8")
>>> db.character_set_name()
'latin1'
>>> db.set_character_set("utf8")
>>> db.character_set_name()
'latin1'
I didn't even know about "set_character_set" but you can see it didn't help.
The only difference between yours and mine is I don't have a ".my.cnf", but
that shouldn't make a difference should it? Anyway, one would still expect
"set_character_set" to work regardless.
Could it be something about the MySQL database itself? Did you create your
"test" database with a default character set? like:
CREATE DATABASE test DEFAULT CHARACTER SET utf-8
I did not do this, although I did define the "DEFAULT CHARACTER SET" for all
of my tables.
Did you maybe pass a charset to the "mysqld_safe" binary when starting the
server?
Thank you for your help,
nttaylor
Make sure you really have UTF-8 support:
$ mysql
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 6 to server version: 5.0.24a-Debian_9-log
Type 'help;' or '\h' for help. Type '\c' to clear the buffer.
mysql> show character set like 'utf%';
+---------+---------------+-------------------+--------+
| Charset | Description | Default collation | Maxlen |
+---------+---------------+-------------------+--------+
| utf8 | UTF-8 Unicode | utf8_general_ci | 3 |
+---------+---------------+-------------------+--------+
1 row in set (0.00 sec)
Why do you have such and old version of MySQL-5.0 anyway? I seem to remember some issues with character sets in older 5.0 versions, but I could be wrong.
I'm actually using the trunk version of MySQLdb (future 1.3-2.0) but that really shouldn't affect anything.
I upgraded to MySQL-5.0.24a and MySQLdb1.2.2b2, but nothing changed from what I described above.
set_character_set("utf8") still doesn't affect the return value of "character_set_name()"
Is there any way we can get to the bottom of this?
Did you verify that your MySQL server has UTF-8 support?
Sorry Andy, yes, I did verify that I do have utf-8 support,
using the query that you specified. I got back a positive
answer just like the one you posted.
nttaylor
Having experienced similar behavior as you nttaylor I have a suggestion:
Have you tried omitting the charset-attribute in the connection string and instead setting it afterwards?
I'm using 1.2.2.b1 in a windows environment, so this is somewhat of a long shot, but having tried to get the 'in-connection-string'-attribute to work never worked for me whereas the 'set-the-attribute-right-after'-version works fine.
In short:
this: db=MySQLdb.connect(db="test",read_default_file="~/.my.cnf",use_unicode=True,charset="utf8") never worked for me.
but this: db.set_character_set("utf8") works like a charm.
Guess Andy is just lucky to have everything working (not using released versions probably helps with the 'luck'? ;)).
I realise this is not an in-depth explanation - it is merely a suggestion to a fix that may or may not work.
Keeping fingers crossed that you'll eventually find a solution.
Cheers,
Ulrik
Hi Ulrik,
I tried instantiating the 'db' object without using the 'charset'
parameter like you suggested, but unfortunately got the same old
problem. The return value from "character_set_name()" is still
always "latin1". Attempting to set it with "set_character_set()"
still has no effect.
nttaylor