I'm working on enabling Unicode in my application. I've gotten it so that users can enter UTF-8 data into my application in most areas, but I can't seem to get new database names to work.
First, if I want to create the database "Unicde", I can create that database but the SHOW DATABASES command returns it as "Unico?de" and the comparison test to the real database name fails. (The Database name shows up properly encoded using the MySQL tools; I'm pretty sure the encoding problems are related to MySQL for Python.) I am unable to USE this database, receiving an "Access denied" error for my user to that database, even though I've specified proper access to it through MySQL Administrator and can access the database outside of Python programs. I've tried all the encoding/decoding specification tricks I can think of with no luck.
Second, though, if I want to create a database named "亰亱亲亳亴" (5 sequential Chinese characters, which can be displayed using u'\u4eb0\u4eb1\u4eb2\u4eb3\u4eb4' in Python), I get the following traceback:
Traceback (most recent call last):
File "D:\My Documents\Python\Unicode\UnicodeDBAccess.py", line 73, in ?
dbCursor.execute(u"CREATE DATABASE IF NOT EXISTS %s CHARACTER SET utf8" % database)
File "D:\PROGRA~1\PYTHON~2\Lib\site-packages\MySQLdb\cursors.py", line 137, in execute
self.errorhandler(self, exc, value)
File "D:\PROGRA~1\PYTHON~2\Lib\site-packages\MySQLdb\connections.py", line 33, in defaulterrorhandler
raise errorclass, errorvalue
UnicodeEncodeError: 'latin-1' codec can't encode characters in position 30-34: ordinal not in range(256)
My app uses UTF-8 encoding throughout, but MySQL for Python seems to be using Latin-1 encoding, which chokes on the Chinese characters. I can't create the database at all. I haven't been able to figure out how to change this.
My app works perfectly with Chinese data in the tables as long as I name my database with English-only characters. I can store German in my database tables, as long as it's named in English. However, I need to support user-named databases.
I'm using MySQL 4.1.14. I'm using MySQL for Python 1.2.1c3 on Windows and MySQL for Python 1.2.0 on Mac OS X, and the problems are identical on both platforms.
Any help you can offer would be greatly appreciated.
David
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
You probably need to change your connection encoding to utf-8. You can do this in the my.cnf file. MySQLdb only ever uses the default encoding for your connection.
Make sure you are setting use_unicode=True when you open the connection.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
That solution didn't work for me. First off, I don't have access to the user's server and want to avoid anything that would complicate the setup. (Many of my users are not technically savvy.) Second, I couldn't figure out how to alter the setting for the connection character set from within the config file, nor could I find that information anywhere on the MySQL web site. Third, more of my users use embedded MySQL than server-based MySQL, and there is no my.cnf config file in that situation that I know of.
Fortunately, given your explanation, I was able to solve the problem by adding the following lines early in the process:
dbCursor.execute(u'SET CHARACTER SET utf8')
dbCursor.execute(u'SET character_set_connection = utf8')
work to access (create, etc.) databases with UTF-8 encoded names.
Now if I can just figure out how to get MySQL for Python 1.2.1c3 working with embedded MySQL 4.1.14 on Windows, I'll be all set. But nothing I've tried has worked on that front and I'm out of ideas.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
You don't need access to the user's server. Create a local my.cnf file and tell MySQLdb.connect() to use it with the read_default_file parameter.
In that file, you will want something like this:
[mysql]
default-character-set=utf8
To use embedded MySQL on Windows, you'll have to recompile MySQLdb; you cannot do it with the current packages. The main thing you'll need to do is set an environment variable:
set mysqlclient=mysqld
(not really sure what the right syntax is for Windows/DOS)
This tells setup.py to use libmysqld (embedded library). Alternately, you can hack setup.py a little bit to make that the default (linstead of mysqlclient_r).
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
>You don't need access to the user's server.
>Create a local my.cnf file and tell
>MySQLdb.connect() to use it with the
>read_default_file parameter.
>
>In that file, you will want something like this:
>
>[mysql]
>default-character-set=utf8
This does NOT affect the character_set_connection setting under MySQL 4.1.14. There are 6 distinct character set variables in that version of MySQL, and that change to the configuration only changes one or two, at least on Mac OS X and Linux. On Windows, there's an option during setup for setting all of your character set values to UTF-8, or any other encoding you want.
I like the capacity to provide my own config file, and that may help with other issues, but as I said, I've resolved the character_set_connection issue successfully.
>To use embedded MySQL on Windows, you'll
>have to recompile MySQLdb; you cannot do it
>with the current packages. The main thing you'll
>need to do is set an environment variable:
>
>set mysqlclient=mysqld
>
>(not really sure what the right syntax is for
>Windows/DOS)
>
>This tells setup.py to use libmysqld (embedded
>library). Alternately, you can hack setup.py a
>little bit to make that the default (linstead of
>mysqlclient_r).
Yes, I've successfully gotten embedded MySQL 4.0.25 to compile with both MySQL for Python 1.2.0 and MySQL for Python 1.2.1c3 that way. I've also gotten (non-embedded) MySQL 4.1.14 working with both of those versions of MySQL for Python. However, when I try to build with embedded MySQL 4.1.14, I get an error message during linking that there is an "unresolved external symbol _mysql_set_server_options@8".
I suppose this is probably a bug in the embedded MySQL files, but I haven't got a clue how to proceed.
You can see a description of exactly what I had to do to compile on Windows through links on http://transana.wceruw.org/developers/setup/MySQLPython_Win.htm. (That page notes a missing line in setup.py that you might want to fix as a courtesy to your Windows users.) There are a couple of windows 1.2.1c3 distros there you're welcome to add to your download section if you'd like as well.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi,
I'm working with MySql 4.0.22 under linux RH and I'm using MySql-Python 1.2.0, and I've to set the character set = utf8, but when I insert inside my code:
dbCursor.execute(u'SET CHARACTER SET utf8')
dbCursor.execute(u'SET character_set_connection = utf8')
I obtain an error:
Traceback (most recent call last):
File "parser.py", line 77, in ?
c.execute(u'SET CHARACTER SET Unicode')
File "/usr/lib/python2.2/site-packages/MySQLdb/cursors.py", line 137, in execute
self.errorhandler(self, exc, value)
File "/usr/lib/python2.2/site-packages/MySQLdb/connections.py", line 33, in defaulterrorhandler
raise errorclass, errorvalue
_mysql_exceptions.OperationalError: (1115, "Unknown character set: 'Unicode'")
I've checked on MySql manual, and I guess that the utf8 character set is not in this MySql version....
Can you help me?
thank you
Marco
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Your first problem is, MySQL-4.0.22 doesn't have full character set support. You need to use a 4.1 or newer version. I am pretty sure in 4.0 you can only change the character set when you restart the server (it's a server option).
To do this correctly, create a configuration file like this:
[mysql]
default-character-set = utf8
Then when you create you connection, set the read_default_file parameter to the path to the file. This will set the connection character set and correctly set the converters in MySQLdb to use utf8.
Your error seems a little suspicious, since the code says utf8 but the error says Unicode.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
UTF-8 character support was one of the changes between MySQL 4.0 and MySQL 4.1. You definitely need to upgrade to MySQL 4.1 to use UTF-8.
Andy is slightly off with his config specification, I think. The correct config information is actually
[mysqld]
default-character-set = utf8
There's a typo on the page in the MySQL manual. It's the [mysqld] section, not the [mysql] section. At least, that's what worked for me on several platforms. However, that change did not make a critical difference in being able to use non-english usernames, passwords, or database names while making the connection, and while it affected a couple of the character settings, it did not affect the character_set_connection value that is critical to what I'm trying to do.
I could never get the read_default_file parameter to have an effect on my MySQL connection. It just ignored the config file I specified no matter what I tried and used the server's default configuration. If you figure out how to make it work, please post something here about how you did it.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Both are right. [mysqld] has settings for the server; [mysql] has settings for the client. If you want to set that on the server side, then you don't have to do anything in the client. But for using read_default_file, you need to put it in [mysql], unless you are using the embedded server.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I'm working on enabling Unicode in my application. I've gotten it so that users can enter UTF-8 data into my application in most areas, but I can't seem to get new database names to work.
First, if I want to create the database "Unicde", I can create that database but the SHOW DATABASES command returns it as "Unico?de" and the comparison test to the real database name fails. (The Database name shows up properly encoded using the MySQL tools; I'm pretty sure the encoding problems are related to MySQL for Python.) I am unable to USE this database, receiving an "Access denied" error for my user to that database, even though I've specified proper access to it through MySQL Administrator and can access the database outside of Python programs. I've tried all the encoding/decoding specification tricks I can think of with no luck.
Second, though, if I want to create a database named "亰亱亲亳亴" (5 sequential Chinese characters, which can be displayed using u'\u4eb0\u4eb1\u4eb2\u4eb3\u4eb4' in Python), I get the following traceback:
Traceback (most recent call last):
File "D:\My Documents\Python\Unicode\UnicodeDBAccess.py", line 73, in ?
dbCursor.execute(u"CREATE DATABASE IF NOT EXISTS %s CHARACTER SET utf8" % database)
File "D:\PROGRA~1\PYTHON~2\Lib\site-packages\MySQLdb\cursors.py", line 137, in execute
self.errorhandler(self, exc, value)
File "D:\PROGRA~1\PYTHON~2\Lib\site-packages\MySQLdb\connections.py", line 33, in defaulterrorhandler
raise errorclass, errorvalue
UnicodeEncodeError: 'latin-1' codec can't encode characters in position 30-34: ordinal not in range(256)
My app uses UTF-8 encoding throughout, but MySQL for Python seems to be using Latin-1 encoding, which chokes on the Chinese characters. I can't create the database at all. I haven't been able to figure out how to change this.
My app works perfectly with Chinese data in the tables as long as I name my database with English-only characters. I can store German in my database tables, as long as it's named in English. However, I need to support user-named databases.
I'm using MySQL 4.1.14. I'm using MySQL for Python 1.2.1c3 on Windows and MySQL for Python 1.2.0 on Mac OS X, and the problems are identical on both platforms.
Any help you can offer would be greatly appreciated.
David
You probably need to change your connection encoding to utf-8. You can do this in the my.cnf file. MySQLdb only ever uses the default encoding for your connection.
Make sure you are setting use_unicode=True when you open the connection.
That solution didn't work for me. First off, I don't have access to the user's server and want to avoid anything that would complicate the setup. (Many of my users are not technically savvy.) Second, I couldn't figure out how to alter the setting for the connection character set from within the config file, nor could I find that information anywhere on the MySQL web site. Third, more of my users use embedded MySQL than server-based MySQL, and there is no my.cnf config file in that situation that I know of.
Fortunately, given your explanation, I was able to solve the problem by adding the following lines early in the process:
dbCursor.execute(u'SET CHARACTER SET utf8')
dbCursor.execute(u'SET character_set_connection = utf8')
Then, commands such as:
dbCursor.execute("USE %s" % database.encode('utf8'))
work to access (create, etc.) databases with UTF-8 encoded names.
Now if I can just figure out how to get MySQL for Python 1.2.1c3 working with embedded MySQL 4.1.14 on Windows, I'll be all set. But nothing I've tried has worked on that front and I'm out of ideas.
You don't need access to the user's server. Create a local my.cnf file and tell MySQLdb.connect() to use it with the read_default_file parameter.
In that file, you will want something like this:
[mysql]
default-character-set=utf8
To use embedded MySQL on Windows, you'll have to recompile MySQLdb; you cannot do it with the current packages. The main thing you'll need to do is set an environment variable:
set mysqlclient=mysqld
(not really sure what the right syntax is for Windows/DOS)
This tells setup.py to use libmysqld (embedded library). Alternately, you can hack setup.py a little bit to make that the default (linstead of mysqlclient_r).
>You don't need access to the user's server.
>Create a local my.cnf file and tell
>MySQLdb.connect() to use it with the
>read_default_file parameter.
>
>In that file, you will want something like this:
>
>[mysql]
>default-character-set=utf8
This does NOT affect the character_set_connection setting under MySQL 4.1.14. There are 6 distinct character set variables in that version of MySQL, and that change to the configuration only changes one or two, at least on Mac OS X and Linux. On Windows, there's an option during setup for setting all of your character set values to UTF-8, or any other encoding you want.
I like the capacity to provide my own config file, and that may help with other issues, but as I said, I've resolved the character_set_connection issue successfully.
>To use embedded MySQL on Windows, you'll
>have to recompile MySQLdb; you cannot do it
>with the current packages. The main thing you'll
>need to do is set an environment variable:
>
>set mysqlclient=mysqld
>
>(not really sure what the right syntax is for
>Windows/DOS)
>
>This tells setup.py to use libmysqld (embedded
>library). Alternately, you can hack setup.py a
>little bit to make that the default (linstead of
>mysqlclient_r).
Yes, I've successfully gotten embedded MySQL 4.0.25 to compile with both MySQL for Python 1.2.0 and MySQL for Python 1.2.1c3 that way. I've also gotten (non-embedded) MySQL 4.1.14 working with both of those versions of MySQL for Python. However, when I try to build with embedded MySQL 4.1.14, I get an error message during linking that there is an "unresolved external symbol _mysql_set_server_options@8".
I suppose this is probably a bug in the embedded MySQL files, but I haven't got a clue how to proceed.
You can see a description of exactly what I had to do to compile on Windows through links on http://transana.wceruw.org/developers/setup/MySQLPython_Win.htm. (That page notes a missing line in setup.py that you might want to fix as a courtesy to your Windows users.) There are a couple of windows 1.2.1c3 distros there you're welcome to add to your download section if you'd like as well.
If you are having to manually encode strings to pass to execute(), then you only have a workaround.
My reading of http://dev.mysql.com/doc/mysql/en/charset-connection.html is that it has the effect of doing this:
SET NAMES 'charset_name'
which really means:
SET character_set_client = x;
SET character_set_results = x;
SET character_set_connection = x;
where x is charset_name.
Not sure what the issue is with that unresolved symbol.
Hi,
I'm working with MySql 4.0.22 under linux RH and I'm using MySql-Python 1.2.0, and I've to set the character set = utf8, but when I insert inside my code:
dbCursor.execute(u'SET CHARACTER SET utf8')
dbCursor.execute(u'SET character_set_connection = utf8')
I obtain an error:
Traceback (most recent call last):
File "parser.py", line 77, in ?
c.execute(u'SET CHARACTER SET Unicode')
File "/usr/lib/python2.2/site-packages/MySQLdb/cursors.py", line 137, in execute
self.errorhandler(self, exc, value)
File "/usr/lib/python2.2/site-packages/MySQLdb/connections.py", line 33, in defaulterrorhandler
raise errorclass, errorvalue
_mysql_exceptions.OperationalError: (1115, "Unknown character set: 'Unicode'")
I've checked on MySql manual, and I guess that the utf8 character set is not in this MySql version....
Can you help me?
thank you
Marco
Your first problem is, MySQL-4.0.22 doesn't have full character set support. You need to use a 4.1 or newer version. I am pretty sure in 4.0 you can only change the character set when you restart the server (it's a server option).
To do this correctly, create a configuration file like this:
[mysql]
default-character-set = utf8
Then when you create you connection, set the read_default_file parameter to the path to the file. This will set the connection character set and correctly set the converters in MySQLdb to use utf8.
Your error seems a little suspicious, since the code says utf8 but the error says Unicode.
UTF-8 character support was one of the changes between MySQL 4.0 and MySQL 4.1. You definitely need to upgrade to MySQL 4.1 to use UTF-8.
Andy is slightly off with his config specification, I think. The correct config information is actually
[mysqld]
default-character-set = utf8
There's a typo on the page in the MySQL manual. It's the [mysqld] section, not the [mysql] section. At least, that's what worked for me on several platforms. However, that change did not make a critical difference in being able to use non-english usernames, passwords, or database names while making the connection, and while it affected a couple of the character settings, it did not affect the character_set_connection value that is critical to what I'm trying to do.
I could never get the read_default_file parameter to have an effect on my MySQL connection. It just ignored the config file I specified no matter what I tried and used the server's default configuration. If you figure out how to make it work, please post something here about how you did it.
Both are right. [mysqld] has settings for the server; [mysql] has settings for the client. If you want to set that on the server side, then you don't have to do anything in the client. But for using read_default_file, you need to put it in [mysql], unless you are using the embedded server.