Menu

Unicode Issues creating databases

Help
2005-09-12
2012-09-19
  • David Woods

    David Woods - 2005-09-12

    I'm working on enabling Unicode in my application. I've gotten it so that users can enter UTF-8 data into my application in most areas, but I can't seem to get new database names to work.

    First, if I want to create the database "Unicde", I can create that database but the SHOW DATABASES command returns it as "Unico?de" and the comparison test to the real database name fails. (The Database name shows up properly encoded using the MySQL tools; I'm pretty sure the encoding problems are related to MySQL for Python.) I am unable to USE this database, receiving an "Access denied" error for my user to that database, even though I've specified proper access to it through MySQL Administrator and can access the database outside of Python programs. I've tried all the encoding/decoding specification tricks I can think of with no luck.

    Second, though, if I want to create a database named "亰亱亲亳亴" (5 sequential Chinese characters, which can be displayed using u'\u4eb0\u4eb1\u4eb2\u4eb3\u4eb4' in Python), I get the following traceback:

    Traceback (most recent call last):
    File "D:\My Documents\Python\Unicode\UnicodeDBAccess.py", line 73, in ?
    dbCursor.execute(u"CREATE DATABASE IF NOT EXISTS %s CHARACTER SET utf8" % database)
    File "D:\PROGRA~1\PYTHON~2\Lib\site-packages\MySQLdb\cursors.py", line 137, in execute
    self.errorhandler(self, exc, value)
    File "D:\PROGRA~1\PYTHON~2\Lib\site-packages\MySQLdb\connections.py", line 33, in defaulterrorhandler
    raise errorclass, errorvalue
    UnicodeEncodeError: 'latin-1' codec can't encode characters in position 30-34: ordinal not in range(256)

    My app uses UTF-8 encoding throughout, but MySQL for Python seems to be using Latin-1 encoding, which chokes on the Chinese characters. I can't create the database at all. I haven't been able to figure out how to change this.

    My app works perfectly with Chinese data in the tables as long as I name my database with English-only characters. I can store German in my database tables, as long as it's named in English. However, I need to support user-named databases.

    I'm using MySQL 4.1.14. I'm using MySQL for Python 1.2.1c3 on Windows and MySQL for Python 1.2.0 on Mac OS X, and the problems are identical on both platforms.

    Any help you can offer would be greatly appreciated.

    David

     
    • Andy Dustman

      Andy Dustman - 2005-09-12

      You probably need to change your connection encoding to utf-8. You can do this in the my.cnf file. MySQLdb only ever uses the default encoding for your connection.

      Make sure you are setting use_unicode=True when you open the connection.

       
      • David Woods

        David Woods - 2005-09-13

        That solution didn't work for me. First off, I don't have access to the user's server and want to avoid anything that would complicate the setup. (Many of my users are not technically savvy.) Second, I couldn't figure out how to alter the setting for the connection character set from within the config file, nor could I find that information anywhere on the MySQL web site. Third, more of my users use embedded MySQL than server-based MySQL, and there is no my.cnf config file in that situation that I know of.

        Fortunately, given your explanation, I was able to solve the problem by adding the following lines early in the process:

        dbCursor.execute(u'SET CHARACTER SET utf8')
        dbCursor.execute(u'SET character_set_connection = utf8')

        Then, commands such as:

        dbCursor.execute("USE %s" % database.encode('utf8'))

        work to access (create, etc.) databases with UTF-8 encoded names.

        Now if I can just figure out how to get MySQL for Python 1.2.1c3 working with embedded MySQL 4.1.14 on Windows, I'll be all set. But nothing I've tried has worked on that front and I'm out of ideas.

         
        • Andy Dustman

          Andy Dustman - 2005-09-14

          You don't need access to the user's server. Create a local my.cnf file and tell MySQLdb.connect() to use it with the read_default_file parameter.

          In that file, you will want something like this:

          [mysql]
          default-character-set=utf8

          To use embedded MySQL on Windows, you'll have to recompile MySQLdb; you cannot do it with the current packages. The main thing you'll need to do is set an environment variable:

          set mysqlclient=mysqld

          (not really sure what the right syntax is for Windows/DOS)

          This tells setup.py to use libmysqld (embedded library). Alternately, you can hack setup.py a little bit to make that the default (linstead of mysqlclient_r).

           
    • David Woods

      David Woods - 2005-09-14

      >You don't need access to the user's server.
      >Create a local my.cnf file and tell
      >MySQLdb.connect() to use it with the
      >read_default_file parameter.
      >
      >In that file, you will want something like this:
      >
      >[mysql]
      >default-character-set=utf8

      This does NOT affect the character_set_connection setting under MySQL 4.1.14. There are 6 distinct character set variables in that version of MySQL, and that change to the configuration only changes one or two, at least on Mac OS X and Linux. On Windows, there's an option during setup for setting all of your character set values to UTF-8, or any other encoding you want.

      I like the capacity to provide my own config file, and that may help with other issues, but as I said, I've resolved the character_set_connection issue successfully.

      >To use embedded MySQL on Windows, you'll
      >have to recompile MySQLdb; you cannot do it
      >with the current packages. The main thing you'll
      >need to do is set an environment variable:
      >
      >set mysqlclient=mysqld
      >
      >(not really sure what the right syntax is for
      >Windows/DOS)
      >
      >This tells setup.py to use libmysqld (embedded
      >library). Alternately, you can hack setup.py a
      >little bit to make that the default (linstead of
      >mysqlclient_r).

      Yes, I've successfully gotten embedded MySQL 4.0.25 to compile with both MySQL for Python 1.2.0 and MySQL for Python 1.2.1c3 that way. I've also gotten (non-embedded) MySQL 4.1.14 working with both of those versions of MySQL for Python. However, when I try to build with embedded MySQL 4.1.14, I get an error message during linking that there is an "unresolved external symbol _mysql_set_server_options@8".

      I suppose this is probably a bug in the embedded MySQL files, but I haven't got a clue how to proceed.

      You can see a description of exactly what I had to do to compile on Windows through links on http://transana.wceruw.org/developers/setup/MySQLPython_Win.htm. (That page notes a missing line in setup.py that you might want to fix as a courtesy to your Windows users.) There are a couple of windows 1.2.1c3 distros there you're welcome to add to your download section if you'd like as well.

       
      • Andy Dustman

        Andy Dustman - 2005-09-14

        If you are having to manually encode strings to pass to execute(), then you only have a workaround.

        My reading of http://dev.mysql.com/doc/mysql/en/charset-connection.html is that it has the effect of doing this:

        SET NAMES 'charset_name'

        which really means:

        SET character_set_client = x;
        SET character_set_results = x;
        SET character_set_connection = x;

        where x is charset_name.

        Not sure what the issue is with that unresolved symbol.

         
    • Lisca75

      Lisca75 - 2005-09-16

      Hi,
      I'm working with MySql 4.0.22 under linux RH and I'm using MySql-Python 1.2.0, and I've to set the character set = utf8, but when I insert inside my code:
      dbCursor.execute(u'SET CHARACTER SET utf8')
      dbCursor.execute(u'SET character_set_connection = utf8')

      I obtain an error:

      Traceback (most recent call last):
      File "parser.py", line 77, in ?
      c.execute(u'SET CHARACTER SET Unicode')
      File "/usr/lib/python2.2/site-packages/MySQLdb/cursors.py", line 137, in execute
      self.errorhandler(self, exc, value)
      File "/usr/lib/python2.2/site-packages/MySQLdb/connections.py", line 33, in defaulterrorhandler
      raise errorclass, errorvalue
      _mysql_exceptions.OperationalError: (1115, "Unknown character set: 'Unicode'")

      I've checked on MySql manual, and I guess that the utf8 character set is not in this MySql version....

      Can you help me?

      thank you
      Marco

       
      • Andy Dustman

        Andy Dustman - 2005-09-17

        Your first problem is, MySQL-4.0.22 doesn't have full character set support. You need to use a 4.1 or newer version. I am pretty sure in 4.0 you can only change the character set when you restart the server (it's a server option).

        To do this correctly, create a configuration file like this:

        [mysql]
        default-character-set = utf8

        Then when you create you connection, set the read_default_file parameter to the path to the file. This will set the connection character set and correctly set the converters in MySQLdb to use utf8.

        Your error seems a little suspicious, since the code says utf8 but the error says Unicode.

         
        • David Woods

          David Woods - 2005-09-17

          UTF-8 character support was one of the changes between MySQL 4.0 and MySQL 4.1. You definitely need to upgrade to MySQL 4.1 to use UTF-8.

          Andy is slightly off with his config specification, I think. The correct config information is actually

          [mysqld]
          default-character-set = utf8

          There's a typo on the page in the MySQL manual. It's the [mysqld] section, not the [mysql] section. At least, that's what worked for me on several platforms. However, that change did not make a critical difference in being able to use non-english usernames, passwords, or database names while making the connection, and while it affected a couple of the character settings, it did not affect the character_set_connection value that is critical to what I'm trying to do.

          I could never get the read_default_file parameter to have an effect on my MySQL connection. It just ignored the config file I specified no matter what I tried and used the server's default configuration. If you figure out how to make it work, please post something here about how you did it.

           
          • Andy Dustman

            Andy Dustman - 2005-09-17

            Both are right. [mysqld] has settings for the server; [mysql] has settings for the client. If you want to set that on the server side, then you don't have to do anything in the client. But for using read_default_file, you need to put it in [mysql], unless you are using the embedded server.

             

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.