Menu

effect of 'charset' param in connect method

Help
nttaylor
2006-10-26
2012-09-19
  • nttaylor

    nttaylor - 2006-10-26

    Dear forum,

    I've been having a terrible time trying to process submissions from a website
    that may contain international characters, use MySQLdb to get those values
    into a database, and successfully read them back out again.

    The going has been rough, and I've got a lot of questions, but right now I'd
    just like to ask a very specific one:

    Why is it that if I specify "utf8" as a parameter to "connect" thus:

    >>> db = MySQLdb.connect( ... use_unicode=True, charset="utf8")

    Calling db.character_set_name() still returns "latin1" ?

    What is it that "character_set_name" tells you exactly? Is it supposed to be
    coming back differently from what I specified in the connect() method?

    By the way, I created all my tables with "DEFAULT CHARACTER SET utf8"

    I'm running MySQL 5.0.13 and am using MySQLdb 1.2.1. This is all on linux machines.

    with thanks,

    nttaylor

     
    • 文祥

      文祥 - 2006-12-14

      I've downloaded the source code for MySQL-python-1.2.1_p2. In "_mysql.c" I found a section of strange code from 1492 to 1506:
      static PyObject
      _mysql_ConnectionObject_character_set_name(
      _mysql_ConnectionObject
      self,
      PyObject args)
      {
      const char
      s;
      if (!PyArg_ParseTuple(args, "")) return NULL;
      check_connection(self);
      #if MYSQL_VERSION_ID >= 32321
      s = mysql_character_set_name(&(self->connection));
      #else
      s = "latin1";
      #endif
      return PyString_FromString(s);
      }
      After compiled into python library, those sentence with "#" wouldn't run. So character_set_name() will always return "latin1".

      Forgive my bad english, hope you can understand my meaning.

       
    • Andy Dustman

      Andy Dustman - 2006-10-26

      I can't reproduce this.

      Python 2.4.4c1 (#2, Oct 11 2006, 21:51:02)
      [GCC 4.1.2 20060928 (prerelease) (Ubuntu 4.1.1-13ubuntu5)] on linux2
      Type "help", "copyright", "credits" or "license" for more information.
      >>> import MySQLdb
      >>> db=MySQLdb.connect(db="test",read_default_file="~/.my.cnf",use_unicode=True,charset="utf8")
      >>> db.character_set_name()
      'utf8'
      >>> db.set_character_set("latin1")
      >>> db.character_set_name()
      'latin1'
      >>> db=MySQLdb.connect(db="test",read_default_file="~/.my.cnf",use_unicode=True,charset="latin1")
      >>> db.character_set_name()
      'latin1'
      >>> db.set_character_set("utf8")
      >>> db.character_set_name()
      'utf8'
      >>>

       
    • nttaylor

      nttaylor - 2006-10-26

      Amazing. I tried your experiment exactly and I got back "latin1"
      everytime:

      >>> import MySQLdb
      >>> db = MySQLdb.connect( ... use_unicode=True, charset="utf8")
      >>> db.character_set_name()
      'latin1'
      >>> db.set_character_set("utf8")
      >>> db.character_set_name()
      'latin1'

      I didn't even know about "set_character_set" but you can see it didn't help.

      The only difference between yours and mine is I don't have a ".my.cnf", but
      that shouldn't make a difference should it? Anyway, one would still expect
      "set_character_set" to work regardless.

      Could it be something about the MySQL database itself? Did you create your
      "test" database with a default character set? like:

      CREATE DATABASE test DEFAULT CHARACTER SET utf-8

      I did not do this, although I did define the "DEFAULT CHARACTER SET" for all
      of my tables.

      Did you maybe pass a charset to the "mysqld_safe" binary when starting the
      server?

      Thank you for your help,

      nttaylor

       
      • Andy Dustman

        Andy Dustman - 2006-10-27

        Make sure you really have UTF-8 support:

        $ mysql
        Welcome to the MySQL monitor. Commands end with ; or \g.
        Your MySQL connection id is 6 to server version: 5.0.24a-Debian_9-log

        Type 'help;' or '\h' for help. Type '\c' to clear the buffer.

        mysql> show character set like 'utf%';
        +---------+---------------+-------------------+--------+
        | Charset | Description | Default collation | Maxlen |
        +---------+---------------+-------------------+--------+
        | utf8 | UTF-8 Unicode | utf8_general_ci | 3 |
        +---------+---------------+-------------------+--------+
        1 row in set (0.00 sec)

        Why do you have such and old version of MySQL-5.0 anyway? I seem to remember some issues with character sets in older 5.0 versions, but I could be wrong.

        I'm actually using the trunk version of MySQLdb (future 1.3-2.0) but that really shouldn't affect anything.

         
    • nttaylor

      nttaylor - 2006-10-27

      I upgraded to MySQL-5.0.24a and MySQLdb1.2.2b2, but nothing changed from what I described above.
      set_character_set("utf8") still doesn't affect the return value of "character_set_name()"

      Is there any way we can get to the bottom of this?

       
      • Andy Dustman

        Andy Dustman - 2006-10-27

        Did you verify that your MySQL server has UTF-8 support?

         
        • nttaylor

          nttaylor - 2006-10-27

          Sorry Andy, yes, I did verify that I do have utf-8 support,
          using the query that you specified. I got back a positive
          answer just like the one you posted.

          nttaylor

           
    • Ulrik Thoerner

      Ulrik Thoerner - 2006-10-27

      Having experienced similar behavior as you nttaylor I have a suggestion:
      Have you tried omitting the charset-attribute in the connection string and instead setting it afterwards?

      I'm using 1.2.2.b1 in a windows environment, so this is somewhat of a long shot, but having tried to get the 'in-connection-string'-attribute to work never worked for me whereas the 'set-the-attribute-right-after'-version works fine.

      In short:
      this: db=MySQLdb.connect(db="test",read_default_file="~/.my.cnf",use_unicode=True,charset="utf8") never worked for me.

      but this: db.set_character_set("utf8") works like a charm.

      Guess Andy is just lucky to have everything working (not using released versions probably helps with the 'luck'? ;)).

      I realise this is not an in-depth explanation - it is merely a suggestion to a fix that may or may not work.

      Keeping fingers crossed that you'll eventually find a solution.

      Cheers,
      Ulrik

       
      • nttaylor

        nttaylor - 2006-10-27

        Hi Ulrik,

        I tried instantiating the 'db' object without using the 'charset'
        parameter like you suggested, but unfortunately got the same old
        problem. The return value from "character_set_name()" is still
        always "latin1". Attempting to set it with "set_character_set()"
        still has no effect.

        nttaylor

         

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.