MySQL for Python / Discussion / Help: effect of 'charset' param in connect method

nttaylor - 2006-10-26

Dear forum,

I've been having a terrible time trying to process submissions from a website
that may contain international characters, use MySQLdb to get those values
into a database, and successfully read them back out again.

The going has been rough, and I've got a lot of questions, but right now I'd
just like to ask a very specific one:

Why is it that if I specify "utf8" as a parameter to "connect" thus:

>>> db = MySQLdb.connect( ... use_unicode=True, charset="utf8")

Calling db.character_set_name() still returns "latin1" ?

What is it that "character_set_name" tells you exactly? Is it supposed to be
coming back differently from what I specified in the connect() method?

By the way, I created all my tables with "DEFAULT CHARACTER SET utf8"

I'm running MySQL 5.0.13 and am using MySQLdb 1.2.1. This is all on linux machines.

with thanks,

nttaylor

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- 文祥 - 2006-12-14
  
  I've downloaded the source code for MySQL-python-1.2.1_p2. In "_mysql.c" I found a section of strange code from 1492 to 1506:
  static PyObject
  _mysql_ConnectionObject_character_set_name(
  _mysql_ConnectionObject self,
  PyObject args)
  {
  const char s;
  if (!PyArg_ParseTuple(args, "")) return NULL;
  check_connection(self);
  #if MYSQL_VERSION_ID >= 32321
  s = mysql_character_set_name(&(self->connection));
  #else
  s = "latin1";
  #endif
  return PyString_FromString(s);
  }
  After compiled into python library, those sentence with "#" wouldn't run. So character_set_name() will always return "latin1".
  
  Forgive my bad english, hope you can understand my meaning.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Andy Dustman - 2006-10-26
  
  I can't reproduce this.
  
  Python 2.4.4c1 (#2, Oct 11 2006, 21:51:02)
  [GCC 4.1.2 20060928 (prerelease) (Ubuntu 4.1.1-13ubuntu5)] on linux2
  Type "help", "copyright", "credits" or "license" for more information.
  >>> import MySQLdb
  >>> db=MySQLdb.connect(db="test",read_default_file="~/.my.cnf",use_unicode=True,charset="utf8")
  >>> db.character_set_name()
  'utf8'
  >>> db.set_character_set("latin1")
  >>> db.character_set_name()
  'latin1'
  >>> db=MySQLdb.connect(db="test",read_default_file="~/.my.cnf",use_unicode=True,charset="latin1")
  >>> db.character_set_name()
  'latin1'
  >>> db.set_character_set("utf8")
  >>> db.character_set_name()
  'utf8'
  >>>
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- nttaylor - 2006-10-26
  
  Amazing. I tried your experiment exactly and I got back "latin1"
  everytime:
  
  >>> import MySQLdb
  >>> db = MySQLdb.connect( ... use_unicode=True, charset="utf8")
  >>> db.character_set_name()
  'latin1'
  >>> db.set_character_set("utf8")
  >>> db.character_set_name()
  'latin1'
  
  I didn't even know about "set_character_set" but you can see it didn't help.
  
  The only difference between yours and mine is I don't have a ".my.cnf", but
  that shouldn't make a difference should it? Anyway, one would still expect
  "set_character_set" to work regardless.
  
  Could it be something about the MySQL database itself? Did you create your
  "test" database with a default character set? like:
  
  CREATE DATABASE test DEFAULT CHARACTER SET utf-8
  
  I did not do this, although I did define the "DEFAULT CHARACTER SET" for all
  of my tables.
  
  Did you maybe pass a charset to the "mysqld_safe" binary when starting the
  server?
  
  Thank you for your help,
  
  nttaylor
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Andy Dustman - 2006-10-27
    
    Make sure you really have UTF-8 support:
    
    $ mysql
    Welcome to the MySQL monitor. Commands end with ; or \g.
    Your MySQL connection id is 6 to server version: 5.0.24a-Debian_9-log
    
    Type 'help;' or '\h' for help. Type '\c' to clear the buffer.
    
    mysql> show character set like 'utf%';
    +---------+---------------+-------------------+--------+
    | Charset | Description | Default collation | Maxlen |
    +---------+---------------+-------------------+--------+
    | utf8 | UTF-8 Unicode | utf8_general_ci | 3 |
    +---------+---------------+-------------------+--------+
    1 row in set (0.00 sec)
    
    Why do you have such and old version of MySQL-5.0 anyway? I seem to remember some issues with character sets in older 5.0 versions, but I could be wrong.
    
    I'm actually using the trunk version of MySQLdb (future 1.3-2.0) but that really shouldn't affect anything.
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- nttaylor - 2006-10-27
  
  I upgraded to MySQL-5.0.24a and MySQLdb1.2.2b2, but nothing changed from what I described above.
  set_character_set("utf8") still doesn't affect the return value of "character_set_name()"
  
  Is there any way we can get to the bottom of this?
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Andy Dustman - 2006-10-27
    
    Did you verify that your MySQL server has UTF-8 support?
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
    - nttaylor - 2006-10-27
      
      Sorry Andy, yes, I did verify that I do have utf-8 support,
      using the query that you specified. I got back a positive
      answer just like the one you posted.
      
      nttaylor
      
      If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Ulrik Thoerner - 2006-10-27
  
  Having experienced similar behavior as you nttaylor I have a suggestion:
  Have you tried omitting the charset-attribute in the connection string and instead setting it afterwards?
  
  I'm using 1.2.2.b1 in a windows environment, so this is somewhat of a long shot, but having tried to get the 'in-connection-string'-attribute to work never worked for me whereas the 'set-the-attribute-right-after'-version works fine.
  
  In short:
  this: db=MySQLdb.connect(db="test",read_default_file="~/.my.cnf",use_unicode=True,charset="utf8") never worked for me.
  
  but this: db.set_character_set("utf8") works like a charm.
  
  Guess Andy is just lucky to have everything working (not using released versions probably helps with the 'luck'? ;)).
  
  I realise this is not an in-depth explanation - it is merely a suggestion to a fix that may or may not work.
  
  Keeping fingers crossed that you'll eventually find a solution.
  
  Cheers,
  Ulrik
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - nttaylor - 2006-10-27
    
    Hi Ulrik,
    
    I tried instantiating the 'db' object without using the 'charset'
    parameter like you suggested, but unfortunately got the same old
    problem. The return value from "character_set_name()" is still
    always "latin1". Attempting to set it with "set_character_set()"
    still has no effect.
    
    nttaylor
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

effect of 'charset' param in connect method

MySQL database connector for Python programming

Forums

Help

effect of 'charset' param in connect method

effect of 'charset' param in connect method

MySQL database connector for Python programming

Forums

Help

effect of 'charset' param in connect method document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

effect of 'charset' param in connect method