#295 wrong charset detection when no charset is specified

MySQLdb-1.3
open
Andy Dustman
MySQLdb (285)
5
2014-12-13
2010-04-01
francoise
No

Used with python-2.4.6 (Plone-3.3.4) and mysql 5

mysql is configured to use utf8 encoding
mysql> SHOW VARIABLES LIKE 'character_set%';
+--------------------------+----------------------------+
| Variable_name | Value |
+--------------------------+----------------------------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | utf8 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | utf8 |
| character_set_system | utf8 |
| character_sets_dir | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+

mysql> SHOW VARIABLES LIKE 'collation%';
+----------------------+-----------------+
| Variable_name | Value |
+----------------------+-----------------+
| collation_connection | utf8_general_ci |
| collation_database | utf8_general_ci |
| collation_server | utf8_general_ci |
+----------------------+-----------------+

The "my.cnf" file contains the following options :

my.cnf file

[mysqld]
character_set_server = utf8
character_set_client = utf8

MySQLdb calls mysql_character_set_name function, when now charset is specified, but the value displayed seems not correct "latin1" instead of "utf8"

import MySQLdb
MySQLdb.get_client_info()
'5.0.51a'
import sys
sys.version
'2.4.6 (#1, Mar 23 2010, 14:39:19) \n[GCC 4.3.2]'
db = MySQLdb.connect(host="host", user="login", passwd="pwd", db="database")
db.character_set_name()
'latin1'
db.get_character_set_info()
{'collation': 'latin1_swedish_ci', 'comment': 'cp1252 West European', 'mbminlen': 1, 'name': 'latin1', 'mbmaxlen': 1}

If I call the mysql_character_set_name C function, I get the good character set :

include <stdio.h>

include <mysql.h>

int main(int argc, char **argv) {
MYSQL mysql;

mysql_init(&mysql);
mysql_options(&mysql,MYSQL_READ_DEFAULT_GROUP,"hello-mysql");
if (!mysql_real_connect(&mysql,"host","login","pwd","database",0,NULL,0))
{
    fprintf(stderr, "Failed to connect to database: Error: %s\n", mysql_error(&mysql));
}
else
{
    printf ("Connected\n");

    fprintf(stdout, "mysql_character_set_name for database : %s\n", mysql_character_set_name(&mysql));

    mysql_close (&mysql);
}

return 0;

}

./hello-mysql
Connected
mysql_character_set_name for database : utf8

Discussion

  • Andy Dustman
    Andy Dustman
    2010-04-01

    Your config file is only setting the database character set, not the client character set. Try specifying it in a [mysql] section instead of the [mysqld] section.

    Also.. your C code is reading a different group (hello-mysql). Try using read_default_group="hello-mysql" in your connect() call.

     
  • francoise
    francoise
    2010-04-01

    I have also the following lines in my.cnf

    [client]
    default-character-set = utf8

    [mysql]
    default-character-set=utf8

    I think that :
    character_set_server = utf8
    character_set_client = utf8
    are [mysqld] parameters, http://dev.mysql.com/doc/refman/5.0/fr/charset-connection.html

    There is no hello-mysql group in the any configuration file but I was forced to pass a third parameter to mysql_options so I have passed a "fake" string.

     
  • Andy Dustman
    Andy Dustman
    2010-04-01

    Try read_default_group="mysql" then. The config files aren't read unless one of read_default_file or read_default_group are specified.

    To address the subject of this bug, there is nothing wrong with the character set detection; it's just not being set to what you think it should be.

     
  • Andy Dustman
    Andy Dustman
    2010-04-01

    Also... You aren't forced to use mysql_options at all. Just comment that line out and see what happens.

     
  • francoise
    francoise
    2010-04-01

    with :
    mysql_options(&mysql,MYSQL_READ_DEFAULT_GROUP,"mysql");

    I also get "utf8" result :
    ./hello-mysql
    Connected
    mysql_character_set_name for database : utf8

    But when I comment the mysql_options line out, I get "latin1" !
    ./hello-mysql
    Connected
    mysql_character_set_name for database : latin1

     
  • Andy Dustman
    Andy Dustman
    2010-04-01

    Right. In the C API, you must use either MYSQL_READ_DEFAULT_GROUP or MYSQL_READ_DEFAULT_FILE in order for the defaults to be read at all. You can see that they are not read if you don't do this. The same applies for MySQLdb.

     
  • francoise
    francoise
    2010-04-01

    With the following call :
    db = MySQLdb.connect(host="host", user="login", passwd="pwd", db="database", read_default_group="mysql")

    I get the right encoding :

    db.character_set_name()
    'utf8'
    db.get_character_set_info()
    {'collation': 'utf8_general_ci', 'comment': 'UTF-8 Unicode', 'mbminlen': 1, 'name': 'utf8', 'mbmaxlen': 3}

    Thanks for your help

     
  • francoise
    francoise
    2010-04-02

    This is not a MySQLdb bug but if I do understand the documentation http://dev.mysql.com/doc/refman/5.0/en/mysql-options.html
    * MYSQL_READ_DEFAULT_FILE is used to "Read options" from another file than my.cnf
    * MYSQL_READ_DEFAULT_GROUP is used to "Read options" from a group from my.cnf or the file specified with MYSQL_READ_DEFAULT_FILE

    When I do not call mysql_options(), I would expect that default are read from [client] or [mysql] groups of my.cnf where I configured the encoding instead of a kind of default value "latin1"

    This mysql behaviour is not completely clear for me, what are the [client] or [mysql] groups of my.cnf made for if they are not read as default ?

     
  • Andy Dustman
    Andy Dustman
    2010-04-02

    I suspect that if mysql_options() is never called, the option files are not read at all.

     
  • francoise
    francoise
    2010-04-08

    I have asked mysql forum to explain why the "default values" set in the "default configuration file" are not taken as default.

    http://forums.mysql.com/read.php?45,362257