utf-8 install problem

Help
izahn
2008-08-29
2013-05-28
  • izahn
    izahn
    2008-08-29

    Hi,
    Just wanted to mention that I'm having trouble with the utf-8 install. It looks like the install goes ok but when I try to retrieve references I get errors saying

    Warning: preg_match() [function.preg-match]: Compilation failed: invalid UTF-8 string at offset 0 in /Library/WebServer/Documents/SDTdemo/refs/includes/include.inc.php on line 4700

    Thanks,
    Ista

     
    • I can't reproduce this.

      Please post your PHP version & the output to:

      svn info /Library/WebServer/Documents/SDTdemo/refs/includes/include.inc.php

      Please also confirm that your initialize/ini.inc.php has $contentTypeCharset="UTF-8", that the scripts are saved UTF-8 encoded, and that the MySQL database and tables that you created are UTF-8 and have only UTF-8 entities.

       
    • svn info /Library/WebServer/Documents/SDTdemo/refs/includes/include.inc.php
      Path: /Library/WebServer/Documents/SDTdemo/refs/includes/include.inc.php
      Name: include.inc.php
      URL: https://refbase.svn.sourceforge.net/svnroot/refbase/trunk/includes/include.inc.php
      Repository Root: https://refbase.svn.sourceforge.net/svnroot/refbase
      Repository UUID: 47a94560-c220-0410-a5a2-efdbd790cbb4
      Revision: 1214
      Node Kind: file
      Schedule: normal
      Last Changed Author: msteffens
      Last Changed Rev: 1214
      Last Changed Date: 2008-08-28 20:36:28 -0400 (Thu, 28 Aug 2008)
      Text Last Updated: 2008-08-29 11:01:45 -0400 (Fri, 29 Aug 2008)
      Checksum: e00610814cf5b145b69478ccd6f19d84

      I'm on OS X 10.5, with MySQL version 5.0.51b. The relevant section of my ini.inc.php file reads $contentTypeCharset = "UTF-8"; // possible values: "ISO-8859-1", "UTF-8" and the file is saved with UTF-8 encoding.

      The mysql database was created using the refbase installer. I don't know much about MySQL, but maybe this will help:

      mysql> SHOW VARIABLES LIKE '%character%';
      +--------------------------+------------------------------------------------------------+
      | Variable_name            | Value                                                      |
      +--------------------------+------------------------------------------------------------+
      | character_set_client     | latin1                                                     |
      | character_set_connection | latin1                                                     |
      | character_set_database   | utf8                                                       |
      | character_set_filesystem | binary                                                     |
      | character_set_results    | latin1                                                     |
      | character_set_server     | utf8                                                       |
      | character_set_system     | utf8                                                       |
      | character_sets_dir       | /usr/local/mysql-5.0.51b-osx10.5-x86/share/mysql/charsets/ |

      mysql> SHOW VARIABLES LIKE '%collation%';
      +----------------------+-------------------+
      | Variable_name        | Value             |
      +----------------------+-------------------+
      | collation_connection | latin1_swedish_ci |
      | collation_database   | utf8_general_ci   |
      | collation_server     | utf8_general_ci   |
      +----------------------+-------------------+

      Thanks,
      -Ista

      -Ista

       
    • I just noticed that you asked if the "scripts" (plural) were saved as UTF-8. The ini.inc.php script is saved as UTF-8 but I did not change the encoding of any of the other scripts. Could this be my problem? If so, which scripts should I save as UFT-8?
      Thanks again,
      Ista

       
    • Oh, and the php version is 5.2.6.
      -Ista

       
    • Hi Ista,

      sorry you're facing trouble with the new version.

      If you haven't already, please read thru the section in our online wiki entitled "Problems with special characters":

      http://wiki.refbase.net/index.php/Installation-Troubleshooting#Problems_with_special_characters

      and closely follow the steps and advice given there. Especially, enter

      SHOW CREATE DATABASE [YOUR_DATABASE_NAME_HERE];

      in your MySQL command line interpreter and make sure that the end of the returned string reads:

      ... DEFAULT CHARACTER SET utf8

      Also note that if you've installed refbase previously using the same database name (but with a different database encoding), it's imported to drop the old database first.

      If you can change the config settings of your MySQL server, it may be also worth setting the various encoding variables directly. See:

      http://wiki.refbase.net/index.php/Troubleshooting#MySQL_migration_and_character_set_problems

      If you can, try setting ALL of the various MySQL character_set_* and collation_* variables to 'utf8' or 'utf8_general_ci', respectively. Does this help?

      In addition, a good check to see whether refbase has been installed correctly (w.r.t. character encoding & collation) is to do the following:

      Make a new record in refbase, enter a few special (i.e. Unicode) characters, and save the record. If *ALL* of the following is true, then refbase should have been setup correctly:

      1) you can successfully enter special (Unicode) characters in refbase
      2) you can have them displayed correctly after saving your edits
      3) you can successfully search for these newly entered characters

      If one or more of these actions (enter/display/search) do not work as expected, then something is still not right with your setup.

      > which scripts should I save as UFT-8?

      If you've installed refbase with a UTF-8 based MySQL database, you'll need to re-save those scripts with UTF-8 encoding where you've entered any non-ASCII characters. So, if any custom values that you've entered in 'ini.inc.php' contain non-ASCII characters, you'll need to re-save that script with encoding "Unicode (UTF-8, no BOM)". However, you don't need to change the encoding of every refbase script to UTF-8. Those refbase scripts that do contain non-ASCII characters have been saved already with the correct encoding.

      Let us know how it goes.

      Best, Matthias

       
    • OK, I got it working. Unfortunately I'm not exactly sure what did the trick. I installed a different version of php (for an unrelated reason) and added the following to my my.cnf file:

      collation_server = utf8_general_ci
      character_set_client = utf8
      character_set_server = utf8
      character_set_filesystem = utf8

      Now it works like a charm! As a side note, I tried setting ALL the character sets and collations to uft8 or utf8_general_ci as Matthias suggested, but doing so broke my sql connection completely. The settings above were the only ones I could set to uft and still connect to the database.

      Thanks for the help, I'm really amazed by the software as well as your swift and helpful replies to my questions in this forum. Keep up the good work!
      -Ista

       
    • Hi Ista,

      thanks for reporting back to us.

      > OK, I got it working.

      Character set & encoding issues can be quite tricky, so I'm glad to hear you could resolve your problem!

      > As a side note, I tried setting ALL the character sets and
      > collations to uft8 or utf8_general_ci as Matthias suggested, but
      > doing so broke my sql connection completely. The settings above
      > were the only ones I could set to uft and still connect to the
      > database.

      That's good to know, thanks, and it may help other users with similar problems.

      > Thanks for the help, I'm really amazed by the software as well as
      > your swift and helpful replies to my questions in this forum. Keep
      > up the good work!

      You're welcome, and thanks for the kind words. It's the feedback from our users that keeps us going.

      Matthias