Menu

#505 Database import/export/conversion utility

open
5
2008-11-20
2008-11-20
Greg Roach
No

This module allows you to export and import the contents of your database to text files containing SQL scripts. This is not the most compact or efficient format, but it simple, reliable and portable. If you want efficent, there are much more appropriate tools available.

A separate file is generated for each table, and each can be imported/exported separately. Although an "import all" option is provided, you may find that your server time limits prevent this from completing.

Files are created in the index directory, so this can be useful for creating backups of your system. They can also be useful for transferring data between different databases, for example, between sqlite on your home PC and MySQL on your webserver.

**FOR MYSQL USERS ONLY**, there is also an option to convert your database tables from latin (single byte) encoding to UTF8 (multibyte) encoding. This change is necessary to allow the database to perform case-insensitive searching of letters containing diacritics and non-latin character sets.

By default, PHP communicates with MySQL in latin text. When PGV sends UTF8 text to the database, MySQL treats multibyte characters as a number of "extended ascii" characters, and converts these characters to UTF. This incorrect encoding (and possible double-encoding) is what prevents the database from searching data. Luckily, the reverse encodings take place when we fetch the data back, so apart from the wasted conversion effort, this process is transparant to PGV.

In order to use UTF8, we need to tell MySQL that we are using UTF8. However, to do this on an existing database means that we will fetch old/existing data using different conversions to those used to store it - and our data will be corrupted.

We therefore need to convert our data before changing the way we talk to MySQL. The steps to be taken are

1) export all the tables
2) convert all the tables to utf8
3) update your config.php to set $DB_UTF8_COLLATION=true;
4) import all the tables

NOTE: If your login name contains non-ascii characters, then you *may* get logged out between steps 3 and 4. To prevent this, create an admin account with an all-ascii username before starting.

NOTE: If you have the default MySQL configuration (latin tables and latin communication), then steps 1 and 4 are optional. However, we can't reliably detect this configuration, so unless you are certain, don't skip them. Also, running step 2 repeatedly may corrupt data, meaning you do need to run steps 1 and 4. In fact, unless you have a very good reason (e.g. you know exactly what you are doing and have large tables on a server with very low execution time limits), then don't skip these steps.

NOTE: If you have low server time limits, then it may not be possible to import some of your large tables before it times out. These tables are likely to be your gedcom-data tables. They can easily be reconstructed by reimporting the gedcom.

NOTE: This procedure can destroy your data and prevent you from logging in. Do not do it unless you understand what you are doing. Even if you know what you are doing, you should make a backup before you start.

Discussion

  • Greg Roach

    Greg Roach - 2008-11-20

    Database import/export/utf8-conversion utility

     
  • KosherJava

    KosherJava - 2008-11-20

    Greg,
    I maintain my data out of PhpGedView, and import it. Besides the user table and the Google Maps tables, do all the tables get dropped and recreated on a re-import, or do I need to manually drop them?

     
  • Greg Roach

    Greg Roach - 2008-11-20

    <<I maintain my data out of PhpGedView, and import it.>>

    This makes no difference. *ALL* data must be updated; gedcom data and application data (messages, block layouts, news, users, etc.)

    The best way to do this is to follow steps 1-4 above.

    If you drop the *genealogy* tables, then the import will recreate them. But you can't drop the *application* tables, or pgv will stop working. Particularly the user table(!)

     
  • KosherJava

    KosherJava - 2008-11-21

    Greg,
    It worked without a hitch. Thanks for the very clear directions. We can probably add $DB_UTF8_COLLATION=false;
    to config.dist.
    Do you plan on integrating this tool into the core PGV for the next release?

     
  • Greg Roach

    Greg Roach - 2008-11-21

    <<We can probably add $DB_UTF8_COLLATION=false;>>

    We already have - and install.php also allows you to edit it. So, it will be present
    for new installations. It will be present on old installations that have recently
    run install.php. But, it won't be present for everyone. Hence I wrote the
    instructions as I did. I'm guessing that everyone who uses this tool is capable of
    editing a file.

    <<Do you plan on integrating this tool into the core PGV for the next release?>>

    It depends on how much code I manage to write to take advantage of it. If not 4.2,
    it will be in 4.3.

     
  • KosherJava

    KosherJava - 2008-11-21

    Greg,
    I had missed the $DB_UTF8_COLLATION, and appended mine, but I now see it and removed the redundant one. I found many old entries in the file, and basically just used a new copy from .dist. I had also been using authentication_mysql.php and changed to authentication.php based on the latest .dist file. Was this OK?
    Is there any way to make a diacritic insensitive search? when I search for a place called Vysny Orlík, I would have hoped that it would return Vyšný Orlík.
    Thanks

     
  • Greg Roach

    Greg Roach - 2008-11-21

    <<as this OK?>>

    The support for merging the distribution config.php into the working config.php is very poor. Lots of hacks.

    I intend to move many of these settings into the database, so the problem will disappear.

    <<Is there any way to make a diacritic insensitive search?>>

    Yes. I've done half the work. The DB queries *do* fetch these matches. However, search_ctrl.php filters them out (when is checks to make sure that the matches aren't in private records).

    I'm sure we used to have a function to strip diacritics, but I couldn't find it. Otherwise I would be using it already.

     
  • Greg Roach

    Greg Roach - 2008-11-23

    Updated version (conversion works better with MySQL4)

     
  • Greg Roach

    Greg Roach - 2008-11-23

    I've updated the conversion script. I tried on an older MySQL database, and the conversion script failed to run, but gave no error. I've updated the sql syntax to be a little stricter.
    File Added: db_export.php

     
  • Warren Meads

    Warren Meads - 2008-11-25

    I ran this but it only changed the table and not the fields in each table so I have errors that are now arrising on individual and family listings

     
  • Greg Roach

    Greg Roach - 2008-11-25

    The Pear::DB interface that we use to connect to the database doesn't seem to report errors in these statements. They just fail silently.

    But, it shows the SQL statements that it is executing. Can you try running them in phpMyAdmin or at a command line.

    You should get some error messages/feedback there.

    It would appear that different versions of MySQL handle different ALTER TABLE statements.
    Also, the binary collation is sometimes called "bin" and sometimes "ascii_bin". You can find out what is available on your system with a command like "show collation;"

     
  • Warren Meads

    Warren Meads - 2008-11-26

    Greg I changed these all manually last night as I was faced with reverting to SVN 4287 or trying to fix it myself and as I am not totally comfortable with complex SQL statements I thought that the quickest and safest route was to change all the fields manually. It took about half an hour which is the time I would have had to spend on reverting any way.

    Every thing now appears to be running ok.

    I have tried running the Show collation but all i get is a list of the available collations with some other data but nothing about bin or ascii_bin

     
  • Greg Roach

    Greg Roach - 2008-11-26

    <<Every thing now appears to be running ok.>>

    Good! As for the collation name, I guess it depends on which ones were installed on your system.

    This is not actually critical. By converting to BIN before converting to UTF8, it can save you having to import the data in step 4 (in certain circumstances).

    As long as you exported/converted/imported, you should be fine. You should now be able to view UTF chars (e.g. letters with accents, greek/russian/hebrew names, etc.) while browsing MySQL using phpMyAdmin, etc.

    I've writen some new code to take advantage of this, but I want to do some more testing before I submit it.

     
  • John Slee

    John Slee - 2009-02-10

    What is the point of the Backup column - all mine show "Not Present". Should I backup my tables - and if so, how do I do so?

     
  • Greg Roach

    Greg Roach - 2009-02-10

    The "not present" means that the .SQL (backup) file isn't present.

    It won't be until you export the table (export button). After that, it will be present, and you will be able to restore (import) it.

     
  • Łukasz Wileński

    It doesn't work with the latest svn (PDO)

     
  • Greg Roach

    Greg Roach - 2009-06-23
     
  • Greg Roach

    Greg Roach - 2009-06-23

    <<It doesn't work with the latest svn (PDO)>>

    Try the latest version

     

Log in to post a comment.