Search causes SQL Updates

Anonymous
2009-10-13
2013-05-30
  • Anonymous - 2009-10-13

    Greetings,

    Since upgrading to 4.2.3, I have noticed some performance issues around the use of MySQL. 

    If I do a search on home page (skatekey.net) upper right hand side, for “Peters”, there is a lot of thrashing of the database.  Using Explain, I looked and here is what I found.

    Searching for "Peters" actually runs 748 separate queries, many of which are LIKE queries that require loading the entire table data (no indexes).

    An odder thing is that PGV does lots of UPDATEs, too - things like:

    UPDATE pgv_individuals SET i_isdead='0' WHERE i_id='I2447' AND i_file='1'

    I don't understand that at all (why does PGV searching cause table updates?), and that's certainly slowing it down.

    Next, the hosting company moved the mysql database to a new server for testing purposes, and the query suddenly did become much faster. We believe the reason is that the tables are now clustered (optimized) on one part of the disk, so it only takes one disk access to load them for the LIKE queries,
    instead of doing lots of disk seeking. So, the conclusion is that all the UPDATEs are having the effect of fragmenting the data table on the disk.   Even though this is a very limited update database!

    Ideally, it wouldn't do all those UPDATEs. So, assuming PGV need to do these updates, I need to make sure that the tables get optimized every so often.    The MySQL manual says,

    "In most setups, you need not run OPTIMIZE TABLE at all. Even if you
      do a lot of updates to variable-length rows, it is not likely that
      you need to do this more than once a week or month and only on
      certain tables."

    Since PGV is doing a lot of updates to “variable-length rows", one would definitely optimize the database every week or so.

    I have all the logs of these tests if you like.

    Rob Peters

     
  • Anonymous - 2009-10-14

    Rob, I'm no expert in this area, but I can offer a couple of suggestions that I hope might help. But first a question - do you really mean "upgraded to 4.2.3" (which is the svn code) or 4.2.2. If svn, to what svn number have you upgraded (remembering that the svn code is changed almost every day, sometimes more often)

    1 - First check your database collation settings. They need to be the same in every table AND every field in each table. Mine are all set to

    > `utf8_unicode_ci`

    If any are different, it WILL slow searching down substantially. I know that from experience.

    2 - The updates you are seeing are part of PGV's privacy checking. The more detailed that needs to be (according to your preferred privacy rules), the more checks that need to be done. The fact it is calculating the 'isdead' setting suggests that not everyone who should be declared as 'dead' in your GEDCOM has that information recorded. For optimum speed EVERY indi that is likely to be deceased should either have a date attached to a DEAT tag (1 DEAt  2 DATE 01 JAN 1800) or at least a 1 DEAT Y tag. This avoids the need for PGV to attempt to calculate whether a person can be displayed or not. Batch Update has a tool to update this setting for you.

    3 - The worst offender in the privacy calculations is "Relationship Privacy". If it is necessary to use it, at least keep the relationship length as short as possible.

     
  • Anonymous - 2009-10-14

    Incidentally, you can check this in PGV without going through SQL logs yourself. In GEDCOM config, under Display & Layout; Hide & Show, turn "Show execution Stats" to Yes. They will then display in the footer area. If I search for the surname of my largest family group (238 INDIs, 86 FAMs, plus SOUR and NOTE records) I get these if I'm logged in:

    Execution time: 9.724 sec. Total Database Queries: 761. Total privacy checks: 482. Total Memory Usage: 21760 KB

    or these if I'm not:

    Execution time: 9.116 sec. Total Database Queries: 1682. Total privacy checks: 453. Total Memory Usage: 19712 KB

     
  • Greg Roach

    Greg Roach - 2009-10-14

    <<I don't understand that at all (why does PGV searching cause table updates?), and that's certainly slowing it down.>>

    Privacy rules require knowledge of whether a person is dead/alive.  These calculations are non-trivial, and involve checking spouse dates, children dates, grandchildren dates.

    We can't do this during import, as we may not have loaded the other relatives yet.

    So, we calculate it first time we check privacy for an individual, and store it in the DB.

    We only do this once per individual.

    Perhaps your subsequent "speed-up" was caused by PGV having finished calcuting these for all indis?

    The most common cause o fperformance issues is, as kiwi says, mismatched collations.

    MySQL has *lots* of collation settings (comms, queries, results, etc. as well as data storage), and if they are not all the same,  you can get huge performance hits, particularly on queries that join two tables with different collations.

     
  • Anonymous - 2009-10-14

    Kiwi:  Sorry about the version number, its 4.2.2. dah.
    I use the standard PGV product and have not modified any table or collation settings.  How would this get changed?

    Fisharebest:  Why the LIKE queries?  This ignores indexes and does a full table scan.  Why update the isdead column in the pgvindividuals table when its already established?  I looked through my table and all the values are either " 0, 1, or -1 " ?
    If there is a need for doing updates on searches, would it be better to have a temp table as opposed to updating the largest table in the db?

     
  • Greg Roach

    Greg Roach - 2009-10-14

    <<I looked through my table and all the values are either " 0, 1, or -1>>

    0=alive
    1=dead
    -1=still needs to be calculated

    <<Why the LIKE queries?>>

    Which LIKE queries?  Generally, these exist to give case-insensitive searching of gedcom identifiers.  i.e. searching for i1234 and finding I1234.

    PS - I'm not defending the current database structure.  It was there long before I joined the project.  With hindsight, lots of things would be done differently.  Particularly with regards to collation.

    <<have not modified any table or collation settings. How would this get changed?>>

    Just changing the collation of your tables/columns using phpMyAdmin or similar is virtually guaranteed to destroy any data with non-ascii characters.

    If the task was simple or could be automated, we'd probably have done it already.

    Look in the patches tracker for a "db export/import" utility.

     

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:





No, thanks