Thread: Re: [Phpgedview-talk] Hello and suggested optimization
Brought to you by:
canajun2eh,
yalnifj
From: John F. <Joh...@ne...> - 2006-06-06 22:48:58
|
Hi Shane, Thanks for looking into these optimizations. Those indexes should already be created when your database is setup. See the create_dates_table() function (about line 1078) in functions_import.php. Unless postgresql treats the two fields indexed together differently then the two indexed seperately. I did some testing on this and I wasn't able to duplicate your results. But I might not have the same configuration as you. Can you post the SQL queries that were taking so long to run? What blocks do you have configured on your welcome page? Thanks, --John -----Original Message----- From: php...@li... [mailto:php...@li...] On Behalf Of Shane Hathaway Sent: Sunday, June 04, 2006 11:15 PM To: php...@li... Subject: [Phpgedview-talk] Hello and suggested optimization Hello, By way of introduction, I'm a software engineer working on genealogy software at the LDS church. I wrote a lot of the functionality (and perhaps bugs too) of Scan Stone, a high speed microfilm scanning system that was recently featured in the Church News. I'm also a Zope expert and one of the founders of the Content Management Framework (CMF), the technology behind Plone, a major open source content management system. I just installed PGV 4.0 beta 8 and uploaded a GEDCOM with 5,000 names. It worked well and I can see an impressive list of features. I've been interested in the project for a long time. I used PostgreSQL 8.0 as the backend. I noticed that the welcome page takes a long time to load, so I did a little investigating. I set "log_min_duration_statement =3D 100" in postgresql.conf, which causes PostgreSQL to log any query that takes longer than 100 ms. Then, postgresql.log revealed two queries that were each taking 33 seconds on every visit to the welcome page. I used EXPLAIN ANALYZE to find out why it was taking so long, and it became apparent that it didn't have the right indexes to perform this query. I tried various combinations of indexes until I settled on the following simple index, which reduces the 33 second queries to 20 ms: create index date_death_opt on pgv_dates (d_fact, d_gid); This is a whopping 1,650 times (165000%) faster. :-) Just thought you'd like a simple optimization. I haven't read a single line of code of PGV yet, so I don't know where the optimization should go. Shane _______________________________________________ Phpgedview-talk mailing list Php...@li... https://lists.sourceforge.net/lists/listinfo/phpgedview-talk |
From: John F. <Joh...@ne...> - 2006-06-07 17:25:50
|
Those queries are for the statistics block. I reinstalled Postgres and I was able to duplicate the results you had. For the next release, I will make sure that that index is created. Most users use MySQL so if you find any other slow queries in pgsql please let me know. Thanks, --John -----Original Message----- From: php...@li... [mailto:php...@li...] On Behalf Of Shane Hathaway Sent: Wednesday, June 07, 2006 12:57 AM To: All About PhpGedView Subject: Re: [Phpgedview-talk] Hello and suggested optimization John Finlay wrote: > Those indexes should already be created when your database is setup. > See the create_dates_table() function (about line 1078) in > functions_import.php. Unless postgresql treats the two fields indexed > together differently then the two indexed seperately. Yes, it treats them very differently, as does MySQL and any relational database. Computing a result based on multiple independent indexes is typically not an efficient operation. Queries that look up a row based on the values in multiple columns are often much faster if they use an index that combines all of the required columns. Imagine having a database table of the geospatial coordinates of every house on earth. One column would have the longitude and another would have the latitude. Create one index for the longitude column and another for the latitude column. If you want to look up a house at a particular latitude and longitude, the longitude index can quickly tell you about the thousands (millions?) of houses along a slice of the earth, and the latitude index can quickly tell you about the houses along a different slice, but no index can tell you about the houses at particular coordinates. You have to merge the lists of houses from the two indexes, or fall back to scanning the table. If you instead create an index that combines the longitude and the latitude, you can look up the house directly, without merging anything. It's as if the table were pre-sorted by both the latitude and longitude. The only problem is that the index may consume a lot of disk space, so it's a trade-off. > I did some testing on this and I wasn't able to duplicate your results. > But I might not have the same configuration as you. Can you post the > SQL queries that were taking so long to run? What blocks do you have > configured on your welcome page? It's a default install. I am a newbie to PGV. This is the first I've heard that you can actually configure the welcome page. ;-) The offending queries are below. Each originally took about 33000 ms, dropping to 20 ms with a more appropriate index. (Interestingly, it only took 130 ms to "create index"!) select avg(death.d_year-birth.d_year) as age from pgv_dates as death, pgv_dates as birth where birth.d_gid=3Ddeath.d_gid AND = death.d_file=3D'1' and birth.d_file=3Ddeath.d_file AND birth.d_fact=3D'BIRT' and death.d_fact=3D'DEAT' AND birth.d_year>0 and death.d_year>0 and birth.d_type is null and death.d_type is null; select death.d_year-birth.d_year as age, death.d_gid from pgv_dates as death, pgv_dates as birth where birth.d_gid=3Ddeath.d_gid AND death.d_file=3D'1' and birth.d_file=3Ddeath.d_file AND = birth.d_fact=3D'BIRT' and death.d_fact=3D'DEAT' AND birth.d_year>0 and death.d_year>0 and birth.d_type is null and death.d_type is null ORDER BY age DESC LIMIT 1 OFFSET 0; Shane _______________________________________________ Phpgedview-talk mailing list Php...@li... https://lists.sourceforge.net/lists/listinfo/phpgedview-talk |
From: Shane H. <sh...@ha...> - 2006-06-07 06:57:26
|
John Finlay wrote: > Those indexes should already be created when your database is setup. > See the create_dates_table() function (about line 1078) in > functions_import.php. Unless postgresql treats the two fields indexed > together differently then the two indexed seperately. Yes, it treats them very differently, as does MySQL and any relational database. Computing a result based on multiple independent indexes is typically not an efficient operation. Queries that look up a row based on the values in multiple columns are often much faster if they use an index that combines all of the required columns. Imagine having a database table of the geospatial coordinates of every house on earth. One column would have the longitude and another would have the latitude. Create one index for the longitude column and another for the latitude column. If you want to look up a house at a particular latitude and longitude, the longitude index can quickly tell you about the thousands (millions?) of houses along a slice of the earth, and the latitude index can quickly tell you about the houses along a different slice, but no index can tell you about the houses at particular coordinates. You have to merge the lists of houses from the two indexes, or fall back to scanning the table. If you instead create an index that combines the longitude and the latitude, you can look up the house directly, without merging anything. It's as if the table were pre-sorted by both the latitude and longitude. The only problem is that the index may consume a lot of disk space, so it's a trade-off. > I did some testing on this and I wasn't able to duplicate your results. > But I might not have the same configuration as you. Can you post the > SQL queries that were taking so long to run? What blocks do you have > configured on your welcome page? It's a default install. I am a newbie to PGV. This is the first I've heard that you can actually configure the welcome page. ;-) The offending queries are below. Each originally took about 33000 ms, dropping to 20 ms with a more appropriate index. (Interestingly, it only took 130 ms to "create index"!) select avg(death.d_year-birth.d_year) as age from pgv_dates as death, pgv_dates as birth where birth.d_gid=death.d_gid AND death.d_file='1' and birth.d_file=death.d_file AND birth.d_fact='BIRT' and death.d_fact='DEAT' AND birth.d_year>0 and death.d_year>0 and birth.d_type is null and death.d_type is null; select death.d_year-birth.d_year as age, death.d_gid from pgv_dates as death, pgv_dates as birth where birth.d_gid=death.d_gid AND death.d_file='1' and birth.d_file=death.d_file AND birth.d_fact='BIRT' and death.d_fact='DEAT' AND birth.d_year>0 and death.d_year>0 and birth.d_type is null and death.d_type is null ORDER BY age DESC LIMIT 1 OFFSET 0; Shane |