[Phpgedview-talk] DB index selectivity and postgresql v4.0.2
Brought to you by:
canajun2eh,
yalnifj
From: vincebr549 <vi...@pl...> - 2006-11-15 19:12:32
|
I'm running 4.0.2 with postgresql v8.0.? I recently attended a session given by Jay Pipes the North American MYSQL manager and he started me looking at some of the tables in phpgedview. First I wrote a quick and dirty perl script to analyze the indexes for what he called selectivity. The SQL statement is: SELECT (COUNT(DISTINCT field) * 100)/COUNT(*) FROM table; where 'field' is the name of a field for which there is an index (or you are considering building an index) within 'table'. Unfortunately I'm a newby on sql and indexes and Jay didn't give any Rules of Thumb (rot) on what to do with this knowedge, so I thought I would share what I found.... My data base has about 19K entries in it. Here is the output of running the above sql on each of the indexed fields: 1 pgv_blocks with 24 rows keyed on b_id sel= 100% 2 pgv_blocks with 24 rows keyed on b_username sel= 12% 3 pgv_dates with 32018 rows keyed on d_datestamp sel= 16% 4 pgv_dates with 32018 rows keyed on d_day sel= 0% 5 pgv_dates with 32018 rows keyed on d_fact sel= 0% 6 pgv_dates with 32018 rows keyed on d_file sel= 0% 7 pgv_dates with 32018 rows keyed on d_gid sel= 77% 8 pgv_dates with 32018 rows keyed on d_mon sel= 0% 9 pgv_dates with 32018 rows keyed on d_month sel= 0% 10 pgv_dates with 32018 rows keyed on d_type sel= 0% 11 pgv_dates with 32018 rows keyed on d_year sel= 1% 12 pgv_families with 5873 rows keyed on f_file sel= 0% 13 pgv_families with 5873 rows keyed on f_id sel= 100% 14 pgv_favorites with 3 rows keyed on fv_id sel= 100% 15 pgv_favorites with 3 rows keyed on fv_username sel= 33% 16 pgv_individuals with 19623 rows keyed on i_file sel= 0% 17 pgv_individuals with 19623 rows keyed on i_id sel= 99% 18 pgv_individuals with 19623 rows keyed on i_letter sel= 0% 19 pgv_individuals with 19623 rows keyed on i_name sel= 92% 20 pgv_individuals with 19623 rows keyed on i_surname sel= 12% 21 pgv_media with 1 rows keyed on m_id sel= 100% 22 pgv_media with 1 rows keyed on m_media sel= 100% 23 Table pgv_media_mapping is empty 24 Table pgv_media_mapping is empty 25 Table pgv_media_mapping is empty 26 Table pgv_media_mapping is empty 27 pgv_messages with 4 rows keyed on m_id sel= 100% 28 pgv_messages with 4 rows keyed on m_to sel= 100% 29 pgv_names with 1 rows keyed on n_gid sel= 100% 30 pgv_names with 1 rows keyed on n_letter sel= 100% 31 pgv_names with 1 rows keyed on n_name sel= 100% 32 pgv_names with 1 rows keyed on n_surname sel= 100% 33 pgv_names with 1 rows keyed on n_type sel= 100% 34 pgv_news with 1 rows keyed on n_id sel= 100% 35 pgv_news with 1 rows keyed on n_username sel= 100% 36 pgv_other with 3 rows keyed on o_file sel= 33% 37 pgv_other with 3 rows keyed on o_id sel= 100% 38 pgv_placelinks with 93557 rows keyed on pl_file sel= 0% 39 pgv_placelinks with 93557 rows keyed on pl_gid sel= 22% 40 pgv_placelinks with 93557 rows keyed on pl_p_id sel= 2% 41 pgv_places with 2803 rows keyed on p_id sel= 100% 42 pgv_places with 2803 rows keyed on p_file sel= 0% 43 pgv_places with 2803 rows keyed on p_level sel= 0% 44 pgv_places with 2803 rows keyed on p_parent_id sel= 35% 45 pgv_places with 2803 rows keyed on p_place sel= 63% 46 Table pgv_remotelinks is empty 47 Table pgv_remotelinks is empty 48 Table pgv_sources is empty 49 Table pgv_sources is empty 50 Table pgv_sources is empty 51 pgv_users with 14 rows keyed on u_username sel= 100% 52 pgv_users with 14 rows keyed on u_username sel= 100% I looked at some of the tables that had non trivial number of entries and with a small selectivity (sel). Item 12 above... phpgedview=# select distinct f_file from pgv_families; f_file -------- 3 (1 row) phpgedview=# Huh? is it just my particular data or what? Why would one need an index on a column that always has the same value, well, why would one even need that column, if it is truly always = 3. Item #16 above phpgedview=# select distinct i_file from pgv_individuals; i_file -------- 3 (1 row) phpgedview=# Here again, is it just my data or is this field truely always going to be = 3??? Item 18 above i_letter gave interesting results. I found some corrections that needed to be made there were 34 entyries when I ran the same select distinct. Surprising I found '(', '-', '@', 'CH' , 'GY', and 'TY' I'm guessing this field is used for soundex (I only support english in my installation)??? item # 38 in the above list??? phpgedview=# select distinct pl_file from pgv_placelinks; pl_file --------- 3 (1 row) phpgedview=# same for p_file in pgv_places. I've not yet run a study to see what postgresql is doing with any searches involving these indexes but, It would seem to me that, at least in my system these indexes can be safely be deleted. For the other indexes does any one have a feel for a ROT? Another thought, what about slow DB loads. web searches strongly suggest that you disable indexes while doing a load and then turn them back on after. They also caution that performance to access existing data while these indexes are disabled may suffer, as always your milage may vary. On my system i've done several loads after running some large sed updates to correct and clean up the data. A load on my system takes on the order of 10 minutes or so. I'm thinking on trying a load with all the indexes removed to see what the difference is. I'm looking into truning on logging to log "long" running commands and hope to be able to gain some information on how this stuff is working by using the EXPLAIN command. v -- View this message in context: http://www.nabble.com/DB-index-selectivity-and-postgresql-v4.0.2-tf2638223.html#a7364249 Sent from the phpgedview-talk mailing list archive at Nabble.com. |