[Phpgedview-talk] DB index selectivity and postgresql v4.0.2
Brought to you by:
canajun2eh,
yalnifj
|
From: vincebr549 <vi...@pl...> - 2006-11-15 19:12:32
|
I'm running 4.0.2 with postgresql v8.0.?
I recently attended a session given by Jay Pipes the North American MYSQL
manager and he started me
looking at some of the tables in phpgedview.
First I wrote a quick and dirty perl script to analyze the indexes for what
he called
selectivity. The SQL statement is:
SELECT (COUNT(DISTINCT field) * 100)/COUNT(*) FROM table;
where 'field' is the name of a field for which there is an index (or you are
considering
building an index) within 'table'.
Unfortunately I'm a newby on sql and indexes and Jay didn't give any Rules
of Thumb (rot)
on what to do with this knowedge, so I thought I would share what I
found....
My data base has about 19K entries in it. Here is the output of running
the above sql on each of the
indexed fields:
1 pgv_blocks with 24 rows keyed on b_id sel= 100%
2 pgv_blocks with 24 rows keyed on b_username sel= 12%
3 pgv_dates with 32018 rows keyed on d_datestamp sel= 16%
4 pgv_dates with 32018 rows keyed on d_day sel= 0%
5 pgv_dates with 32018 rows keyed on d_fact sel= 0%
6 pgv_dates with 32018 rows keyed on d_file sel= 0%
7 pgv_dates with 32018 rows keyed on d_gid sel= 77%
8 pgv_dates with 32018 rows keyed on d_mon sel= 0%
9 pgv_dates with 32018 rows keyed on d_month sel= 0%
10 pgv_dates with 32018 rows keyed on d_type sel= 0%
11 pgv_dates with 32018 rows keyed on d_year sel= 1%
12 pgv_families with 5873 rows keyed on f_file sel= 0%
13 pgv_families with 5873 rows keyed on f_id sel= 100%
14 pgv_favorites with 3 rows keyed on fv_id sel= 100%
15 pgv_favorites with 3 rows keyed on fv_username sel= 33%
16 pgv_individuals with 19623 rows keyed on i_file sel= 0%
17 pgv_individuals with 19623 rows keyed on i_id sel= 99%
18 pgv_individuals with 19623 rows keyed on i_letter sel= 0%
19 pgv_individuals with 19623 rows keyed on i_name sel= 92%
20 pgv_individuals with 19623 rows keyed on i_surname sel= 12%
21 pgv_media with 1 rows keyed on m_id sel= 100%
22 pgv_media with 1 rows keyed on m_media sel= 100%
23 Table pgv_media_mapping is empty
24 Table pgv_media_mapping is empty
25 Table pgv_media_mapping is empty
26 Table pgv_media_mapping is empty
27 pgv_messages with 4 rows keyed on m_id sel= 100%
28 pgv_messages with 4 rows keyed on m_to sel= 100%
29 pgv_names with 1 rows keyed on n_gid sel= 100%
30 pgv_names with 1 rows keyed on n_letter sel= 100%
31 pgv_names with 1 rows keyed on n_name sel= 100%
32 pgv_names with 1 rows keyed on n_surname sel= 100%
33 pgv_names with 1 rows keyed on n_type sel= 100%
34 pgv_news with 1 rows keyed on n_id sel= 100%
35 pgv_news with 1 rows keyed on n_username sel= 100%
36 pgv_other with 3 rows keyed on o_file sel= 33%
37 pgv_other with 3 rows keyed on o_id sel= 100%
38 pgv_placelinks with 93557 rows keyed on pl_file sel= 0%
39 pgv_placelinks with 93557 rows keyed on pl_gid sel= 22%
40 pgv_placelinks with 93557 rows keyed on pl_p_id sel= 2%
41 pgv_places with 2803 rows keyed on p_id sel= 100%
42 pgv_places with 2803 rows keyed on p_file sel= 0%
43 pgv_places with 2803 rows keyed on p_level sel= 0%
44 pgv_places with 2803 rows keyed on p_parent_id sel= 35%
45 pgv_places with 2803 rows keyed on p_place sel= 63%
46 Table pgv_remotelinks is empty
47 Table pgv_remotelinks is empty
48 Table pgv_sources is empty
49 Table pgv_sources is empty
50 Table pgv_sources is empty
51 pgv_users with 14 rows keyed on u_username sel= 100%
52 pgv_users with 14 rows keyed on u_username sel= 100%
I looked at some of the tables that had non trivial number of entries and
with a small selectivity (sel).
Item 12 above...
phpgedview=# select distinct f_file from pgv_families;
f_file
--------
3
(1 row)
phpgedview=#
Huh? is it just my particular data or what? Why would one need an index on
a column that always has the
same value, well, why would one even need that column, if it is truly always
= 3.
Item #16 above
phpgedview=# select distinct i_file from pgv_individuals;
i_file
--------
3
(1 row)
phpgedview=#
Here again, is it just my data or is this field truely always going to be =
3???
Item 18 above i_letter gave interesting results. I found some corrections
that needed to be made
there were 34 entyries when I ran the same select distinct. Surprising I
found
'(', '-', '@', 'CH' , 'GY', and 'TY'
I'm guessing this field is used for soundex (I only support english in my
installation)???
item # 38 in the above list???
phpgedview=# select distinct pl_file from pgv_placelinks;
pl_file
---------
3
(1 row)
phpgedview=#
same for p_file in pgv_places.
I've not yet run a study to see what postgresql is doing with any searches
involving these
indexes but,
It would seem to me that, at least in my system these indexes can be safely
be deleted.
For the other indexes does any one have a feel for a ROT?
Another thought, what about slow DB loads. web searches strongly suggest
that
you disable indexes while doing a load and then turn them back on after.
They
also caution that performance to access existing data while these indexes
are
disabled may suffer, as always your milage may vary.
On my system
i've done several loads after running some large sed updates to correct and
clean up the data.
A load on my system takes on the order of 10 minutes or so. I'm thinking on
trying a load with all the indexes removed to see what the difference is.
I'm looking into truning on logging to log "long" running commands and hope
to be
able to gain some information on how this stuff is working by using the
EXPLAIN command.
v
--
View this message in context: http://www.nabble.com/DB-index-selectivity-and-postgresql-v4.0.2-tf2638223.html#a7364249
Sent from the phpgedview-talk mailing list archive at Nabble.com.
|