|
From: David E. <em...@mo...> - 2003-02-26 20:29:23
|
Hi Scott,
Which Pg server are you using?
Explaining your query1 on Pg 7.3 using srcfeature_id = 1, 4, 6, I'm not seeing
any difference in index usage. Eg:
explain select distinct f.name,fl.min,fl.max,fl.strand,f.type_id,f.feature_id
from feature f, featureloc fl
where
fl.srcfeature_id = 6 and
f.feature_id = fl.feature_id and
fl.max >= 390956 and
fl.min <= 393164
;
-Dave
From: Scott Cain <ca...@cs...>
>> To: gmod schema <gmo...@li...>
>> Subject: [Gmod-schema] Postgresql DBA-type question
>>
>> Hello,
>>
>> This is going to be fairly long.
>>
>> I am on a continuing quest to improve query performance for queries used
>> frequently by gbrowse. The most common time consuming query is to find
>> all of the features in a given range. To do that, there are two types
>> of queries that can be used, which I will refer to as Query1 and Query2.
>> Query1 uses the coordinate system I have advocated the last couple of
>> weeks, with min and max columns and min < max always. (Those column
>> names should be changed to avoid conflicts with restricted words.)
>> Query2 uses a union of two queries that use the other coordinate system
>> (nbeg and nend) combined with strand. Here are examples:
>>
>> Query1:
>> select distinct f.name,fl.min,fl.max,fl.strand,f.type_id,f.feature_id
>> from feature f, featureloc fl
>> where
>> fl.srcfeature_id = 1 and
>> f.feature_id = fl.feature_id and
>> fl.max >= 390956 and
>> fl.min <= 393164
>>
>> Query2:
>> select distinct f.name,fl.nbeg,fl.nend,fl.strand,f.type_id,f.feature_id
>> from feature f, featureloc fl
>> where
>> fl.srcfeature_id = 1 and
>> f.feature_id = fl.feature_id and
>> fl.nbeg <= 393164 and fl.nend >= 390956 and strand=1
>> union
>> select distinct f.name,fl.nbeg,fl.nend,fl.strand,f.type_id,f.feature_id
>> from feature f, featureloc fl
>> where
>> fl.srcfeature_id = 1 and
>> f.feature_id = fl.feature_id and
>> fl.nbeg >= 390956 and fl.nend <= 393164 and strand=-1
>>
>> Here is the odd thing that I can't explain: the performance (read:index
>> usage) seems to vary by srcfeature_id. Here's a table showing what I
>> mean (values are msec from explain analyze):
>>
>> src_id Query1 Query2
>> 1 116433.34 209532.05
>> 2 979.55 1039.81
>> 4 36.76 251.25
>> 5 369.95 901.55
>> 6 108404.04 208322.54
>> 7 798.28 1131.86
>>
>> Note that srcfeature_id 4 is much faster because there are relatively
>> few features on it. The biggest difference is srcfeature_ids 1 and 6
>> taking 3 orders of magnitude longer than the other queries. That is
>> because the query planner uses seq scans when the srcfeature_id is 1 or
>> 6, but for other srcfeature_ids, it uses the appropriate index (either
>> featureloc_src_nbeg_nend or featureloc_src_min_max depending on the
>> query). I am at a total loss as to why postgres would do this; can
>> anyone with more experience with postgres enlighten me?
>>
>> Note that this was done with gadfly 3 (filename gadfly_chado_dump2.gz)
>> if it matters to anyone. I will probably take this to the postgres
>> mailing list as well to get there input.
>>
>> Thanks,
>> Scott
>>
>>
>> --
>> ------------------------------------------------------------------------
>> Scott Cain, Ph. D. ca...@cs...
>> GMOD Coordinator (http://www.gmod.org/) 216-392-3087
>> Cold Spring Harbor Laboratory
>>
>>
>>
>> -------------------------------------------------------
>> This SF.net email is sponsored by: Scholarships for Techies!
>> Can't afford IT training? All 2003 ictp students receive scholarships.
>> Get hands-on training in Microsoft, Cisco, Sun, Linux/UNIX, and more.
>> www.ictp.com/training/sourceforge.asp
>> _______________________________________________
>> Gmod-schema mailing list
>> Gmo...@li...
>> https://lists.sourceforge.net/lists/listinfo/gmod-schema
>>
>>
|