gmod-schema Mailing List for Generic Model Organism Database Project (Page 262)

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Scott,
	I developed a 3D data blade in Illustra (a commercial descendent of
	postgres) some years ago for MRI images. There is a built-in 2D index 
using R-tree indexing. However for the 1D case that bioinformatics is most 
commonly interested in, all that is overkill. A compound index of the form
	(contig_id, fmin, fmax)
(recently renamed to nbeg and nend) should suffice to make range queries
efficient. It may be a good idea to use the cluster command on this index
periodically as well, to get good coupling between disk hardware and
software query optimization. You may also need to play with the ways of
expressing the query -- I am not sure how smart the postgres optimizer is
about knowing which indexes to use when, or whether conjunct order
affects this. I have done this in Sybase in the past and gotten very good 
performance on both "contains" and "overlaps" queries.

Contains: where fmin >= {query_interval_fmin} and fmax <= {query_interval_fmax}
Overlaps: where fmin <= {query_interval_fmax} and fmax >= {query_interval_fmin}

But it has to use the index. Someone should experiment with postgres and
make sure it does the right thing here.

Note that the proposed policy of having nbeg > bend when on the opposite
strand will complicate these queries, possibly making it harder to get
the optimizer to use the index. The versions above only work when fmin <= fmax.
If people are wedded to the strand/reversal idea, you may want to materialize
fmin = min(nbeg, nend)
fmax = max(nbeg, nend)
as extra columns in feature for indexing purposes. Unless someone can
verify that postgres can efficiently index the nbeg/nend queries with
strand reversal.

Cheers, -Stan

2002	Jan	Feb	Mar	Apr	May	Jun	Jul	Aug	Sep	Oct (28)	Nov (87)	Dec (16)
2003	Jan (109)	Feb (107)	Mar (117)	Apr (5)	May (156)	Jun (83)	Jul (86)	Aug (25)	Sep (17)	Oct (14)	Nov (82)	Dec (50)
2004	Jan (14)	Feb (75)	Mar (110)	Apr (83)	May (20)	Jun (36)	Jul (12)	Aug (37)	Sep (9)	Oct (11)	Nov (52)	Dec (68)
2005	Jan (46)	Feb (94)	Mar (68)	Apr (55)	May (67)	Jun (65)	Jul (67)	Aug (96)	Sep (79)	Oct (46)	Nov (24)	Dec (64)
2006	Jan (39)	Feb (31)	Mar (48)	Apr (58)	May (31)	Jun (57)	Jul (29)	Aug (40)	Sep (22)	Oct (31)	Nov (44)	Dec (51)
2007	Jan (103)	Feb (172)	Mar (59)	Apr (41)	May (33)	Jun (50)	Jul (60)	Aug (51)	Sep (21)	Oct (40)	Nov (89)	Dec (39)
2008	Jan (28)	Feb (20)	Mar (19)	Apr (29)	May (29)	Jun (24)	Jul (32)	Aug (16)	Sep (35)	Oct (23)	Nov (17)	Dec (19)
2009	Jan (4)	Feb (23)	Mar (16)	Apr (16)	May (38)	Jun (54)	Jul (18)	Aug (40)	Sep (58)	Oct (6)	Nov (8)	Dec (29)
2010	Jan (40)	Feb (40)	Mar (63)	Apr (95)	May (136)	Jun (58)	Jul (91)	Aug (55)	Sep (77)	Oct (52)	Nov (85)	Dec (37)
2011	Jan (22)	Feb (46)	Mar (73)	Apr (138)	May (75)	Jun (35)	Jul (41)	Aug (13)	Sep (13)	Oct (11)	Nov (21)	Dec (5)
2012	Jan (13)	Feb (34)	Mar (59)	Apr (4)	May (13)	Jun (1)	Jul (1)	Aug (1)	Sep (3)	Oct (2)	Nov (4)	Dec (1)
2013	Jan (18)	Feb (28)	Mar (19)	Apr (42)	May (43)	Jun (41)	Jul (41)	Aug (31)	Sep (6)	Oct (2)	Nov (2)	Dec (70)
2014	Jan (55)	Feb (98)	Mar (44)	Apr (40)	May (15)	Jun (18)	Jul (20)	Aug (1)	Sep (13)	Oct (3)	Nov (37)	Dec (85)
2015	Jan (16)	Feb (12)	Mar (16)	Apr (13)	May (16)	Jun (3)	Jul (23)	Aug	Sep	Oct	Nov (9)	Dec (2)
2016	Jan (12)	Feb (1)	Mar (9)	Apr (13)	May (4)	Jun (5)	Jul	Aug	Sep (10)	Oct (11)	Nov (1)	Dec
2017	Jan	Feb (1)	Mar (11)	Apr (8)	May	Jun (6)	Jul	Aug	Sep	Oct (3)	Nov (2)	Dec (1)
2018	Jan (6)	Feb (6)	Mar (3)	Apr (9)	May (3)	Jun	Jul	Aug (3)	Sep (8)	Oct (1)	Nov (1)	Dec (4)
2019	Jan (4)	Feb	Mar (1)	Apr	May (2)	Jun	Jul	Aug	Sep	Oct (2)	Nov (1)	Dec
2020	Jan (22)	Feb (4)	Mar	Apr	May	Jun (1)	Jul (2)	Aug (2)	Sep (1)	Oct	Nov	Dec (1)
2021	Jan	Feb	Mar	Apr	May (1)	Jun	Jul (2)	Aug (2)	Sep	Oct	Nov	Dec
2022	Jan (1)	Feb	Mar (1)	Apr	May	Jun	Jul	Aug (2)	Sep	Oct	Nov	Dec
2023	Jan	Feb	Mar (1)	Apr (1)	May (5)	Jun	Jul	Aug	Sep	Oct	Nov	Dec
2024	Jan	Feb	Mar	Apr	May	Jun	Jul (3)	Aug (3)	Sep	Oct	Nov	Dec

gmod-schema Mailing List for Generic Model Organism Database Project (Page 262)

gmod-schema — For discussion of GMOD schema development