From: Eric J. <e-...@no...> - 2007-10-19 01:39:57
|
Hi Don, Thanks for bringing this up. I agree that they should not be denormalized and that synonyms should not replicate the name and uniquename in the feature table. In addition to being inefficient in terms of space any denormalization make data management more difficult. As I remember, this denormalization was introduced as a performance enhancement for the Gbrowse adaptor. Eric On 10/17/07, Don Gilbert <gil...@cr...> wrote: > > Dear folks, > > I ran into a memory error with feature table dumps from the > parameciumDB chado database which confused me, as the SQL > is fairly simple, and the data set is smaller than others > I've worked with. The sql dump does however include a view > extracting per feature properties, synonyms, dbxrefs, etc. > > On inspection I found that each feature had its name and > uniquename also stored via feature_synonym and synonym > tables. This produced some 3 million feature_synonym and a > like number of synonym entries for only 1.5 million features > in parameciumDB. In contrast, the flybase dros.mel chado > (v5.1) has some 20 million features, but only 800 K feature > synonyms. > > This to me seems a page from the department of redundancy > department, i.e., unnecessary duplication of names. > > My reading of chado (*) is that the synonym table is for use > of extra names, from mostly experimental literature. Not > needed especially for those throw away auto-generated IDs > that come from analysis features that no person memorizes (and often > are not publicly viewed). > > Do others believe otherwise? If not, this can be cured in > gmod_bulk_load_gff3.pl, which seems to have generated > parameciumDB's generous copies of names. > > * http://www.gmod.org/wiki/index.php/Chado_Sequence_Module#Feature_Synonyms > > PS, after dropping those redundant synonyms, the feature dump worked fine. > > Example table row sizes (=reltuples) for Dros and Paramecium chado dbs: > > chado_dmel_r5_1=# > select relname,reltype,relpages,reltuples,relkind from pg_class > where relkind = 'r' and relnamespace = 2200 order by reltuples DESC ; > relname | reltype | relpages | reltuples | relkind > ------------------------------+---------+----------+-------------+--------- > feature | 59810 | 326737 | 2.0808e+07 | r <<< > analysisfeature | 59795 | 122570 | 1.80887e+07 | r > featureloc | 59778 | 281770 | 1.68201e+07 | r > feature_relationship | 59825 | 119071 | 1.61936e+07 | r > featureprop | 60089 | 63267 | 4.11791e+06 | r > dbxref | 59908 | 17867 | 2.19424e+06 | r > feature_dbxref | 59997 | 12478 | 1.95898e+06 | r > featureprop_pub | 60098 | 8788 | 1.37968e+06 | r > feature_pub | 60018 | 8719 | 1.36878e+06 | r > feature_relationship_pub | 60032 | 6836 | 1.07321e+06 | r > feature_synonym | 60051 | 6448 | 876919 | r <<< > synonym | 60343 | 6796 | 672363 | r > library_feature | 60142 | 3812 | 598451 | r > feature_relationshipprop | 60037 | 4821 | 534885 | r > feature_cvterm | 59977 | 3788 | 515107 | r > feature_genotype | 60008 | 3921 | 470480 | r > feature_pubprop | 60023 | 2448 | 384298 | r > pubauthor | 60249 | 3498 | 378145 | r > featureloc_pub | 60063 | 2153 | 337944 | r > feature_cvtermprop | 59988 | 2596 | 311867 | r > > parameciumDB=# > select relname,reltype,relpages,reltuples,relkind from pg_class > where relkind = 'r' and relnamespace = 2200 order by reltuples DESC ; > relname | reltype | relpages | reltuples | relkind > ------------------------------+---------+----------+-------------+--------- > feature_synonym | 62007 | 24262 | 3.29954e+06 | r <<< > synonym | 62002 | 32681 | 2.6971e+06 | r > featureloc | 61906 | 30890 | 2.31602e+06 | r > feature_dbxref | 61943 | 13475 | 2.11543e+06 | r > feature | 61892 | 41981 | 1.67906e+06 | r <<< > tmp_gff_load_cache | 62791 | 15138 | 1.67906e+06 | r > analysisfeature | 62117 | 11606 | 1.66231e+06 | r > feature_relationship | 61949 | 8075 | 1.09814e+06 | r > featureprop | 61929 | 6951 | 589385 | r > dbxref | 61725 | 1993 | 243508 | r > feature_cvterm | 61977 | 1149 | 156184 | r > > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Splunk Inc. > Still grepping through log files to find problems? Stop. > Now Search log events and configuration files using AJAX and a browser. > Download your FREE copy of Splunk now >> http://get.splunk.com/ > _______________________________________________ > Gmod-schema mailing list > Gmo...@li... > https://lists.sourceforge.net/lists/listinfo/gmod-schema > |