From: John I. <io...@pc...> - 2004-09-17 19:11:39
|
Folks, Shailesh and I are planning to add the phylogenetic profile data he has generated (http://apropos.icmb.utexas.edu/plex/plex.html) to PlasmoDB. This will require storing the data somewhere in GUS. We propose four new tables, described below, to do this. The technique involves creating a "profile" for a gene by BLASTing it against each sequenced genome. A matrix is created with the rows labelled by genes and the column by organisms. In each cell is stored the E value of the best BLAST hit from searching the column's organism's genome with the row's gene's sequence. For pairs of profiles (i.e. pairs of genes), a mutual information score is calculated; genes with a high mutual information score are likely to be part of the same pathway. For GUS purposes, I propose to call a row of the matrix (i.e. the "profile" of one gene) a "PhylogeneticProfile", and an entire run a "PhylogeneticProfileSet". One element of a row is a "PhylogeneticProfileMember". I propose the "MutualInformationScore" table to store the relative score of two genes. This operation is not associative, so that table has a primary_profile_id and secondary_profile_id, together with a numeric mutual_information_score. This means that the full name of this last-mentioned column will be the somewhat awkward "DoTS.MutualInformationScore.mutual_information_score". The schema is below. Thanks in advance for any comments you can offer. John --------------------------------------------------------------- create table PhylogeneticProfileSet ( phylogenetic_profile_set_id NUMBER(10) primary key, description varchar2(255), modification_date DATE NOT NULL, user_read NUMBER(1) NOT NULL, user_write NUMBER(1) NOT NULL, group_read NUMBER(1) NOT NULL, group_write NUMBER(1) NOT NULL, other_read NUMBER(1) NOT NULL, other_write NUMBER(1) NOT NULL, row_user_id NUMBER(12) NOT NULL, row_group_id NUMBER(3) NOT NULL, row_project_id NUMBER(4) NOT NULL, row_alg_invocation_id NUMBER(12) NOT NULL ); create table PhylogeneticProfile ( phylogenetic_profile_id NUMBER(10) primary key, na_feature_id NUMBER(10) not null, phylogenetic_profile_set_id NUMBER(10) not null, minus_log_entropy NUMBER(10), modification_date DATE NOT NULL, user_read NUMBER(1) NOT NULL, user_write NUMBER(1) NOT NULL, group_read NUMBER(1) NOT NULL, group_write NUMBER(1) NOT NULL, other_read NUMBER(1) NOT NULL, other_write NUMBER(1) NOT NULL, row_user_id NUMBER(12) NOT NULL, row_group_id NUMBER(3) NOT NULL, row_project_id NUMBER(4) NOT NULL, row_alg_invocation_id NUMBER(12) NOT NULL, CONSTRAINT PhylogeneticProfile_fk01 FOREIGN KEY (na_feature_id) REFERENCES DoTS.NaFeatureImp (na_feature_id), CONSTRAINT PhylogeneticProfile_fk02 FOREIGN KEY (phylogenetic_profile_set_id) REFERENCES PhylogeneticProfileSet (phylogenetic_profile_set_id) ); create table PhylogeneticProfileMember ( phylogenetic_profile_member_id NUMBER(10) primary key, phylogenetic_profile_id NUMBER(10) not null, taxon_id NUMBER(10) not null, minus_log_e_value NUMBER(10), modification_date DATE NOT NULL, user_read NUMBER(1) NOT NULL, user_write NUMBER(1) NOT NULL, group_read NUMBER(1) NOT NULL, group_write NUMBER(1) NOT NULL, other_read NUMBER(1) NOT NULL, other_write NUMBER(1) NOT NULL, row_user_id NUMBER(12) NOT NULL, row_group_id NUMBER(3) NOT NULL, row_project_id NUMBER(4) NOT NULL, row_alg_invocation_id NUMBER(12) NOT NULL, CONSTRAINT PhylogeneticProfileMember_fk01 FOREIGN KEY (taxon_id) REFERENCES SRes.Taxon (taxon_id) ); create table MutualInformationScore ( primary_profile_id NUMBER(10) primary key, secondary_profile_id NUMBER(10) not null, mutual_information_score NUMBER(10) not null, modification_date DATE NOT NULL, user_read NUMBER(1) NOT NULL, user_write NUMBER(1) NOT NULL, group_read NUMBER(1) NOT NULL, group_write NUMBER(1) NOT NULL, other_read NUMBER(1) NOT NULL, other_write NUMBER(1) NOT NULL, row_user_id NUMBER(12) NOT NULL, row_group_id NUMBER(3) NOT NULL, row_project_id NUMBER(4) NOT NULL, row_alg_invocation_id NUMBER(12) NOT NULL, CONSTRAINT MutualInformationScore_fk01 FOREIGN KEY (primary_profile_id) REFERENCES PhylogeneticProfile (phylogenetic_profile_id), CONSTRAINT MutualInformationScore_fk02 FOREIGN KEY (secondary_profile_id) REFERENCES PhylogeneticProfile (phylogenetic_profile_id) ); |