From: Angel P. <an...@pc...> - 2003-12-16 21:20:44
|
John, In the table defs you don't state what schema space you are slating for these tables. I will assume that you mean DoTS. It is debatable whether this is the proper place for these tables. Here are my reasons: 1) Prodom is an external resource that we are mirroring and we do not want to know anything about their algorithms for identifying motifs, we want to import them whole-sale. This is slightly different that saying we are rejecting a prodom motif. (e.g. It is still a valid prodom motif, although not a very informative one) 2) What you really want is to make quality assesment calls on the motifs prior running some learning algorithm for GO assignements. Is this application specific information? Genome annotation that is useful for other folks? I do not know. GUSDEVers, please pipe in here. Unless it is useful for folks to have this information tied to the motif itself, I would place these tables on some application specific space. If there is utility to a qualtiy assement for imported motifs, then we should also track "good" quality motifs, etc, not just rejected or misleading motifs. Information content algorithms come to mind... Angel John Iodice wrote: > > GUS folks, > > I'm working on the GO term predictor, which uses BLAST similarities > between proteins of known function and domains from CDD or ProDom to > automatically assign functions to novel proteins. This system is > built around rules, which are domain - GO-term pairs. In some cases, > bad, repeat-rich domains have given rise to bad rules. We want to > create the ability to mark these domains so they are not used for the > generation or application of rules (and possibly other functions > unrelated to the GO term predictor). > > We propose to do this by means of two new tables. A motif will be > marked as rejected by its addition to the rejectedMotif table. It > will be identified by a source_id/external_database_id pair. The > record will also include, for documentation, an > external_database_release_id and a motif_rejection_reason_id. The > latter will be the primary key of the motifRejectionReason table, > which will store a name and description for each reason. > > The request is number 854957. Here's a link. I include the text below: > https://sourceforge.net/tracker/?func=detail&aid=854957&group_id=54213&atid=479181 > <https://sourceforge.net/tracker/?func=detail&aid=854957&group_id=54213&atid=479181> > > > Thanks in advance for any comments. > John > > --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- > > Summary: Two new tables to support motif rejection > > We need two new tables in the DoTS schema to support > marking a motif as a bad motif. They look like this: > > rejectedMotif > ( > rejected_motif_id number(10) not null, > source_id varchar2(32) not null, > external_database_id number(10) not null, > external_database_release_id number(10) not null, > motif_rejection_reason_id number(10) not null, > {plus housekeeping columns} > ) > > motifRejectionReason > ( > motif_rejection_reason_id number(10) not null, > name varchar2(255) not null, > description varchar2(255), > {plus housekeeping columns} > ) > > rejectedMotif should have an index on (source_id, > external_database_id). > > rejectedMotif rows will number in the hundreds at most > for the forseeable future. There will likely never be more > than 20 rows in motifRejectionReason. > |