From: Steve F. <sfi...@pc...> - 2003-05-05 15:11:31
|
folks- right now in GUS, we have a bunch of tables and attribute that relate to gene symbols, names and aliases: Dots::Gene.name Dots::Gene.gene_symbol Dots::GeneAlias Sres::DbRef.gene_symbol (this is pretty clearly a hack. DbRef is intended to store references to external database entries. it is hackish to encode in the schema that we assume that such entries are gene records. they could easily be proteins or journals, whatever) This schema is being used by the DoTS project to hold both automated assignments of gene_symbol (Sres::DbRef) and manual assignments. The problem for the DoTS project is that these disparate ways of making assignments are not managed as a coherent whole. The manual and automated assignments are not queried together. I am thinking that we should consider a different approach, one modeled on how we store GO assignments. It seems that Gene symbols and GO terms are very similar. they are both amenable to contolled vocabs, and are both assigned by automated and manual operations. This pattern may apply to other types of annotation as well. 1. introduce a GeneName table: GeneName.gene_name_id GeneName.name --- the full name GeneName.symbol -- the symbol 2. introduce a GeneSynonym table: GeneSynonym.gene_name_id -- the GeneName it is a synonym for GeneSynonym.name -- the full name of the synonym GeneSynonym.symbol -- the symbol these tables are treated as controlled vocabularies, downloaded from sites such as HUGO and MGI. 3. introduce a GeneNameAssociation table -- a mapping between Gene and GeneName (better name for this??) GeneNameAssociation.gene_id GeneNameAssociation.gene_name_id GeneNameAssociaction.review_status_id GeneNameAssociaction.is_not probably adopt here an instance and evidence mechanism similar to go assocation. note that this implies a m-m relationship between gene and gene name. while this might not be true in the ideal sense, it may well be true for tentative data, which is what we often have. so, this model accepts that unfortunate fact, and does the best to preserve as much info as we can. |