From: Arnaud K. <ax...@sa...> - 2003-05-08 10:08:40
|
Steve What do you call a full gene name and what is the gene symbol ? cheers Arnaud Steve Fischer wrote: > folks- > > right now in GUS, we have a bunch of tables and attribute that relate > to gene symbols, names and aliases: > > Dots::Gene.name > Dots::Gene.gene_symbol > Dots::GeneAlias > Sres::DbRef.gene_symbol (this is pretty clearly a hack. DbRef is > intended to store references to external database entries. it is > hackish to encode in the schema that we assume that such entries are > gene records. they could easily be proteins or journals, whatever) > > This schema is being used by the DoTS project to hold both automated > assignments of gene_symbol (Sres::DbRef) and manual assignments. The > problem for the DoTS project is that these disparate ways of making > assignments are not managed as a coherent whole. The manual and > automated assignments are not queried together. > I am thinking that we should consider a different approach, one > modeled on how we store GO assignments. It seems that Gene symbols > and GO terms are very similar. they are both amenable to contolled > vocabs, and are both assigned by automated and manual operations. > This pattern may apply to other types of annotation as well. > > > 1. introduce a GeneName table: > GeneName.gene_name_id > GeneName.name --- the full name > GeneName.symbol -- the symbol > > 2. introduce a GeneSynonym table: > GeneSynonym.gene_name_id -- the GeneName it is a synonym for > GeneSynonym.name -- the full name of the synonym > GeneSynonym.symbol -- the symbol > > these tables are treated as controlled vocabularies, downloaded from > sites such as HUGO and MGI. > > > 3. introduce a GeneNameAssociation table -- a mapping between Gene and > GeneName (better name for this??) > GeneNameAssociation.gene_id > GeneNameAssociation.gene_name_id > GeneNameAssociaction.review_status_id > GeneNameAssociaction.is_not > probably adopt here an instance and evidence mechanism similar to go > assocation. > > note that this implies a m-m relationship between gene and gene name. > while this might not be true in the ideal sense, it may well be true > for tentative data, which is what we often have. so, this model > accepts that unfortunate fact, and does the best to preserve as much > info as we can. > |