Thread: [Gusdev-gusdev] representing gene symbols

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

folks-

right now in GUS, we have a bunch of tables and attribute that relate to 
gene symbols, names and aliases:

Dots::Gene.name
Dots::Gene.gene_symbol
Dots::GeneAlias
Sres::DbRef.gene_symbol   (this is pretty clearly a hack.  DbRef is 
intended to store references to external database entries.  it is 
hackish to encode in the schema that we assume that such entries are 
gene records.  they could easily be proteins or journals, whatever)

This schema is being used by the DoTS project to hold both automated 
assignments of gene_symbol (Sres::DbRef) and manual assignments.  The 
problem for the DoTS project is that these disparate ways of making 
assignments are not managed as a coherent whole. The manual and 
automated assignments are not queried together.  

I am thinking that we should consider a different approach, one modeled 
on how we store GO assignments.  It seems that Gene symbols and GO terms 
are very similar.  they are both amenable to contolled vocabs, and are 
both assigned by automated and manual operations.  This pattern may 
apply to other types of annotation as well.

1. introduce a GeneName table:
   GeneName.gene_name_id
   GeneName.name    --- the full name
   GeneName.symbol  -- the symbol

2. introduce a GeneSynonym table:
    GeneSynonym.gene_name_id     -- the GeneName it is a synonym for
    GeneSynonym.name                  -- the full name of the synonym
    GeneSynonym.symbol               -- the symbol

these tables are treated as controlled vocabularies, downloaded from 
sites such as HUGO and MGI.

3. introduce a GeneNameAssociation table -- a mapping between Gene and 
GeneName (better name for this??)
   GeneNameAssociation.gene_id
   GeneNameAssociation.gene_name_id
   GeneNameAssociaction.review_status_id
   GeneNameAssociaction.is_not
   probably adopt here an instance and evidence mechanism similar to go 
assocation.

note that this implies a m-m relationship between gene and gene name. 
 while this might not be true in the ideal sense, it may well be true 
for tentative data, which is what we often have.  so, this model accepts 
that unfortunate fact, and does the best to preserve as much info as we can.

Thread: [Gusdev-gusdev] representing gene symbols

gusdev-gusdev