|
From: Steve F. <sfi...@pc...> - 2003-05-05 15:11:31
|
folks-
right now in GUS, we have a bunch of tables and attribute that relate to
gene symbols, names and aliases:
Dots::Gene.name
Dots::Gene.gene_symbol
Dots::GeneAlias
Sres::DbRef.gene_symbol (this is pretty clearly a hack. DbRef is
intended to store references to external database entries. it is
hackish to encode in the schema that we assume that such entries are
gene records. they could easily be proteins or journals, whatever)
This schema is being used by the DoTS project to hold both automated
assignments of gene_symbol (Sres::DbRef) and manual assignments. The
problem for the DoTS project is that these disparate ways of making
assignments are not managed as a coherent whole. The manual and
automated assignments are not queried together.
I am thinking that we should consider a different approach, one modeled
on how we store GO assignments. It seems that Gene symbols and GO terms
are very similar. they are both amenable to contolled vocabs, and are
both assigned by automated and manual operations. This pattern may
apply to other types of annotation as well.
1. introduce a GeneName table:
GeneName.gene_name_id
GeneName.name --- the full name
GeneName.symbol -- the symbol
2. introduce a GeneSynonym table:
GeneSynonym.gene_name_id -- the GeneName it is a synonym for
GeneSynonym.name -- the full name of the synonym
GeneSynonym.symbol -- the symbol
these tables are treated as controlled vocabularies, downloaded from
sites such as HUGO and MGI.
3. introduce a GeneNameAssociation table -- a mapping between Gene and
GeneName (better name for this??)
GeneNameAssociation.gene_id
GeneNameAssociation.gene_name_id
GeneNameAssociaction.review_status_id
GeneNameAssociaction.is_not
probably adopt here an instance and evidence mechanism similar to go
assocation.
note that this implies a m-m relationship between gene and gene name.
while this might not be true in the ideal sense, it may well be true
for tentative data, which is what we often have. so, this model accepts
that unfortunate fact, and does the best to preserve as much info as we can.
|