Re: [Gusdev-gusdev] representing gene symbols

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Steve Fischer wrote:
> right now in GUS, we have a bunch of tables and attribute that relate to 
> gene symbols, names and aliases:
> 
> Dots::Gene.name
> Dots::Gene.gene_symbol
> Dots::GeneAlias
> Sres::DbRef.gene_symbol   (this is pretty clearly a hack.  DbRef is 
> intended to store references to external database entries.  it is 
> hackish to encode in the schema that we assume that such entries are 
> gene records.  they could easily be proteins or journals, whatever)

Yes, this is definitely a hack; I added some columns to the DbRef table
because I wanted to store 2-3 specific pieces of information for MGI and
GeneCards entries, without creating another table.  However, I disagree
that I "encoded" in the schema the assumption that these DbRef entries
are gene records; I think if you look more closely you will see that all
of the newly-added columns (gene_symbol, chromosome, centimorgans) are
NULLable.  Therefore the only assumption I am making is that one or more
of these columns *may* be applicable to certain DbRefs.

> 1. introduce a GeneName table:
>   GeneName.gene_name_id
>   GeneName.name    --- the full name
>   GeneName.symbol  -- the symbol
> 
> 2. introduce a GeneSynonym table:
>    GeneSynonym.gene_name_id     -- the GeneName it is a synonym for
>    GeneSynonym.name                  -- the full name of the synonym
>    GeneSynonym.symbol               -- the symbol

Arnaud's point that a gene may have names, but no approved name is a good
one.  It suggests that GeneSynonym should reference Gene, not GeneName.
We might also consider renaming "GeneName" to "ApprovedGeneName" and
"GeneSynonym" to "GeneName".  Arnaud's second point, that there are
potentially several different categories of names, suggests that we
follow the example of the TaxonName table, and add a 'name_class' column
to GeneSynonym.  (This could also be a controlled vocabulary.)  Then I
think the only remaining question is whether we are sure that the only
kinds of approved names we will ever have are "gene name" and "gene symbol".

Jonathan