From: Jonathan C. <cra...@pc...> - 2003-05-14 16:11:21
|
Steve Fischer wrote: > right now in GUS, we have a bunch of tables and attribute that relate to > gene symbols, names and aliases: > > Dots::Gene.name > Dots::Gene.gene_symbol > Dots::GeneAlias > Sres::DbRef.gene_symbol (this is pretty clearly a hack. DbRef is > intended to store references to external database entries. it is > hackish to encode in the schema that we assume that such entries are > gene records. they could easily be proteins or journals, whatever) Yes, this is definitely a hack; I added some columns to the DbRef table because I wanted to store 2-3 specific pieces of information for MGI and GeneCards entries, without creating another table. However, I disagree that I "encoded" in the schema the assumption that these DbRef entries are gene records; I think if you look more closely you will see that all of the newly-added columns (gene_symbol, chromosome, centimorgans) are NULLable. Therefore the only assumption I am making is that one or more of these columns *may* be applicable to certain DbRefs. > 1. introduce a GeneName table: > GeneName.gene_name_id > GeneName.name --- the full name > GeneName.symbol -- the symbol > > 2. introduce a GeneSynonym table: > GeneSynonym.gene_name_id -- the GeneName it is a synonym for > GeneSynonym.name -- the full name of the synonym > GeneSynonym.symbol -- the symbol Arnaud's point that a gene may have names, but no approved name is a good one. It suggests that GeneSynonym should reference Gene, not GeneName. We might also consider renaming "GeneName" to "ApprovedGeneName" and "GeneSynonym" to "GeneName". Arnaud's second point, that there are potentially several different categories of names, suggests that we follow the example of the TaxonName table, and add a 'name_class' column to GeneSynonym. (This could also be a controlled vocabulary.) Then I think the only remaining question is whether we are sure that the only kinds of approved names we will ever have are "gene name" and "gene symbol". Jonathan |