From: Arnaud K. <ax...@sa...> - 2003-05-08 09:30:10
|
Hi Steve What about genes which have synonyms but don't have an approved primary name yet ? See below the complete list of gene names we are using. This list doesn't only include the primary name and its synonyms which are, as far as I understood, infered from the function of the gene. It also includes systematic names, assigned by the sequencing centres. Arnaud Additional qualifiers to be used in place of /gene for GeneDB purposes * /systematic_id - final systematic name for when chromosome is finished or stable sequence is submitted, will be title for gene page in abscence of standard name. Could be /locus_tag, the EMBL equivalent (to be discussed??) * /temporary_systematic_id - for temporary systematic name used during projects where sequence is unfinished, i.e temporary name for the shotgun sequences. * /previous_systematic_id - for systematic names no longer in use. * /synonym - used for other gene names still in use and to be displayed on the gene page * /obsolete_name - redundant gene names that can be queried but are not visible on gene page eg. errors * /primary_name - for published or agreed unique user friendly gene name, following the convention set out for kinetoplastids, will be the title for gene page. NB. this is an EMBL-compliant qualifier so it should be used "to give full gene name, but use /gene to give gene symbol". * /reserved_name - pre publication names that will, presumably, become the standard_name Steve Fischer wrote: > folks- > > right now in GUS, we have a bunch of tables and attribute that relate > to gene symbols, names and aliases: > > Dots::Gene.name > Dots::Gene.gene_symbol > Dots::GeneAlias > Sres::DbRef.gene_symbol (this is pretty clearly a hack. DbRef is > intended to store references to external database entries. it is > hackish to encode in the schema that we assume that such entries are > gene records. they could easily be proteins or journals, whatever) > > This schema is being used by the DoTS project to hold both automated > assignments of gene_symbol (Sres::DbRef) and manual assignments. The > problem for the DoTS project is that these disparate ways of making > assignments are not managed as a coherent whole. The manual and > automated assignments are not queried together. > I am thinking that we should consider a different approach, one > modeled on how we store GO assignments. It seems that Gene symbols > and GO terms are very similar. they are both amenable to contolled > vocabs, and are both assigned by automated and manual operations. > This pattern may apply to other types of annotation as well. > > > 1. introduce a GeneName table: > GeneName.gene_name_id > GeneName.name --- the full name > GeneName.symbol -- the symbol > > 2. introduce a GeneSynonym table: > GeneSynonym.gene_name_id -- the GeneName it is a synonym for > GeneSynonym.name -- the full name of the synonym > GeneSynonym.symbol -- the symbol > > these tables are treated as controlled vocabularies, downloaded from > sites such as HUGO and MGI. > > > 3. introduce a GeneNameAssociation table -- a mapping between Gene and > GeneName (better name for this??) > GeneNameAssociation.gene_id > GeneNameAssociation.gene_name_id > GeneNameAssociaction.review_status_id > GeneNameAssociaction.is_not > probably adopt here an instance and evidence mechanism similar to go > assocation. > > note that this implies a m-m relationship between gene and gene name. > while this might not be true in the ideal sense, it may well be true > for tentative data, which is what we often have. so, this model > accepts that unfortunate fact, and does the best to preserve as much > info as we can. > > |