Re: [Gusdev-gusdev] representing gene symbols

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Hi Steve

What about genes which have synonyms but don't have an approved primary 
name yet ?

See below the complete list of gene names we are using. This list 
doesn't only include the primary name and its synonyms which are, as far 
as I understood, infered from the function of the gene. It also includes 
systematic names, assigned by the sequencing centres.

Arnaud

        Additional qualifiers to be used in place of /gene for GeneDB
        purposes

    * /systematic_id - final systematic name for when chromosome is
      finished or stable sequence is submitted, will be title for gene
      page in abscence of standard name. Could be /locus_tag, the EMBL
      equivalent (to be discussed??)
    * /temporary_systematic_id - for temporary systematic name used
      during projects where sequence is unfinished, i.e temporary name
      for the shotgun sequences.
    * /previous_systematic_id - for systematic names no longer in use.
    * /synonym - used for other gene names still in use and to be
      displayed on the gene page
    * /obsolete_name - redundant gene names that can be queried but are
      not visible on gene page eg. errors
    * /primary_name - for published or agreed unique user friendly gene
      name, following the convention set out for kinetoplastids, will be
      the title for gene page. NB. this is an EMBL-compliant qualifier
      so it should be used "to give full gene name, but use /gene to
      give gene symbol".
    * /reserved_name - pre publication names that will, presumably,
      become the standard_name

Steve Fischer wrote:

> folks-
>
> right now in GUS, we have a bunch of tables and attribute that relate 
> to gene symbols, names and aliases:
>
> Dots::Gene.name
> Dots::Gene.gene_symbol
> Dots::GeneAlias
> Sres::DbRef.gene_symbol   (this is pretty clearly a hack.  DbRef is 
> intended to store references to external database entries.  it is 
> hackish to encode in the schema that we assume that such entries are 
> gene records.  they could easily be proteins or journals, whatever)
>
> This schema is being used by the DoTS project to hold both automated 
> assignments of gene_symbol (Sres::DbRef) and manual assignments.  The 
> problem for the DoTS project is that these disparate ways of making 
> assignments are not managed as a coherent whole. The manual and 
> automated assignments are not queried together. 
> I am thinking that we should consider a different approach, one 
> modeled on how we store GO assignments.  It seems that Gene symbols 
> and GO terms are very similar.  they are both amenable to contolled 
> vocabs, and are both assigned by automated and manual operations.  
> This pattern may apply to other types of annotation as well.
>
>
> 1. introduce a GeneName table:
>   GeneName.gene_name_id
>   GeneName.name    --- the full name
>   GeneName.symbol  -- the symbol
>
> 2. introduce a GeneSynonym table:
>    GeneSynonym.gene_name_id     -- the GeneName it is a synonym for
>    GeneSynonym.name                  -- the full name of the synonym
>    GeneSynonym.symbol               -- the symbol
>
> these tables are treated as controlled vocabularies, downloaded from 
> sites such as HUGO and MGI.
>
>
> 3. introduce a GeneNameAssociation table -- a mapping between Gene and 
> GeneName (better name for this??)
>   GeneNameAssociation.gene_id
>   GeneNameAssociation.gene_name_id
>   GeneNameAssociaction.review_status_id
>   GeneNameAssociaction.is_not
>   probably adopt here an instance and evidence mechanism similar to go 
> assocation.
>
> note that this implies a m-m relationship between gene and gene name. 
> while this might not be true in the ideal sense, it may well be true 
> for tentative data, which is what we often have.  so, this model 
> accepts that unfortunate fact, and does the best to preserve as much 
> info as we can.
>
>