Re: [Gusdev-gusdev] representing gene symbols

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Hi all,

Steve Fischer wrote:

> The model I was going for is a controlled vocab, ie, that gene names, 
> symbols and synonyms are knowable without reference to a Gene object. 
> The act of associating a Name with a Gene is "annotating" the Gene, 
> and may be tentative.  And, there may be more than one Gene that 
> tentatively lays claim to that name (eg across species?).  IF that is 
> the model we are going for, then i don't think i agree that synonyms 
> should reference a gene directly.

The effort to assign approved gene symbols and gene names at least by 
MGI and HUGO is to assign unique gene symbol and gene names to a gene.
They research a gene name or symbol prior to its approved assignment to 
a gene.

A Non Approved gene name or symbol may possibly be assigned to more than 
one gene.  I am inclined to say that even though this may happen we 
should still have the gene_id referenced.
By calling it a gene synonym or alias, we saying that it is an 
alternative designation for the gene.  

>
> We have seen a similar problem with "reference" sequence, ie, chosing 
> one of a set to be representative.

This is true but we are phasing this out by creating a gene 
model/sequence instead of a choosing a reference RNA.  

>
> Here is how i think it can work (my original w/ modfications as per 
> this discussion and pending Joan's explanation of aliases).  The 
> GeneName has a boolean 'approved' attribute.  If it is set, then that 
> is the approved name.  Otherwise, the GeneName is equal to its 
> synonyms, but has been (arbitrarily) chosen as the representative.   
> (The other way to do this is to lose GeneName.is_approved and allow 
> GeneName.name and GeneName.symbol be nullable, indicating that there 
> is no approved name yet).

I have made some changes in the text below and removed a table:

1.GeneSymbol table

>      gene_symbol_id
>      gene_id
>      symbol              -- the symbol  (a gene can have more than one 
> symbol but only one is approved)
>      is_approved      -- boolean  (point to evidence of why this is 
> the approved symbol, if MGI gene symbol  for example)    

         review_status_id  (manually reviewed = 1, from external base 
(not reviewed) =2, updated = 3)
         external_db_id (where this symbol was obtained from or 
external_db_release_id)

>
> 2.GeneFullName table : 

>
>   gene_fullname_id
>   gene_id      
>   name                     -- the full name of the gene
>   is_approved    (a gene can only have one approved full name, point 
> to evidence)

      review_status_id
      external_db_id (where this name was obtained from)

>
>
>  is_not  
>  gene_name_type_id          -- points to a controlled vocab of gene 
> name types such as mentioned by Arnaud.

Arnaud, Is is_not necesssary, if in your case, the is_approved is 
changed from one gene symbol to another, with the addition of evidence 
of why this was done?
also what are the controlled vocabulary types?

Anyway, I am inclined to think that a symbol and a full name of the gene 
should have the gene_id referenced in the table

Joan

>
> Jonathan Crabtree wrote:
>
>>
>> Steve Fischer wrote:
>>
>>> right now in GUS, we have a bunch of tables and attribute that 
>>> relate to gene symbols, names and aliases:
>>>
>>> Dots::Gene.name
>>> Dots::Gene.gene_symbol
>>> Dots::GeneAlias
>>> Sres::DbRef.gene_symbol   (this is pretty clearly a hack.  DbRef is 
>>> intended to store references to external database entries.  it is 
>>> hackish to encode in the schema that we assume that such entries are 
>>> gene records.  they could easily be proteins or journals, whatever)
>>
>>
>>
>> Yes, this is definitely a hack; I added some columns to the DbRef table
>> because I wanted to store 2-3 specific pieces of information for MGI and
>> GeneCards entries, without creating another table.  However, I disagree
>> that I "encoded" in the schema the assumption that these DbRef entries
>> are gene records; I think if you look more closely you will see that all
>> of the newly-added columns (gene_symbol, chromosome, centimorgans) are
>> NULLable.  Therefore the only assumption I am making is that one or more
>> of these columns *may* be applicable to certain DbRefs.
>>
>>> 1. introduce a GeneName table:
>>>   GeneName.gene_name_id
>>>   GeneName.name    --- the full name
>>>   GeneName.symbol  -- the symbol
>>>
>>> 2. introduce a GeneSynonym table:
>>>    GeneSynonym.gene_name_id     -- the GeneName it is a synonym for
>>>    GeneSynonym.name                  -- the full name of the synonym
>>>    GeneSynonym.symbol               -- the symbol
>>
>>
>>
>> Arnaud's point that a gene may have names, but no approved name is a 
>> good
>> one.  It suggests that GeneSynonym should reference Gene, not GeneName.
>> We might also consider renaming "GeneName" to "ApprovedGeneName" and
>> "GeneSynonym" to "GeneName".  Arnaud's second point, that there are
>> potentially several different categories of names, suggests that we
>> follow the example of the TaxonName table, and add a 'name_class' column
>> to GeneSynonym.  (This could also be a controlled vocabulary.)  Then I
>> think the only remaining question is whether we are sure that the only
>> kinds of approved names we will ever have are "gene name" and "gene 
>> symbol".
>>
>> Jonathan
>>
>>
>>
>> -------------------------------------------------------
>> Enterprise Linux Forum Conference & Expo, June 4-6, 2003, Santa Clara
>> The only event dedicated to issues related to Linux enterprise solutions
>> www.enterpriselinuxforum.com
>>
>> _______________________________________________
>> Gusdev-gusdev mailing list
>> Gus...@li...
>> https://lists.sourceforge.net/lists/listinfo/gusdev-gusdev
>
>
>
>
>
>
> -------------------------------------------------------
> Enterprise Linux Forum Conference & Expo, June 4-6, 2003, Santa Clara
> The only event dedicated to issues related to Linux enterprise solutions
> www.enterpriselinuxforum.com
>
> _______________________________________________
> Gusdev-gusdev mailing list
> Gus...@li...
> https://lists.sourceforge.net/lists/listinfo/gusdev-gusdev
>