|
From: Joan M. <ma...@pc...> - 2003-05-15 15:23:58
|
Hi all,
Steve Fischer wrote:
> The model I was going for is a controlled vocab, ie, that gene names,
> symbols and synonyms are knowable without reference to a Gene object.
> The act of associating a Name with a Gene is "annotating" the Gene,
> and may be tentative. And, there may be more than one Gene that
> tentatively lays claim to that name (eg across species?). IF that is
> the model we are going for, then i don't think i agree that synonyms
> should reference a gene directly.
The effort to assign approved gene symbols and gene names at least by
MGI and HUGO is to assign unique gene symbol and gene names to a gene.
They research a gene name or symbol prior to its approved assignment to
a gene.
A Non Approved gene name or symbol may possibly be assigned to more than
one gene. I am inclined to say that even though this may happen we
should still have the gene_id referenced.
By calling it a gene synonym or alias, we saying that it is an
alternative designation for the gene.
>
> We have seen a similar problem with "reference" sequence, ie, chosing
> one of a set to be representative.
This is true but we are phasing this out by creating a gene
model/sequence instead of a choosing a reference RNA.
>
> Here is how i think it can work (my original w/ modfications as per
> this discussion and pending Joan's explanation of aliases). The
> GeneName has a boolean 'approved' attribute. If it is set, then that
> is the approved name. Otherwise, the GeneName is equal to its
> synonyms, but has been (arbitrarily) chosen as the representative.
> (The other way to do this is to lose GeneName.is_approved and allow
> GeneName.name and GeneName.symbol be nullable, indicating that there
> is no approved name yet).
I have made some changes in the text below and removed a table:
1.GeneSymbol table
> gene_symbol_id
> gene_id
> symbol -- the symbol (a gene can have more than one
> symbol but only one is approved)
> is_approved -- boolean (point to evidence of why this is
> the approved symbol, if MGI gene symbol for example)
review_status_id (manually reviewed = 1, from external base
(not reviewed) =2, updated = 3)
external_db_id (where this symbol was obtained from or
external_db_release_id)
>
> 2.GeneFullName table :
>
> gene_fullname_id
> gene_id
> name -- the full name of the gene
> is_approved (a gene can only have one approved full name, point
> to evidence)
review_status_id
external_db_id (where this name was obtained from)
>
>
> is_not
> gene_name_type_id -- points to a controlled vocab of gene
> name types such as mentioned by Arnaud.
Arnaud, Is is_not necesssary, if in your case, the is_approved is
changed from one gene symbol to another, with the addition of evidence
of why this was done?
also what are the controlled vocabulary types?
Anyway, I am inclined to think that a symbol and a full name of the gene
should have the gene_id referenced in the table
Joan
>
> Jonathan Crabtree wrote:
>
>>
>> Steve Fischer wrote:
>>
>>> right now in GUS, we have a bunch of tables and attribute that
>>> relate to gene symbols, names and aliases:
>>>
>>> Dots::Gene.name
>>> Dots::Gene.gene_symbol
>>> Dots::GeneAlias
>>> Sres::DbRef.gene_symbol (this is pretty clearly a hack. DbRef is
>>> intended to store references to external database entries. it is
>>> hackish to encode in the schema that we assume that such entries are
>>> gene records. they could easily be proteins or journals, whatever)
>>
>>
>>
>> Yes, this is definitely a hack; I added some columns to the DbRef table
>> because I wanted to store 2-3 specific pieces of information for MGI and
>> GeneCards entries, without creating another table. However, I disagree
>> that I "encoded" in the schema the assumption that these DbRef entries
>> are gene records; I think if you look more closely you will see that all
>> of the newly-added columns (gene_symbol, chromosome, centimorgans) are
>> NULLable. Therefore the only assumption I am making is that one or more
>> of these columns *may* be applicable to certain DbRefs.
>>
>>> 1. introduce a GeneName table:
>>> GeneName.gene_name_id
>>> GeneName.name --- the full name
>>> GeneName.symbol -- the symbol
>>>
>>> 2. introduce a GeneSynonym table:
>>> GeneSynonym.gene_name_id -- the GeneName it is a synonym for
>>> GeneSynonym.name -- the full name of the synonym
>>> GeneSynonym.symbol -- the symbol
>>
>>
>>
>> Arnaud's point that a gene may have names, but no approved name is a
>> good
>> one. It suggests that GeneSynonym should reference Gene, not GeneName.
>> We might also consider renaming "GeneName" to "ApprovedGeneName" and
>> "GeneSynonym" to "GeneName". Arnaud's second point, that there are
>> potentially several different categories of names, suggests that we
>> follow the example of the TaxonName table, and add a 'name_class' column
>> to GeneSynonym. (This could also be a controlled vocabulary.) Then I
>> think the only remaining question is whether we are sure that the only
>> kinds of approved names we will ever have are "gene name" and "gene
>> symbol".
>>
>> Jonathan
>>
>>
>>
>> -------------------------------------------------------
>> Enterprise Linux Forum Conference & Expo, June 4-6, 2003, Santa Clara
>> The only event dedicated to issues related to Linux enterprise solutions
>> www.enterpriselinuxforum.com
>>
>> _______________________________________________
>> Gusdev-gusdev mailing list
>> Gus...@li...
>> https://lists.sourceforge.net/lists/listinfo/gusdev-gusdev
>
>
>
>
>
>
> -------------------------------------------------------
> Enterprise Linux Forum Conference & Expo, June 4-6, 2003, Santa Clara
> The only event dedicated to issues related to Linux enterprise solutions
> www.enterpriselinuxforum.com
>
> _______________________________________________
> Gusdev-gusdev mailing list
> Gus...@li...
> https://lists.sourceforge.net/lists/listinfo/gusdev-gusdev
>
|