From: Joan M. <ma...@pc...> - 2003-05-15 15:23:58
|
Hi all, Steve Fischer wrote: > The model I was going for is a controlled vocab, ie, that gene names, > symbols and synonyms are knowable without reference to a Gene object. > The act of associating a Name with a Gene is "annotating" the Gene, > and may be tentative. And, there may be more than one Gene that > tentatively lays claim to that name (eg across species?). IF that is > the model we are going for, then i don't think i agree that synonyms > should reference a gene directly. The effort to assign approved gene symbols and gene names at least by MGI and HUGO is to assign unique gene symbol and gene names to a gene. They research a gene name or symbol prior to its approved assignment to a gene. A Non Approved gene name or symbol may possibly be assigned to more than one gene. I am inclined to say that even though this may happen we should still have the gene_id referenced. By calling it a gene synonym or alias, we saying that it is an alternative designation for the gene. > > We have seen a similar problem with "reference" sequence, ie, chosing > one of a set to be representative. This is true but we are phasing this out by creating a gene model/sequence instead of a choosing a reference RNA. > > Here is how i think it can work (my original w/ modfications as per > this discussion and pending Joan's explanation of aliases). The > GeneName has a boolean 'approved' attribute. If it is set, then that > is the approved name. Otherwise, the GeneName is equal to its > synonyms, but has been (arbitrarily) chosen as the representative. > (The other way to do this is to lose GeneName.is_approved and allow > GeneName.name and GeneName.symbol be nullable, indicating that there > is no approved name yet). I have made some changes in the text below and removed a table: 1.GeneSymbol table > gene_symbol_id > gene_id > symbol -- the symbol (a gene can have more than one > symbol but only one is approved) > is_approved -- boolean (point to evidence of why this is > the approved symbol, if MGI gene symbol for example) review_status_id (manually reviewed = 1, from external base (not reviewed) =2, updated = 3) external_db_id (where this symbol was obtained from or external_db_release_id) > > 2.GeneFullName table : > > gene_fullname_id > gene_id > name -- the full name of the gene > is_approved (a gene can only have one approved full name, point > to evidence) review_status_id external_db_id (where this name was obtained from) > > > is_not > gene_name_type_id -- points to a controlled vocab of gene > name types such as mentioned by Arnaud. Arnaud, Is is_not necesssary, if in your case, the is_approved is changed from one gene symbol to another, with the addition of evidence of why this was done? also what are the controlled vocabulary types? Anyway, I am inclined to think that a symbol and a full name of the gene should have the gene_id referenced in the table Joan > > Jonathan Crabtree wrote: > >> >> Steve Fischer wrote: >> >>> right now in GUS, we have a bunch of tables and attribute that >>> relate to gene symbols, names and aliases: >>> >>> Dots::Gene.name >>> Dots::Gene.gene_symbol >>> Dots::GeneAlias >>> Sres::DbRef.gene_symbol (this is pretty clearly a hack. DbRef is >>> intended to store references to external database entries. it is >>> hackish to encode in the schema that we assume that such entries are >>> gene records. they could easily be proteins or journals, whatever) >> >> >> >> Yes, this is definitely a hack; I added some columns to the DbRef table >> because I wanted to store 2-3 specific pieces of information for MGI and >> GeneCards entries, without creating another table. However, I disagree >> that I "encoded" in the schema the assumption that these DbRef entries >> are gene records; I think if you look more closely you will see that all >> of the newly-added columns (gene_symbol, chromosome, centimorgans) are >> NULLable. Therefore the only assumption I am making is that one or more >> of these columns *may* be applicable to certain DbRefs. >> >>> 1. introduce a GeneName table: >>> GeneName.gene_name_id >>> GeneName.name --- the full name >>> GeneName.symbol -- the symbol >>> >>> 2. introduce a GeneSynonym table: >>> GeneSynonym.gene_name_id -- the GeneName it is a synonym for >>> GeneSynonym.name -- the full name of the synonym >>> GeneSynonym.symbol -- the symbol >> >> >> >> Arnaud's point that a gene may have names, but no approved name is a >> good >> one. It suggests that GeneSynonym should reference Gene, not GeneName. >> We might also consider renaming "GeneName" to "ApprovedGeneName" and >> "GeneSynonym" to "GeneName". Arnaud's second point, that there are >> potentially several different categories of names, suggests that we >> follow the example of the TaxonName table, and add a 'name_class' column >> to GeneSynonym. (This could also be a controlled vocabulary.) Then I >> think the only remaining question is whether we are sure that the only >> kinds of approved names we will ever have are "gene name" and "gene >> symbol". >> >> Jonathan >> >> >> >> ------------------------------------------------------- >> Enterprise Linux Forum Conference & Expo, June 4-6, 2003, Santa Clara >> The only event dedicated to issues related to Linux enterprise solutions >> www.enterpriselinuxforum.com >> >> _______________________________________________ >> Gusdev-gusdev mailing list >> Gus...@li... >> https://lists.sourceforge.net/lists/listinfo/gusdev-gusdev > > > > > > > ------------------------------------------------------- > Enterprise Linux Forum Conference & Expo, June 4-6, 2003, Santa Clara > The only event dedicated to issues related to Linux enterprise solutions > www.enterpriselinuxforum.com > > _______________________________________________ > Gusdev-gusdev mailing list > Gus...@li... > https://lists.sourceforge.net/lists/listinfo/gusdev-gusdev > |