From: Steve F. <st...@pc...> - 2003-05-15 17:42:18
|
I have a had a long talk with joan about this. I think it is a slightly complicated issue. Joan's approach and mine agree in that they make gene names, synonms and alias a coherent whole. The differ in that mine treats gene names as a controlled vocabulary which can exist without reference to our genes (and are associated with them through an association table) while Joan's approach understands them to be values directly associated with genes, and having no life in our db otherwise. For example, on my approach a gene name can be loaded into the db even if the db does not contain the relevant gene, or the relevant gene has not been figured out yet. Ie, it is truly a controlled vocab. On joan's approach, a gene name belongs to exactly one gene. (However, the gene name table might hold multiple rows which contain the same name value and/or symbol value. That is, the name and symbol are specifically not alternate keys.) Joan's approach has the advantage of being simpler (at least fewer tables), and mine is only necessary if we do indeed want to treat the gene names as a controlled vocab. steve Joan Mazzarelli wrote: > Hi all, > > Steve Fischer wrote: > >> The model I was going for is a controlled vocab, ie, that gene names, >> symbols and synonyms are knowable without reference to a Gene object. >> The act of associating a Name with a Gene is "annotating" the Gene, >> and may be tentative. And, there may be more than one Gene that >> tentatively lays claim to that name (eg across species?). IF that is >> the model we are going for, then i don't think i agree that synonyms >> should reference a gene directly. > > > The effort to assign approved gene symbols and gene names at least by > MGI and HUGO is to assign unique gene symbol and gene names to a gene. > They research a gene name or symbol prior to its approved assignment > to a gene. > > A Non Approved gene name or symbol may possibly be assigned to more > than one gene. I am inclined to say that even though this may happen > we should still have the gene_id referenced. > By calling it a gene synonym or alias, we saying that it is an > alternative designation for the gene. > >> >> We have seen a similar problem with "reference" sequence, ie, chosing >> one of a set to be representative. > > > > This is true but we are phasing this out by creating a gene > model/sequence instead of a choosing a reference RNA. > >> >> Here is how i think it can work (my original w/ modfications as per >> this discussion and pending Joan's explanation of aliases). The >> GeneName has a boolean 'approved' attribute. If it is set, then that >> is the approved name. Otherwise, the GeneName is equal to its >> synonyms, but has been (arbitrarily) chosen as the representative. >> (The other way to do this is to lose GeneName.is_approved and allow >> GeneName.name and GeneName.symbol be nullable, indicating that there >> is no approved name yet). > > > > I have made some changes in the text below and removed a table: > > > 1.GeneSymbol table > >> gene_symbol_id >> gene_id >> symbol -- the symbol (a gene can have more than >> one symbol but only one is approved) >> is_approved -- boolean (point to evidence of why this is >> the approved symbol, if MGI gene symbol for example) > > > review_status_id (manually reviewed = 1, from external base > (not reviewed) =2, updated = 3) > external_db_id (where this symbol was obtained from or > external_db_release_id) > >> >> 2.GeneFullName table : > > >> >> gene_fullname_id >> gene_id name -- the full name of the gene >> is_approved (a gene can only have one approved full name, point >> to evidence) > > > review_status_id > external_db_id (where this name was obtained from) > >> >> >> is_not gene_name_type_id -- points to a controlled vocab >> of gene name types such as mentioned by Arnaud. > > > Arnaud, Is is_not necesssary, if in your case, the is_approved is > changed from one gene symbol to another, with the addition of evidence > of why this was done? > also what are the controlled vocabulary types? > > Anyway, I am inclined to think that a symbol and a full name of the > gene should have the gene_id referenced in the table > > Joan > > >> >> Jonathan Crabtree wrote: >> >>> >>> Steve Fischer wrote: >>> >>>> right now in GUS, we have a bunch of tables and attribute that >>>> relate to gene symbols, names and aliases: >>>> >>>> Dots::Gene.name >>>> Dots::Gene.gene_symbol >>>> Dots::GeneAlias >>>> Sres::DbRef.gene_symbol (this is pretty clearly a hack. DbRef is >>>> intended to store references to external database entries. it is >>>> hackish to encode in the schema that we assume that such entries >>>> are gene records. they could easily be proteins or journals, >>>> whatever) >>> >>> >>> >>> >>> Yes, this is definitely a hack; I added some columns to the DbRef table >>> because I wanted to store 2-3 specific pieces of information for MGI >>> and >>> GeneCards entries, without creating another table. However, I disagree >>> that I "encoded" in the schema the assumption that these DbRef entries >>> are gene records; I think if you look more closely you will see that >>> all >>> of the newly-added columns (gene_symbol, chromosome, centimorgans) are >>> NULLable. Therefore the only assumption I am making is that one or >>> more >>> of these columns *may* be applicable to certain DbRefs. >>> >>>> 1. introduce a GeneName table: >>>> GeneName.gene_name_id >>>> GeneName.name --- the full name >>>> GeneName.symbol -- the symbol >>>> >>>> 2. introduce a GeneSynonym table: >>>> GeneSynonym.gene_name_id -- the GeneName it is a synonym for >>>> GeneSynonym.name -- the full name of the synonym >>>> GeneSynonym.symbol -- the symbol >>> >>> >>> >>> >>> Arnaud's point that a gene may have names, but no approved name is a >>> good >>> one. It suggests that GeneSynonym should reference Gene, not GeneName. >>> We might also consider renaming "GeneName" to "ApprovedGeneName" and >>> "GeneSynonym" to "GeneName". Arnaud's second point, that there are >>> potentially several different categories of names, suggests that we >>> follow the example of the TaxonName table, and add a 'name_class' >>> column >>> to GeneSynonym. (This could also be a controlled vocabulary.) Then I >>> think the only remaining question is whether we are sure that the only >>> kinds of approved names we will ever have are "gene name" and "gene >>> symbol". >>> >>> Jonathan >>> >>> >>> >>> ------------------------------------------------------- >>> Enterprise Linux Forum Conference & Expo, June 4-6, 2003, Santa Clara >>> The only event dedicated to issues related to Linux enterprise >>> solutions >>> www.enterpriselinuxforum.com >>> >>> _______________________________________________ >>> Gusdev-gusdev mailing list >>> Gus...@li... >>> https://lists.sourceforge.net/lists/listinfo/gusdev-gusdev >> >> >> >> >> >> >> >> ------------------------------------------------------- >> Enterprise Linux Forum Conference & Expo, June 4-6, 2003, Santa Clara >> The only event dedicated to issues related to Linux enterprise solutions >> www.enterpriselinuxforum.com >> >> _______________________________________________ >> Gusdev-gusdev mailing list >> Gus...@li... >> https://lists.sourceforge.net/lists/listinfo/gusdev-gusdev >> > |