From: Steve F. <sfi...@pc...> - 2003-05-05 15:11:31
|
folks- right now in GUS, we have a bunch of tables and attribute that relate to gene symbols, names and aliases: Dots::Gene.name Dots::Gene.gene_symbol Dots::GeneAlias Sres::DbRef.gene_symbol (this is pretty clearly a hack. DbRef is intended to store references to external database entries. it is hackish to encode in the schema that we assume that such entries are gene records. they could easily be proteins or journals, whatever) This schema is being used by the DoTS project to hold both automated assignments of gene_symbol (Sres::DbRef) and manual assignments. The problem for the DoTS project is that these disparate ways of making assignments are not managed as a coherent whole. The manual and automated assignments are not queried together. I am thinking that we should consider a different approach, one modeled on how we store GO assignments. It seems that Gene symbols and GO terms are very similar. they are both amenable to contolled vocabs, and are both assigned by automated and manual operations. This pattern may apply to other types of annotation as well. 1. introduce a GeneName table: GeneName.gene_name_id GeneName.name --- the full name GeneName.symbol -- the symbol 2. introduce a GeneSynonym table: GeneSynonym.gene_name_id -- the GeneName it is a synonym for GeneSynonym.name -- the full name of the synonym GeneSynonym.symbol -- the symbol these tables are treated as controlled vocabularies, downloaded from sites such as HUGO and MGI. 3. introduce a GeneNameAssociation table -- a mapping between Gene and GeneName (better name for this??) GeneNameAssociation.gene_id GeneNameAssociation.gene_name_id GeneNameAssociaction.review_status_id GeneNameAssociaction.is_not probably adopt here an instance and evidence mechanism similar to go assocation. note that this implies a m-m relationship between gene and gene name. while this might not be true in the ideal sense, it may well be true for tentative data, which is what we often have. so, this model accepts that unfortunate fact, and does the best to preserve as much info as we can. |
From: Angel P. <an...@pc...> - 2003-05-05 15:30:04
|
Steve Fischer wrote: > folks- > 1. introduce a GeneName table: > GeneName.gene_name_id > GeneName.name --- the full name > GeneName.symbol -- the symbol > > 2. introduce a GeneSynonym table: > GeneSynonym.gene_name_id -- the GeneName it is a synonym for > GeneSynonym.name -- the full name of the synonym > GeneSynonym.symbol -- the symbol > these tables are treated as controlled vocabularies, downloaded from > sites such as HUGO and MGI. Why do you want to separate the synonyms? It is implying that the GeneName table has an "approved" name and only one approved name, but approved by whom and what about alternate sources of information? Also there should be a ExternalDatabaseRelease FK reference here. I would store all names in a single table and handle the 'approved' names either by a query to the ExternalDBRel (if you always prefer one authority to others) or in the GeneNameAssociation table with a bit column. > > > 3. introduce a GeneNameAssociation table -- a mapping between Gene and > GeneName (better name for this??) > GeneNameAssociation.gene_id > GeneNameAssociation.gene_name_id > GeneNameAssociaction.review_status_id > GeneNameAssociaction.is_not Why "is_not" ? Is this a hold-over from GO terms? I don't see how it applies to GeneName. Angel |
From: Steve F. <sfi...@pc...> - 2003-05-05 15:52:25
|
i separated the synonyms from the names because i understood that HUGO and MGI have standardized that information, ie, that there is an approved name. about the ExternalDatabaseRelease FK, yes, that sounds good. i was only providing a sketch here. the is_not is there for the same reason it is there for GO assocations: the association was made automatically, but a curator says it "is not" true. steve Angel Pizarro wrote: > Steve Fischer wrote: > >> folks- >> 1. introduce a GeneName table: >> GeneName.gene_name_id >> GeneName.name --- the full name >> GeneName.symbol -- the symbol >> >> 2. introduce a GeneSynonym table: >> GeneSynonym.gene_name_id -- the GeneName it is a synonym for >> GeneSynonym.name -- the full name of the synonym >> GeneSynonym.symbol -- the symbol >> these tables are treated as controlled vocabularies, downloaded from >> sites such as HUGO and MGI. > > > Why do you want to separate the synonyms? It is implying that the > GeneName table has an "approved" name and only one approved name, but > approved by whom and what about alternate sources of information? > Also there should be a ExternalDatabaseRelease FK reference here. I > would store all names in a single table and handle the 'approved' > names either by a query to the ExternalDBRel (if you always prefer one > authority to others) or in the GeneNameAssociation table with a bit > column. > >> >> >> 3. introduce a GeneNameAssociation table -- a mapping between Gene >> and GeneName (better name for this??) >> GeneNameAssociation.gene_id >> GeneNameAssociation.gene_name_id >> GeneNameAssociaction.review_status_id >> GeneNameAssociaction.is_not > > > Why "is_not" ? Is this a hold-over from GO terms? I don't see how it > applies to GeneName. > > > Angel > > > > ------------------------------------------------------- > This sf.net email is sponsored by:ThinkGeek > Welcome to geek heaven. > http://thinkgeek.com/sf > _______________________________________________ > Gusdev-gusdev mailing list > Gus...@li... > https://lists.sourceforge.net/lists/listinfo/gusdev-gusdev |
From: Chris S. <sto...@pc...> - 2003-05-05 17:06:05
|
On Mon, 5 May 2003, Steve Fischer wrote: > i separated the synonyms from the names because i understood that HUGO > and MGI have standardized that information, ie, that there is an > approved name. This is correct. Chris -- Chris Stoeckert, Ph.D. Research Associate Professor, Dept. of Genetics Center for Bioinformatics, University of Pennsylvania 423 Guardian Dr., Philadelphia, PA 19104 Ph: 215-573-4409 FAX:215-573-3111 |
From: Angel P. <an...@pc...> - 2003-05-05 18:32:25
|
Chris Stoeckert wrote: >On Mon, 5 May 2003, Steve Fischer wrote: > > > >>i separated the synonyms from the names because i understood that HUGO >>and MGI have standardized that information, ie, that there is an >>approved name. >> >> >This is correct. > > I was thinking more in terms of other organisms outside of human and mouse. But in talking with Steve directly, I have come around to his way of thinking. Basically, we don't want assignments of arbitrary gene names. If the synonym is in the same talbe as the approved name, then the possibility exists for someone to assign the synonym as the sole gene name. This may happen anyway when there is no naming authority for the particular organism being annotated, but at least you are forced to explicitly graduate synonyms to an official gene name, as far as GUS is concerned. Angel >Chris > > > |
From: Arnaud K. <ax...@sa...> - 2003-05-08 09:30:10
|
Hi Steve What about genes which have synonyms but don't have an approved primary name yet ? See below the complete list of gene names we are using. This list doesn't only include the primary name and its synonyms which are, as far as I understood, infered from the function of the gene. It also includes systematic names, assigned by the sequencing centres. Arnaud Additional qualifiers to be used in place of /gene for GeneDB purposes * /systematic_id - final systematic name for when chromosome is finished or stable sequence is submitted, will be title for gene page in abscence of standard name. Could be /locus_tag, the EMBL equivalent (to be discussed??) * /temporary_systematic_id - for temporary systematic name used during projects where sequence is unfinished, i.e temporary name for the shotgun sequences. * /previous_systematic_id - for systematic names no longer in use. * /synonym - used for other gene names still in use and to be displayed on the gene page * /obsolete_name - redundant gene names that can be queried but are not visible on gene page eg. errors * /primary_name - for published or agreed unique user friendly gene name, following the convention set out for kinetoplastids, will be the title for gene page. NB. this is an EMBL-compliant qualifier so it should be used "to give full gene name, but use /gene to give gene symbol". * /reserved_name - pre publication names that will, presumably, become the standard_name Steve Fischer wrote: > folks- > > right now in GUS, we have a bunch of tables and attribute that relate > to gene symbols, names and aliases: > > Dots::Gene.name > Dots::Gene.gene_symbol > Dots::GeneAlias > Sres::DbRef.gene_symbol (this is pretty clearly a hack. DbRef is > intended to store references to external database entries. it is > hackish to encode in the schema that we assume that such entries are > gene records. they could easily be proteins or journals, whatever) > > This schema is being used by the DoTS project to hold both automated > assignments of gene_symbol (Sres::DbRef) and manual assignments. The > problem for the DoTS project is that these disparate ways of making > assignments are not managed as a coherent whole. The manual and > automated assignments are not queried together. > I am thinking that we should consider a different approach, one > modeled on how we store GO assignments. It seems that Gene symbols > and GO terms are very similar. they are both amenable to contolled > vocabs, and are both assigned by automated and manual operations. > This pattern may apply to other types of annotation as well. > > > 1. introduce a GeneName table: > GeneName.gene_name_id > GeneName.name --- the full name > GeneName.symbol -- the symbol > > 2. introduce a GeneSynonym table: > GeneSynonym.gene_name_id -- the GeneName it is a synonym for > GeneSynonym.name -- the full name of the synonym > GeneSynonym.symbol -- the symbol > > these tables are treated as controlled vocabularies, downloaded from > sites such as HUGO and MGI. > > > 3. introduce a GeneNameAssociation table -- a mapping between Gene and > GeneName (better name for this??) > GeneNameAssociation.gene_id > GeneNameAssociation.gene_name_id > GeneNameAssociaction.review_status_id > GeneNameAssociaction.is_not > probably adopt here an instance and evidence mechanism similar to go > assocation. > > note that this implies a m-m relationship between gene and gene name. > while this might not be true in the ideal sense, it may well be true > for tentative data, which is what we often have. so, this model > accepts that unfortunate fact, and does the best to preserve as much > info as we can. > > |
From: Arnaud K. <ax...@sa...> - 2003-05-08 10:08:40
|
Steve What do you call a full gene name and what is the gene symbol ? cheers Arnaud Steve Fischer wrote: > folks- > > right now in GUS, we have a bunch of tables and attribute that relate > to gene symbols, names and aliases: > > Dots::Gene.name > Dots::Gene.gene_symbol > Dots::GeneAlias > Sres::DbRef.gene_symbol (this is pretty clearly a hack. DbRef is > intended to store references to external database entries. it is > hackish to encode in the schema that we assume that such entries are > gene records. they could easily be proteins or journals, whatever) > > This schema is being used by the DoTS project to hold both automated > assignments of gene_symbol (Sres::DbRef) and manual assignments. The > problem for the DoTS project is that these disparate ways of making > assignments are not managed as a coherent whole. The manual and > automated assignments are not queried together. > I am thinking that we should consider a different approach, one > modeled on how we store GO assignments. It seems that Gene symbols > and GO terms are very similar. they are both amenable to contolled > vocabs, and are both assigned by automated and manual operations. > This pattern may apply to other types of annotation as well. > > > 1. introduce a GeneName table: > GeneName.gene_name_id > GeneName.name --- the full name > GeneName.symbol -- the symbol > > 2. introduce a GeneSynonym table: > GeneSynonym.gene_name_id -- the GeneName it is a synonym for > GeneSynonym.name -- the full name of the synonym > GeneSynonym.symbol -- the symbol > > these tables are treated as controlled vocabularies, downloaded from > sites such as HUGO and MGI. > > > 3. introduce a GeneNameAssociation table -- a mapping between Gene and > GeneName (better name for this??) > GeneNameAssociation.gene_id > GeneNameAssociation.gene_name_id > GeneNameAssociaction.review_status_id > GeneNameAssociaction.is_not > probably adopt here an instance and evidence mechanism similar to go > assocation. > > note that this implies a m-m relationship between gene and gene name. > while this might not be true in the ideal sense, it may well be true > for tentative data, which is what we often have. so, this model > accepts that unfortunate fact, and does the best to preserve as much > info as we can. > |
From: Steve F. <st...@pc...> - 2003-05-09 01:13:48
|
the full gene name is something like (i'm making this up) "Breast Cancer 1" and the symbol is "BRCA1" steve Arnaud Kerhornou wrote: > Steve > > What do you call a full gene name and what is the gene symbol ? > > cheers > Arnaud > > Steve Fischer wrote: > >> folks- >> >> right now in GUS, we have a bunch of tables and attribute that relate >> to gene symbols, names and aliases: >> >> Dots::Gene.name >> Dots::Gene.gene_symbol >> Dots::GeneAlias >> Sres::DbRef.gene_symbol (this is pretty clearly a hack. DbRef is >> intended to store references to external database entries. it is >> hackish to encode in the schema that we assume that such entries are >> gene records. they could easily be proteins or journals, whatever) >> >> This schema is being used by the DoTS project to hold both automated >> assignments of gene_symbol (Sres::DbRef) and manual assignments. The >> problem for the DoTS project is that these disparate ways of making >> assignments are not managed as a coherent whole. The manual and >> automated assignments are not queried together. I am thinking that we >> should consider a different approach, one modeled on how we store GO >> assignments. It seems that Gene symbols and GO terms are very >> similar. they are both amenable to contolled vocabs, and are both >> assigned by automated and manual operations. This pattern may apply >> to other types of annotation as well. >> >> >> 1. introduce a GeneName table: >> GeneName.gene_name_id >> GeneName.name --- the full name >> GeneName.symbol -- the symbol >> >> 2. introduce a GeneSynonym table: >> GeneSynonym.gene_name_id -- the GeneName it is a synonym for >> GeneSynonym.name -- the full name of the synonym >> GeneSynonym.symbol -- the symbol >> >> these tables are treated as controlled vocabularies, downloaded from >> sites such as HUGO and MGI. >> >> >> 3. introduce a GeneNameAssociation table -- a mapping between Gene >> and GeneName (better name for this??) >> GeneNameAssociation.gene_id >> GeneNameAssociation.gene_name_id >> GeneNameAssociaction.review_status_id >> GeneNameAssociaction.is_not >> probably adopt here an instance and evidence mechanism similar to >> go assocation. >> >> note that this implies a m-m relationship between gene and gene name. >> while this might not be true in the ideal sense, it may well be true >> for tentative data, which is what we often have. so, this model >> accepts that unfortunate fact, and does the best to preserve as much >> info as we can. >> > > > > ------------------------------------------------------- > Enterprise Linux Forum Conference & Expo, June 4-6, 2003, Santa Clara > The only event dedicated to issues related to Linux enterprise solutions > www.enterpriselinuxforum.com > > _______________________________________________ > Gusdev-gusdev mailing list > Gus...@li... > https://lists.sourceforge.net/lists/listinfo/gusdev-gusdev |
From: Jonathan C. <cra...@pc...> - 2003-05-14 16:11:21
|
Steve Fischer wrote: > right now in GUS, we have a bunch of tables and attribute that relate to > gene symbols, names and aliases: > > Dots::Gene.name > Dots::Gene.gene_symbol > Dots::GeneAlias > Sres::DbRef.gene_symbol (this is pretty clearly a hack. DbRef is > intended to store references to external database entries. it is > hackish to encode in the schema that we assume that such entries are > gene records. they could easily be proteins or journals, whatever) Yes, this is definitely a hack; I added some columns to the DbRef table because I wanted to store 2-3 specific pieces of information for MGI and GeneCards entries, without creating another table. However, I disagree that I "encoded" in the schema the assumption that these DbRef entries are gene records; I think if you look more closely you will see that all of the newly-added columns (gene_symbol, chromosome, centimorgans) are NULLable. Therefore the only assumption I am making is that one or more of these columns *may* be applicable to certain DbRefs. > 1. introduce a GeneName table: > GeneName.gene_name_id > GeneName.name --- the full name > GeneName.symbol -- the symbol > > 2. introduce a GeneSynonym table: > GeneSynonym.gene_name_id -- the GeneName it is a synonym for > GeneSynonym.name -- the full name of the synonym > GeneSynonym.symbol -- the symbol Arnaud's point that a gene may have names, but no approved name is a good one. It suggests that GeneSynonym should reference Gene, not GeneName. We might also consider renaming "GeneName" to "ApprovedGeneName" and "GeneSynonym" to "GeneName". Arnaud's second point, that there are potentially several different categories of names, suggests that we follow the example of the TaxonName table, and add a 'name_class' column to GeneSynonym. (This could also be a controlled vocabulary.) Then I think the only remaining question is whether we are sure that the only kinds of approved names we will ever have are "gene name" and "gene symbol". Jonathan |
From: Joan M. <ma...@pc...> - 2003-05-14 17:20:08
|
Hi all, I thought the point of this discussion was to figure out how to integrate into the tables which contain (or were created to contain) manual gene annotation assignments the gene information which we get from MGI/Gene cards sequence mappings. (although we may want to make recreate these tables for this and/or if PSU has certain needs) . BTW, although a gene symbol is approved it can also change (MGI versions for instance and also they have -pending), so this is another case where changes can occur. As it stands now, in the gene table we have the attribute gene_symbol where the approved human or mouse gene symbol is written for each gene when added by the annotator. Also, in the gene table there is name, where I envisioned using the new annotation tool, the approved gene name would be written. approved gene_symbol = Fzd4 approved gene name = frizzled homolog 4 (Drosophila) https://www.cbil.upenn.edu/cgi-bin/dotsgenes-curator/schemaBrowser.pl?db=GUSdev&table=DoTS::Gene&path=DoTS::Gene Now for the Current dots.GeneSynonym table, the annotator can add gene symbol synonyms for the gene and this is where they are written. https://www.cbil.upenn.edu/cgi-bin/dotsgenes-curator/schemaBrowser.pl?db=GUSdev&table=DoTS::GeneSynonym&path=DoTS::GeneSynonym I created GeneAlias for other (not approved) gene names for a gene to be used by the new annotation tool. https://www.cbil.upenn.edu/cgi-bin/dotsgenes-curator/schemaBrowser.pl?db=GUSdev&table=DoTS::GeneAlias&path=DoTS::GeneAlias For genes, they can have gene symbol synonyms and also gene name aliases. It not necessarily the case where every gene symbol synonym has a gene name alias which corresponds to it, as in the approved case above or vice versa. Joan Jonathan Crabtree wrote: > > Steve Fischer wrote: > >> right now in GUS, we have a bunch of tables and attribute that relate >> to gene symbols, names and aliases: >> >> Dots::Gene.name >> Dots::Gene.gene_symbol >> Dots::GeneAlias >> Sres::DbRef.gene_symbol (this is pretty clearly a hack. DbRef is >> intended to store references to external database entries. it is >> hackish to encode in the schema that we assume that such entries are >> gene records. they could easily be proteins or journals, whatever) > > > Yes, this is definitely a hack; I added some columns to the DbRef table > because I wanted to store 2-3 specific pieces of information for MGI and > GeneCards entries, without creating another table. However, I disagree > that I "encoded" in the schema the assumption that these DbRef entries > are gene records; I think if you look more closely you will see that all > of the newly-added columns (gene_symbol, chromosome, centimorgans) are > NULLable. Therefore the only assumption I am making is that one or more > of these columns *may* be applicable to certain DbRefs. > >> 1. introduce a GeneName table: >> GeneName.gene_name_id >> GeneName.name --- the full name >> GeneName.symbol -- the symbol >> >> 2. introduce a GeneSynonym table: >> GeneSynonym.gene_name_id -- the GeneName it is a synonym for >> GeneSynonym.name -- the full name of the synonym >> GeneSynonym.symbol -- the symbol > > > Arnaud's point that a gene may have names, but no approved name is a good > one. It suggests that GeneSynonym should reference Gene, not GeneName. > We might also consider renaming "GeneName" to "ApprovedGeneName" and > "GeneSynonym" to "GeneName". Arnaud's second point, that there are > potentially several different categories of names, suggests that we > follow the example of the TaxonName table, and add a 'name_class' column > to GeneSynonym. (This could also be a controlled vocabulary.) Then I > think the only remaining question is whether we are sure that the only > kinds of approved names we will ever have are "gene name" and "gene > symbol". > > Jonathan > > > > ------------------------------------------------------- > Enterprise Linux Forum Conference & Expo, June 4-6, 2003, Santa Clara > The only event dedicated to issues related to Linux enterprise solutions > www.enterpriselinuxforum.com > > _______________________________________________ > Gusdev-gusdev mailing list > Gus...@li... > https://lists.sourceforge.net/lists/listinfo/gusdev-gusdev > |
From: Steve F. <st...@pc...> - 2003-05-14 17:56:59
|
Joan- can you explain what a gene alias is as opposed to a gene synonym? thanks, steve Joan Mazzarelli wrote: > Hi all, > > I thought the point of this discussion was to figure out how to > integrate into the tables which contain (or were created to contain) > manual gene annotation assignments > the gene information which we get from MGI/Gene cards sequence > mappings. (although we may want to make recreate these tables for this > and/or if PSU has certain needs) . > > BTW, although a gene symbol is approved it can also change (MGI > versions for instance and also they have -pending), so this is another > case where changes can occur. > As it stands now, in the gene table we have the attribute gene_symbol > where the approved human or mouse gene symbol is written for each gene > when added by the annotator. > Also, in the gene table there is name, where I envisioned using the > new annotation tool, the approved gene name would be written. > approved gene_symbol = Fzd4 > approved gene name = frizzled homolog 4 (Drosophila) > > > https://www.cbil.upenn.edu/cgi-bin/dotsgenes-curator/schemaBrowser.pl?db=GUSdev&table=DoTS::Gene&path=DoTS::Gene > > > Now for the Current dots.GeneSynonym table, the annotator can add gene > symbol synonyms for the gene and this is where they are written. > > https://www.cbil.upenn.edu/cgi-bin/dotsgenes-curator/schemaBrowser.pl?db=GUSdev&table=DoTS::GeneSynonym&path=DoTS::GeneSynonym > > > I created GeneAlias for other (not approved) gene names for a gene to > be used by the new annotation tool. > > https://www.cbil.upenn.edu/cgi-bin/dotsgenes-curator/schemaBrowser.pl?db=GUSdev&table=DoTS::GeneAlias&path=DoTS::GeneAlias > > > > For genes, they can have gene symbol synonyms and also gene name aliases. > It not necessarily the case where every gene symbol synonym has a gene > name alias which corresponds to it, as in the approved case above > or vice versa. > > > > > Joan > > > > Jonathan Crabtree wrote: > >> >> Steve Fischer wrote: >> >>> right now in GUS, we have a bunch of tables and attribute that >>> relate to gene symbols, names and aliases: >>> >>> Dots::Gene.name >>> Dots::Gene.gene_symbol >>> Dots::GeneAlias >>> Sres::DbRef.gene_symbol (this is pretty clearly a hack. DbRef is >>> intended to store references to external database entries. it is >>> hackish to encode in the schema that we assume that such entries are >>> gene records. they could easily be proteins or journals, whatever) >> >> >> >> Yes, this is definitely a hack; I added some columns to the DbRef table >> because I wanted to store 2-3 specific pieces of information for MGI and >> GeneCards entries, without creating another table. However, I disagree >> that I "encoded" in the schema the assumption that these DbRef entries >> are gene records; I think if you look more closely you will see that all >> of the newly-added columns (gene_symbol, chromosome, centimorgans) are >> NULLable. Therefore the only assumption I am making is that one or more >> of these columns *may* be applicable to certain DbRefs. >> >>> 1. introduce a GeneName table: >>> GeneName.gene_name_id >>> GeneName.name --- the full name >>> GeneName.symbol -- the symbol >>> >>> 2. introduce a GeneSynonym table: >>> GeneSynonym.gene_name_id -- the GeneName it is a synonym for >>> GeneSynonym.name -- the full name of the synonym >>> GeneSynonym.symbol -- the symbol >> >> >> >> Arnaud's point that a gene may have names, but no approved name is a >> good >> one. It suggests that GeneSynonym should reference Gene, not GeneName. >> We might also consider renaming "GeneName" to "ApprovedGeneName" and >> "GeneSynonym" to "GeneName". Arnaud's second point, that there are >> potentially several different categories of names, suggests that we >> follow the example of the TaxonName table, and add a 'name_class' column >> to GeneSynonym. (This could also be a controlled vocabulary.) Then I >> think the only remaining question is whether we are sure that the only >> kinds of approved names we will ever have are "gene name" and "gene >> symbol". >> >> Jonathan >> >> >> >> ------------------------------------------------------- >> Enterprise Linux Forum Conference & Expo, June 4-6, 2003, Santa Clara >> The only event dedicated to issues related to Linux enterprise solutions >> www.enterpriselinuxforum.com >> >> _______________________________________________ >> Gusdev-gusdev mailing list >> Gus...@li... >> https://lists.sourceforge.net/lists/listinfo/gusdev-gusdev >> > > > > > ------------------------------------------------------- > Enterprise Linux Forum Conference & Expo, June 4-6, 2003, Santa Clara > The only event dedicated to issues related to Linux enterprise solutions > www.enterpriselinuxforum.com > > _______________________________________________ > Gusdev-gusdev mailing list > Gus...@li... > https://lists.sourceforge.net/lists/listinfo/gusdev-gusdev |
From: Joan M. <ma...@pc...> - 2003-05-14 18:17:45
|
Steve, Alias is another gene name for a gene. Synonym is another gene_symbol for a gene. Perhaps to make it clearer it should be gene_symbol alternative (i.e. not approved) instead of synonym. and gene name alternative instead of alias. Joan Steve Fischer wrote: > Joan- > > can you explain what a gene alias is as opposed to a gene synonym? > > thanks, > steve > > Joan Mazzarelli wrote: > >> Hi all, >> >> I thought the point of this discussion was to figure out how to >> integrate into the tables which contain (or were created to contain) >> manual gene annotation assignments >> the gene information which we get from MGI/Gene cards sequence >> mappings. (although we may want to make recreate these tables for >> this and/or if PSU has certain needs) . >> >> BTW, although a gene symbol is approved it can also change (MGI >> versions for instance and also they have -pending), so this is >> another case where changes can occur. As it stands now, in the gene >> table we have the attribute gene_symbol where the approved human or >> mouse gene symbol is written for each gene when added by the annotator. >> Also, in the gene table there is name, where I envisioned using the >> new annotation tool, the approved gene name would be written. >> approved gene_symbol = Fzd4 >> approved gene name = frizzled homolog 4 (Drosophila) >> >> >> https://www.cbil.upenn.edu/cgi-bin/dotsgenes-curator/schemaBrowser.pl?db=GUSdev&table=DoTS::Gene&path=DoTS::Gene >> >> >> Now for the Current dots.GeneSynonym table, the annotator can add >> gene symbol synonyms for the gene and this is where they are written. >> >> https://www.cbil.upenn.edu/cgi-bin/dotsgenes-curator/schemaBrowser.pl?db=GUSdev&table=DoTS::GeneSynonym&path=DoTS::GeneSynonym >> >> >> I created GeneAlias for other (not approved) gene names for a gene to >> be used by the new annotation tool. >> >> https://www.cbil.upenn.edu/cgi-bin/dotsgenes-curator/schemaBrowser.pl?db=GUSdev&table=DoTS::GeneAlias&path=DoTS::GeneAlias >> >> >> >> For genes, they can have gene symbol synonyms and also gene name >> aliases. >> It not necessarily the case where every gene symbol synonym has a >> gene name alias which corresponds to it, as in the approved case above >> or vice versa. >> >> >> >> >> Joan >> >> >> >> Jonathan Crabtree wrote: >> >>> >>> Steve Fischer wrote: >>> >>>> right now in GUS, we have a bunch of tables and attribute that >>>> relate to gene symbols, names and aliases: >>>> >>>> Dots::Gene.name >>>> Dots::Gene.gene_symbol >>>> Dots::GeneAlias >>>> Sres::DbRef.gene_symbol (this is pretty clearly a hack. DbRef is >>>> intended to store references to external database entries. it is >>>> hackish to encode in the schema that we assume that such entries >>>> are gene records. they could easily be proteins or journals, >>>> whatever) >>> >>> >>> >>> >>> Yes, this is definitely a hack; I added some columns to the DbRef table >>> because I wanted to store 2-3 specific pieces of information for MGI >>> and >>> GeneCards entries, without creating another table. However, I disagree >>> that I "encoded" in the schema the assumption that these DbRef entries >>> are gene records; I think if you look more closely you will see that >>> all >>> of the newly-added columns (gene_symbol, chromosome, centimorgans) are >>> NULLable. Therefore the only assumption I am making is that one or >>> more >>> of these columns *may* be applicable to certain DbRefs. >>> >>>> 1. introduce a GeneName table: >>>> GeneName.gene_name_id >>>> GeneName.name --- the full name >>>> GeneName.symbol -- the symbol >>>> >>>> 2. introduce a GeneSynonym table: >>>> GeneSynonym.gene_name_id -- the GeneName it is a synonym for >>>> GeneSynonym.name -- the full name of the synonym >>>> GeneSynonym.symbol -- the symbol >>> >>> >>> >>> >>> Arnaud's point that a gene may have names, but no approved name is a >>> good >>> one. It suggests that GeneSynonym should reference Gene, not GeneName. >>> We might also consider renaming "GeneName" to "ApprovedGeneName" and >>> "GeneSynonym" to "GeneName". Arnaud's second point, that there are >>> potentially several different categories of names, suggests that we >>> follow the example of the TaxonName table, and add a 'name_class' >>> column >>> to GeneSynonym. (This could also be a controlled vocabulary.) Then I >>> think the only remaining question is whether we are sure that the only >>> kinds of approved names we will ever have are "gene name" and "gene >>> symbol". >>> >>> Jonathan >>> >>> >>> >>> ------------------------------------------------------- >>> Enterprise Linux Forum Conference & Expo, June 4-6, 2003, Santa Clara >>> The only event dedicated to issues related to Linux enterprise >>> solutions >>> www.enterpriselinuxforum.com >>> >>> _______________________________________________ >>> Gusdev-gusdev mailing list >>> Gus...@li... >>> https://lists.sourceforge.net/lists/listinfo/gusdev-gusdev >>> >> >> >> >> >> ------------------------------------------------------- >> Enterprise Linux Forum Conference & Expo, June 4-6, 2003, Santa Clara >> The only event dedicated to issues related to Linux enterprise solutions >> www.enterpriselinuxforum.com >> >> _______________________________________________ >> Gusdev-gusdev mailing list >> Gus...@li... >> https://lists.sourceforge.net/lists/listinfo/gusdev-gusdev > > > > |
From: Steve F. <st...@pc...> - 2003-05-14 18:09:51
|
The model I was going for is a controlled vocab, ie, that gene names, symbols and synonyms are knowable without reference to a Gene object. The act of associating a Name with a Gene is "annotating" the Gene, and may be tentative. And, there may be more than one Gene that tentatively lays claim to that name (eg across species?). IF that is the model we are going for, then i don't think i agree that synonyms should reference a gene directly. We have seen a similar problem with "reference" sequence, ie, chosing one of a set to be representative. Here is how i think it can work (my original w/ modfications as per this discussion and pending Joan's explanation of aliases). The GeneName has a boolean 'approved' attribute. If it is set, then that is the approved name. Otherwise, the GeneName is equal to its synonyms, but has been (arbitrarily) chosen as the representative. (The other way to do this is to lose GeneName.is_approved and allow GeneName.name and GeneName.symbol be nullable, indicating that there is no approved name yet). 1. GeneName table: gene_name_id name -- the full name symbol -- the symbol is_approved -- boolean 2.GeneSynonym table: gene_synonym_id gene_name_id -- the GeneName it is a synonym for name -- the full name of the synonym symbol -- the symbol 3. GeneNameAssociation table -- a mapping between Gene and GeneName (better name for this??) gene_id gene_name_id review_status_id is_not gene_name_type_id -- points to a controlled vocab of gene name types such as mentioned by Arnaud. probably adopt here an instance and evidence mechanism similar to go assocation. Jonathan Crabtree wrote: > > Steve Fischer wrote: > >> right now in GUS, we have a bunch of tables and attribute that relate >> to gene symbols, names and aliases: >> >> Dots::Gene.name >> Dots::Gene.gene_symbol >> Dots::GeneAlias >> Sres::DbRef.gene_symbol (this is pretty clearly a hack. DbRef is >> intended to store references to external database entries. it is >> hackish to encode in the schema that we assume that such entries are >> gene records. they could easily be proteins or journals, whatever) > > > Yes, this is definitely a hack; I added some columns to the DbRef table > because I wanted to store 2-3 specific pieces of information for MGI and > GeneCards entries, without creating another table. However, I disagree > that I "encoded" in the schema the assumption that these DbRef entries > are gene records; I think if you look more closely you will see that all > of the newly-added columns (gene_symbol, chromosome, centimorgans) are > NULLable. Therefore the only assumption I am making is that one or more > of these columns *may* be applicable to certain DbRefs. > >> 1. introduce a GeneName table: >> GeneName.gene_name_id >> GeneName.name --- the full name >> GeneName.symbol -- the symbol >> >> 2. introduce a GeneSynonym table: >> GeneSynonym.gene_name_id -- the GeneName it is a synonym for >> GeneSynonym.name -- the full name of the synonym >> GeneSynonym.symbol -- the symbol > > > Arnaud's point that a gene may have names, but no approved name is a good > one. It suggests that GeneSynonym should reference Gene, not GeneName. > We might also consider renaming "GeneName" to "ApprovedGeneName" and > "GeneSynonym" to "GeneName". Arnaud's second point, that there are > potentially several different categories of names, suggests that we > follow the example of the TaxonName table, and add a 'name_class' column > to GeneSynonym. (This could also be a controlled vocabulary.) Then I > think the only remaining question is whether we are sure that the only > kinds of approved names we will ever have are "gene name" and "gene > symbol". > > Jonathan > > > > ------------------------------------------------------- > Enterprise Linux Forum Conference & Expo, June 4-6, 2003, Santa Clara > The only event dedicated to issues related to Linux enterprise solutions > www.enterpriselinuxforum.com > > _______________________________________________ > Gusdev-gusdev mailing list > Gus...@li... > https://lists.sourceforge.net/lists/listinfo/gusdev-gusdev |
From: Joan M. <ma...@pc...> - 2003-05-15 15:23:58
|
Hi all, Steve Fischer wrote: > The model I was going for is a controlled vocab, ie, that gene names, > symbols and synonyms are knowable without reference to a Gene object. > The act of associating a Name with a Gene is "annotating" the Gene, > and may be tentative. And, there may be more than one Gene that > tentatively lays claim to that name (eg across species?). IF that is > the model we are going for, then i don't think i agree that synonyms > should reference a gene directly. The effort to assign approved gene symbols and gene names at least by MGI and HUGO is to assign unique gene symbol and gene names to a gene. They research a gene name or symbol prior to its approved assignment to a gene. A Non Approved gene name or symbol may possibly be assigned to more than one gene. I am inclined to say that even though this may happen we should still have the gene_id referenced. By calling it a gene synonym or alias, we saying that it is an alternative designation for the gene. > > We have seen a similar problem with "reference" sequence, ie, chosing > one of a set to be representative. This is true but we are phasing this out by creating a gene model/sequence instead of a choosing a reference RNA. > > Here is how i think it can work (my original w/ modfications as per > this discussion and pending Joan's explanation of aliases). The > GeneName has a boolean 'approved' attribute. If it is set, then that > is the approved name. Otherwise, the GeneName is equal to its > synonyms, but has been (arbitrarily) chosen as the representative. > (The other way to do this is to lose GeneName.is_approved and allow > GeneName.name and GeneName.symbol be nullable, indicating that there > is no approved name yet). I have made some changes in the text below and removed a table: 1.GeneSymbol table > gene_symbol_id > gene_id > symbol -- the symbol (a gene can have more than one > symbol but only one is approved) > is_approved -- boolean (point to evidence of why this is > the approved symbol, if MGI gene symbol for example) review_status_id (manually reviewed = 1, from external base (not reviewed) =2, updated = 3) external_db_id (where this symbol was obtained from or external_db_release_id) > > 2.GeneFullName table : > > gene_fullname_id > gene_id > name -- the full name of the gene > is_approved (a gene can only have one approved full name, point > to evidence) review_status_id external_db_id (where this name was obtained from) > > > is_not > gene_name_type_id -- points to a controlled vocab of gene > name types such as mentioned by Arnaud. Arnaud, Is is_not necesssary, if in your case, the is_approved is changed from one gene symbol to another, with the addition of evidence of why this was done? also what are the controlled vocabulary types? Anyway, I am inclined to think that a symbol and a full name of the gene should have the gene_id referenced in the table Joan > > Jonathan Crabtree wrote: > >> >> Steve Fischer wrote: >> >>> right now in GUS, we have a bunch of tables and attribute that >>> relate to gene symbols, names and aliases: >>> >>> Dots::Gene.name >>> Dots::Gene.gene_symbol >>> Dots::GeneAlias >>> Sres::DbRef.gene_symbol (this is pretty clearly a hack. DbRef is >>> intended to store references to external database entries. it is >>> hackish to encode in the schema that we assume that such entries are >>> gene records. they could easily be proteins or journals, whatever) >> >> >> >> Yes, this is definitely a hack; I added some columns to the DbRef table >> because I wanted to store 2-3 specific pieces of information for MGI and >> GeneCards entries, without creating another table. However, I disagree >> that I "encoded" in the schema the assumption that these DbRef entries >> are gene records; I think if you look more closely you will see that all >> of the newly-added columns (gene_symbol, chromosome, centimorgans) are >> NULLable. Therefore the only assumption I am making is that one or more >> of these columns *may* be applicable to certain DbRefs. >> >>> 1. introduce a GeneName table: >>> GeneName.gene_name_id >>> GeneName.name --- the full name >>> GeneName.symbol -- the symbol >>> >>> 2. introduce a GeneSynonym table: >>> GeneSynonym.gene_name_id -- the GeneName it is a synonym for >>> GeneSynonym.name -- the full name of the synonym >>> GeneSynonym.symbol -- the symbol >> >> >> >> Arnaud's point that a gene may have names, but no approved name is a >> good >> one. It suggests that GeneSynonym should reference Gene, not GeneName. >> We might also consider renaming "GeneName" to "ApprovedGeneName" and >> "GeneSynonym" to "GeneName". Arnaud's second point, that there are >> potentially several different categories of names, suggests that we >> follow the example of the TaxonName table, and add a 'name_class' column >> to GeneSynonym. (This could also be a controlled vocabulary.) Then I >> think the only remaining question is whether we are sure that the only >> kinds of approved names we will ever have are "gene name" and "gene >> symbol". >> >> Jonathan >> >> >> >> ------------------------------------------------------- >> Enterprise Linux Forum Conference & Expo, June 4-6, 2003, Santa Clara >> The only event dedicated to issues related to Linux enterprise solutions >> www.enterpriselinuxforum.com >> >> _______________________________________________ >> Gusdev-gusdev mailing list >> Gus...@li... >> https://lists.sourceforge.net/lists/listinfo/gusdev-gusdev > > > > > > > ------------------------------------------------------- > Enterprise Linux Forum Conference & Expo, June 4-6, 2003, Santa Clara > The only event dedicated to issues related to Linux enterprise solutions > www.enterpriselinuxforum.com > > _______________________________________________ > Gusdev-gusdev mailing list > Gus...@li... > https://lists.sourceforge.net/lists/listinfo/gusdev-gusdev > |
From: Steve F. <st...@pc...> - 2003-05-15 17:42:18
|
I have a had a long talk with joan about this. I think it is a slightly complicated issue. Joan's approach and mine agree in that they make gene names, synonms and alias a coherent whole. The differ in that mine treats gene names as a controlled vocabulary which can exist without reference to our genes (and are associated with them through an association table) while Joan's approach understands them to be values directly associated with genes, and having no life in our db otherwise. For example, on my approach a gene name can be loaded into the db even if the db does not contain the relevant gene, or the relevant gene has not been figured out yet. Ie, it is truly a controlled vocab. On joan's approach, a gene name belongs to exactly one gene. (However, the gene name table might hold multiple rows which contain the same name value and/or symbol value. That is, the name and symbol are specifically not alternate keys.) Joan's approach has the advantage of being simpler (at least fewer tables), and mine is only necessary if we do indeed want to treat the gene names as a controlled vocab. steve Joan Mazzarelli wrote: > Hi all, > > Steve Fischer wrote: > >> The model I was going for is a controlled vocab, ie, that gene names, >> symbols and synonyms are knowable without reference to a Gene object. >> The act of associating a Name with a Gene is "annotating" the Gene, >> and may be tentative. And, there may be more than one Gene that >> tentatively lays claim to that name (eg across species?). IF that is >> the model we are going for, then i don't think i agree that synonyms >> should reference a gene directly. > > > The effort to assign approved gene symbols and gene names at least by > MGI and HUGO is to assign unique gene symbol and gene names to a gene. > They research a gene name or symbol prior to its approved assignment > to a gene. > > A Non Approved gene name or symbol may possibly be assigned to more > than one gene. I am inclined to say that even though this may happen > we should still have the gene_id referenced. > By calling it a gene synonym or alias, we saying that it is an > alternative designation for the gene. > >> >> We have seen a similar problem with "reference" sequence, ie, chosing >> one of a set to be representative. > > > > This is true but we are phasing this out by creating a gene > model/sequence instead of a choosing a reference RNA. > >> >> Here is how i think it can work (my original w/ modfications as per >> this discussion and pending Joan's explanation of aliases). The >> GeneName has a boolean 'approved' attribute. If it is set, then that >> is the approved name. Otherwise, the GeneName is equal to its >> synonyms, but has been (arbitrarily) chosen as the representative. >> (The other way to do this is to lose GeneName.is_approved and allow >> GeneName.name and GeneName.symbol be nullable, indicating that there >> is no approved name yet). > > > > I have made some changes in the text below and removed a table: > > > 1.GeneSymbol table > >> gene_symbol_id >> gene_id >> symbol -- the symbol (a gene can have more than >> one symbol but only one is approved) >> is_approved -- boolean (point to evidence of why this is >> the approved symbol, if MGI gene symbol for example) > > > review_status_id (manually reviewed = 1, from external base > (not reviewed) =2, updated = 3) > external_db_id (where this symbol was obtained from or > external_db_release_id) > >> >> 2.GeneFullName table : > > >> >> gene_fullname_id >> gene_id name -- the full name of the gene >> is_approved (a gene can only have one approved full name, point >> to evidence) > > > review_status_id > external_db_id (where this name was obtained from) > >> >> >> is_not gene_name_type_id -- points to a controlled vocab >> of gene name types such as mentioned by Arnaud. > > > Arnaud, Is is_not necesssary, if in your case, the is_approved is > changed from one gene symbol to another, with the addition of evidence > of why this was done? > also what are the controlled vocabulary types? > > Anyway, I am inclined to think that a symbol and a full name of the > gene should have the gene_id referenced in the table > > Joan > > >> >> Jonathan Crabtree wrote: >> >>> >>> Steve Fischer wrote: >>> >>>> right now in GUS, we have a bunch of tables and attribute that >>>> relate to gene symbols, names and aliases: >>>> >>>> Dots::Gene.name >>>> Dots::Gene.gene_symbol >>>> Dots::GeneAlias >>>> Sres::DbRef.gene_symbol (this is pretty clearly a hack. DbRef is >>>> intended to store references to external database entries. it is >>>> hackish to encode in the schema that we assume that such entries >>>> are gene records. they could easily be proteins or journals, >>>> whatever) >>> >>> >>> >>> >>> Yes, this is definitely a hack; I added some columns to the DbRef table >>> because I wanted to store 2-3 specific pieces of information for MGI >>> and >>> GeneCards entries, without creating another table. However, I disagree >>> that I "encoded" in the schema the assumption that these DbRef entries >>> are gene records; I think if you look more closely you will see that >>> all >>> of the newly-added columns (gene_symbol, chromosome, centimorgans) are >>> NULLable. Therefore the only assumption I am making is that one or >>> more >>> of these columns *may* be applicable to certain DbRefs. >>> >>>> 1. introduce a GeneName table: >>>> GeneName.gene_name_id >>>> GeneName.name --- the full name >>>> GeneName.symbol -- the symbol >>>> >>>> 2. introduce a GeneSynonym table: >>>> GeneSynonym.gene_name_id -- the GeneName it is a synonym for >>>> GeneSynonym.name -- the full name of the synonym >>>> GeneSynonym.symbol -- the symbol >>> >>> >>> >>> >>> Arnaud's point that a gene may have names, but no approved name is a >>> good >>> one. It suggests that GeneSynonym should reference Gene, not GeneName. >>> We might also consider renaming "GeneName" to "ApprovedGeneName" and >>> "GeneSynonym" to "GeneName". Arnaud's second point, that there are >>> potentially several different categories of names, suggests that we >>> follow the example of the TaxonName table, and add a 'name_class' >>> column >>> to GeneSynonym. (This could also be a controlled vocabulary.) Then I >>> think the only remaining question is whether we are sure that the only >>> kinds of approved names we will ever have are "gene name" and "gene >>> symbol". >>> >>> Jonathan >>> >>> >>> >>> ------------------------------------------------------- >>> Enterprise Linux Forum Conference & Expo, June 4-6, 2003, Santa Clara >>> The only event dedicated to issues related to Linux enterprise >>> solutions >>> www.enterpriselinuxforum.com >>> >>> _______________________________________________ >>> Gusdev-gusdev mailing list >>> Gus...@li... >>> https://lists.sourceforge.net/lists/listinfo/gusdev-gusdev >> >> >> >> >> >> >> >> ------------------------------------------------------- >> Enterprise Linux Forum Conference & Expo, June 4-6, 2003, Santa Clara >> The only event dedicated to issues related to Linux enterprise solutions >> www.enterpriselinuxforum.com >> >> _______________________________________________ >> Gusdev-gusdev mailing list >> Gus...@li... >> https://lists.sourceforge.net/lists/listinfo/gusdev-gusdev >> > |