You can subscribe to this list here.
2002 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
(11) |
Jul
(34) |
Aug
(14) |
Sep
(10) |
Oct
(10) |
Nov
(11) |
Dec
(6) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2003 |
Jan
(56) |
Feb
(76) |
Mar
(68) |
Apr
(11) |
May
(97) |
Jun
(16) |
Jul
(29) |
Aug
(35) |
Sep
(18) |
Oct
(32) |
Nov
(23) |
Dec
(77) |
2004 |
Jan
(52) |
Feb
(44) |
Mar
(55) |
Apr
(38) |
May
(106) |
Jun
(82) |
Jul
(76) |
Aug
(47) |
Sep
(36) |
Oct
(56) |
Nov
(46) |
Dec
(61) |
2005 |
Jan
(52) |
Feb
(118) |
Mar
(41) |
Apr
(40) |
May
(35) |
Jun
(99) |
Jul
(84) |
Aug
(104) |
Sep
(53) |
Oct
(107) |
Nov
(68) |
Dec
(30) |
2006 |
Jan
(19) |
Feb
(27) |
Mar
(24) |
Apr
(9) |
May
(22) |
Jun
(11) |
Jul
(34) |
Aug
(8) |
Sep
(15) |
Oct
(55) |
Nov
(16) |
Dec
(2) |
2007 |
Jan
(12) |
Feb
(4) |
Mar
(8) |
Apr
|
May
(19) |
Jun
(3) |
Jul
(1) |
Aug
(6) |
Sep
(12) |
Oct
(3) |
Nov
|
Dec
|
2008 |
Jan
(4) |
Feb
|
Mar
|
Apr
|
May
(1) |
Jun
(1) |
Jul
|
Aug
|
Sep
|
Oct
(1) |
Nov
|
Dec
(21) |
2009 |
Jan
|
Feb
(2) |
Mar
(1) |
Apr
|
May
(1) |
Jun
(8) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2010 |
Jan
|
Feb
(1) |
Mar
(4) |
Apr
(3) |
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2011 |
Jan
|
Feb
|
Mar
|
Apr
(4) |
May
(19) |
Jun
(14) |
Jul
(1) |
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2012 |
Jan
|
Feb
|
Mar
(22) |
Apr
(12) |
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2013 |
Jan
(2) |
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(2) |
Nov
|
Dec
|
2015 |
Jan
|
Feb
|
Mar
|
Apr
|
May
(3) |
Jun
|
Jul
|
Aug
(2) |
Sep
|
Oct
|
Nov
|
Dec
(1) |
2016 |
Jan
(1) |
Feb
(1) |
Mar
|
Apr
(1) |
May
|
Jun
(2) |
Jul
(1) |
Aug
|
Sep
|
Oct
(1) |
Nov
(1) |
Dec
|
2017 |
Jan
|
Feb
|
Mar
|
Apr
|
May
(1) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
From: Joan M. <ma...@pc...> - 2003-01-23 16:57:59
|
Dear Jonathan, (OK, although I'd like to get something settled soon, because I'm on a very tight schedule. Can you be more specific about the adverse impact you think this change will have on our existing code? In particular, why does having 5 terms in the controlled vocabulary make things any more difficult than having 3? It seems that in either case code that previously relied on a single bit (manually_reviewed) will now have to query a controlled vocabulary instead._ I am not against having a more extensive controlled vocabulary. (and by the way I am pretty busy myself, and this is not how I planned to spend my morning). (manually_reviewed -> ReviewStatus 0 unreviewed 1 reviewed, correct I believe this correctly represents our existing semantics, *except* for entries that have been manually created (which are simply marked as manually_reviewed = 1). Doing this we don't lose any information and could go back later and attempt to identify entries that had been manually created. Or we could later decide that we want to drop the "manually created" term completely, and could do so without affecting any of the migrated gusdev data.) The above makes the most sense to me for now. Also, as defined (1 manually created Entry was created by hand; review is not needed) this does not make sense to me, why set a review status id? Joan Jonathan Crabtree wrote: > Joan- > > Joan Mazzarelli wrote: > > Hi Jonathan, > > > > I think we need to discuss this a bit more in terms of how this will impact other code we have > > which recognizes this assignment, and it is not clear what type of entry has this set in GUS30 > > now (1 manually created Entry was created by hand; review is not needed.) for it to be taken. > > > > OK, although I'd like to get something settled soon, because I'm on a very > tight schedule. Can you be more specific about the adverse impact you think > this change will have on our existing code? In particular, why does having > 5 terms in the controlled vocabulary make things any more difficult than > having 3? It seems that in either case code that previously relied on a > single bit (manually_reviewed) will now have to query a controlled vocabulary > instead. > > Your second point is a good one, which is that we may not have an easy way to > determine which entries currently in gusdev should be assigned the ReviewStatus > "manually created." I'm certainly open to dropping this term if people don't > think it will be useful. Jonathan, can you tell us how you're using this term > now? In the short term (i.e. later today), however, I would propose the > following mapping (for entries currently in gusdev with a non-null > manually_reviewed column): > > manually_reviewed -> ReviewStatus > 0 unreviewed > 1 reviewed, correct > > I believe this correctly represents our existing semantics, *except* for entries > that have been manually created (which are simply marked as manually_reviewed = 1). > Doing this we don't lose any information and could go back later and attempt to > identify entries that had been manually created. Or we could later decide that > we want to drop the "manually created" term completely, and could do so without > affecting any of the migrated gusdev data. > > Jonathan > > ------------------------------------------------------- > This SF.NET email is sponsored by: > SourceForge Enterprise Edition + IBM + LinuxWorld = Something 2 See! > http://www.vasoftware.com > _______________________________________________ > Gusdev-gusdev mailing list > Gus...@li... > https://lists.sourceforge.net/lists/listinfo/gusdev-gusdev -- Joan Mazzarelli Computational Biology and Informatics Laboratory Center for Bioinformatics 1429 Blockley Hall University of Pennsylvania Philadelphia, PA 19104 |
From: Arnaud K. <ax...@sa...> - 2003-01-23 16:55:00
|
Hi Sorry for the delay. We've got some troubles getting gusdev emails. I think the entry set looks fine. Two comments though: * What about an extra "automatically created" entry, along the "manually created" one ? * Curators here has raised another point : they want to be able to track when was the last time the feature has been reviewed. By reviewed I mean checked even if the review status is already set on "reviewed, correct". Is there any way of storing a "last_checked_date" ? I'm thinking of curated similarity evidences. Regularly new searches would be done and a curator would want to check that any new hit would confirm or cancel a prediction. Arnaud Marie-Adele Rajandream wrote: >-----Original Message----- >From: gus...@li... >[mailto:gus...@li...]On Behalf Of Jonathan >Crabtree >Sent: 23 January 2003 15:23 >To: gus...@li... >Cc: Joan Mazzarelli >Subject: Re: [Gusdev-gusdev] SRes.ReviewStatus > > > >On Thu, 23 Jan 2003, Joan Mazzarelli wrote: > > > >>In its most simplest vocabulary, I thought that the review_status_id would >> >> >represent. > > >>never reviewed = 0 >>reviewed = 1 >>updated thus review status becomes = 2 (needs to be re-reviewed) >> >> >> > >Yes, that's right, although I think that Jonathan's addition ("manually >created") is likely to be a useful one. Also, based on the feedback thus >far, I think the consensus is to have a slightly more complex vocabulary >than the (0,1,2) that we originally talked about. Here's the current >proposal, based on Angel and Chris's feedback: > >0 unreviewed Entry has never been manually reviewed. >1 manually created Entry was created by hand; review is not needed. >2 reviewed, correct Entry has been manually reviewed and is deemed to be >correct. >3 reviewed, incorrect Entry has been manually reviewed and is deemed to be >incorrect. >4 updated Entry has been updated since last being reviewed or >manually created. > >The one thing that I don't like about this is that the names "reviewed, >correct" and "reviewed, incorrect" are somewhat long. However, it will be >possible to do an SQL 'like' query on the ReviewStatus table to find all >of the reviewed entries (correct or incorrect.) By the way, the reason >that I didn't use the original mapping of ids to terms (0 = unreviewed, 1 >= reviewed, 2 = updated) is that id 1 was already in use. Also, I had >originally wanted to keep the categories sorted like so: > >1 manually created >2 reviewed, correct >3 reviewed, incorrect >4 updated >5 unreviewed > >Doing this one would be able to do range queries; all entries with >review_status_id <= 2 would be manually reviewed and correct. All entries >with review_status_id >= 3 would still require action of some sort. >Anyway, I don't think it's worth the trouble to do this, and it also means >that you potentially have to renumber the terms if and when more are >added. Anyway, unless anyone has strong objections I'll probably >implement the 5-term vocabulary described above sometime later today. > >Jonathan > > > > |
From: Deborah F. P. <pi...@sn...> - 2003-01-23 16:35:25
|
On Thu, 23 Jan 2003, Joan Mazzarelli wrote: IMHO: "manually created" doesn't seem as if it is a review status and this situation could be covered with "manually reviewed correct" with evidence being that it was manually created. I have to agree with Joan, that it may be safer to stick as closely to the existing manually_reviewed values as possible, 0=unreviewed and 1=manually reviewed correct and add 2=manually reviewed incorrect as well as 4=updated. Debbie > Hi Jonathan, > > I think we need to discuss this a bit more in terms of how this will impact other code we have > which recognizes this assignment, and it is not clear what type of entry has this set in GUS30 > now (1 manually created Entry was created by hand; review is not needed.) for it to be taken. > > Joan > > > > Jonathan Crabtree wrote: > > > On Thu, 23 Jan 2003, Joan Mazzarelli wrote: > > > > > In its most simplest vocabulary, I thought that the review_status_id would represent. > > > > > > never reviewed = 0 > > > reviewed = 1 > > > updated thus review status becomes = 2 (needs to be re-reviewed) > > > > > > > Yes, that's right, although I think that Jonathan's addition ("manually > > created") is likely to be a useful one. Also, based on the feedback thus > > far, I think the consensus is to have a slightly more complex vocabulary > > than the (0,1,2) that we originally talked about. Here's the current > > proposal, based on Angel and Chris's feedback: > > > > 0 unreviewed Entry has never been manually reviewed. > > 1 manually created Entry was created by hand; review is not needed. > > 2 reviewed, correct Entry has been manually reviewed and is deemed to be correct. > > 3 reviewed, incorrect Entry has been manually reviewed and is deemed to be incorrect. > > 4 updated Entry has been updated since last being reviewed or manually created. > > > > The one thing that I don't like about this is that the names "reviewed, > > correct" and "reviewed, incorrect" are somewhat long. However, it will be > > possible to do an SQL 'like' query on the ReviewStatus table to find all > > of the reviewed entries (correct or incorrect.) By the way, the reason > > that I didn't use the original mapping of ids to terms (0 = unreviewed, 1 > > = reviewed, 2 = updated) is that id 1 was already in use. Also, I had > > originally wanted to keep the categories sorted like so: > > > > 1 manually created > > 2 reviewed, correct > > 3 reviewed, incorrect > > 4 updated > > 5 unreviewed > > > > Doing this one would be able to do range queries; all entries with > > review_status_id <= 2 would be manually reviewed and correct. All entries > > with review_status_id >= 3 would still require action of some sort. > > Anyway, I don't think it's worth the trouble to do this, and it also means > > that you potentially have to renumber the terms if and when more are > > added. Anyway, unless anyone has strong objections I'll probably > > implement the 5-term vocabulary described above sometime later today. > > > > Jonathan > > > > ------------------------------------------------------- > > This SF.NET email is sponsored by: > > SourceForge Enterprise Edition + IBM + LinuxWorld = Something 2 See! > > http://www.vasoftware.com > > _______________________________________________ > > Gusdev-gusdev mailing list > > Gus...@li... > > https://lists.sourceforge.net/lists/listinfo/gusdev-gusdev > > -- > Joan Mazzarelli > Computational Biology and Informatics Laboratory > Center for Bioinformatics > 1429 Blockley Hall > University of Pennsylvania > Philadelphia, PA 19104 > > > > > ------------------------------------------------------- > This SF.NET email is sponsored by: > SourceForge Enterprise Edition + IBM + LinuxWorld = Something 2 See! > http://www.vasoftware.com > _______________________________________________ > Gusdev-gusdev mailing list > Gus...@li... > https://lists.sourceforge.net/lists/listinfo/gusdev-gusdev > |
From: Jonathan C. <cra...@pc...> - 2003-01-23 16:15:19
|
Joan- Joan Mazzarelli wrote: > Hi Jonathan, > > I think we need to discuss this a bit more in terms of how this will impact other code we have > which recognizes this assignment, and it is not clear what type of entry has this set in GUS30 > now (1 manually created Entry was created by hand; review is not needed.) for it to be taken. > OK, although I'd like to get something settled soon, because I'm on a very tight schedule. Can you be more specific about the adverse impact you think this change will have on our existing code? In particular, why does having 5 terms in the controlled vocabulary make things any more difficult than having 3? It seems that in either case code that previously relied on a single bit (manually_reviewed) will now have to query a controlled vocabulary instead. Your second point is a good one, which is that we may not have an easy way to determine which entries currently in gusdev should be assigned the ReviewStatus "manually created." I'm certainly open to dropping this term if people don't think it will be useful. Jonathan, can you tell us how you're using this term now? In the short term (i.e. later today), however, I would propose the following mapping (for entries currently in gusdev with a non-null manually_reviewed column): manually_reviewed -> ReviewStatus 0 unreviewed 1 reviewed, correct I believe this correctly represents our existing semantics, *except* for entries that have been manually created (which are simply marked as manually_reviewed = 1). Doing this we don't lose any information and could go back later and attempt to identify entries that had been manually created. Or we could later decide that we want to drop the "manually created" term completely, and could do so without affecting any of the migrated gusdev data. Jonathan |
From: Joan M. <ma...@pc...> - 2003-01-23 15:53:04
|
Hi Jonathan, I think we need to discuss this a bit more in terms of how this will impact other code we have which recognizes this assignment, and it is not clear what type of entry has this set in GUS30 now (1 manually created Entry was created by hand; review is not needed.) for it to be taken. Joan Jonathan Crabtree wrote: > On Thu, 23 Jan 2003, Joan Mazzarelli wrote: > > > In its most simplest vocabulary, I thought that the review_status_id would represent. > > > > never reviewed = 0 > > reviewed = 1 > > updated thus review status becomes = 2 (needs to be re-reviewed) > > > > Yes, that's right, although I think that Jonathan's addition ("manually > created") is likely to be a useful one. Also, based on the feedback thus > far, I think the consensus is to have a slightly more complex vocabulary > than the (0,1,2) that we originally talked about. Here's the current > proposal, based on Angel and Chris's feedback: > > 0 unreviewed Entry has never been manually reviewed. > 1 manually created Entry was created by hand; review is not needed. > 2 reviewed, correct Entry has been manually reviewed and is deemed to be correct. > 3 reviewed, incorrect Entry has been manually reviewed and is deemed to be incorrect. > 4 updated Entry has been updated since last being reviewed or manually created. > > The one thing that I don't like about this is that the names "reviewed, > correct" and "reviewed, incorrect" are somewhat long. However, it will be > possible to do an SQL 'like' query on the ReviewStatus table to find all > of the reviewed entries (correct or incorrect.) By the way, the reason > that I didn't use the original mapping of ids to terms (0 = unreviewed, 1 > = reviewed, 2 = updated) is that id 1 was already in use. Also, I had > originally wanted to keep the categories sorted like so: > > 1 manually created > 2 reviewed, correct > 3 reviewed, incorrect > 4 updated > 5 unreviewed > > Doing this one would be able to do range queries; all entries with > review_status_id <= 2 would be manually reviewed and correct. All entries > with review_status_id >= 3 would still require action of some sort. > Anyway, I don't think it's worth the trouble to do this, and it also means > that you potentially have to renumber the terms if and when more are > added. Anyway, unless anyone has strong objections I'll probably > implement the 5-term vocabulary described above sometime later today. > > Jonathan > > ------------------------------------------------------- > This SF.NET email is sponsored by: > SourceForge Enterprise Edition + IBM + LinuxWorld = Something 2 See! > http://www.vasoftware.com > _______________________________________________ > Gusdev-gusdev mailing list > Gus...@li... > https://lists.sourceforge.net/lists/listinfo/gusdev-gusdev -- Joan Mazzarelli Computational Biology and Informatics Laboratory Center for Bioinformatics 1429 Blockley Hall University of Pennsylvania Philadelphia, PA 19104 |
From: Jonathan C. <cra...@sn...> - 2003-01-23 15:24:04
|
On Thu, 23 Jan 2003, Joan Mazzarelli wrote: > In its most simplest vocabulary, I thought that the review_status_id would represent. > > never reviewed = 0 > reviewed = 1 > updated thus review status becomes = 2 (needs to be re-reviewed) > Yes, that's right, although I think that Jonathan's addition ("manually created") is likely to be a useful one. Also, based on the feedback thus far, I think the consensus is to have a slightly more complex vocabulary than the (0,1,2) that we originally talked about. Here's the current proposal, based on Angel and Chris's feedback: 0 unreviewed Entry has never been manually reviewed. 1 manually created Entry was created by hand; review is not needed. 2 reviewed, correct Entry has been manually reviewed and is deemed to be correct. 3 reviewed, incorrect Entry has been manually reviewed and is deemed to be incorrect. 4 updated Entry has been updated since last being reviewed or manually created. The one thing that I don't like about this is that the names "reviewed, correct" and "reviewed, incorrect" are somewhat long. However, it will be possible to do an SQL 'like' query on the ReviewStatus table to find all of the reviewed entries (correct or incorrect.) By the way, the reason that I didn't use the original mapping of ids to terms (0 = unreviewed, 1 = reviewed, 2 = updated) is that id 1 was already in use. Also, I had originally wanted to keep the categories sorted like so: 1 manually created 2 reviewed, correct 3 reviewed, incorrect 4 updated 5 unreviewed Doing this one would be able to do range queries; all entries with review_status_id <= 2 would be manually reviewed and correct. All entries with review_status_id >= 3 would still require action of some sort. Anyway, I don't think it's worth the trouble to do this, and it also means that you potentially have to renumber the terms if and when more are added. Anyway, unless anyone has strong objections I'll probably implement the 5-term vocabulary described above sometime later today. Jonathan |
From: Joan M. <ma...@pc...> - 2003-01-23 14:52:41
|
Dear Jonathan, In its most simplest vocabulary, I thought that the review_status_id would represent. never reviewed = 0 reviewed = 1 updated thus review status becomes = 2 (needs to be re-reviewed) We needed the #2 for those cases where the computational analysis may have effected (or changed the entry) that had the review status #1. (this was tracking entries that had changed) Joan Jonathan Crabtree wrote: > While we're on the topic of populating the GUS 3.0 controlled vocabularies, > does anyone remember how we originally decided to use the SRes.ReviewStatus > table? We currently have a single row in this table, courtesy of Jonathan > Schug: > > REVIEW_STATUS_ID > ---------------- > NAME > -------------------------------------------------------------------------------- > DESCRIPTION > -------------------------------------------------------------------------------- > 1 > manually created > Entry was created by hand; review is not needed. > > I think I was one of the people who originally suggested that we do away > with the binary "manually_reviewed" column in favor of this controlled > vocabulary. One reason for doing this was to be able to distinguish the > above case (e.g., something that was *entered* manually and should > therefore automaticaly be considered manually reviewed) from entries that > are created automatically and then subsequently reviewed. Another reason > for creating the vocabulary was to allow us to distinguish between entries > that have *never* been reviewed, versus those that have been reviewed, but > that have been updated in some way since their last review. Therefore I > suggest adding the following terms to the vocabulary: > > 2|manually reviewed|Entry has been manually reviewed and is deemed to be correct. > 3|updated|Entry has been updated since it was last reviewed. > 4|unreviewed|Entry has never been manually reviewed. > > Comments? Implicit in the above is that if an annotator is able to review > something then he/she also has the ability to correct mistakes in that > thing, and/or is able to delete the entry completely if it is erroneous. > If we do not make this assumption then we need another term to represent > the case where an "Entry has been manually reviewed and is deemed to be > INcorrect." We might want this even if the assumption I mentioned holds, > because it would allow us to mark the entry as "manually reviewed - > incorrect" before deleting it (i.e., moving into the version table), in > caes where the annotator decides to delete an erroneous entry. Note also > that once an entry is marked with term #3, "updated", we no longer track > (in ReviewStatus, at least) whether the entry was originally of type #1 or > type #2. I don't see this is a major problem, however, since the > provenance of the entry can (and should) be tracked through other means. > > Jonathan > > ------------------------------------------------------- > This SF.net email is sponsored by: Scholarships for Techies! > Can't afford IT training? All 2003 ictp students receive scholarships. > Get hands-on training in Microsoft, Cisco, Sun, Linux/UNIX, and more. > www.ictp.com/training/sourceforge.asp > _______________________________________________ > Gusdev-gusdev mailing list > Gus...@li... > https://lists.sourceforge.net/lists/listinfo/gusdev-gusdev -- Joan Mazzarelli Computational Biology and Informatics Laboratory Center for Bioinformatics 1429 Blockley Hall University of Pennsylvania Philadelphia, PA 19104 |
From: Joan M. <ma...@pc...> - 2003-01-23 14:51:40
|
Dear Jonathan, In its most simplest vocabulary, I thought that the review_status_id would represent. never reviewed = 0 reviewed = 1 updated thus review status becomes = 2 (needs to be re-reviewed) We needed the #2 for those cases where the computational analysis may have effected (or changed the entry) that had the review status #1. (this was tracking entries that had changed) Joan Jonathan Crabtree wrote: > While we're on the topic of populating the GUS 3.0 controlled vocabularies, > does anyone remember how we originally decided to use the SRes.ReviewStatus > table? We currently have a single row in this table, courtesy of Jonathan > Schug: > > REVIEW_STATUS_ID > ---------------- > NAME > -------------------------------------------------------------------------------- > DESCRIPTION > -------------------------------------------------------------------------------- > 1 > manually created > Entry was created by hand; review is not needed. > > I think I was one of the people who originally suggested that we do away > with the binary "manually_reviewed" column in favor of this controlled > vocabulary. One reason for doing this was to be able to distinguish the > above case (e.g., something that was *entered* manually and should > therefore automaticaly be considered manually reviewed) from entries that > are created automatically and then subsequently reviewed. Another reason > for creating the vocabulary was to allow us to distinguish between entries > that have *never* been reviewed, versus those that have been reviewed, but > that have been updated in some way since their last review. Therefore I > suggest adding the following terms to the vocabulary: > > 2|manually reviewed|Entry has been manually reviewed and is deemed to be correct. > 3|updated|Entry has been updated since it was last reviewed. > 4|unreviewed|Entry has never been manually reviewed. > > Comments? Implicit in the above is that if an annotator is able to review > something then he/she also has the ability to correct mistakes in that > thing, and/or is able to delete the entry completely if it is erroneous. > If we do not make this assumption then we need another term to represent > the case where an "Entry has been manually reviewed and is deemed to be > INcorrect." We might want this even if the assumption I mentioned holds, > because it would allow us to mark the entry as "manually reviewed - > incorrect" before deleting it (i.e., moving into the version table), in > caes where the annotator decides to delete an erroneous entry. Note also > that once an entry is marked with term #3, "updated", we no longer track > (in ReviewStatus, at least) whether the entry was originally of type #1 or > type #2. I don't see this is a major problem, however, since the > provenance of the entry can (and should) be tracked through other means. > > Jonathan > > ------------------------------------------------------- > This SF.net email is sponsored by: Scholarships for Techies! > Can't afford IT training? All 2003 ictp students receive scholarships. > Get hands-on training in Microsoft, Cisco, Sun, Linux/UNIX, and more. > www.ictp.com/training/sourceforge.asp > _______________________________________________ > Gusdev-gusdev mailing list > Gus...@li... > https://lists.sourceforge.net/lists/listinfo/gusdev-gusdev -- Joan Mazzarelli Computational Biology and Informatics Laboratory Center for Bioinformatics 1429 Blockley Hall University of Pennsylvania Philadelphia, PA 19104 |
From: Angel P. <an...@sn...> - 2003-01-23 14:40:07
|
On Thu, 23 Jan 2003, Jonathan Crabtree wrote: > > While we're on the topic of populating the GUS 3.0 controlled vocabularies, > does anyone remember how we originally decided to use the SRes.ReviewStatus > table? We currently have a single row in this table, courtesy of Jonathan > Schug: > > REVIEW_STATUS_ID > ---------------- > NAME > -------------------------------------------------------------------------------- > DESCRIPTION > -------------------------------------------------------------------------------- > 1 > manually created > Entry was created by hand; review is not needed. > > > I think I was one of the people who originally suggested that we do away > with the binary "manually_reviewed" column in favor of this controlled > vocabulary. One reason for doing this was to be able to distinguish the > above case (e.g., something that was *entered* manually and should > therefore automaticaly be considered manually reviewed) from entries that > are created automatically and then subsequently reviewed. Another reason > for creating the vocabulary was to allow us to distinguish between entries > that have *never* been reviewed, versus those that have been reviewed, but > that have been updated in some way since their last review. Therefore I > suggest adding the following terms to the vocabulary: > > 2|manually reviewed|Entry has been manually reviewed and is deemed to be correct. > 3|updated|Entry has been updated since it was last reviewed. > 4|unreviewed|Entry has never been manually reviewed. > I believe that this is roughly what we had agreed on before but never implemented. > Comments? Implicit in the above is that if an annotator is able to review > something then he/she also has the ability to correct mistakes in that > thing, and/or is able to delete the entry completely if it is erroneous. > If we do not make this assumption then we need another term to represent > the case where an "Entry has been manually reviewed and is deemed to be > INcorrect." We might want this even if the assumption I mentioned holds, > because it would allow us to mark the entry as "manually reviewed - > incorrect" before deleting it (i.e., moving into the version table), in > caes where the annotator decides to delete an erroneous entry. Note also > that once an entry is marked with term #3, "updated", we no longer track > (in ReviewStatus, at least) whether the entry was originally of type #1 or > type #2. I don't see this is a major problem, however, since the > provenance of the entry can (and should) be tracked through other means. We should add this. angel |
From: Chris S. <sto...@pc...> - 2003-01-23 14:39:43
|
Jonathan, These sound good to me including manually reviewed and found to be incorrect. Chris On Thursday, January 23, 2003, at 01:17 AM, Jonathan Crabtree wrote: > > While we're on the topic of populating the GUS 3.0 controlled > vocabularies, > does anyone remember how we originally decided to use the > SRes.ReviewStatus > table? We currently have a single row in this table, courtesy of > Jonathan > Schug: > > REVIEW_STATUS_ID > ---------------- > NAME > ----------------------------------------------------------------------- > --------- > DESCRIPTION > ----------------------------------------------------------------------- > --------- > 1 > manually created > Entry was created by hand; review is not needed. > > > I think I was one of the people who originally suggested that we do > away > with the binary "manually_reviewed" column in favor of this controlled > vocabulary. One reason for doing this was to be able to distinguish > the > above case (e.g., something that was *entered* manually and should > therefore automaticaly be considered manually reviewed) from entries > that > are created automatically and then subsequently reviewed. Another > reason > for creating the vocabulary was to allow us to distinguish between > entries > that have *never* been reviewed, versus those that have been reviewed, > but > that have been updated in some way since their last review. Therefore > I > suggest adding the following terms to the vocabulary: > > 2|manually reviewed|Entry has been manually reviewed and is deemed to > be correct. > 3|updated|Entry has been updated since it was last reviewed. > 4|unreviewed|Entry has never been manually reviewed. > > Comments? Implicit in the above is that if an annotator is able to > review > something then he/she also has the ability to correct mistakes in that > thing, and/or is able to delete the entry completely if it is > erroneous. > If we do not make this assumption then we need another term to > represent > the case where an "Entry has been manually reviewed and is deemed to be > INcorrect." We might want this even if the assumption I mentioned > holds, > because it would allow us to mark the entry as "manually reviewed - > incorrect" before deleting it (i.e., moving into the version table), in > caes where the annotator decides to delete an erroneous entry. Note > also > that once an entry is marked with term #3, "updated", we no longer > track > (in ReviewStatus, at least) whether the entry was originally of type > #1 or > type #2. I don't see this is a major problem, however, since the > provenance of the entry can (and should) be tracked through other > means. > > Jonathan > > > > > > ------------------------------------------------------- > This SF.net email is sponsored by: Scholarships for Techies! > Can't afford IT training? All 2003 ictp students receive scholarships. > Get hands-on training in Microsoft, Cisco, Sun, Linux/UNIX, and more. > www.ictp.com/training/sourceforge.asp > _______________________________________________ > Gusdev-gusdev mailing list > Gus...@li... > https://lists.sourceforge.net/lists/listinfo/gusdev-gusdev > |
From: Jonathan C. <cra...@sn...> - 2003-01-23 06:18:24
|
While we're on the topic of populating the GUS 3.0 controlled vocabularies, does anyone remember how we originally decided to use the SRes.ReviewStatus table? We currently have a single row in this table, courtesy of Jonathan Schug: REVIEW_STATUS_ID ---------------- NAME -------------------------------------------------------------------------------- DESCRIPTION -------------------------------------------------------------------------------- 1 manually created Entry was created by hand; review is not needed. I think I was one of the people who originally suggested that we do away with the binary "manually_reviewed" column in favor of this controlled vocabulary. One reason for doing this was to be able to distinguish the above case (e.g., something that was *entered* manually and should therefore automaticaly be considered manually reviewed) from entries that are created automatically and then subsequently reviewed. Another reason for creating the vocabulary was to allow us to distinguish between entries that have *never* been reviewed, versus those that have been reviewed, but that have been updated in some way since their last review. Therefore I suggest adding the following terms to the vocabulary: 2|manually reviewed|Entry has been manually reviewed and is deemed to be correct. 3|updated|Entry has been updated since it was last reviewed. 4|unreviewed|Entry has never been manually reviewed. Comments? Implicit in the above is that if an annotator is able to review something then he/she also has the ability to correct mistakes in that thing, and/or is able to delete the entry completely if it is erroneous. If we do not make this assumption then we need another term to represent the case where an "Entry has been manually reviewed and is deemed to be INcorrect." We might want this even if the assumption I mentioned holds, because it would allow us to mark the entry as "manually reviewed - incorrect" before deleting it (i.e., moving into the version table), in caes where the annotator decides to delete an erroneous entry. Note also that once an entry is marked with term #3, "updated", we no longer track (in ReviewStatus, at least) whether the entry was originally of type #1 or type #2. I don't see this is a major problem, however, since the provenance of the entry can (and should) be tracked through other means. Jonathan |
From: Angel P. <an...@sn...> - 2003-01-21 14:17:42
|
The XML parsing is currently handled by the GUSRow module (formerly RelationalRow). As such, there are a few constraints on the creation of GUS XML objects, in addition to using the fully qualified object names. 1) It relies on newlines to get valid input. YOU MUST put object declaration on a separate line that attribute declarations. For instance the following do not work: <GUS::Model::RAD::Array><name> PancChip </name> </GUS::Model::RAD::Array> <GUS::Model::RAD::Array> <name> PancChip </name> <version> 1.2</version> </GUS::Model::RAD::Array> 2) To force a submit while parsing a GUS XML doc, enter the characters "//" on a separate line. I am not sure this is valid XML, but I suspect it is due to backwards compatability issues with SGML. I am working in my spare time (of which there is not much) to switch to an actual XML parser, rather than use of regular expressions (as is now the case). At the same time, I will swith the XML syntax to match a more database centric scheme, by using the tag attributes as the table columns, instead of subelements: <Object att1 = 'val1' att2 = 'val2' ... <Child Object .../> </Object> But this is not anywhere near finished (in fact just started) so for now we must live with the constaints as they stand. Angel On Mon, 20 Jan 2003, mazz wrote: > Dear Chris and Arnaud, > > It seems that it has changed. I think that perhaps Steve can clarify this. I > tested this bit of XML with the UpdateGusFromXML plugin, and although I can not > commit it to GUS30 now it recognized this tag designation for the table > (<GUS::Model::DoTS::GeneSynonym>).This plugin (UpdateGusFromXML) was recently > re-converted for GUS30. > > Arnaud, this is something I can completely clarify with you later. I still want > to go through and detemine what controlled vocabulary tables need to be populated > and create XML files for them (even if slight changes need to be made to the XML > later). > > Joan > > Chris Stoeckert wrote: > > > Joan, > > Right, the Perl packages used by the plugin indicate the directory tree > > that they are stored in but I thought the actual objects that the XML is > > used to populate directly reflect the database structure (not the > > directory structure) as they are generated from the tables. Has this > > changed? > > Chris > > > > On Mon, 20 Jan 2003, mazz wrote: > > > > > Chris, > > > > > > This is the designation for the table Object. > > > > > > package GUS::Model::DoTS::Gene; > > > > > > I thought this is what is needed for the table tag. > > > > > > Joan > > > > > > This is what it is for a RAD table > > > > > > package GUS::Model::RAD3::ElementAnnotation; > > > > > > > > > > > > > > > > > > Chris Stoeckert wrote: > > > > > > > Dear Joan and Arnaud, > > > > The CVS structure should not come into the XML used by the plug-in. It > > > > is my understanding that only the actual schema of the structure: > > > > Database.Namespace.Table.Attribute should be used. > > > > > > > > Chris > > > > > > > > On Monday, January 20, 2003, at 10:23 AM, mazz wrote: > > > > > > > > > Dear Arnaud, > > > > > > > > > > Model is a directory of Steve's new CVS structure under which the DoTS > > > > > table Objects (eg > > > > > > > > > > Gene) are found. > > > > > I do not know why Steve named the directory Model. > > > > > > > > > > Joan > > > > > > > > > > Arnaud Kerhornou wrote: > > > > > > > > > >> Hi Joan > > > > >> > > > > >> Thanks. Just a quick question, what is Model for ? > > > > >> > > > > >> <GUS::Model::DoTS::Gene> > > > > >> > > > > >> Arnaud > > > > >> > > > > >> mazz wrote: > > > > >> > > > > >>> Hi Arnaud, > > > > >>> > > > > >>> > > > > >>> Below is a sample of the XML for a table (e.g. Gene) the plugin > > > > >>> will > > > > >>> use. > > > > >>> The controlled vocabulary table DoTS::EffectorActionType also needs > > > > >>> to > > > > >>> be populated. > > > > >>> > > > > >>> I will try to go though and make a list of the new controlled > > > > >>> vocabulary > > > > >>> tables. > > > > >>> Tables such as geneCategory & rnaCategory are tables I created for my > > > > >>> planned future annotation tasks. > > > > >>> > > > > >>> > > > > >>> Joan > > > > >>> > > > > >>> <GUS::Model::DoTS::Gene> > > > > >>> <gene_id>10288603</gene_id> > > > > >>> <name>test</name> > > > > >>> <review_status_id>1</review_status_id> > > > > >>> <description>gene desc test</description> > > > > >>> <reviewer_summary>test</reviewer_summary> > > > > >>> </GUS::Model::DoTS::Gene> > > > > >>> > > > > >>> Arnaud Kerhornou wrote: > > > > >>> > > > > >>> Arnaud Kerhornou wrote: > > > > >>> > > > > >>> > > > > >>> > > > > >>>> Hi Joan > > > > >>>> > > > > >>>> I'll get the new controlled vocabularies ready for population. If > > > > >>>> you're planning to use the UpdateFromXML.pm plugin for populating > > > > >>>> GUS > > > > >>>> I should have examples. > > > > >>>> > > > > >>>> Regarding ComplexType it should be covered by GO component. > > > > >>>> Regarding InteractionType, we need to find a controlled vocabulary > > > > >>>> which I'm not aware of yet ! > > > > >>>> > > > > >>>> cheers > > > > >>>> Arnaud > > > > >>>> > > > > >>>> mazz wrote: > > > > >>>> > > > > >>>> > > > > >>>> > > > > >>>>> Hi Jonathan, > > > > >>>>> > > > > >>>>> Perhaps we can ask Matt to revisit his documentation plugin. There > > > > >>>>> are probably > > > > >>>>> additional changes he will have to make for its use with GUS30 now. > > > > >>>>> Also, I can send Arnaud an example of the XML for a table. We can > > > > >>>>> use the XML to > > > > >>>>> populate the rows of the controlled vocabulary tables (ids, terms > > > > >>>>> (names) and > > > > >>>>> definitions (descriptions). > > > > >>>>> > > > > >>>>> > > > > >>>>> Joan > > > > >>>>> > > > > >>>>> Jonathan Crabtree wrote: > > > > >>>>> > > > > >>>>> > > > > >>>>> > > > > >>>>> > > > > >>>>>> Hi Joan- > > > > >>>>>> > > > > >>>>>> Arnaud did supply us with documentation (attached) for the new > > > > >>>>>> Phenotype tables, > > > > >>>>>> but I just haven't loaded it into the database yet (I've also been > > > > >>>>>> quite busy :)) > > > > >>>>>> I started working on updating the documentation a couple of days > > > > >>>>>> ago, but in the > > > > >>>>>> process discovered that there are some invalid rows in > > > > >>>>>> core.DatabaseDocumentation > > > > >>>>>> that should be corrected first. A query shows that there are 73 > > > > >>>>>> rows in this > > > > >>>>>> table that reference nonexistent columns in GUS 3.0. For the most > > > > >>>>>> part I think > > > > >>>>>> that these are relatively minor problems stemming from the fact > > > > >>>>>> that the schema > > > > >>>>>> has been updated more recently than the documentation. However, > > > > >>>>>> there are also > > > > >>>>>> a few rows that suggest we need to improve the plugin and/or > > > > >>>>>> procedure used to > > > > >>>>>> populate this table. For example, the following rows have spaces > > > > >>>>>> in the column > > > > >>>>>> name (attribute_name), probably because the input files were > > > > >>>>>> invalid and the plugin > > > > >>>>>> has no restrictions on the format of the attribute_name: > > > > >>>>>> > > > > >>>>>> DATABASE_DOCUMENTATION_ID > > > > >>>>>> ------------------------- > > > > >>>>>> ATTRIBUTE_NAME > > > > >>>>>> ------------------------------------------------------------------ > > > > >>>>>> -------------- > > > > >>>>>> 1419 > > > > >>>>>> bio_material_id fk to LabelledExtract view of BioMaterial > > > > >>>>>> > > > > >>>>>> 1103 > > > > >>>>>> bio_source_characteristic_id primary key > > > > >>>>>> > > > > >>>>>> 1120 > > > > >>>>>> treatment_id fk to Treatment > > > > >>>>>> > > > > >>>>>> DATABASE_DOCUMENTATION_ID > > > > >>>>>> ------------------------- > > > > >>>>>> ATTRIBUTE_NAME > > > > >>>>>> ------------------------------------------------------------------ > > > > >>>>>> -------------- > > > > >>>>>> 1374 > > > > >>>>>> review_status_id The identifer of the review status > > > > >>>>>> > > > > >>>>>> 1418 > > > > >>>>>> assay_id fk to Assay > > > > >>>>>> > > > > >>>>>> 1373 > > > > >>>>>> synonym_name The gene symbol > > > > >>>>>> > > > > >>>>>> 6 rows selected. > > > > >>>>>> > > > > >>>>>> Also, as an aside (and not a comment to you in particular), it > > > > >>>>>> strikes me that > > > > >>>>>> column "documentation" of the form "fk to Table X" and "Primary > > > > >>>>>> key" could be > > > > >>>>>> generated automatically from the schema. However, comments on > > > > >>>>>> foreign keys > > > > >>>>>> are useful if they identify the specific subclass (i.e. view) to > > > > >>>>>> which the > > > > >>>>>> reference is expected to link, or if they explain what the > > > > >>>>>> referenced value is > > > > >>>>>> used for (if not obvious). Anyway, since there are still some > > > > >>>>>> minor schema > > > > >>>>>> changes taking place, I think that next week might be a good time > > > > >>>>>> to worry > > > > >>>>>> about updating all the documentation, since the database will be > > > > >>>>>> locked down > > > > >>>>>> for the migration at that point anyway. As for the controlled > > > > >>>>>> vocabularies, > > > > >>>>>> I think you're right, and we should try to populate these as soon > > > > >>>>>> as we can, > > > > >>>>>> even if it will be an iterative process in some cases. > > > > >>>>>> > > > > >>>>>> Jonathan > > > > >>>>>> > > > > >>>>>> > > > > >>>>>> > > > > >> > > > > >> ------------------------------------------------------- > > > > >> This SF.NET email is sponsored by: FREE SSL Guide from Thawte > > > > >> are you planning your Web Server Security? Click here to get a FREE > > > > >> Thawte SSL guide and find the answers to all your SSL security > > > > >> issues. > > > > >> http://ads.sourceforge.net/cgi-bin/redirect.pl?thaw0026en > > > > >> _______________________________________________ > > > > >> Gusdev-gusdev mailing list > > > > >> Gus...@li... > > > > >> https://lists.sourceforge.net/lists/listinfo/gusdev-gusdev > > > > > > > > > > > > > > > > > > > > ------------------------------------------------------- > > > > > This SF.NET email is sponsored by: FREE SSL Guide from Thawte > > > > > are you planning your Web Server Security? Click here to get a FREE > > > > > Thawte SSL guide and find the answers to all your SSL security issues. > > > > > http://ads.sourceforge.net/cgi-bin/redirect.pl?thaw0026en > > > > > _______________________________________________ > > > > > Gusdev-gusdev mailing list > > > > > Gus...@li... > > > > > https://lists.sourceforge.net/lists/listinfo/gusdev-gusdev > > > > > > > > > > > > > ------------------------------------------------------- > > > > This SF.NET email is sponsored by: FREE SSL Guide from Thawte > > > > are you planning your Web Server Security? Click here to get a FREE > > > > Thawte SSL guide and find the answers to all your SSL security issues. > > > > http://ads.sourceforge.net/cgi-bin/redirect.pl?thaw0026en > > > > _______________________________________________ > > > > Gusdev-gusdev mailing list > > > > Gus...@li... > > > > https://lists.sourceforge.net/lists/listinfo/gusdev-gusdev > > > > > > > > > > > > ------------------------------------------------------- > > > This SF.NET email is sponsored by: FREE SSL Guide from Thawte > > > are you planning your Web Server Security? Click here to get a FREE > > > Thawte SSL guide and find the answers to all your SSL security issues. > > > http://ads.sourceforge.net/cgi-bin/redirect.pl?thaw0026en > > > _______________________________________________ > > > Gusdev-gusdev mailing list > > > Gus...@li... > > > https://lists.sourceforge.net/lists/listinfo/gusdev-gusdev > > > > > > > -- > > Chris Stoeckert, Ph.D. > > Research Associate Professor, Dept. of Genetics > > Center for Bioinformatics, University of Pennsylvania > > 423 Guardian Dr., Philadelphia, PA 19104 > > Ph: 215-573-4409 FAX:215-573-3111 > > > > ------------------------------------------------------- > > This SF.NET email is sponsored by: FREE SSL Guide from Thawte > > are you planning your Web Server Security? Click here to get a FREE > > Thawte SSL guide and find the answers to all your SSL security issues. > > http://ads.sourceforge.net/cgi-bin/redirect.pl?thaw0026en > > _______________________________________________ > > Gusdev-gusdev mailing list > > Gus...@li... > > https://lists.sourceforge.net/lists/listinfo/gusdev-gusdev > > > > ------------------------------------------------------- > This SF.NET email is sponsored by: FREE SSL Guide from Thawte > are you planning your Web Server Security? Click here to get a FREE > Thawte SSL guide and find the answers to all your SSL security issues. > http://ads.sourceforge.net/cgi-bin/redirect.pl?thaw0026en > _______________________________________________ > Gusdev-gusdev mailing list > Gus...@li... > https://lists.sourceforge.net/lists/listinfo/gusdev-gusdev > -- Angel Pizarro Programmer Analyst Center for Bioinformatics an...@pc... |
From: mazz <ma...@sn...> - 2003-01-20 22:13:46
|
Dear Chris and Arnaud, It seems that it has changed. I think that perhaps Steve can clarify this. I tested this bit of XML with the UpdateGusFromXML plugin, and although I can not commit it to GUS30 now it recognized this tag designation for the table (<GUS::Model::DoTS::GeneSynonym>).This plugin (UpdateGusFromXML) was recently re-converted for GUS30. Arnaud, this is something I can completely clarify with you later. I still want to go through and detemine what controlled vocabulary tables need to be populated and create XML files for them (even if slight changes need to be made to the XML later). Joan Chris Stoeckert wrote: > Joan, > Right, the Perl packages used by the plugin indicate the directory tree > that they are stored in but I thought the actual objects that the XML is > used to populate directly reflect the database structure (not the > directory structure) as they are generated from the tables. Has this > changed? > Chris > > On Mon, 20 Jan 2003, mazz wrote: > > > Chris, > > > > This is the designation for the table Object. > > > > package GUS::Model::DoTS::Gene; > > > > I thought this is what is needed for the table tag. > > > > Joan > > > > This is what it is for a RAD table > > > > package GUS::Model::RAD3::ElementAnnotation; > > > > > > > > > > > > Chris Stoeckert wrote: > > > > > Dear Joan and Arnaud, > > > The CVS structure should not come into the XML used by the plug-in. It > > > is my understanding that only the actual schema of the structure: > > > Database.Namespace.Table.Attribute should be used. > > > > > > Chris > > > > > > On Monday, January 20, 2003, at 10:23 AM, mazz wrote: > > > > > > > Dear Arnaud, > > > > > > > > Model is a directory of Steve's new CVS structure under which the DoTS > > > > table Objects (eg > > > > > > > > Gene) are found. > > > > I do not know why Steve named the directory Model. > > > > > > > > Joan > > > > > > > > Arnaud Kerhornou wrote: > > > > > > > >> Hi Joan > > > >> > > > >> Thanks. Just a quick question, what is Model for ? > > > >> > > > >> <GUS::Model::DoTS::Gene> > > > >> > > > >> Arnaud > > > >> > > > >> mazz wrote: > > > >> > > > >>> Hi Arnaud, > > > >>> > > > >>> > > > >>> Below is a sample of the XML for a table (e.g. Gene) the plugin > > > >>> will > > > >>> use. > > > >>> The controlled vocabulary table DoTS::EffectorActionType also needs > > > >>> to > > > >>> be populated. > > > >>> > > > >>> I will try to go though and make a list of the new controlled > > > >>> vocabulary > > > >>> tables. > > > >>> Tables such as geneCategory & rnaCategory are tables I created for my > > > >>> planned future annotation tasks. > > > >>> > > > >>> > > > >>> Joan > > > >>> > > > >>> <GUS::Model::DoTS::Gene> > > > >>> <gene_id>10288603</gene_id> > > > >>> <name>test</name> > > > >>> <review_status_id>1</review_status_id> > > > >>> <description>gene desc test</description> > > > >>> <reviewer_summary>test</reviewer_summary> > > > >>> </GUS::Model::DoTS::Gene> > > > >>> > > > >>> Arnaud Kerhornou wrote: > > > >>> > > > >>> Arnaud Kerhornou wrote: > > > >>> > > > >>> > > > >>> > > > >>>> Hi Joan > > > >>>> > > > >>>> I'll get the new controlled vocabularies ready for population. If > > > >>>> you're planning to use the UpdateFromXML.pm plugin for populating > > > >>>> GUS > > > >>>> I should have examples. > > > >>>> > > > >>>> Regarding ComplexType it should be covered by GO component. > > > >>>> Regarding InteractionType, we need to find a controlled vocabulary > > > >>>> which I'm not aware of yet ! > > > >>>> > > > >>>> cheers > > > >>>> Arnaud > > > >>>> > > > >>>> mazz wrote: > > > >>>> > > > >>>> > > > >>>> > > > >>>>> Hi Jonathan, > > > >>>>> > > > >>>>> Perhaps we can ask Matt to revisit his documentation plugin. There > > > >>>>> are probably > > > >>>>> additional changes he will have to make for its use with GUS30 now. > > > >>>>> Also, I can send Arnaud an example of the XML for a table. We can > > > >>>>> use the XML to > > > >>>>> populate the rows of the controlled vocabulary tables (ids, terms > > > >>>>> (names) and > > > >>>>> definitions (descriptions). > > > >>>>> > > > >>>>> > > > >>>>> Joan > > > >>>>> > > > >>>>> Jonathan Crabtree wrote: > > > >>>>> > > > >>>>> > > > >>>>> > > > >>>>> > > > >>>>>> Hi Joan- > > > >>>>>> > > > >>>>>> Arnaud did supply us with documentation (attached) for the new > > > >>>>>> Phenotype tables, > > > >>>>>> but I just haven't loaded it into the database yet (I've also been > > > >>>>>> quite busy :)) > > > >>>>>> I started working on updating the documentation a couple of days > > > >>>>>> ago, but in the > > > >>>>>> process discovered that there are some invalid rows in > > > >>>>>> core.DatabaseDocumentation > > > >>>>>> that should be corrected first. A query shows that there are 73 > > > >>>>>> rows in this > > > >>>>>> table that reference nonexistent columns in GUS 3.0. For the most > > > >>>>>> part I think > > > >>>>>> that these are relatively minor problems stemming from the fact > > > >>>>>> that the schema > > > >>>>>> has been updated more recently than the documentation. However, > > > >>>>>> there are also > > > >>>>>> a few rows that suggest we need to improve the plugin and/or > > > >>>>>> procedure used to > > > >>>>>> populate this table. For example, the following rows have spaces > > > >>>>>> in the column > > > >>>>>> name (attribute_name), probably because the input files were > > > >>>>>> invalid and the plugin > > > >>>>>> has no restrictions on the format of the attribute_name: > > > >>>>>> > > > >>>>>> DATABASE_DOCUMENTATION_ID > > > >>>>>> ------------------------- > > > >>>>>> ATTRIBUTE_NAME > > > >>>>>> ------------------------------------------------------------------ > > > >>>>>> -------------- > > > >>>>>> 1419 > > > >>>>>> bio_material_id fk to LabelledExtract view of BioMaterial > > > >>>>>> > > > >>>>>> 1103 > > > >>>>>> bio_source_characteristic_id primary key > > > >>>>>> > > > >>>>>> 1120 > > > >>>>>> treatment_id fk to Treatment > > > >>>>>> > > > >>>>>> DATABASE_DOCUMENTATION_ID > > > >>>>>> ------------------------- > > > >>>>>> ATTRIBUTE_NAME > > > >>>>>> ------------------------------------------------------------------ > > > >>>>>> -------------- > > > >>>>>> 1374 > > > >>>>>> review_status_id The identifer of the review status > > > >>>>>> > > > >>>>>> 1418 > > > >>>>>> assay_id fk to Assay > > > >>>>>> > > > >>>>>> 1373 > > > >>>>>> synonym_name The gene symbol > > > >>>>>> > > > >>>>>> 6 rows selected. > > > >>>>>> > > > >>>>>> Also, as an aside (and not a comment to you in particular), it > > > >>>>>> strikes me that > > > >>>>>> column "documentation" of the form "fk to Table X" and "Primary > > > >>>>>> key" could be > > > >>>>>> generated automatically from the schema. However, comments on > > > >>>>>> foreign keys > > > >>>>>> are useful if they identify the specific subclass (i.e. view) to > > > >>>>>> which the > > > >>>>>> reference is expected to link, or if they explain what the > > > >>>>>> referenced value is > > > >>>>>> used for (if not obvious). Anyway, since there are still some > > > >>>>>> minor schema > > > >>>>>> changes taking place, I think that next week might be a good time > > > >>>>>> to worry > > > >>>>>> about updating all the documentation, since the database will be > > > >>>>>> locked down > > > >>>>>> for the migration at that point anyway. As for the controlled > > > >>>>>> vocabularies, > > > >>>>>> I think you're right, and we should try to populate these as soon > > > >>>>>> as we can, > > > >>>>>> even if it will be an iterative process in some cases. > > > >>>>>> > > > >>>>>> Jonathan > > > >>>>>> > > > >>>>>> > > > >>>>>> > > > >> > > > >> ------------------------------------------------------- > > > >> This SF.NET email is sponsored by: FREE SSL Guide from Thawte > > > >> are you planning your Web Server Security? Click here to get a FREE > > > >> Thawte SSL guide and find the answers to all your SSL security > > > >> issues. > > > >> http://ads.sourceforge.net/cgi-bin/redirect.pl?thaw0026en > > > >> _______________________________________________ > > > >> Gusdev-gusdev mailing list > > > >> Gus...@li... > > > >> https://lists.sourceforge.net/lists/listinfo/gusdev-gusdev > > > > > > > > > > > > > > > > ------------------------------------------------------- > > > > This SF.NET email is sponsored by: FREE SSL Guide from Thawte > > > > are you planning your Web Server Security? Click here to get a FREE > > > > Thawte SSL guide and find the answers to all your SSL security issues. > > > > http://ads.sourceforge.net/cgi-bin/redirect.pl?thaw0026en > > > > _______________________________________________ > > > > Gusdev-gusdev mailing list > > > > Gus...@li... > > > > https://lists.sourceforge.net/lists/listinfo/gusdev-gusdev > > > > > > > > > > ------------------------------------------------------- > > > This SF.NET email is sponsored by: FREE SSL Guide from Thawte > > > are you planning your Web Server Security? Click here to get a FREE > > > Thawte SSL guide and find the answers to all your SSL security issues. > > > http://ads.sourceforge.net/cgi-bin/redirect.pl?thaw0026en > > > _______________________________________________ > > > Gusdev-gusdev mailing list > > > Gus...@li... > > > https://lists.sourceforge.net/lists/listinfo/gusdev-gusdev > > > > > > > > ------------------------------------------------------- > > This SF.NET email is sponsored by: FREE SSL Guide from Thawte > > are you planning your Web Server Security? Click here to get a FREE > > Thawte SSL guide and find the answers to all your SSL security issues. > > http://ads.sourceforge.net/cgi-bin/redirect.pl?thaw0026en > > _______________________________________________ > > Gusdev-gusdev mailing list > > Gus...@li... > > https://lists.sourceforge.net/lists/listinfo/gusdev-gusdev > > > > -- > Chris Stoeckert, Ph.D. > Research Associate Professor, Dept. of Genetics > Center for Bioinformatics, University of Pennsylvania > 423 Guardian Dr., Philadelphia, PA 19104 > Ph: 215-573-4409 FAX:215-573-3111 > > ------------------------------------------------------- > This SF.NET email is sponsored by: FREE SSL Guide from Thawte > are you planning your Web Server Security? Click here to get a FREE > Thawte SSL guide and find the answers to all your SSL security issues. > http://ads.sourceforge.net/cgi-bin/redirect.pl?thaw0026en > _______________________________________________ > Gusdev-gusdev mailing list > Gus...@li... > https://lists.sourceforge.net/lists/listinfo/gusdev-gusdev |
From: Chris S. <sto...@sn...> - 2003-01-20 21:09:32
|
Joan, Right, the Perl packages used by the plugin indicate the directory tree that they are stored in but I thought the actual objects that the XML is used to populate directly reflect the database structure (not the directory structure) as they are generated from the tables. Has this changed? Chris On Mon, 20 Jan 2003, mazz wrote: > Chris, > > This is the designation for the table Object. > > package GUS::Model::DoTS::Gene; > > I thought this is what is needed for the table tag. > > Joan > > This is what it is for a RAD table > > package GUS::Model::RAD3::ElementAnnotation; > > > > > > Chris Stoeckert wrote: > > > Dear Joan and Arnaud, > > The CVS structure should not come into the XML used by the plug-in. It > > is my understanding that only the actual schema of the structure: > > Database.Namespace.Table.Attribute should be used. > > > > Chris > > > > On Monday, January 20, 2003, at 10:23 AM, mazz wrote: > > > > > Dear Arnaud, > > > > > > Model is a directory of Steve's new CVS structure under which the DoTS > > > table Objects (eg > > > > > > Gene) are found. > > > I do not know why Steve named the directory Model. > > > > > > Joan > > > > > > Arnaud Kerhornou wrote: > > > > > >> Hi Joan > > >> > > >> Thanks. Just a quick question, what is Model for ? > > >> > > >> <GUS::Model::DoTS::Gene> > > >> > > >> Arnaud > > >> > > >> mazz wrote: > > >> > > >>> Hi Arnaud, > > >>> > > >>> > > >>> Below is a sample of the XML for a table (e.g. Gene) the plugin > > >>> will > > >>> use. > > >>> The controlled vocabulary table DoTS::EffectorActionType also needs > > >>> to > > >>> be populated. > > >>> > > >>> I will try to go though and make a list of the new controlled > > >>> vocabulary > > >>> tables. > > >>> Tables such as geneCategory & rnaCategory are tables I created for my > > >>> planned future annotation tasks. > > >>> > > >>> > > >>> Joan > > >>> > > >>> <GUS::Model::DoTS::Gene> > > >>> <gene_id>10288603</gene_id> > > >>> <name>test</name> > > >>> <review_status_id>1</review_status_id> > > >>> <description>gene desc test</description> > > >>> <reviewer_summary>test</reviewer_summary> > > >>> </GUS::Model::DoTS::Gene> > > >>> > > >>> Arnaud Kerhornou wrote: > > >>> > > >>> Arnaud Kerhornou wrote: > > >>> > > >>> > > >>> > > >>>> Hi Joan > > >>>> > > >>>> I'll get the new controlled vocabularies ready for population. If > > >>>> you're planning to use the UpdateFromXML.pm plugin for populating > > >>>> GUS > > >>>> I should have examples. > > >>>> > > >>>> Regarding ComplexType it should be covered by GO component. > > >>>> Regarding InteractionType, we need to find a controlled vocabulary > > >>>> which I'm not aware of yet ! > > >>>> > > >>>> cheers > > >>>> Arnaud > > >>>> > > >>>> mazz wrote: > > >>>> > > >>>> > > >>>> > > >>>>> Hi Jonathan, > > >>>>> > > >>>>> Perhaps we can ask Matt to revisit his documentation plugin. There > > >>>>> are probably > > >>>>> additional changes he will have to make for its use with GUS30 now. > > >>>>> Also, I can send Arnaud an example of the XML for a table. We can > > >>>>> use the XML to > > >>>>> populate the rows of the controlled vocabulary tables (ids, terms > > >>>>> (names) and > > >>>>> definitions (descriptions). > > >>>>> > > >>>>> > > >>>>> Joan > > >>>>> > > >>>>> Jonathan Crabtree wrote: > > >>>>> > > >>>>> > > >>>>> > > >>>>> > > >>>>>> Hi Joan- > > >>>>>> > > >>>>>> Arnaud did supply us with documentation (attached) for the new > > >>>>>> Phenotype tables, > > >>>>>> but I just haven't loaded it into the database yet (I've also been > > >>>>>> quite busy :)) > > >>>>>> I started working on updating the documentation a couple of days > > >>>>>> ago, but in the > > >>>>>> process discovered that there are some invalid rows in > > >>>>>> core.DatabaseDocumentation > > >>>>>> that should be corrected first. A query shows that there are 73 > > >>>>>> rows in this > > >>>>>> table that reference nonexistent columns in GUS 3.0. For the most > > >>>>>> part I think > > >>>>>> that these are relatively minor problems stemming from the fact > > >>>>>> that the schema > > >>>>>> has been updated more recently than the documentation. However, > > >>>>>> there are also > > >>>>>> a few rows that suggest we need to improve the plugin and/or > > >>>>>> procedure used to > > >>>>>> populate this table. For example, the following rows have spaces > > >>>>>> in the column > > >>>>>> name (attribute_name), probably because the input files were > > >>>>>> invalid and the plugin > > >>>>>> has no restrictions on the format of the attribute_name: > > >>>>>> > > >>>>>> DATABASE_DOCUMENTATION_ID > > >>>>>> ------------------------- > > >>>>>> ATTRIBUTE_NAME > > >>>>>> ------------------------------------------------------------------ > > >>>>>> -------------- > > >>>>>> 1419 > > >>>>>> bio_material_id fk to LabelledExtract view of BioMaterial > > >>>>>> > > >>>>>> 1103 > > >>>>>> bio_source_characteristic_id primary key > > >>>>>> > > >>>>>> 1120 > > >>>>>> treatment_id fk to Treatment > > >>>>>> > > >>>>>> DATABASE_DOCUMENTATION_ID > > >>>>>> ------------------------- > > >>>>>> ATTRIBUTE_NAME > > >>>>>> ------------------------------------------------------------------ > > >>>>>> -------------- > > >>>>>> 1374 > > >>>>>> review_status_id The identifer of the review status > > >>>>>> > > >>>>>> 1418 > > >>>>>> assay_id fk to Assay > > >>>>>> > > >>>>>> 1373 > > >>>>>> synonym_name The gene symbol > > >>>>>> > > >>>>>> 6 rows selected. > > >>>>>> > > >>>>>> Also, as an aside (and not a comment to you in particular), it > > >>>>>> strikes me that > > >>>>>> column "documentation" of the form "fk to Table X" and "Primary > > >>>>>> key" could be > > >>>>>> generated automatically from the schema. However, comments on > > >>>>>> foreign keys > > >>>>>> are useful if they identify the specific subclass (i.e. view) to > > >>>>>> which the > > >>>>>> reference is expected to link, or if they explain what the > > >>>>>> referenced value is > > >>>>>> used for (if not obvious). Anyway, since there are still some > > >>>>>> minor schema > > >>>>>> changes taking place, I think that next week might be a good time > > >>>>>> to worry > > >>>>>> about updating all the documentation, since the database will be > > >>>>>> locked down > > >>>>>> for the migration at that point anyway. As for the controlled > > >>>>>> vocabularies, > > >>>>>> I think you're right, and we should try to populate these as soon > > >>>>>> as we can, > > >>>>>> even if it will be an iterative process in some cases. > > >>>>>> > > >>>>>> Jonathan > > >>>>>> > > >>>>>> > > >>>>>> > > >> > > >> ------------------------------------------------------- > > >> This SF.NET email is sponsored by: FREE SSL Guide from Thawte > > >> are you planning your Web Server Security? Click here to get a FREE > > >> Thawte SSL guide and find the answers to all your SSL security > > >> issues. > > >> http://ads.sourceforge.net/cgi-bin/redirect.pl?thaw0026en > > >> _______________________________________________ > > >> Gusdev-gusdev mailing list > > >> Gus...@li... > > >> https://lists.sourceforge.net/lists/listinfo/gusdev-gusdev > > > > > > > > > > > > ------------------------------------------------------- > > > This SF.NET email is sponsored by: FREE SSL Guide from Thawte > > > are you planning your Web Server Security? Click here to get a FREE > > > Thawte SSL guide and find the answers to all your SSL security issues. > > > http://ads.sourceforge.net/cgi-bin/redirect.pl?thaw0026en > > > _______________________________________________ > > > Gusdev-gusdev mailing list > > > Gus...@li... > > > https://lists.sourceforge.net/lists/listinfo/gusdev-gusdev > > > > > > > ------------------------------------------------------- > > This SF.NET email is sponsored by: FREE SSL Guide from Thawte > > are you planning your Web Server Security? Click here to get a FREE > > Thawte SSL guide and find the answers to all your SSL security issues. > > http://ads.sourceforge.net/cgi-bin/redirect.pl?thaw0026en > > _______________________________________________ > > Gusdev-gusdev mailing list > > Gus...@li... > > https://lists.sourceforge.net/lists/listinfo/gusdev-gusdev > > > > ------------------------------------------------------- > This SF.NET email is sponsored by: FREE SSL Guide from Thawte > are you planning your Web Server Security? Click here to get a FREE > Thawte SSL guide and find the answers to all your SSL security issues. > http://ads.sourceforge.net/cgi-bin/redirect.pl?thaw0026en > _______________________________________________ > Gusdev-gusdev mailing list > Gus...@li... > https://lists.sourceforge.net/lists/listinfo/gusdev-gusdev > -- Chris Stoeckert, Ph.D. Research Associate Professor, Dept. of Genetics Center for Bioinformatics, University of Pennsylvania 423 Guardian Dr., Philadelphia, PA 19104 Ph: 215-573-4409 FAX:215-573-3111 |
From: mazz <ma...@sn...> - 2003-01-20 20:51:18
|
Chris, This is the designation for the table Object. package GUS::Model::DoTS::Gene; I thought this is what is needed for the table tag. Joan This is what it is for a RAD table package GUS::Model::RAD3::ElementAnnotation; Chris Stoeckert wrote: > Dear Joan and Arnaud, > The CVS structure should not come into the XML used by the plug-in. It > is my understanding that only the actual schema of the structure: > Database.Namespace.Table.Attribute should be used. > > Chris > > On Monday, January 20, 2003, at 10:23 AM, mazz wrote: > > > Dear Arnaud, > > > > Model is a directory of Steve's new CVS structure under which the DoTS > > table Objects (eg > > > > Gene) are found. > > I do not know why Steve named the directory Model. > > > > Joan > > > > Arnaud Kerhornou wrote: > > > >> Hi Joan > >> > >> Thanks. Just a quick question, what is Model for ? > >> > >> <GUS::Model::DoTS::Gene> > >> > >> Arnaud > >> > >> mazz wrote: > >> > >>> Hi Arnaud, > >>> > >>> > >>> Below is a sample of the XML for a table (e.g. Gene) the plugin > >>> will > >>> use. > >>> The controlled vocabulary table DoTS::EffectorActionType also needs > >>> to > >>> be populated. > >>> > >>> I will try to go though and make a list of the new controlled > >>> vocabulary > >>> tables. > >>> Tables such as geneCategory & rnaCategory are tables I created for my > >>> planned future annotation tasks. > >>> > >>> > >>> Joan > >>> > >>> <GUS::Model::DoTS::Gene> > >>> <gene_id>10288603</gene_id> > >>> <name>test</name> > >>> <review_status_id>1</review_status_id> > >>> <description>gene desc test</description> > >>> <reviewer_summary>test</reviewer_summary> > >>> </GUS::Model::DoTS::Gene> > >>> > >>> Arnaud Kerhornou wrote: > >>> > >>> Arnaud Kerhornou wrote: > >>> > >>> > >>> > >>>> Hi Joan > >>>> > >>>> I'll get the new controlled vocabularies ready for population. If > >>>> you're planning to use the UpdateFromXML.pm plugin for populating > >>>> GUS > >>>> I should have examples. > >>>> > >>>> Regarding ComplexType it should be covered by GO component. > >>>> Regarding InteractionType, we need to find a controlled vocabulary > >>>> which I'm not aware of yet ! > >>>> > >>>> cheers > >>>> Arnaud > >>>> > >>>> mazz wrote: > >>>> > >>>> > >>>> > >>>>> Hi Jonathan, > >>>>> > >>>>> Perhaps we can ask Matt to revisit his documentation plugin. There > >>>>> are probably > >>>>> additional changes he will have to make for its use with GUS30 now. > >>>>> Also, I can send Arnaud an example of the XML for a table. We can > >>>>> use the XML to > >>>>> populate the rows of the controlled vocabulary tables (ids, terms > >>>>> (names) and > >>>>> definitions (descriptions). > >>>>> > >>>>> > >>>>> Joan > >>>>> > >>>>> Jonathan Crabtree wrote: > >>>>> > >>>>> > >>>>> > >>>>> > >>>>>> Hi Joan- > >>>>>> > >>>>>> Arnaud did supply us with documentation (attached) for the new > >>>>>> Phenotype tables, > >>>>>> but I just haven't loaded it into the database yet (I've also been > >>>>>> quite busy :)) > >>>>>> I started working on updating the documentation a couple of days > >>>>>> ago, but in the > >>>>>> process discovered that there are some invalid rows in > >>>>>> core.DatabaseDocumentation > >>>>>> that should be corrected first. A query shows that there are 73 > >>>>>> rows in this > >>>>>> table that reference nonexistent columns in GUS 3.0. For the most > >>>>>> part I think > >>>>>> that these are relatively minor problems stemming from the fact > >>>>>> that the schema > >>>>>> has been updated more recently than the documentation. However, > >>>>>> there are also > >>>>>> a few rows that suggest we need to improve the plugin and/or > >>>>>> procedure used to > >>>>>> populate this table. For example, the following rows have spaces > >>>>>> in the column > >>>>>> name (attribute_name), probably because the input files were > >>>>>> invalid and the plugin > >>>>>> has no restrictions on the format of the attribute_name: > >>>>>> > >>>>>> DATABASE_DOCUMENTATION_ID > >>>>>> ------------------------- > >>>>>> ATTRIBUTE_NAME > >>>>>> ------------------------------------------------------------------ > >>>>>> -------------- > >>>>>> 1419 > >>>>>> bio_material_id fk to LabelledExtract view of BioMaterial > >>>>>> > >>>>>> 1103 > >>>>>> bio_source_characteristic_id primary key > >>>>>> > >>>>>> 1120 > >>>>>> treatment_id fk to Treatment > >>>>>> > >>>>>> DATABASE_DOCUMENTATION_ID > >>>>>> ------------------------- > >>>>>> ATTRIBUTE_NAME > >>>>>> ------------------------------------------------------------------ > >>>>>> -------------- > >>>>>> 1374 > >>>>>> review_status_id The identifer of the review status > >>>>>> > >>>>>> 1418 > >>>>>> assay_id fk to Assay > >>>>>> > >>>>>> 1373 > >>>>>> synonym_name The gene symbol > >>>>>> > >>>>>> 6 rows selected. > >>>>>> > >>>>>> Also, as an aside (and not a comment to you in particular), it > >>>>>> strikes me that > >>>>>> column "documentation" of the form "fk to Table X" and "Primary > >>>>>> key" could be > >>>>>> generated automatically from the schema. However, comments on > >>>>>> foreign keys > >>>>>> are useful if they identify the specific subclass (i.e. view) to > >>>>>> which the > >>>>>> reference is expected to link, or if they explain what the > >>>>>> referenced value is > >>>>>> used for (if not obvious). Anyway, since there are still some > >>>>>> minor schema > >>>>>> changes taking place, I think that next week might be a good time > >>>>>> to worry > >>>>>> about updating all the documentation, since the database will be > >>>>>> locked down > >>>>>> for the migration at that point anyway. As for the controlled > >>>>>> vocabularies, > >>>>>> I think you're right, and we should try to populate these as soon > >>>>>> as we can, > >>>>>> even if it will be an iterative process in some cases. > >>>>>> > >>>>>> Jonathan > >>>>>> > >>>>>> > >>>>>> > >> > >> ------------------------------------------------------- > >> This SF.NET email is sponsored by: FREE SSL Guide from Thawte > >> are you planning your Web Server Security? Click here to get a FREE > >> Thawte SSL guide and find the answers to all your SSL security > >> issues. > >> http://ads.sourceforge.net/cgi-bin/redirect.pl?thaw0026en > >> _______________________________________________ > >> Gusdev-gusdev mailing list > >> Gus...@li... > >> https://lists.sourceforge.net/lists/listinfo/gusdev-gusdev > > > > > > > > ------------------------------------------------------- > > This SF.NET email is sponsored by: FREE SSL Guide from Thawte > > are you planning your Web Server Security? Click here to get a FREE > > Thawte SSL guide and find the answers to all your SSL security issues. > > http://ads.sourceforge.net/cgi-bin/redirect.pl?thaw0026en > > _______________________________________________ > > Gusdev-gusdev mailing list > > Gus...@li... > > https://lists.sourceforge.net/lists/listinfo/gusdev-gusdev > > > > ------------------------------------------------------- > This SF.NET email is sponsored by: FREE SSL Guide from Thawte > are you planning your Web Server Security? Click here to get a FREE > Thawte SSL guide and find the answers to all your SSL security issues. > http://ads.sourceforge.net/cgi-bin/redirect.pl?thaw0026en > _______________________________________________ > Gusdev-gusdev mailing list > Gus...@li... > https://lists.sourceforge.net/lists/listinfo/gusdev-gusdev |
From: Chris S. <sto...@pc...> - 2003-01-20 19:38:12
|
Dear Joan and Arnaud, The CVS structure should not come into the XML used by the plug-in. It is my understanding that only the actual schema of the structure: Database.Namespace.Table.Attribute should be used. Chris On Monday, January 20, 2003, at 10:23 AM, mazz wrote: > Dear Arnaud, > > Model is a directory of Steve's new CVS structure under which the DoTS > table Objects (eg > > Gene) are found. > I do not know why Steve named the directory Model. > > Joan > > Arnaud Kerhornou wrote: > >> Hi Joan >> >> Thanks. Just a quick question, what is Model for ? >> >> <GUS::Model::DoTS::Gene> >> >> Arnaud >> >> mazz wrote: >> >>> Hi Arnaud, >>> >>> >>> Below is a sample of the XML for a table (e.g. Gene) the plugin >>> will >>> use. >>> The controlled vocabulary table DoTS::EffectorActionType also needs >>> to >>> be populated. >>> >>> I will try to go though and make a list of the new controlled >>> vocabulary >>> tables. >>> Tables such as geneCategory & rnaCategory are tables I created for my >>> planned future annotation tasks. >>> >>> >>> Joan >>> >>> <GUS::Model::DoTS::Gene> >>> <gene_id>10288603</gene_id> >>> <name>test</name> >>> <review_status_id>1</review_status_id> >>> <description>gene desc test</description> >>> <reviewer_summary>test</reviewer_summary> >>> </GUS::Model::DoTS::Gene> >>> >>> Arnaud Kerhornou wrote: >>> >>> Arnaud Kerhornou wrote: >>> >>> >>> >>>> Hi Joan >>>> >>>> I'll get the new controlled vocabularies ready for population. If >>>> you're planning to use the UpdateFromXML.pm plugin for populating >>>> GUS >>>> I should have examples. >>>> >>>> Regarding ComplexType it should be covered by GO component. >>>> Regarding InteractionType, we need to find a controlled vocabulary >>>> which I'm not aware of yet ! >>>> >>>> cheers >>>> Arnaud >>>> >>>> mazz wrote: >>>> >>>> >>>> >>>>> Hi Jonathan, >>>>> >>>>> Perhaps we can ask Matt to revisit his documentation plugin. There >>>>> are probably >>>>> additional changes he will have to make for its use with GUS30 now. >>>>> Also, I can send Arnaud an example of the XML for a table. We can >>>>> use the XML to >>>>> populate the rows of the controlled vocabulary tables (ids, terms >>>>> (names) and >>>>> definitions (descriptions). >>>>> >>>>> >>>>> Joan >>>>> >>>>> Jonathan Crabtree wrote: >>>>> >>>>> >>>>> >>>>> >>>>>> Hi Joan- >>>>>> >>>>>> Arnaud did supply us with documentation (attached) for the new >>>>>> Phenotype tables, >>>>>> but I just haven't loaded it into the database yet (I've also been >>>>>> quite busy :)) >>>>>> I started working on updating the documentation a couple of days >>>>>> ago, but in the >>>>>> process discovered that there are some invalid rows in >>>>>> core.DatabaseDocumentation >>>>>> that should be corrected first. A query shows that there are 73 >>>>>> rows in this >>>>>> table that reference nonexistent columns in GUS 3.0. For the most >>>>>> part I think >>>>>> that these are relatively minor problems stemming from the fact >>>>>> that the schema >>>>>> has been updated more recently than the documentation. However, >>>>>> there are also >>>>>> a few rows that suggest we need to improve the plugin and/or >>>>>> procedure used to >>>>>> populate this table. For example, the following rows have spaces >>>>>> in the column >>>>>> name (attribute_name), probably because the input files were >>>>>> invalid and the plugin >>>>>> has no restrictions on the format of the attribute_name: >>>>>> >>>>>> DATABASE_DOCUMENTATION_ID >>>>>> ------------------------- >>>>>> ATTRIBUTE_NAME >>>>>> ------------------------------------------------------------------ >>>>>> -------------- >>>>>> 1419 >>>>>> bio_material_id fk to LabelledExtract view of BioMaterial >>>>>> >>>>>> 1103 >>>>>> bio_source_characteristic_id primary key >>>>>> >>>>>> 1120 >>>>>> treatment_id fk to Treatment >>>>>> >>>>>> DATABASE_DOCUMENTATION_ID >>>>>> ------------------------- >>>>>> ATTRIBUTE_NAME >>>>>> ------------------------------------------------------------------ >>>>>> -------------- >>>>>> 1374 >>>>>> review_status_id The identifer of the review status >>>>>> >>>>>> 1418 >>>>>> assay_id fk to Assay >>>>>> >>>>>> 1373 >>>>>> synonym_name The gene symbol >>>>>> >>>>>> 6 rows selected. >>>>>> >>>>>> Also, as an aside (and not a comment to you in particular), it >>>>>> strikes me that >>>>>> column "documentation" of the form "fk to Table X" and "Primary >>>>>> key" could be >>>>>> generated automatically from the schema. However, comments on >>>>>> foreign keys >>>>>> are useful if they identify the specific subclass (i.e. view) to >>>>>> which the >>>>>> reference is expected to link, or if they explain what the >>>>>> referenced value is >>>>>> used for (if not obvious). Anyway, since there are still some >>>>>> minor schema >>>>>> changes taking place, I think that next week might be a good time >>>>>> to worry >>>>>> about updating all the documentation, since the database will be >>>>>> locked down >>>>>> for the migration at that point anyway. As for the controlled >>>>>> vocabularies, >>>>>> I think you're right, and we should try to populate these as soon >>>>>> as we can, >>>>>> even if it will be an iterative process in some cases. >>>>>> >>>>>> Jonathan >>>>>> >>>>>> >>>>>> >> >> ------------------------------------------------------- >> This SF.NET email is sponsored by: FREE SSL Guide from Thawte >> are you planning your Web Server Security? Click here to get a FREE >> Thawte SSL guide and find the answers to all your SSL security >> issues. >> http://ads.sourceforge.net/cgi-bin/redirect.pl?thaw0026en >> _______________________________________________ >> Gusdev-gusdev mailing list >> Gus...@li... >> https://lists.sourceforge.net/lists/listinfo/gusdev-gusdev > > > > ------------------------------------------------------- > This SF.NET email is sponsored by: FREE SSL Guide from Thawte > are you planning your Web Server Security? Click here to get a FREE > Thawte SSL guide and find the answers to all your SSL security issues. > http://ads.sourceforge.net/cgi-bin/redirect.pl?thaw0026en > _______________________________________________ > Gusdev-gusdev mailing list > Gus...@li... > https://lists.sourceforge.net/lists/listinfo/gusdev-gusdev > |
From: Chris S. <sto...@pc...> - 2003-01-20 19:33:43
|
Hi Arnaud and Jonathan, >>>> -Modified DoTS.ProteinProperty table to reference >>>> ProteinPropertyType >>>> One question I have regarding these tables is how will the units be >>>> specified? >>>> Should I make the "property_value" column a varchar2 column? It >>>> may have had this type originally, and I might have changed it >>>> without considering the consequences. One option would be to >>>> specify in the ProteinPropertyType table >>>> what units are to be used, though this is clumsy if there is more >>>> than one >>>> choice of units for a given property. >>>> >>> Whatever the unit they're in, they should all be numbers (some would >>> be integer) so we can go for the "number" data type but float or >>> varchar could also be fine! >> >> >> Right, but the question is how does somebody querying the table know >> what >> a mass of "25" means? Are molecular masses always expressed in the >> same >> units, no matter what? My recollection is that you can sometimes have >> some pretty big polypeptides, but I don't know what the convention is. >> > If we want to query "value" attribute it might be better to have it as > a number. It doesn't matter for charge et isoelectric point pH but > you're right re. the molecular mass "25" could mean 25 Da but also 25 > kDa. Why not storing as a convention the molecular mass always in > Daltons and then the API code would do the conversion in kiloDaltons > if needed. This way we don't need a "unit" attribute in > ProteinPropertyType. This sounds dangerous and unenforceable. We should add a units field. This can either be a varchar or a foreign key to units stored are Sres:MGEDOntology terms. Chris |
From: Arnaud K. <ax...@sa...> - 2003-01-20 17:01:02
|
Hi Jonathan Jonathan Crabtree wrote: > > Arnaud- > >> A quick question regarding evidences, you're mentioning that the >> Evidence table will connect Features and Experimental evidences. >> Where will the latter be stored ? > > > Hopefully others will chime in if I get this wrong... I believe that the > relevant tables are DoTS.Comments (for free text notes/comments > entered by > an annotator) and SRes.BibliographicReference (for published > experiments.) > However, I don't think that we have a generic table to represent > unpublished > laboratory experiments in a structured way. Perhaps we need some use > cases > here? We do have your new table for representing RNAi constructs, but I > don't think that we have a corresponding table to represent the actual > RNAi experiment. Do we need/want such a table (either for RNAi > experiments > or in general) and, if so, how detailed does it need to be? Should be fine for now. In the future the evidence design could be extended for assignment of evidence codes, the same way it is done for GO annotations. > >>> -Modified DoTS.ProteinProperty table to reference ProteinPropertyType >>> One question I have regarding these tables is how will the units be >>> specified? >>> Should I make the "property_value" column a varchar2 column? It may >>> have had this type originally, and I might have changed it without >>> considering the consequences. One option would be to specify in the >>> ProteinPropertyType table >>> what units are to be used, though this is clumsy if there is more >>> than one >>> choice of units for a given property. >>> >> Whatever the unit they're in, they should all be numbers (some would >> be integer) so we can go for the "number" data type but float or >> varchar could also be fine! > > > Right, but the question is how does somebody querying the table know what > a mass of "25" means? Are molecular masses always expressed in the same > units, no matter what? My recollection is that you can sometimes have > some pretty big polypeptides, but I don't know what the convention is. > If we want to query "value" attribute it might be better to have it as a number. It doesn't matter for charge et isoelectric point pH but you're right re. the molecular mass "25" could mean 25 Da but also 25 kDa. Why not storing as a convention the molecular mass always in Daltons and then the API code would do the conversion in kiloDaltons if needed. This way we don't need a "unit" attribute in ProteinPropertyType. > > > Jonathan > Arnaud |
From: mazz <ma...@sn...> - 2003-01-20 15:22:34
|
Dear Arnaud, Model is a directory of Steve's new CVS structure under which the DoTS table Objects (eg Gene) are found. I do not know why Steve named the directory Model. Joan Arnaud Kerhornou wrote: > Hi Joan > > Thanks. Just a quick question, what is Model for ? > > <GUS::Model::DoTS::Gene> > > Arnaud > > mazz wrote: > > > Hi Arnaud, > > > > > >Below is a sample of the XML for a table (e.g. Gene) the plugin will > >use. > >The controlled vocabulary table DoTS::EffectorActionType also needs to > >be populated. > > > >I will try to go though and make a list of the new controlled vocabulary > >tables. > >Tables such as geneCategory & rnaCategory are tables I created for my > >planned future annotation tasks. > > > > > >Joan > > > ><GUS::Model::DoTS::Gene> > > <gene_id>10288603</gene_id> > > <name>test</name> > > <review_status_id>1</review_status_id> > > <description>gene desc test</description> > > <reviewer_summary>test</reviewer_summary> > ></GUS::Model::DoTS::Gene> > > > >Arnaud Kerhornou wrote: > > > >Arnaud Kerhornou wrote: > > > > > > > >>Hi Joan > >> > >>I'll get the new controlled vocabularies ready for population. If > >>you're planning to use the UpdateFromXML.pm plugin for populating GUS > >>I should have examples. > >> > >>Regarding ComplexType it should be covered by GO component. > >>Regarding InteractionType, we need to find a controlled vocabulary > >>which I'm not aware of yet ! > >> > >>cheers > >>Arnaud > >> > >>mazz wrote: > >> > >> > >> > >>>Hi Jonathan, > >>> > >>>Perhaps we can ask Matt to revisit his documentation plugin. There > >>>are probably > >>>additional changes he will have to make for its use with GUS30 now. > >>>Also, I can send Arnaud an example of the XML for a table. We can > >>>use the XML to > >>>populate the rows of the controlled vocabulary tables (ids, terms > >>>(names) and > >>>definitions (descriptions). > >>> > >>> > >>>Joan > >>> > >>>Jonathan Crabtree wrote: > >>> > >>> > >>> > >>> > >>>>Hi Joan- > >>>> > >>>>Arnaud did supply us with documentation (attached) for the new > >>>>Phenotype tables, > >>>>but I just haven't loaded it into the database yet (I've also been > >>>>quite busy :)) > >>>>I started working on updating the documentation a couple of days > >>>>ago, but in the > >>>>process discovered that there are some invalid rows in > >>>>core.DatabaseDocumentation > >>>>that should be corrected first. A query shows that there are 73 > >>>>rows in this > >>>>table that reference nonexistent columns in GUS 3.0. For the most > >>>>part I think > >>>>that these are relatively minor problems stemming from the fact > >>>>that the schema > >>>>has been updated more recently than the documentation. However, > >>>>there are also > >>>>a few rows that suggest we need to improve the plugin and/or > >>>>procedure used to > >>>>populate this table. For example, the following rows have spaces > >>>>in the column > >>>>name (attribute_name), probably because the input files were > >>>>invalid and the plugin > >>>>has no restrictions on the format of the attribute_name: > >>>> > >>>>DATABASE_DOCUMENTATION_ID > >>>>------------------------- > >>>>ATTRIBUTE_NAME > >>>>-------------------------------------------------------------------------------- > >>>> 1419 > >>>>bio_material_id fk to LabelledExtract view of BioMaterial > >>>> > >>>> 1103 > >>>>bio_source_characteristic_id primary key > >>>> > >>>> 1120 > >>>>treatment_id fk to Treatment > >>>> > >>>>DATABASE_DOCUMENTATION_ID > >>>>------------------------- > >>>>ATTRIBUTE_NAME > >>>>-------------------------------------------------------------------------------- > >>>> 1374 > >>>>review_status_id The identifer of the review status > >>>> > >>>> 1418 > >>>>assay_id fk to Assay > >>>> > >>>> 1373 > >>>>synonym_name The gene symbol > >>>> > >>>>6 rows selected. > >>>> > >>>>Also, as an aside (and not a comment to you in particular), it > >>>>strikes me that > >>>>column "documentation" of the form "fk to Table X" and "Primary > >>>>key" could be > >>>>generated automatically from the schema. However, comments on > >>>>foreign keys > >>>>are useful if they identify the specific subclass (i.e. view) to > >>>>which the > >>>>reference is expected to link, or if they explain what the > >>>>referenced value is > >>>>used for (if not obvious). Anyway, since there are still some > >>>>minor schema > >>>>changes taking place, I think that next week might be a good time > >>>>to worry > >>>>about updating all the documentation, since the database will be > >>>>locked down > >>>>for the migration at that point anyway. As for the controlled > >>>>vocabularies, > >>>>I think you're right, and we should try to populate these as soon > >>>>as we can, > >>>>even if it will be an iterative process in some cases. > >>>> > >>>>Jonathan > >>>> > >>>> > >>>> > > ------------------------------------------------------- > This SF.NET email is sponsored by: FREE SSL Guide from Thawte > are you planning your Web Server Security? Click here to get a FREE > Thawte SSL guide and find the answers to all your SSL security issues. > http://ads.sourceforge.net/cgi-bin/redirect.pl?thaw0026en > _______________________________________________ > Gusdev-gusdev mailing list > Gus...@li... > https://lists.sourceforge.net/lists/listinfo/gusdev-gusdev |
From: Arnaud K. <ax...@sa...> - 2003-01-20 12:59:47
|
Hi Joan Thanks. Just a quick question, what is Model for ? <GUS::Model::DoTS::Gene> Arnaud mazz wrote: > Hi Arnaud, > > >Below is a sample of the XML for a table (e.g. Gene) the plugin will >use. >The controlled vocabulary table DoTS::EffectorActionType also needs to >be populated. > >I will try to go though and make a list of the new controlled vocabulary >tables. >Tables such as geneCategory & rnaCategory are tables I created for my >planned future annotation tasks. > > >Joan > ><GUS::Model::DoTS::Gene> > <gene_id>10288603</gene_id> > <name>test</name> > <review_status_id>1</review_status_id> > <description>gene desc test</description> > <reviewer_summary>test</reviewer_summary> ></GUS::Model::DoTS::Gene> > >Arnaud Kerhornou wrote: > >Arnaud Kerhornou wrote: > > > >>Hi Joan >> >>I'll get the new controlled vocabularies ready for population. If >>you're planning to use the UpdateFromXML.pm plugin for populating GUS >>I should have examples. >> >>Regarding ComplexType it should be covered by GO component. >>Regarding InteractionType, we need to find a controlled vocabulary >>which I'm not aware of yet ! >> >>cheers >>Arnaud >> >>mazz wrote: >> >> >> >>>Hi Jonathan, >>> >>>Perhaps we can ask Matt to revisit his documentation plugin. There >>>are probably >>>additional changes he will have to make for its use with GUS30 now. >>>Also, I can send Arnaud an example of the XML for a table. We can >>>use the XML to >>>populate the rows of the controlled vocabulary tables (ids, terms >>>(names) and >>>definitions (descriptions). >>> >>> >>>Joan >>> >>>Jonathan Crabtree wrote: >>> >>> >>> >>> >>>>Hi Joan- >>>> >>>>Arnaud did supply us with documentation (attached) for the new >>>>Phenotype tables, >>>>but I just haven't loaded it into the database yet (I've also been >>>>quite busy :)) >>>>I started working on updating the documentation a couple of days >>>>ago, but in the >>>>process discovered that there are some invalid rows in >>>>core.DatabaseDocumentation >>>>that should be corrected first. A query shows that there are 73 >>>>rows in this >>>>table that reference nonexistent columns in GUS 3.0. For the most >>>>part I think >>>>that these are relatively minor problems stemming from the fact >>>>that the schema >>>>has been updated more recently than the documentation. However, >>>>there are also >>>>a few rows that suggest we need to improve the plugin and/or >>>>procedure used to >>>>populate this table. For example, the following rows have spaces >>>>in the column >>>>name (attribute_name), probably because the input files were >>>>invalid and the plugin >>>>has no restrictions on the format of the attribute_name: >>>> >>>>DATABASE_DOCUMENTATION_ID >>>>------------------------- >>>>ATTRIBUTE_NAME >>>>-------------------------------------------------------------------------------- >>>> 1419 >>>>bio_material_id fk to LabelledExtract view of BioMaterial >>>> >>>> 1103 >>>>bio_source_characteristic_id primary key >>>> >>>> 1120 >>>>treatment_id fk to Treatment >>>> >>>>DATABASE_DOCUMENTATION_ID >>>>------------------------- >>>>ATTRIBUTE_NAME >>>>-------------------------------------------------------------------------------- >>>> 1374 >>>>review_status_id The identifer of the review status >>>> >>>> 1418 >>>>assay_id fk to Assay >>>> >>>> 1373 >>>>synonym_name The gene symbol >>>> >>>>6 rows selected. >>>> >>>>Also, as an aside (and not a comment to you in particular), it >>>>strikes me that >>>>column "documentation" of the form "fk to Table X" and "Primary >>>>key" could be >>>>generated automatically from the schema. However, comments on >>>>foreign keys >>>>are useful if they identify the specific subclass (i.e. view) to >>>>which the >>>>reference is expected to link, or if they explain what the >>>>referenced value is >>>>used for (if not obvious). Anyway, since there are still some >>>>minor schema >>>>changes taking place, I think that next week might be a good time >>>>to worry >>>>about updating all the documentation, since the database will be >>>>locked down >>>>for the migration at that point anyway. As for the controlled >>>>vocabularies, >>>>I think you're right, and we should try to populate these as soon >>>>as we can, >>>>even if it will be an iterative process in some cases. >>>> >>>>Jonathan >>>> >>>> >>>> |
From: mazz <ma...@sn...> - 2003-01-19 18:45:13
|
Hi Arnaud, Below is a sample of the XML for a table (e.g. Gene) the plugin will use. The controlled vocabulary table DoTS::EffectorActionType also needs to be populated. I will try to go though and make a list of the new controlled vocabulary tables. Tables such as geneCategory & rnaCategory are tables I created for my planned future annotation tasks. Joan <GUS::Model::DoTS::Gene> <gene_id>10288603</gene_id> <name>test</name> <review_status_id>1</review_status_id> <description>gene desc test</description> <reviewer_summary>test</reviewer_summary> </GUS::Model::DoTS::Gene> Arnaud Kerhornou wrote: Arnaud Kerhornou wrote: > Hi Joan > > I'll get the new controlled vocabularies ready for population. If > you're planning to use the UpdateFromXML.pm plugin for populating GUS > I should have examples. > > Regarding ComplexType it should be covered by GO component. > Regarding InteractionType, we need to find a controlled vocabulary > which I'm not aware of yet ! > > cheers > Arnaud > > mazz wrote: > >> Hi Jonathan, >> >> Perhaps we can ask Matt to revisit his documentation plugin. There >> are probably >> additional changes he will have to make for its use with GUS30 now. >> Also, I can send Arnaud an example of the XML for a table. We can >> use the XML to >> populate the rows of the controlled vocabulary tables (ids, terms >> (names) and >> definitions (descriptions). >> >> >> Joan >> >> Jonathan Crabtree wrote: >> >> >> > Hi Joan- >> > >> > Arnaud did supply us with documentation (attached) for the new >> > Phenotype tables, >> > but I just haven't loaded it into the database yet (I've also been >> > quite busy :)) >> > I started working on updating the documentation a couple of days >> > ago, but in the >> > process discovered that there are some invalid rows in >> > core.DatabaseDocumentation >> > that should be corrected first. A query shows that there are 73 >> > rows in this >> > table that reference nonexistent columns in GUS 3.0. For the most >> > part I think >> > that these are relatively minor problems stemming from the fact >> > that the schema >> > has been updated more recently than the documentation. However, >> > there are also >> > a few rows that suggest we need to improve the plugin and/or >> > procedure used to >> > populate this table. For example, the following rows have spaces >> > in the column >> > name (attribute_name), probably because the input files were >> > invalid and the plugin >> > has no restrictions on the format of the attribute_name: >> > >> > DATABASE_DOCUMENTATION_ID >> > ------------------------- >> > ATTRIBUTE_NAME >> > -------------------------------------------------------------------------------- >> > 1419 >> > bio_material_id fk to LabelledExtract view of BioMaterial >> > >> > 1103 >> > bio_source_characteristic_id primary key >> > >> > 1120 >> > treatment_id fk to Treatment >> > >> > DATABASE_DOCUMENTATION_ID >> > ------------------------- >> > ATTRIBUTE_NAME >> > -------------------------------------------------------------------------------- >> > 1374 >> > review_status_id The identifer of the review status >> > >> > 1418 >> > assay_id fk to Assay >> > >> > 1373 >> > synonym_name The gene symbol >> > >> > 6 rows selected. >> > >> > Also, as an aside (and not a comment to you in particular), it >> > strikes me that >> > column "documentation" of the form "fk to Table X" and "Primary >> > key" could be >> > generated automatically from the schema. However, comments on >> > foreign keys >> > are useful if they identify the specific subclass (i.e. view) to >> > which the >> > reference is expected to link, or if they explain what the >> > referenced value is >> > used for (if not obvious). Anyway, since there are still some >> > minor schema >> > changes taking place, I think that next week might be a good time >> > to worry >> > about updating all the documentation, since the database will be >> > locked down >> > for the migration at that point anyway. As for the controlled >> > vocabularies, >> > I think you're right, and we should try to populate these as soon >> > as we can, >> > even if it will be an iterative process in some cases. >> > >> > Jonathan >> > >> > -- >> > Jonathan Crabtree >> > Center for Bioinformatics, University of Pennsylvania >> > 1406 Blockley Hall, 423 Guardian Drive Philadelphia, PA 19104-6021 >> > 215-573-3115 >> > >> > >> > ------------------------------------------------------------------------ >> > Name: gus.phenotype_draft.doc >> > gus.phenotype_draft.doc Type: Winword File >> > (application/msword) >> > Encoding: base64 >> > |
From: Arnaud K. <ax...@sa...> - 2003-01-17 23:34:40
|
Quoting Jonathan Crabtree <cra...@pc...>: > > Arnaud- > > > Here two examples of transposable elements annotations, one is from > > Tbrucei, the other one is a common one in procaryote genomes. > > > > The first one in the inclusion of a INGI transposon within an ORF, the > > RHS gene. The transposon includes two RIME flanking repeats and another > ORF. > > So in GUS, the INGI transposon could be stored as a transposable element > > feature, attached to a RHS gene feature. The transposable element > > feature will have three sub features, a gene feature, tagged as a > > pseudo-gene and two repeat features, which repeat_type is RIME and with > > a given location. > > So in the "current" schema (meaning that I'm assuming we have only a single > repeat-related view, called RepeatRegionNAFeature, which is the NA > equivalent > of RepeatRegionAAFeature), the picture would look like this: > > <DoTS::GenomicSequence> > ^ ^ ^ ^ > | | | | > <DoTS::GeneFeature (RHS)> | | | > ^ | | | > | | | | > <DoTS::TransposableElement (INGI)> | | > ^ ^ | | > | | | | > | 2 x <DoTS::RepeatRegionNAFeature (RIME)> | > | | > ------------------------<DoTS::GeneFeature (pseudo)> > > -For each feature the leftmost arrow shows the parent_id, the rightmost > arrow shows the na_sequence_id. > -All of the features will have a location specified in terms of the > genomic sequence (because that's what their na_sequence_id references.) > -I have to create 2 RepeatRegionNAFeatures under my definition, because > the RIME repeats are not adjacent to one another. > -Presumably the transposable element is contained in the coding region > of a single exon, so the parent feature could be an ExonFeature instead > of a GeneFeature. > -Note that parent_id is typically used to indicate a part-whole > relationship, in the sense that the part *must* have a corresponding > whole (e.g. Exon to Gene). In the above picture and our discussions > on this topic we've generalized its usage to also encompass the > concept that one feature "happens to be" part of another i.e., > that its NALocation is strictly within the bounds of its parent's > NALocation, but that this need not be the case by definition. > > And I believe your proposal is for something that looks more like this: > > <DoTS::GenomicSequence> > ^ ^ ^ ^ ^ > | | | | | > <DoTS::GeneFeature (RHS)> | | | | > ^ | | | | > | | | | | > <DoTS::TransposableElement (INGI)> | | | > ^ ^ | | | > | | | | | > | <DoTS::RepeatRegionNAFeature> | | > | ^ | | > | | | | > | 2 x <DoTS::RepeatFeature (RIME)> | > | | > | | > ------------------------<DoTS::GeneFeature (pseudo)> > My proposal is this representation without the repeat region feature. I would see the repeat region feature to cluster together a sequence, whatever the sequence is (even one base, or more), repeated X times, but not being used in this situation. > In other words, the RepeatRegionNAFeature serves only to group the two RIME > repeats (which aren't even immediately adjacent to one another.) Is this > what you had in mind? I don't think we need to group them with a repeat region feature, as the transposable element would do it. Or did you mean to make the RepeatRegionNAFeature a > child of the GeneFeature and then make the TransposableElement a child of > the RepeatRegionNAFeature? I'm just not clear on your definition of > "repeat > region". Specifically, can a repeat region contain things that are not > repeats, Yes ! a gene for example !! A repeat region would be used to cluster tandemly repeated genes. But this should be fine as long as a gene feature can be attached to a repeat region. and can it contain more than one type of repeat? I think we agree on only one type of repeat unit and if it has more, we would nest the repeat region features. We din't come here with a repeat region made of interlaced repeat units which would require to make the schema more generic. And, if so, how > does one assign bounds to the region in a non-arbitrary way? > > > The second example is nested transposable elements in procaryote > > genomes, ie insertion of a transposable element within another one. Each > > transposable element can have a similar structure including the > > following sub features : two flanking Inverted Repeats, a gene and its > > promoter and/or a promoter, functional on the other strand ! > > I won't try to draw the pictures for this one! In both the current schema > and your proposal I think we have the problem that we haev no way of > explicitly representing the relationship between the two flanking inverted > repeats. But we don't need to !? Apart from that, however, I think that we can handle this case > just as well as the first. You have to create quite a few features, but > I don't think there's any way to avoid that unless we want to come up with > some "exemplar" transposons and use them to classify the instances we > encounter. The promoter/gene that's functional on the opposite strand > would be represented simply as reverse-strand features (i.e., we'd set > the is_reversed flag in their NALocations, but still use their parent_ids > to indicate their place in the nested repeat structure.) > > > So if there is no repeat feature, the flanking repeats will have to be > > annotated part of the transposable element feature. > > Let me know what you think about these. > > But shouldn't they be part of the transposable element feature? I don't > know the details of this specific type of transposon, but are you trying > to make the distinction between: 1) the core transposon, i.e., the > machinery > that enables that part of the genome (encompassing both the machinery and > perhaps some variable-sized flanking regions) to move around and 2) the > "transposed" element, i.e. the core machinery plus whatever flanking > regions happened to be carried along on the element's most recent trip > (the one that brought it to its current location.)? > I think we want to represent a transposable element in a given context, ie at a given location because this insertion may have consequences, (in)activating a gene or shifting the frame of a gene etc. A core transposon should be represented as an entity on its own like genes are. > > Jonathan > > -- > Jonathan Crabtree > Center for Bioinformatics, University of Pennsylvania > 1406 Blockley Hall, 423 Guardian Drive Philadelphia, PA 19104-6021 > 215-573-3115 > > Arnaud |
From: Jonathan C. <cra...@pc...> - 2003-01-17 19:23:33
|
Arnaud- > A quick question regarding evidences, you're mentioning that the > Evidence table will connect Features and Experimental evidences. Where > will the latter be stored ? Hopefully others will chime in if I get this wrong... I believe that the relevant tables are DoTS.Comments (for free text notes/comments entered by an annotator) and SRes.BibliographicReference (for published experiments.) However, I don't think that we have a generic table to represent unpublished laboratory experiments in a structured way. Perhaps we need some use cases here? We do have your new table for representing RNAi constructs, but I don't think that we have a corresponding table to represent the actual RNAi experiment. Do we need/want such a table (either for RNAi experiments or in general) and, if so, how detailed does it need to be? > Here two examples of transposable elements annotations, one is from > Tbrucei, the other one is a common one in procaryote genomes. > > The first one in the inclusion of a INGI transposon within an ORF, the > RHS gene. The transposon includes two RIME flanking repeats and another ORF. > So in GUS, the INGI transposon could be stored as a transposable element > feature, attached to a RHS gene feature. The transposable element > feature will have three sub features, a gene feature, tagged as a > pseudo-gene and two repeat features, which repeat_type is RIME and with > a given location. So in the "current" schema (meaning that I'm assuming we have only a single repeat-related view, called RepeatRegionNAFeature, which is the NA equivalent of RepeatRegionAAFeature), the picture would look like this: <DoTS::GenomicSequence> ^ ^ ^ ^ | | | | <DoTS::GeneFeature (RHS)> | | | ^ | | | | | | | <DoTS::TransposableElement (INGI)> | | ^ ^ | | | | | | | 2 x <DoTS::RepeatRegionNAFeature (RIME)> | | | ------------------------<DoTS::GeneFeature (pseudo)> -For each feature the leftmost arrow shows the parent_id, the rightmost arrow shows the na_sequence_id. -All of the features will have a location specified in terms of the genomic sequence (because that's what their na_sequence_id references.) -I have to create 2 RepeatRegionNAFeatures under my definition, because the RIME repeats are not adjacent to one another. -Presumably the transposable element is contained in the coding region of a single exon, so the parent feature could be an ExonFeature instead of a GeneFeature. -Note that parent_id is typically used to indicate a part-whole relationship, in the sense that the part *must* have a corresponding whole (e.g. Exon to Gene). In the above picture and our discussions on this topic we've generalized its usage to also encompass the concept that one feature "happens to be" part of another i.e., that its NALocation is strictly within the bounds of its parent's NALocation, but that this need not be the case by definition. And I believe your proposal is for something that looks more like this: <DoTS::GenomicSequence> ^ ^ ^ ^ ^ | | | | | <DoTS::GeneFeature (RHS)> | | | | ^ | | | | | | | | | <DoTS::TransposableElement (INGI)> | | | ^ ^ | | | | | | | | | <DoTS::RepeatRegionNAFeature> | | | ^ | | | | | | | 2 x <DoTS::RepeatFeature (RIME)> | | | | | ------------------------<DoTS::GeneFeature (pseudo)> In other words, the RepeatRegionNAFeature serves only to group the two RIME repeats (which aren't even immediately adjacent to one another.) Is this what you had in mind? Or did you mean to make the RepeatRegionNAFeature a child of the GeneFeature and then make the TransposableElement a child of the RepeatRegionNAFeature? I'm just not clear on your definition of "repeat region". Specifically, can a repeat region contain things that are not repeats, and can it contain more than one type of repeat? And, if so, how does one assign bounds to the region in a non-arbitrary way? > The second example is nested transposable elements in procaryote > genomes, ie insertion of a transposable element within another one. Each > transposable element can have a similar structure including the > following sub features : two flanking Inverted Repeats, a gene and its > promoter and/or a promoter, functional on the other strand ! I won't try to draw the pictures for this one! In both the current schema and your proposal I think we have the problem that we haev no way of explicitly representing the relationship between the two flanking inverted repeats. Apart from that, however, I think that we can handle this case just as well as the first. You have to create quite a few features, but I don't think there's any way to avoid that unless we want to come up with some "exemplar" transposons and use them to classify the instances we encounter. The promoter/gene that's functional on the opposite strand would be represented simply as reverse-strand features (i.e., we'd set the is_reversed flag in their NALocations, but still use their parent_ids to indicate their place in the nested repeat structure.) > So if there is no repeat feature, the flanking repeats will have to be > annotated part of the transposable element feature. > Let me know what you think about these. But shouldn't they be part of the transposable element feature? I don't know the details of this specific type of transposon, but are you trying to make the distinction between: 1) the core transposon, i.e., the machinery that enables that part of the genome (encompassing both the machinery and perhaps some variable-sized flanking regions) to move around and 2) the "transposed" element, i.e. the core machinery plus whatever flanking regions happened to be carried along on the element's most recent trip (the one that brought it to its current location.)? >>-Modified DoTS.ProteinProperty table to reference ProteinPropertyType >> One question I have regarding these tables is how will the units be specified? >> Should I make the "property_value" column a varchar2 column? It may have had >> this type originally, and I might have changed it without considering the >> consequences. One option would be to specify in the ProteinPropertyType table >> what units are to be used, though this is clumsy if there is more than one >> choice of units for a given property. >> > Whatever the unit they're in, they should all be numbers (some would be > integer) so we can go for the "number" data type but float or varchar > could also be fine! Right, but the question is how does somebody querying the table know what a mass of "25" means? Are molecular masses always expressed in the same units, no matter what? My recollection is that you can sometimes have some pretty big polypeptides, but I don't know what the convention is. > I reckon ReplicationOriginFeature would make more sense OK, I'll make this change. Jonathan -- Jonathan Crabtree Center for Bioinformatics, University of Pennsylvania 1406 Blockley Hall, 423 Guardian Drive Philadelphia, PA 19104-6021 215-573-3115 |
From: Arnaud K. <ax...@sa...> - 2003-01-17 16:34:02
|
Hi Jonathan Jonathan Crabtree wrote: >Arnaud - > > > >>>Which DNA/RNA features do you mean (other than those mentioned above)? >>> >>> >>The file I sent you should include views on the top of NAFeatureImp >>table. Here the list : >> >> > >Yes, you're absolutely right; there was a period when I wasn't paying very >close attention to the schema mailing list, and I'm afraid I misplaced a >couple of the files you sent, at least temporarily. I believe I've >now added all the views and tables that you originally proposed, with >some minor modifications to take into account discussions we've had since >then. See the attached text file for a complete list of the changes I've >made this time around. > > > >>Yes we had! So regarding chromosome regions, shall we keep >>TelomereFeature and CentromereFeature ? >> >> > >No, I think we should use ChromosomeElementFeature instead; I've created >this view based on the ChromosomeElement view you suggested, but with a >couple of additional columns to handle the data currently in >gusdev.TelomereFeature and gusdev.CentromereFeature. > > > >>>At >>>the other extreme, we could continue what we're doing now, i.e. using >>>an ad-hoc classification of features based on the data we actually have >>>available, and just make sure that every feature is tagged with the >>>correct sequence ontology term. Any thoughts? >>> >>> >>It makes sense as SO may undergo revisions this year. >> >> > >OK, as noted in the attachment, I've added sequence_ontology_id to *all* >views of NAFeatureImp and AAFeatureImp. > > > >>>>A controlled vocabulary table with the four attributes you've >>>>mentioned is fine. >>>> >>>> > >Done; it's called ProteinPropertyType, and the schema/contents are >described in the attached list of changes. > > > >>>>As you're going to add a extra attribute sequence_ontology_id to the >>>>NA Features, could you do the same to any AA Features ? >>>> >>>> > >OK, done. > > > >>The way the SignalPeptideFeature is designed make difficult the >>annotation of localization signal features. We can leave >>SignalPeptideFeature as it is as it fits with SignalP software >>prediction and in the future create a new feature LocalizationSignalFeature. >> >> > >OK, based on our discussion today the only change I've made to >SignalPeptideFeature is to add the sequence_ontology_id, which can be >used to reference the different localization ontology terms that you >mentioned. A column has been added to SequenceOntology to let us store >multiple ontologies (and versions thereof) in the same table. >Experimental evidence, references, and annotator's comments can be linked >to SignalPeptideFeature (or a future LocalizationSignalFeature view) using >DoTS.Evidence. > > A quick question regarding evidences, you're mentioning that the Evidence table will connect Features and Experimental evidences. Where will the latter be stored ? > > >>>>I reckon they could be merged. >>>> >>>> > >(This comment was in reference to incorporating TM domain features into >the DomainFeature view.) I've added a "number_of_domains" column to >DomainFeature to permit this. We will *not* have a separate view >specifically for TM domain features. > > > >>>I also realized belatedly that I could have left the Interaction table >>>unchanged, rather than introducing specific references to RowSet. This >>>would have allowed us to represent either singleton effectors/targets or >>>set-valued effectors/targets, without having to always join through >>>RowSet >>>in the singleton case. On the other hand, if we do associate some >>>additional information with the RowSets, then the current representation >>>is correct. >>> >>> >>It depends if we want to represent many-to-many relationship between >>interaction and members of this interaction. Without the RowSet table, >>we can't assign a set of several effectors/targets, right ? Unless we >>consider that this set of effectors are being part of a complex and act >>as the whole. >> >> > >It's true that without the RowSet table we can't assign a set of several >effectors or targets. What I was trying to say was that I replaced the >following rows in DoTS.Interaction-- > effector_table_id > effector_row_id (or something to that effect) > >using instead a single row that references a RowSet: > effector_row_set_id > >However, I could have left the Interaction table unchanged, and used the >effector_table_id and effector_row_id to reference entries in the RowSet >table (in the case where there are multiple effectors.) With this >approach one would have the choice of either using or not using the RowSet >table on a case-by-case basis. I don't think it's too important which way >we do this; on the one hand you save a join when you only need to reference >a single effector/target (using the table_id/row_id approach) but on the >other hand with the row_set_id approach you can write uniform code and >also have an enforceable referential integrity constraint. So barring any >strong objection, I'll leave the table as it is now (i.e., with explicit >references to RowSet, meaning that you always have to have a RowSet even >when the effector or target is a single object.) > > fine, I think this way is more consistent as storing one and storing more than one effectors will be done the same way. > > >>A case we came across here for Tbrucei is nested repeat regions (at the >>DNA level). Each repeat region has coordinates and is annotated with a >>unique repeat unit type. This repeat region can be within a bigger >>repeat region annotated with a different repeat unit type. >>... which is in other words your suggestion with parent_id as an extra >>attribute ... >> >> > >I haven't added the parent_id yet, but I'll do so. > > > >>Regarding transposon repeat types, if we have a TransposableElement >>feature and its type is given as an attribute, a repeat feature will >>just be useful to locate the LTRs within a given a transposable element. >>Can we keep this functionality ? Then the feature will be simple, just a >>repeat_type, and a parent_id atributes. >> >> > >Are you saying that we still need the two tables/features, one for >RepeatFeature, the other for RepeatRegionFeature? Could you give me a >specific example of how you would envision using these tables (and also >these tables in conjunction with the TransposableElement view, under the >assumption that they're all equipped with parent_ids)? > > Here two examples of transposable elements annotations, one is from Tbrucei, the other one is a common one in procaryote genomes. The first one in the inclusion of a INGI transposon within an ORF, the RHS gene. The transposon includes two RIME flanking repeats and another ORF. So in GUS, the INGI transposon could be stored as a transposable element feature, attached to a RHS gene feature. The transposable element feature will have three sub features, a gene feature, tagged as a pseudo-gene and two repeat features, which repeat_type is RIME and with a given location. The second example is nested transposable elements in procaryote genomes, ie insertion of a transposable element within another one. Each transposable element can have a similar structure including the following sub features : two flanking Inverted Repeats, a gene and its promoter and/or a promoter, functional on the other strand ! So if there is no repeat feature, the flanking repeats will have to be annotated part of the transposable element feature. Let me know what you think about these. > > >>Let's leave the design as it is for now. Curators are not going to >>curate interactions data in the short term. We shall come back later >>with more precise ideas/use cases about them. >> >> > >Sounds good. Let me know if there's anything I've missed. I'll try to >generate updated SQL scripts tomorrow, and also update the schema browser >so that everyone can review the changes one last time. Cheers, > >Jonathan > > > >------------------------------------------------------------------------ > > >-Added nullable 'is_obsolete' column to DoTS.GeneSynonym >-Added and populated DoTS.ProteinPropertyType table (please correct/improve my > protein property descriptions, shown below.) I did not include a source_id column, > because that usually implies a reference to an external database (in conjunction > with an external_database_release_id to specify which database). > > 1 isoelectric point The pH at which the net charge of the entire polypeptide is zero. > 2 molecular mass The mass of the entire polypeptide. > 3 charge The net charge of the entire polypeptide. > 4 average residue mass The average mass of a single residue in the polypeptide chain. > >-Modified DoTS.ProteinProperty table to reference ProteinPropertyType > One question I have regarding these tables is how will the units be specified? > Should I make the "property_value" column a varchar2 column? It may have had > this type originally, and I might have changed it without considering the > consequences. One option would be to specify in the ProteinPropertyType table > what units are to be used, though this is clumsy if there is more than one > choice of units for a given property. > Whatever the unit they're in, they should all be numbers (some would be integer) so we can go for the "number" data type but float or varchar could also be fine! >-Created DoTS.SecondaryStructureAAFeature (instead of AASecondaryStructure) >-Created DoTS.TertiaryStructureAAFeature (instead of AATertiaryStructure) >-Created DoTS.ChromosomeElementFeature (instead of ChromosomeElement), with > a few additional columns to handle the data currently in gusdev.TelomereFeature > and gusdev.CentromereFeature >-Added "probability" column to DoTS.DomainFeature. >-Added "number_of_domains" column to DoTS.DomainFeature, so that it can be used > instead of the proposed TransmembraneDomainFeature to represent TM domains. >-Added DoTS.GenomicSequence view, with sequencing_center_contact_id instead of > the proposed free text column, "sequencing_center". >-Added sequencing_center_contact_id to DoTS.NASequenceImp to support this. >-Created DoTS.InflectionPointFeature >-Added columns to ProteinProperty to more closely reflect the original proposal > (e.g. prediction_algorithm_id, is_predicted, review_status_id, source_id) >-Modified DoTS.PostTranslationalModFeature as per Arnaud's original proposal >-Created DoTS.ReplicationFeature (should this be ReplicationOriginFeature?) > I reckon ReplicationOriginFeature would make more sense >-Added "type_of_cut" column to DoTS.RestrictionFragmentFeature >-Created DoTS.RNARegulatoryFeature (instead of RNARegulatory), but omitted the > "evidence" column; shouldn't the Evidence table be used for this purpose? >-Created DoTS.RNASecondaryStructureFeature (instead of RNASecondaryStructure) >-Created DoTS.SpliceSiteFeature >-Created DoTS.TransposableElement >-Added external_database_release_id to any view that has a source_id; these two > fields should always appear together, since by convention they are used to > specify a reference to an external database. (Admittedly this is somewhat > obscure, and we should probably think about using something more obvious.) >-Added sequence_ontology_id to AAFeatureImp and all of its views >-Added "ontology_name" column to SequenceOntology to allow us to store multiple > ontologies (na sequence + aa sequence) in the table. We *could* have used > the existing so_version column for this purpose, but I think adding an extra > column is a slightly better idea. Alternatively we could switch to using an > external_database_release_id, which I think we might have done for the GO > terms already. > > > cheers Arnaud |
From: Valerie W. <va...@sa...> - 2003-01-17 15:10:23
|
Monthly stats (GeneDB is updated weekly) ---------------------------------------- http://www.genedb.org/genedb/pombe/index.jsp Number PMID supported curations for characterized genes increased from 1827 to 1940 (coverage increased from 801 to 842 genes) S. pombe description lines increased from 16407 to 16705 (9731 types) http://www.genedb.org/genedb/Products?organism=pombe&startLetter=a&keywords=Browse S. pombe GO assignments decreased from 15107 to 15050 (due to an overhaul of the configuration files to remove false assignments) Number of manually curated S. cerevisiae (confirmed or predicted) orthologs annotated increased from 3207 to 3223 No of orphans decreased from 486 to 483 Genes with NO biological process assignment increased from 1439 to 1460 (due to an overhaul of the configuration files to remove false assignments) http://www.genedb.org/genedb/pombe/GOprocess.jsp Experimentally confirmed introns annotated increased from 748 to 775 3' UTRs annotated increased from 730 to 733 5' UTRs annotated static at 360 These features will not be available in GeneDB until a future release, but the datasets are available from the ftp site. http://www.sanger.ac.uk/Projects/S_pombe/DNA_download.shtml -- ---------------------------------------------------------------------------------- Valerie Wood Tel: 01223 494954 S. Pombe Genome Project Fax: 01223 494919 The Sanger Institute email: va...@sa... Wellcome Trust Genome Campus http://www.sanger.ac.uk/Projects/S_pombe Cambridge CB10 1SA |