From: Jonathan C. <cra...@sn...> - 2003-01-23 15:24:04
|
On Thu, 23 Jan 2003, Joan Mazzarelli wrote: > In its most simplest vocabulary, I thought that the review_status_id would represent. > > never reviewed = 0 > reviewed = 1 > updated thus review status becomes = 2 (needs to be re-reviewed) > Yes, that's right, although I think that Jonathan's addition ("manually created") is likely to be a useful one. Also, based on the feedback thus far, I think the consensus is to have a slightly more complex vocabulary than the (0,1,2) that we originally talked about. Here's the current proposal, based on Angel and Chris's feedback: 0 unreviewed Entry has never been manually reviewed. 1 manually created Entry was created by hand; review is not needed. 2 reviewed, correct Entry has been manually reviewed and is deemed to be correct. 3 reviewed, incorrect Entry has been manually reviewed and is deemed to be incorrect. 4 updated Entry has been updated since last being reviewed or manually created. The one thing that I don't like about this is that the names "reviewed, correct" and "reviewed, incorrect" are somewhat long. However, it will be possible to do an SQL 'like' query on the ReviewStatus table to find all of the reviewed entries (correct or incorrect.) By the way, the reason that I didn't use the original mapping of ids to terms (0 = unreviewed, 1 = reviewed, 2 = updated) is that id 1 was already in use. Also, I had originally wanted to keep the categories sorted like so: 1 manually created 2 reviewed, correct 3 reviewed, incorrect 4 updated 5 unreviewed Doing this one would be able to do range queries; all entries with review_status_id <= 2 would be manually reviewed and correct. All entries with review_status_id >= 3 would still require action of some sort. Anyway, I don't think it's worth the trouble to do this, and it also means that you potentially have to renumber the terms if and when more are added. Anyway, unless anyone has strong objections I'll probably implement the 5-term vocabulary described above sometime later today. Jonathan |
From: Arnaud K. <ax...@sa...> - 2003-01-23 16:55:00
|
Hi Sorry for the delay. We've got some troubles getting gusdev emails. I think the entry set looks fine. Two comments though: * What about an extra "automatically created" entry, along the "manually created" one ? * Curators here has raised another point : they want to be able to track when was the last time the feature has been reviewed. By reviewed I mean checked even if the review status is already set on "reviewed, correct". Is there any way of storing a "last_checked_date" ? I'm thinking of curated similarity evidences. Regularly new searches would be done and a curator would want to check that any new hit would confirm or cancel a prediction. Arnaud Marie-Adele Rajandream wrote: >-----Original Message----- >From: gus...@li... >[mailto:gus...@li...]On Behalf Of Jonathan >Crabtree >Sent: 23 January 2003 15:23 >To: gus...@li... >Cc: Joan Mazzarelli >Subject: Re: [Gusdev-gusdev] SRes.ReviewStatus > > > >On Thu, 23 Jan 2003, Joan Mazzarelli wrote: > > > >>In its most simplest vocabulary, I thought that the review_status_id would >> >> >represent. > > >>never reviewed = 0 >>reviewed = 1 >>updated thus review status becomes = 2 (needs to be re-reviewed) >> >> >> > >Yes, that's right, although I think that Jonathan's addition ("manually >created") is likely to be a useful one. Also, based on the feedback thus >far, I think the consensus is to have a slightly more complex vocabulary >than the (0,1,2) that we originally talked about. Here's the current >proposal, based on Angel and Chris's feedback: > >0 unreviewed Entry has never been manually reviewed. >1 manually created Entry was created by hand; review is not needed. >2 reviewed, correct Entry has been manually reviewed and is deemed to be >correct. >3 reviewed, incorrect Entry has been manually reviewed and is deemed to be >incorrect. >4 updated Entry has been updated since last being reviewed or >manually created. > >The one thing that I don't like about this is that the names "reviewed, >correct" and "reviewed, incorrect" are somewhat long. However, it will be >possible to do an SQL 'like' query on the ReviewStatus table to find all >of the reviewed entries (correct or incorrect.) By the way, the reason >that I didn't use the original mapping of ids to terms (0 = unreviewed, 1 >= reviewed, 2 = updated) is that id 1 was already in use. Also, I had >originally wanted to keep the categories sorted like so: > >1 manually created >2 reviewed, correct >3 reviewed, incorrect >4 updated >5 unreviewed > >Doing this one would be able to do range queries; all entries with >review_status_id <= 2 would be manually reviewed and correct. All entries >with review_status_id >= 3 would still require action of some sort. >Anyway, I don't think it's worth the trouble to do this, and it also means >that you potentially have to renumber the terms if and when more are >added. Anyway, unless anyone has strong objections I'll probably >implement the 5-term vocabulary described above sometime later today. > >Jonathan > > > > |
From: Jonathan C. <cra...@pc...> - 2003-01-23 18:13:49
|
Arnaud- > * What about an extra "automatically created" entry, along the "manually > created" one ? We've decided to drop the "manually created" term, and while it would be useful to know which entries were created automatically vs. manually, this information can be tracked using a combination of the Evidence and/or AlgorithmInvocation tables. If you think it's crucial to record how an entry was *originally* created, then we should consider adding an extra column in the relevant tables to record this (and also to enable fast queries to retrieve entries based on their origin.) > * Curators here has raised another point : they want to be able to track > when was the last time the feature has been reviewed. By reviewed I mean > checked even if the review status is already set on "reviewed, correct". > Is there any way of storing a "last_checked_date" ? > I'm thinking of curated similarity evidences. Regularly new searches > would be done and a curator would want to check that any new hit would > confirm or cancel a prediction. I think that we're currently working under the assumption that we'll use the modification_date for this (which may or may not meet your needs). When an annotator first reviews an unreviewed entry, its status is set to either "manually reviewed, correct" or "manually reviewed, incorrect" and its modification_date is updated (and the old row versioned.) The modification_date now records the date of last review. Now, when something happens that might change the status of the entry (e.g., new searches are performed), its status gets changed to "updated" and its modification_date is updated. At this point the only way to tell the date of last manual review is to look in the version table. (Although one does know that the last manual review must have been before the stated modification_date.) When the entry comes up for review again (since it's now marked "updated"), its status is changed once more, and it's modification_date will once again reflect the time of the most recent manual review. So the reason (in our current system) that we don't independently store the last_checked_date is because we don't care when the entry was last checked in absolute terms; we only care whether anything has changed since then. One problem with this (versus just storing a last_checked_date) is that it means that any program that makes changes (e.g., the one that runs the similarity searches) must determine what entries in the database *may* have been affected, and update their review_status. I think I agree that last_manual_review_date would be a useful thing to have, but I think that its addition will have to be deferred until after I've finished working on the migration, because it will affect several tables. I'll put it on my list of things to deal with after the migration is done. Jonathan |
From: Arnaud K. <ax...@sa...> - 2003-01-23 22:45:22
|
Hi Quoting Jonathan Crabtree <cra...@pc...>: > > Arnaud- > > > * What about an extra "automatically created" entry, along the "manually > > created" one ? > > We've decided to drop the "manually created" term, and while it would be > useful to know which entries were created automatically vs. manually, > this information can be tracked using a combination of the Evidence and/or > AlgorithmInvocation tables. If you think it's crucial to record how an > entry was *originally* created, then we should consider adding an extra > column in the relevant tables to record this (and also to enable fast > queries to retrieve entries based on their origin.) > > > * Curators here has raised another point : they want to be able to track > > when was the last time the feature has been reviewed. By reviewed I mean > > checked even if the review status is already set on "reviewed, correct". > > Is there any way of storing a "last_checked_date" ? > > I'm thinking of curated similarity evidences. Regularly new searches > > would be done and a curator would want to check that any new hit would > > confirm or cancel a prediction. > > I think that we're currently working under the assumption that we'll use > the > modification_date for this (which may or may not meet your needs). When an > annotator first reviews an unreviewed entry, its status is set to either > "manually reviewed, correct" or "manually reviewed, incorrect" and its > modification_date is updated (and the old row versioned.) The > modification_date > now records the date of last review. Now, when something happens that > might > change the status of the entry (e.g., new searches are performed), its > status > gets changed to "updated" and its modification_date is updated. At this > point > the only way to tell the date of last manual review is to look in the > version > table. (Although one does know that the last manual review must have been > before the stated modification_date.) When the entry comes up for review > again (since it's now marked "updated"), its status is changed once more, > and > it's modification_date will once again reflect the time of the most recent > manual review. > > So the reason (in our current system) that we don't independently store the > last_checked_date is because we don't care when the entry was last checked > in absolute terms; we only care whether anything has changed since then. > One > problem with this (versus just storing a last_checked_date) is that it > means > that any program that makes changes (e.g., the one that runs the similarity > searches) must determine what entries in the database *may* have been > affected, > and update their review_status. > I think it would be an interesting functionality. A way for the annotators/curators to be informed than a run may affect some entries in the database and that they should be reviewed. It sounds like some sort of "triggers" with a set of rules that specify, for a given run, which entries may be affected. But it's probably not a simple task to implement! > I think I agree that last_manual_review_date would be a useful thing to > have, > but I think that its addition will have to be deferred until after I've > finished working on the migration, because it will affect several tables. > I'll put it on my list of things to deal with after the migration is done. > fine > Jonathan > Arnaud |
From: Joan M. <ma...@pc...> - 2003-01-23 15:53:04
|
Hi Jonathan, I think we need to discuss this a bit more in terms of how this will impact other code we have which recognizes this assignment, and it is not clear what type of entry has this set in GUS30 now (1 manually created Entry was created by hand; review is not needed.) for it to be taken. Joan Jonathan Crabtree wrote: > On Thu, 23 Jan 2003, Joan Mazzarelli wrote: > > > In its most simplest vocabulary, I thought that the review_status_id would represent. > > > > never reviewed = 0 > > reviewed = 1 > > updated thus review status becomes = 2 (needs to be re-reviewed) > > > > Yes, that's right, although I think that Jonathan's addition ("manually > created") is likely to be a useful one. Also, based on the feedback thus > far, I think the consensus is to have a slightly more complex vocabulary > than the (0,1,2) that we originally talked about. Here's the current > proposal, based on Angel and Chris's feedback: > > 0 unreviewed Entry has never been manually reviewed. > 1 manually created Entry was created by hand; review is not needed. > 2 reviewed, correct Entry has been manually reviewed and is deemed to be correct. > 3 reviewed, incorrect Entry has been manually reviewed and is deemed to be incorrect. > 4 updated Entry has been updated since last being reviewed or manually created. > > The one thing that I don't like about this is that the names "reviewed, > correct" and "reviewed, incorrect" are somewhat long. However, it will be > possible to do an SQL 'like' query on the ReviewStatus table to find all > of the reviewed entries (correct or incorrect.) By the way, the reason > that I didn't use the original mapping of ids to terms (0 = unreviewed, 1 > = reviewed, 2 = updated) is that id 1 was already in use. Also, I had > originally wanted to keep the categories sorted like so: > > 1 manually created > 2 reviewed, correct > 3 reviewed, incorrect > 4 updated > 5 unreviewed > > Doing this one would be able to do range queries; all entries with > review_status_id <= 2 would be manually reviewed and correct. All entries > with review_status_id >= 3 would still require action of some sort. > Anyway, I don't think it's worth the trouble to do this, and it also means > that you potentially have to renumber the terms if and when more are > added. Anyway, unless anyone has strong objections I'll probably > implement the 5-term vocabulary described above sometime later today. > > Jonathan > > ------------------------------------------------------- > This SF.NET email is sponsored by: > SourceForge Enterprise Edition + IBM + LinuxWorld = Something 2 See! > http://www.vasoftware.com > _______________________________________________ > Gusdev-gusdev mailing list > Gus...@li... > https://lists.sourceforge.net/lists/listinfo/gusdev-gusdev -- Joan Mazzarelli Computational Biology and Informatics Laboratory Center for Bioinformatics 1429 Blockley Hall University of Pennsylvania Philadelphia, PA 19104 |
From: Jonathan C. <cra...@pc...> - 2003-01-23 16:15:19
|
Joan- Joan Mazzarelli wrote: > Hi Jonathan, > > I think we need to discuss this a bit more in terms of how this will impact other code we have > which recognizes this assignment, and it is not clear what type of entry has this set in GUS30 > now (1 manually created Entry was created by hand; review is not needed.) for it to be taken. > OK, although I'd like to get something settled soon, because I'm on a very tight schedule. Can you be more specific about the adverse impact you think this change will have on our existing code? In particular, why does having 5 terms in the controlled vocabulary make things any more difficult than having 3? It seems that in either case code that previously relied on a single bit (manually_reviewed) will now have to query a controlled vocabulary instead. Your second point is a good one, which is that we may not have an easy way to determine which entries currently in gusdev should be assigned the ReviewStatus "manually created." I'm certainly open to dropping this term if people don't think it will be useful. Jonathan, can you tell us how you're using this term now? In the short term (i.e. later today), however, I would propose the following mapping (for entries currently in gusdev with a non-null manually_reviewed column): manually_reviewed -> ReviewStatus 0 unreviewed 1 reviewed, correct I believe this correctly represents our existing semantics, *except* for entries that have been manually created (which are simply marked as manually_reviewed = 1). Doing this we don't lose any information and could go back later and attempt to identify entries that had been manually created. Or we could later decide that we want to drop the "manually created" term completely, and could do so without affecting any of the migrated gusdev data. Jonathan |
From: Joan M. <ma...@pc...> - 2003-01-23 16:57:59
|
Dear Jonathan, (OK, although I'd like to get something settled soon, because I'm on a very tight schedule. Can you be more specific about the adverse impact you think this change will have on our existing code? In particular, why does having 5 terms in the controlled vocabulary make things any more difficult than having 3? It seems that in either case code that previously relied on a single bit (manually_reviewed) will now have to query a controlled vocabulary instead._ I am not against having a more extensive controlled vocabulary. (and by the way I am pretty busy myself, and this is not how I planned to spend my morning). (manually_reviewed -> ReviewStatus 0 unreviewed 1 reviewed, correct I believe this correctly represents our existing semantics, *except* for entries that have been manually created (which are simply marked as manually_reviewed = 1). Doing this we don't lose any information and could go back later and attempt to identify entries that had been manually created. Or we could later decide that we want to drop the "manually created" term completely, and could do so without affecting any of the migrated gusdev data.) The above makes the most sense to me for now. Also, as defined (1 manually created Entry was created by hand; review is not needed) this does not make sense to me, why set a review status id? Joan Jonathan Crabtree wrote: > Joan- > > Joan Mazzarelli wrote: > > Hi Jonathan, > > > > I think we need to discuss this a bit more in terms of how this will impact other code we have > > which recognizes this assignment, and it is not clear what type of entry has this set in GUS30 > > now (1 manually created Entry was created by hand; review is not needed.) for it to be taken. > > > > OK, although I'd like to get something settled soon, because I'm on a very > tight schedule. Can you be more specific about the adverse impact you think > this change will have on our existing code? In particular, why does having > 5 terms in the controlled vocabulary make things any more difficult than > having 3? It seems that in either case code that previously relied on a > single bit (manually_reviewed) will now have to query a controlled vocabulary > instead. > > Your second point is a good one, which is that we may not have an easy way to > determine which entries currently in gusdev should be assigned the ReviewStatus > "manually created." I'm certainly open to dropping this term if people don't > think it will be useful. Jonathan, can you tell us how you're using this term > now? In the short term (i.e. later today), however, I would propose the > following mapping (for entries currently in gusdev with a non-null > manually_reviewed column): > > manually_reviewed -> ReviewStatus > 0 unreviewed > 1 reviewed, correct > > I believe this correctly represents our existing semantics, *except* for entries > that have been manually created (which are simply marked as manually_reviewed = 1). > Doing this we don't lose any information and could go back later and attempt to > identify entries that had been manually created. Or we could later decide that > we want to drop the "manually created" term completely, and could do so without > affecting any of the migrated gusdev data. > > Jonathan > > ------------------------------------------------------- > This SF.NET email is sponsored by: > SourceForge Enterprise Edition + IBM + LinuxWorld = Something 2 See! > http://www.vasoftware.com > _______________________________________________ > Gusdev-gusdev mailing list > Gus...@li... > https://lists.sourceforge.net/lists/listinfo/gusdev-gusdev -- Joan Mazzarelli Computational Biology and Informatics Laboratory Center for Bioinformatics 1429 Blockley Hall University of Pennsylvania Philadelphia, PA 19104 |
From: Deborah F. P. <pi...@sn...> - 2003-01-23 16:35:25
|
On Thu, 23 Jan 2003, Joan Mazzarelli wrote: IMHO: "manually created" doesn't seem as if it is a review status and this situation could be covered with "manually reviewed correct" with evidence being that it was manually created. I have to agree with Joan, that it may be safer to stick as closely to the existing manually_reviewed values as possible, 0=unreviewed and 1=manually reviewed correct and add 2=manually reviewed incorrect as well as 4=updated. Debbie > Hi Jonathan, > > I think we need to discuss this a bit more in terms of how this will impact other code we have > which recognizes this assignment, and it is not clear what type of entry has this set in GUS30 > now (1 manually created Entry was created by hand; review is not needed.) for it to be taken. > > Joan > > > > Jonathan Crabtree wrote: > > > On Thu, 23 Jan 2003, Joan Mazzarelli wrote: > > > > > In its most simplest vocabulary, I thought that the review_status_id would represent. > > > > > > never reviewed = 0 > > > reviewed = 1 > > > updated thus review status becomes = 2 (needs to be re-reviewed) > > > > > > > Yes, that's right, although I think that Jonathan's addition ("manually > > created") is likely to be a useful one. Also, based on the feedback thus > > far, I think the consensus is to have a slightly more complex vocabulary > > than the (0,1,2) that we originally talked about. Here's the current > > proposal, based on Angel and Chris's feedback: > > > > 0 unreviewed Entry has never been manually reviewed. > > 1 manually created Entry was created by hand; review is not needed. > > 2 reviewed, correct Entry has been manually reviewed and is deemed to be correct. > > 3 reviewed, incorrect Entry has been manually reviewed and is deemed to be incorrect. > > 4 updated Entry has been updated since last being reviewed or manually created. > > > > The one thing that I don't like about this is that the names "reviewed, > > correct" and "reviewed, incorrect" are somewhat long. However, it will be > > possible to do an SQL 'like' query on the ReviewStatus table to find all > > of the reviewed entries (correct or incorrect.) By the way, the reason > > that I didn't use the original mapping of ids to terms (0 = unreviewed, 1 > > = reviewed, 2 = updated) is that id 1 was already in use. Also, I had > > originally wanted to keep the categories sorted like so: > > > > 1 manually created > > 2 reviewed, correct > > 3 reviewed, incorrect > > 4 updated > > 5 unreviewed > > > > Doing this one would be able to do range queries; all entries with > > review_status_id <= 2 would be manually reviewed and correct. All entries > > with review_status_id >= 3 would still require action of some sort. > > Anyway, I don't think it's worth the trouble to do this, and it also means > > that you potentially have to renumber the terms if and when more are > > added. Anyway, unless anyone has strong objections I'll probably > > implement the 5-term vocabulary described above sometime later today. > > > > Jonathan > > > > ------------------------------------------------------- > > This SF.NET email is sponsored by: > > SourceForge Enterprise Edition + IBM + LinuxWorld = Something 2 See! > > http://www.vasoftware.com > > _______________________________________________ > > Gusdev-gusdev mailing list > > Gus...@li... > > https://lists.sourceforge.net/lists/listinfo/gusdev-gusdev > > -- > Joan Mazzarelli > Computational Biology and Informatics Laboratory > Center for Bioinformatics > 1429 Blockley Hall > University of Pennsylvania > Philadelphia, PA 19104 > > > > > ------------------------------------------------------- > This SF.NET email is sponsored by: > SourceForge Enterprise Edition + IBM + LinuxWorld = Something 2 See! > http://www.vasoftware.com > _______________________________________________ > Gusdev-gusdev mailing list > Gus...@li... > https://lists.sourceforge.net/lists/listinfo/gusdev-gusdev > |
From: Jonathan C. <cra...@pc...> - 2003-01-23 17:55:26
|
Debbie- > "manually created" doesn't seem as if it is a review status and this > situation could be covered with "manually reviewed correct" with evidence > being that it was manually created. It is a review status if you want to differentiate between "implicitly reviewed" (i.e., the annotator created it, and he/she would not have done so if he/she did not believe it to be correct) and "explicitly reviewed" (i.e., the entry, which already exists in the database, was retrieved and then examined to determine whether it's correct.) However, it's not clear that this is a distinction (between two different kinds of "reviewed, correct") that we should be making in ReviewStatus. There are at least two separate questions here: 1. Do we want to track which entries in the database were created manually, versus those that were created automatically and then approved by an annotator? I think that we're all in agreement that the answer to this question is a resounding "yes". Given that, the second question is: 2. Where should this information be stored? As you point out, we could record this information using the Evidence table. And, as I mentioned in a previous e-mail, we *have* to do it this way unless we change our ReviewStatus vocabulary so that each and every term in the vocabulary records whether the entry was originally created manually or automatically (so that we can track its original status through one or more rounds of update/re-review.) I don't think that this is a good idea, and after talking to Jonathan about it I think we're in agreement that we should drop the term for "manually created." We also have to bear in mind that our current notion of ReviewStatus is something that's fairly closely tied to the annotation process that we use in DoTS. There's nothing wrong with that, but it's quite possible that other sites will have different ideas about how ReviewStatus should be used. So at some point we should revisit this, but as long as the revised set of terms (see below) is agreeable to everyone on the mailing list, I think that we should stick with it for the time being. > I have to agree with Joan, that it may be safer to stick as closely to the > existing manually_reviewed values as possible, 0=unreviewed and 1=manually > reviewed correct and add 2=manually reviewed incorrect as well as > 4=updated. Can you be more specific about why changing the actual ids would be unsafe? (I hope you're not threatening me :)) I trust that you're not planning to rely on having hard-coded review_status_ids in your GUS 3.0 programs and queries, right? I myself have plenty of GUS 2.x scripts and queries that contain hard-coded internal identifiers (e.g., sequence_type_ids and external_db_ids, to name two of the most frequently-used ones.) However, when I convert these scripts to GUS 3.0 I'm going to have to rewrite them to be portable, meaning that I can't assume that other copies of GUS (perhaps running at other sites) will have the same internal ids. Unless we're willing to take these ids and publish them (as, for example, the GO consortium has done with their GO IDs), we can't rely on their being constant across different copies of GUS; it's just not good programming practice. Jonathan |
From: Jonathan C. <cra...@pc...> - 2003-01-23 18:06:14
|
I forgot to include the latest proposal in the previous e-mail; it's the same as before, but with "manually created" removed: 0 unreviewed Entry has never been manually reviewed. 1 reviewed, correct Entry has been manually reviewed and is deemed to be correct. 2 reviewed, incorrect Entry has been manually reviewed and is deemed to be incorrect. 3 updated Entry has been updated since last being manually reviewed. Manually-created entries will initially be assigned "reviewed, correct". Jonathan |
From: Joan M. <ma...@pc...> - 2003-01-23 19:34:56
|
Jonathan, For now, I think this is the best proposal. Joan Jonathan Crabtree wrote: > I forgot to include the latest proposal in the previous e-mail; it's the > same as before, but with "manually created" removed: > > 0 unreviewed Entry has never been manually reviewed. > 1 reviewed, correct Entry has been manually reviewed and is deemed to be correct. > 2 reviewed, incorrect Entry has been manually reviewed and is deemed to be incorrect. > 3 updated Entry has been updated since last being manually reviewed. > > Manually-created entries will initially be assigned "reviewed, correct". > > Jonathan > > ------------------------------------------------------- > This SF.NET email is sponsored by: > SourceForge Enterprise Edition + IBM + LinuxWorld = Something 2 See! > http://www.vasoftware.com > _______________________________________________ > Gusdev-gusdev mailing list > Gus...@li... > https://lists.sourceforge.net/lists/listinfo/gusdev-gusdev -- Joan Mazzarelli Computational Biology and Informatics Laboratory Center for Bioinformatics 1429 Blockley Hall University of Pennsylvania Philadelphia, PA 19104 |