From: Arnaud K. <ax...@sa...> - 2002-11-12 15:29:58
|
Hi Jonathan Sorry for the delay to come back to you with some thoughts on attribution data. Here a case of what could happen on a given project: * The sequences would come from TIGR, * The gene models would come from SBRI, * The manual annotation of the gene models and the GO curation would be done by TIGR, * The curation would be done by the Sanger, * Some curated comments would be sent by members of the community. Instead of using the evidence table, would it be possible to attribute data by using the user_id attribute ? e.g. if the gene models are coming from SBRI, the user_id would acknowledge the gene features as owned by SBRI. Any update would keep the ownership and would acknowledge who's done the update. The other point was the attribution of data coming from publication or personal communication. I had a look at flybase. Flybase considers personal communication as references. To differentiate them, they have an extra attribute in the reference table to allow the classification of the different references. For more information about the refernce class controlled vocabulary, see http://flybase.bio.indiana.edu/.data/docs/refman/refman-B.html#B.13.2. cheers Arnaud ------------------------------ Item 3: Attribution of data from multiple sources. Three methods are available in GUS3.0 to attach information to tables. Evidence which allows attributions to be linked to any row. NAComment which allows multiple attachment of comments to a sequence. Comment which is attached to a review_status_id; each NAFeature has a review_status_id. Use cases are needed to determine if any of these mechanisms are appropriate. see addendum from Jonathan Crabtree below. Addendum to item 3 from Jonathan. I spent a little time looking into this and the number of methods differs depending on how you count them (and also because in most situations the number of alternatives differs depending on which table you're commenting on.) But here are the ways we currently support in GUS 3.0 for adding comments to things (external to the tables themselves): 1. DoTS.Comments (not "Comment") + DoTS.Evidence I list these together because the Comments table relies on the Evidence table to link its rows to other objects in the db. This method can be used with any table and supports CLOB comments. 2. DoTS.AAComment + DoTS.CommentName Can be used only with AASequence entries and supports VARCHAR2(4000). 3. DoTS.NAComment Can be used only with NASequence entries and supports VARCHAR2(4000). (Does *not* have a link to DoTS.CommentName) 4. DoTS.Note Can be used only with NAFeature entries and supports VARCHAR2(4000) (Note that this is different from gusdev.Note, which has a VARCHAR(255) AND a CLOB column.) Note that DoTS.Comments is the only generic option (that I found) for associating notes/comments with rows. Note also that AASequence, NASequence, and NAFeature all have their own specialized comment tables, but AAFeature doesn't appear to (at least not one with "comment" in its name!) Conceptually speaking I'm also not sure that I agree with the use of the "Evidence" table to link comments to rows in general. For example, during the conference call I gave the example of a note in PlasmoDB that basically says "the second exon of this predicted gene is incorrect"; this would actually be evidence *against* the GeneFeature, not *for* it (the typical use of the Evidence table.) Likewise, one could merely be commenting on an aspect of a predicted feature, without actually providing any further evidence for its existence or correctness. In other words, an implicit statement of the form "if this thing exists, then it's interesting that such and such would be true...". Another thing to point out is that none of these tables (as far as I can remember), has a pointer to SRES.Contact, so they don't really address the question of attribution. In PlasmoDB right now we handle attribution mainly through creative use of the ExternalDatabase table (external_db_id in the current GUSdev). In GUS 3.0 I believe that external database releases will be linked to Contacts, so perhaps the thing to do is to allow a single entry in the database to be associated with multiple external databases? This gets slightly messy if you want to be able to attribute something to a personal communication with somebody, or to a journal article (neither of which is expressed particularly well as an "external database".) Although both might be nicely represented as References, perhaps? There are enough possibilities that maybe we should just find out exactly what the PSU folks have in mind, and tailor a solution that works for them (using the existing schema as much as possible.) |