Hi Jonathan
Sorry for the delay to come back to you with some thoughts on
attribution data.
Here a case of what could happen on a given project:
* The sequences would come from TIGR,
* The gene models would come from SBRI,
* The manual annotation of the gene models and the GO curation would be
done by TIGR,
* The curation would be done by the Sanger,
* Some curated comments would be sent by members of the community.
Instead of using the evidence table, would it be possible to attribute
data by using the user_id attribute ?
e.g. if the gene models are coming from SBRI, the user_id would
acknowledge the gene features as owned by SBRI. Any update would keep
the ownership and would acknowledge who's done the update.
The other point was the attribution of data coming from publication or
personal communication. I had a look at flybase. Flybase considers
personal communication as references. To differentiate them, they have
an extra attribute in the reference table to allow the classification of
the different references.
For more information about the refernce class controlled vocabulary, see
http://flybase.bio.indiana.edu/.data/docs/refman/refman-B.html#B.13.2.
cheers
Arnaud
------------------------------
Item 3: Attribution of data from multiple sources. Three methods are
available in GUS3.0 to attach information to tables. Evidence which
allows attributions to be linked to any row. NAComment which allows
multiple attachment of comments to a sequence. Comment which is
attached to a review_status_id; each NAFeature has a review_status_id.
Use cases are needed to determine if any of these mechanisms are
appropriate.
see addendum from Jonathan Crabtree below.
Addendum to item 3 from Jonathan.
I spent a little time looking into this and the number of methods
differs
depending on how you count them (and also because in most situations the
number of alternatives differs depending on which table you're
commenting
on.) But here are the ways we currently support in GUS 3.0 for adding
comments to things (external to the tables themselves):
1. DoTS.Comments (not "Comment") + DoTS.Evidence
I list these together because the Comments table relies on the
Evidence
table to link its rows to other objects in the db.
This method can be used with any table and supports CLOB comments.
2. DoTS.AAComment + DoTS.CommentName
Can be used only with AASequence entries and supports VARCHAR2(4000).
3. DoTS.NAComment
Can be used only with NASequence entries and supports VARCHAR2(4000).
(Does *not* have a link to DoTS.CommentName)
4. DoTS.Note
Can be used only with NAFeature entries and supports VARCHAR2(4000)
(Note that this is different from gusdev.Note, which has a
VARCHAR(255)
AND a CLOB column.)
Note that DoTS.Comments is the only generic option (that I found) for
associating notes/comments with rows. Note also that AASequence,
NASequence, and NAFeature all have their own specialized comment tables,
but AAFeature doesn't appear to (at least not one with "comment" in its
name!) Conceptually speaking I'm also not sure that I agree with the
use
of the "Evidence" table to link comments to rows in general. For
example,
during the conference call I gave the example of a note in PlasmoDB that
basically says "the second exon of this predicted gene is incorrect";
this
would actually be evidence *against* the GeneFeature, not *for* it (the
typical use of the Evidence table.) Likewise, one could merely be
commenting on an aspect of a predicted feature, without actually
providing
any further evidence for its existence or correctness. In other words,
an implicit statement of the form "if this thing exists, then it's
interesting that such and such would be true...".
Another thing to point out is that none of these tables (as far as I can
remember), has a pointer to SRES.Contact, so they don't really address
the
question of attribution. In PlasmoDB right now we handle attribution
mainly through creative use of the ExternalDatabase table
(external_db_id
in the current GUSdev). In GUS 3.0 I believe that external database
releases will be linked to Contacts, so perhaps the thing to do is to
allow a single entry in the database to be associated with multiple
external databases? This gets slightly messy if you want to be able to
attribute something to a personal communication with somebody, or to a
journal article (neither of which is expressed particularly well as an
"external database".) Although both might be nicely represented as
References, perhaps? There are enough possibilities that maybe we
should
just find out exactly what the PSU folks have in mind, and tailor a
solution that works for them (using the existing schema as much as
possible.)
|