From: Jonathan C. <cra...@pc...> - 2003-01-17 19:23:33
|
Arnaud- > A quick question regarding evidences, you're mentioning that the > Evidence table will connect Features and Experimental evidences. Where > will the latter be stored ? Hopefully others will chime in if I get this wrong... I believe that the relevant tables are DoTS.Comments (for free text notes/comments entered by an annotator) and SRes.BibliographicReference (for published experiments.) However, I don't think that we have a generic table to represent unpublished laboratory experiments in a structured way. Perhaps we need some use cases here? We do have your new table for representing RNAi constructs, but I don't think that we have a corresponding table to represent the actual RNAi experiment. Do we need/want such a table (either for RNAi experiments or in general) and, if so, how detailed does it need to be? > Here two examples of transposable elements annotations, one is from > Tbrucei, the other one is a common one in procaryote genomes. > > The first one in the inclusion of a INGI transposon within an ORF, the > RHS gene. The transposon includes two RIME flanking repeats and another ORF. > So in GUS, the INGI transposon could be stored as a transposable element > feature, attached to a RHS gene feature. The transposable element > feature will have three sub features, a gene feature, tagged as a > pseudo-gene and two repeat features, which repeat_type is RIME and with > a given location. So in the "current" schema (meaning that I'm assuming we have only a single repeat-related view, called RepeatRegionNAFeature, which is the NA equivalent of RepeatRegionAAFeature), the picture would look like this: <DoTS::GenomicSequence> ^ ^ ^ ^ | | | | <DoTS::GeneFeature (RHS)> | | | ^ | | | | | | | <DoTS::TransposableElement (INGI)> | | ^ ^ | | | | | | | 2 x <DoTS::RepeatRegionNAFeature (RIME)> | | | ------------------------<DoTS::GeneFeature (pseudo)> -For each feature the leftmost arrow shows the parent_id, the rightmost arrow shows the na_sequence_id. -All of the features will have a location specified in terms of the genomic sequence (because that's what their na_sequence_id references.) -I have to create 2 RepeatRegionNAFeatures under my definition, because the RIME repeats are not adjacent to one another. -Presumably the transposable element is contained in the coding region of a single exon, so the parent feature could be an ExonFeature instead of a GeneFeature. -Note that parent_id is typically used to indicate a part-whole relationship, in the sense that the part *must* have a corresponding whole (e.g. Exon to Gene). In the above picture and our discussions on this topic we've generalized its usage to also encompass the concept that one feature "happens to be" part of another i.e., that its NALocation is strictly within the bounds of its parent's NALocation, but that this need not be the case by definition. And I believe your proposal is for something that looks more like this: <DoTS::GenomicSequence> ^ ^ ^ ^ ^ | | | | | <DoTS::GeneFeature (RHS)> | | | | ^ | | | | | | | | | <DoTS::TransposableElement (INGI)> | | | ^ ^ | | | | | | | | | <DoTS::RepeatRegionNAFeature> | | | ^ | | | | | | | 2 x <DoTS::RepeatFeature (RIME)> | | | | | ------------------------<DoTS::GeneFeature (pseudo)> In other words, the RepeatRegionNAFeature serves only to group the two RIME repeats (which aren't even immediately adjacent to one another.) Is this what you had in mind? Or did you mean to make the RepeatRegionNAFeature a child of the GeneFeature and then make the TransposableElement a child of the RepeatRegionNAFeature? I'm just not clear on your definition of "repeat region". Specifically, can a repeat region contain things that are not repeats, and can it contain more than one type of repeat? And, if so, how does one assign bounds to the region in a non-arbitrary way? > The second example is nested transposable elements in procaryote > genomes, ie insertion of a transposable element within another one. Each > transposable element can have a similar structure including the > following sub features : two flanking Inverted Repeats, a gene and its > promoter and/or a promoter, functional on the other strand ! I won't try to draw the pictures for this one! In both the current schema and your proposal I think we have the problem that we haev no way of explicitly representing the relationship between the two flanking inverted repeats. Apart from that, however, I think that we can handle this case just as well as the first. You have to create quite a few features, but I don't think there's any way to avoid that unless we want to come up with some "exemplar" transposons and use them to classify the instances we encounter. The promoter/gene that's functional on the opposite strand would be represented simply as reverse-strand features (i.e., we'd set the is_reversed flag in their NALocations, but still use their parent_ids to indicate their place in the nested repeat structure.) > So if there is no repeat feature, the flanking repeats will have to be > annotated part of the transposable element feature. > Let me know what you think about these. But shouldn't they be part of the transposable element feature? I don't know the details of this specific type of transposon, but are you trying to make the distinction between: 1) the core transposon, i.e., the machinery that enables that part of the genome (encompassing both the machinery and perhaps some variable-sized flanking regions) to move around and 2) the "transposed" element, i.e. the core machinery plus whatever flanking regions happened to be carried along on the element's most recent trip (the one that brought it to its current location.)? >>-Modified DoTS.ProteinProperty table to reference ProteinPropertyType >> One question I have regarding these tables is how will the units be specified? >> Should I make the "property_value" column a varchar2 column? It may have had >> this type originally, and I might have changed it without considering the >> consequences. One option would be to specify in the ProteinPropertyType table >> what units are to be used, though this is clumsy if there is more than one >> choice of units for a given property. >> > Whatever the unit they're in, they should all be numbers (some would be > integer) so we can go for the "number" data type but float or varchar > could also be fine! Right, but the question is how does somebody querying the table know what a mass of "25" means? Are molecular masses always expressed in the same units, no matter what? My recollection is that you can sometimes have some pretty big polypeptides, but I don't know what the convention is. > I reckon ReplicationOriginFeature would make more sense OK, I'll make this change. Jonathan -- Jonathan Crabtree Center for Bioinformatics, University of Pennsylvania 1406 Blockley Hall, 423 Guardian Drive Philadelphia, PA 19104-6021 215-573-3115 |