From: Arnaud K. <ax...@sa...> - 2003-01-17 23:34:40
|
Quoting Jonathan Crabtree <cra...@pc...>: > > Arnaud- > > > Here two examples of transposable elements annotations, one is from > > Tbrucei, the other one is a common one in procaryote genomes. > > > > The first one in the inclusion of a INGI transposon within an ORF, the > > RHS gene. The transposon includes two RIME flanking repeats and another > ORF. > > So in GUS, the INGI transposon could be stored as a transposable element > > feature, attached to a RHS gene feature. The transposable element > > feature will have three sub features, a gene feature, tagged as a > > pseudo-gene and two repeat features, which repeat_type is RIME and with > > a given location. > > So in the "current" schema (meaning that I'm assuming we have only a single > repeat-related view, called RepeatRegionNAFeature, which is the NA > equivalent > of RepeatRegionAAFeature), the picture would look like this: > > <DoTS::GenomicSequence> > ^ ^ ^ ^ > | | | | > <DoTS::GeneFeature (RHS)> | | | > ^ | | | > | | | | > <DoTS::TransposableElement (INGI)> | | > ^ ^ | | > | | | | > | 2 x <DoTS::RepeatRegionNAFeature (RIME)> | > | | > ------------------------<DoTS::GeneFeature (pseudo)> > > -For each feature the leftmost arrow shows the parent_id, the rightmost > arrow shows the na_sequence_id. > -All of the features will have a location specified in terms of the > genomic sequence (because that's what their na_sequence_id references.) > -I have to create 2 RepeatRegionNAFeatures under my definition, because > the RIME repeats are not adjacent to one another. > -Presumably the transposable element is contained in the coding region > of a single exon, so the parent feature could be an ExonFeature instead > of a GeneFeature. > -Note that parent_id is typically used to indicate a part-whole > relationship, in the sense that the part *must* have a corresponding > whole (e.g. Exon to Gene). In the above picture and our discussions > on this topic we've generalized its usage to also encompass the > concept that one feature "happens to be" part of another i.e., > that its NALocation is strictly within the bounds of its parent's > NALocation, but that this need not be the case by definition. > > And I believe your proposal is for something that looks more like this: > > <DoTS::GenomicSequence> > ^ ^ ^ ^ ^ > | | | | | > <DoTS::GeneFeature (RHS)> | | | | > ^ | | | | > | | | | | > <DoTS::TransposableElement (INGI)> | | | > ^ ^ | | | > | | | | | > | <DoTS::RepeatRegionNAFeature> | | > | ^ | | > | | | | > | 2 x <DoTS::RepeatFeature (RIME)> | > | | > | | > ------------------------<DoTS::GeneFeature (pseudo)> > My proposal is this representation without the repeat region feature. I would see the repeat region feature to cluster together a sequence, whatever the sequence is (even one base, or more), repeated X times, but not being used in this situation. > In other words, the RepeatRegionNAFeature serves only to group the two RIME > repeats (which aren't even immediately adjacent to one another.) Is this > what you had in mind? I don't think we need to group them with a repeat region feature, as the transposable element would do it. Or did you mean to make the RepeatRegionNAFeature a > child of the GeneFeature and then make the TransposableElement a child of > the RepeatRegionNAFeature? I'm just not clear on your definition of > "repeat > region". Specifically, can a repeat region contain things that are not > repeats, Yes ! a gene for example !! A repeat region would be used to cluster tandemly repeated genes. But this should be fine as long as a gene feature can be attached to a repeat region. and can it contain more than one type of repeat? I think we agree on only one type of repeat unit and if it has more, we would nest the repeat region features. We din't come here with a repeat region made of interlaced repeat units which would require to make the schema more generic. And, if so, how > does one assign bounds to the region in a non-arbitrary way? > > > The second example is nested transposable elements in procaryote > > genomes, ie insertion of a transposable element within another one. Each > > transposable element can have a similar structure including the > > following sub features : two flanking Inverted Repeats, a gene and its > > promoter and/or a promoter, functional on the other strand ! > > I won't try to draw the pictures for this one! In both the current schema > and your proposal I think we have the problem that we haev no way of > explicitly representing the relationship between the two flanking inverted > repeats. But we don't need to !? Apart from that, however, I think that we can handle this case > just as well as the first. You have to create quite a few features, but > I don't think there's any way to avoid that unless we want to come up with > some "exemplar" transposons and use them to classify the instances we > encounter. The promoter/gene that's functional on the opposite strand > would be represented simply as reverse-strand features (i.e., we'd set > the is_reversed flag in their NALocations, but still use their parent_ids > to indicate their place in the nested repeat structure.) > > > So if there is no repeat feature, the flanking repeats will have to be > > annotated part of the transposable element feature. > > Let me know what you think about these. > > But shouldn't they be part of the transposable element feature? I don't > know the details of this specific type of transposon, but are you trying > to make the distinction between: 1) the core transposon, i.e., the > machinery > that enables that part of the genome (encompassing both the machinery and > perhaps some variable-sized flanking regions) to move around and 2) the > "transposed" element, i.e. the core machinery plus whatever flanking > regions happened to be carried along on the element's most recent trip > (the one that brought it to its current location.)? > I think we want to represent a transposable element in a given context, ie at a given location because this insertion may have consequences, (in)activating a gene or shifting the frame of a gene etc. A core transposon should be represented as an entity on its own like genes are. > > Jonathan > > -- > Jonathan Crabtree > Center for Bioinformatics, University of Pennsylvania > 1406 Blockley Hall, 423 Guardian Drive Philadelphia, PA 19104-6021 > 215-573-3115 > > Arnaud |