RE: [Gusdev-gusdev] DNA, RNA and Protein GUS Features + PeptidePropertyType Table

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Hi Jessica and Chris,

With regards to some of the specific points you raised:

1. Homologous chromosomes:
Yes, we are trying to tackle this one; T. brucei and L. major are also
diploid and even possibly trisomic for some of the chromosomes. Arnaud is
already thinking about how to represent the potentially large
insertion/deletions between homologues of a given chromosome. It is in the
functional specifications for GeneDB, both to be represented at the sequence
level as well as graphically. However, because of the varied sequencing
approaches [i.e. for T. brucei, Sanger used PFGE separated homologues (where
possible) whereas TIGR doesn't map BACs to the homologues], we thought that
storing these kinds of data was quite a way off and thus concentrated
efforts on extending the schema to store other data first. Initially not
being able to assign sequences to particular homologues will also hold true
for  T. cruzi (with an as yet undefined karyotype), using a whole genome
shotgun approach.

2. Polycistronic transcription
As far as I am aware for T. brucei and L. major, trans-splicing and
polyadenylation are co-transcriptional. Occasionally, transcripts with one
spliced leader sequence, two CDSs for e.g. and a polyA tail are observed
when amplified by PCR from cDNA. However, it is apparently an error in
processing, as these transcripts are unlikely to be functional and thus are
probably degraded. As a consequence, we didn't think it necessary to
represent these. Also, I think that at this stage pol II promoters for
protein-coding genes are poorly characterised (obviously, that will change)
and can't as yet be assigned to particular transcription units and it is
clear that adjacent genes within the same transcription unit are regulated
independently both in terms of differing localisation and expression levels
(e.g. the phosphoglycerate kinase cluster in T. brucei). Is this different
in T. cruzi? How can you at this stage assign genes to a given transcript?
However, we have been thinking of this in the "bacterial sense". The first
bacterium is now in the development version of GeneDB and as a consequence,
we would like GUS to be able to cope with operons. Again, Arnaud is thinking
about this.

3. Spliced leader:
The spliced leader is the same for all transcripts in T. brucei and L.
major. As a consequence (after very long discussions) we decided not to
attempt to represent this. Also we understood it to be a problem attaching a
transcript to two genes (which is effectively what you'd want i.e. the gene
of interest + the sequence encoding the SL). What Arnaud proposed was to
annotate the transpliced transcript with an additional note/qualifier about
the SL. Are there different SL sequences in T. cruzi? Also, the SL sequences
are transcribed from long arrays which are difficult to resolve in
sequencing. So, it would have to be annotated to the array rather than
individual genes.

4. Mitochondrial DNA
We also thought about this. I am not sure to what extent the minicircles
have been (and will be) sequenced, there are 1000s of them. For maxicircle
encoded genes, Arnaud is proposing to use a unique RNAFeature object for
both edited/unedited transcripts and the distinction between the two
transcripts would be made using Sequence Ontology. The editing process would
be annotated by using a SeqVariation object.
As far as gRNA positions and sequences were concerned, we were thinking of
linking to comprehensive databases such as
http://www.rna.ucla.edu/trypanosome/database.html or
http://www.ebi.ac.uk/parasites/kDNA/Source.html. However, it would be great
if it were possible to store all this info in GUS.

Cheers,

Christiane and Arnaud

--
Dr Christiane Hertz-Fowler
GeneDB Curator (T. brucei)
Pathogen Sequencing Unit
The Wellcome Trust Sanger Institute
Wellcome Trust Genome Campus
Cambridge  CB10 1SA
Tel: 01223 494955

-----Original Message-----
From: gus...@li...
[mailto:gus...@li...]On Behalf Of Chris
Stoeckert
Sent: 16 October 2002 23:53
To: gusdev-gusdev
Cc: jki...@ar...
Subject: Fwd: [Gusdev-gusdev] DNA, RNA and Protein GUS Features +
PeptidePropertyType Table

Hi Folks,
Jessie Kissinger has set up gusdev at the University of Georgia and I
hope that she will be joining these discussions soon. As you can see
from her mail below, there are issues she needs to address that we've
been trying to avoid. Sigh. It may be time to address them.

Cheers,
Chris

Begin forwarded message:

> 	We are still setting up so, needless to say, we have not made a
> detailed walk through the schema and the features of every table yet.
> We have made a list of a few concepts that we presume will need to be
> added to the schema to accomplish some of our goals and many of these
> will also be needed by Sanger since they are particular to
> Kinetoplastid organisms and or the sequencing strategy.
>
> 	Some issues that are on my list are the following:
>
> 1 - The concept of a homologous chromosome. T. cruzi is being
> sequenced as a diploid.
>
> 2 - The concept of multiple genes per transcript, kinetoplastid
> organisms are eukaryotic but use polycistronic transcription. This
> feature is commonly ignored, but now that we have expression studies,
> we need to be able to study expression levels of genes on the same
> transcript to get testable ideas about post-transcription mechanisms
> of control.
>
> 3 - The concept of a 5' splice leader sequence (the idea that it
> exists and keeping track of which leader it was, there are multiple
> leaders).  Currently, nobody keeps track of this, they just remove it
> and analyze the rest.
>
> 4 - Kinetoplastid mitochondria a quite weird, they consist of mini and
> maxi circle plasmid DNA's and heavily utilize RNA editing. Thus in
> addition to the keeping track of mini and maxi circle DNA's we need
> the concept of a guide RNA and an 'edited' site in a message that is
> edited.  Idealy one would like to record the nature of the edit, i.e.
> what change is made, what nucleotides are added to the sequence.
> Transcripts can only encode ORF's after they have been edited.