From: Christiane Hertz-F. <ch...@sa...> - 2002-10-18 10:36:46
|
Hi Jessica and Chris, With regards to some of the specific points you raised: 1. Homologous chromosomes: Yes, we are trying to tackle this one; T. brucei and L. major are also diploid and even possibly trisomic for some of the chromosomes. Arnaud is already thinking about how to represent the potentially large insertion/deletions between homologues of a given chromosome. It is in the functional specifications for GeneDB, both to be represented at the sequence level as well as graphically. However, because of the varied sequencing approaches [i.e. for T. brucei, Sanger used PFGE separated homologues (where possible) whereas TIGR doesn't map BACs to the homologues], we thought that storing these kinds of data was quite a way off and thus concentrated efforts on extending the schema to store other data first. Initially not being able to assign sequences to particular homologues will also hold true for T. cruzi (with an as yet undefined karyotype), using a whole genome shotgun approach. 2. Polycistronic transcription As far as I am aware for T. brucei and L. major, trans-splicing and polyadenylation are co-transcriptional. Occasionally, transcripts with one spliced leader sequence, two CDSs for e.g. and a polyA tail are observed when amplified by PCR from cDNA. However, it is apparently an error in processing, as these transcripts are unlikely to be functional and thus are probably degraded. As a consequence, we didn't think it necessary to represent these. Also, I think that at this stage pol II promoters for protein-coding genes are poorly characterised (obviously, that will change) and can't as yet be assigned to particular transcription units and it is clear that adjacent genes within the same transcription unit are regulated independently both in terms of differing localisation and expression levels (e.g. the phosphoglycerate kinase cluster in T. brucei). Is this different in T. cruzi? How can you at this stage assign genes to a given transcript? However, we have been thinking of this in the "bacterial sense". The first bacterium is now in the development version of GeneDB and as a consequence, we would like GUS to be able to cope with operons. Again, Arnaud is thinking about this. 3. Spliced leader: The spliced leader is the same for all transcripts in T. brucei and L. major. As a consequence (after very long discussions) we decided not to attempt to represent this. Also we understood it to be a problem attaching a transcript to two genes (which is effectively what you'd want i.e. the gene of interest + the sequence encoding the SL). What Arnaud proposed was to annotate the transpliced transcript with an additional note/qualifier about the SL. Are there different SL sequences in T. cruzi? Also, the SL sequences are transcribed from long arrays which are difficult to resolve in sequencing. So, it would have to be annotated to the array rather than individual genes. 4. Mitochondrial DNA We also thought about this. I am not sure to what extent the minicircles have been (and will be) sequenced, there are 1000s of them. For maxicircle encoded genes, Arnaud is proposing to use a unique RNAFeature object for both edited/unedited transcripts and the distinction between the two transcripts would be made using Sequence Ontology. The editing process would be annotated by using a SeqVariation object. As far as gRNA positions and sequences were concerned, we were thinking of linking to comprehensive databases such as http://www.rna.ucla.edu/trypanosome/database.html or http://www.ebi.ac.uk/parasites/kDNA/Source.html. However, it would be great if it were possible to store all this info in GUS. Cheers, Christiane and Arnaud -- Dr Christiane Hertz-Fowler GeneDB Curator (T. brucei) Pathogen Sequencing Unit The Wellcome Trust Sanger Institute Wellcome Trust Genome Campus Cambridge CB10 1SA Tel: 01223 494955 -----Original Message----- From: gus...@li... [mailto:gus...@li...]On Behalf Of Chris Stoeckert Sent: 16 October 2002 23:53 To: gusdev-gusdev Cc: jki...@ar... Subject: Fwd: [Gusdev-gusdev] DNA, RNA and Protein GUS Features + PeptidePropertyType Table Hi Folks, Jessie Kissinger has set up gusdev at the University of Georgia and I hope that she will be joining these discussions soon. As you can see from her mail below, there are issues she needs to address that we've been trying to avoid. Sigh. It may be time to address them. Cheers, Chris Begin forwarded message: > We are still setting up so, needless to say, we have not made a > detailed walk through the schema and the features of every table yet. > We have made a list of a few concepts that we presume will need to be > added to the schema to accomplish some of our goals and many of these > will also be needed by Sanger since they are particular to > Kinetoplastid organisms and or the sequencing strategy. > > Some issues that are on my list are the following: > > 1 - The concept of a homologous chromosome. T. cruzi is being > sequenced as a diploid. > > 2 - The concept of multiple genes per transcript, kinetoplastid > organisms are eukaryotic but use polycistronic transcription. This > feature is commonly ignored, but now that we have expression studies, > we need to be able to study expression levels of genes on the same > transcript to get testable ideas about post-transcription mechanisms > of control. > > 3 - The concept of a 5' splice leader sequence (the idea that it > exists and keeping track of which leader it was, there are multiple > leaders). Currently, nobody keeps track of this, they just remove it > and analyze the rest. > > 4 - Kinetoplastid mitochondria a quite weird, they consist of mini and > maxi circle plasmid DNA's and heavily utilize RNA editing. Thus in > addition to the keeping track of mini and maxi circle DNA's we need > the concept of a guide RNA and an 'edited' site in a message that is > edited. Idealy one would like to record the nature of the edit, i.e. > what change is made, what nucleotides are added to the sequence. > Transcripts can only encode ORF's after they have been edited. |