You can subscribe to this list here.
2002 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
(11) |
Jul
(34) |
Aug
(14) |
Sep
(10) |
Oct
(10) |
Nov
(11) |
Dec
(6) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2003 |
Jan
(56) |
Feb
(76) |
Mar
(68) |
Apr
(11) |
May
(97) |
Jun
(16) |
Jul
(29) |
Aug
(35) |
Sep
(18) |
Oct
(32) |
Nov
(23) |
Dec
(77) |
2004 |
Jan
(52) |
Feb
(44) |
Mar
(55) |
Apr
(38) |
May
(106) |
Jun
(82) |
Jul
(76) |
Aug
(47) |
Sep
(36) |
Oct
(56) |
Nov
(46) |
Dec
(61) |
2005 |
Jan
(52) |
Feb
(118) |
Mar
(41) |
Apr
(40) |
May
(35) |
Jun
(99) |
Jul
(84) |
Aug
(104) |
Sep
(53) |
Oct
(107) |
Nov
(68) |
Dec
(30) |
2006 |
Jan
(19) |
Feb
(27) |
Mar
(24) |
Apr
(9) |
May
(22) |
Jun
(11) |
Jul
(34) |
Aug
(8) |
Sep
(15) |
Oct
(55) |
Nov
(16) |
Dec
(2) |
2007 |
Jan
(12) |
Feb
(4) |
Mar
(8) |
Apr
|
May
(19) |
Jun
(3) |
Jul
(1) |
Aug
(6) |
Sep
(12) |
Oct
(3) |
Nov
|
Dec
|
2008 |
Jan
(4) |
Feb
|
Mar
|
Apr
|
May
(1) |
Jun
(1) |
Jul
|
Aug
|
Sep
|
Oct
(1) |
Nov
|
Dec
(21) |
2009 |
Jan
|
Feb
(2) |
Mar
(1) |
Apr
|
May
(1) |
Jun
(8) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2010 |
Jan
|
Feb
(1) |
Mar
(4) |
Apr
(3) |
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2011 |
Jan
|
Feb
|
Mar
|
Apr
(4) |
May
(19) |
Jun
(14) |
Jul
(1) |
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2012 |
Jan
|
Feb
|
Mar
(22) |
Apr
(12) |
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2013 |
Jan
(2) |
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(2) |
Nov
|
Dec
|
2015 |
Jan
|
Feb
|
Mar
|
Apr
|
May
(3) |
Jun
|
Jul
|
Aug
(2) |
Sep
|
Oct
|
Nov
|
Dec
(1) |
2016 |
Jan
(1) |
Feb
(1) |
Mar
|
Apr
(1) |
May
|
Jun
(2) |
Jul
(1) |
Aug
|
Sep
|
Oct
(1) |
Nov
(1) |
Dec
|
2017 |
Jan
|
Feb
|
Mar
|
Apr
|
May
(1) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
From: Arnaud K. <ax...@sa...> - 2002-11-13 11:38:09
|
Hi Chris I didn't include anything about genetic interactions even if in the future we will want to store them. I've reviewed the interaction table and I've got some thoughts about this table. Genetic interactions may involve more than one effector/target. If we want to make the Interaction table generic, we need to store more than one effector and more than one target. I don't have any use cases yet, but I can ask around one if needed. An extra controlled vocabulary is needed. This controlled vocabulary will be used to classify the behaviour of the effector for a given interaction. e.g. Allele1 "inhibits" the expression of Allele2. Regarding physical interactions, there are two cases in which it will be useful to annotate them: * Transient interactions associated with a function, e.g. a protein regulating the transcription will be interacting with DNA. * Structural interactions involved in the formation of a complex. In that case, we can associate component interactions with the complex they are involved in. Some of these interactions are experimentally characterized, others are hypothetical. Currently in GUS, a complex is a set of components. Would it be possible to associate a complex with a set of interactions as well ? The other point I didn't mention in my previous email was the review of the phenotype table. Would it be possible to associate phenotypic data with GO terms ? cheers Arnaud Chris Stoeckert wrote: > Hi Arnaud, > These look fine to me and represent what we've discussed. Thanks for > putting this together! > Cheers, > Chris > > On Monday, November 11, 2002, at 06:53 AM, Arnaud Kerhornou wrote: > >> Hi everyone >> >> I've attached SQL statements and doc for draft tables and views for >> holding genetic data. >> >> You'll find statements for: >> >> Allele tables: >> >> * Allele table, >> * AlleleInstance table, >> * AlleleFeature view, >> >> * A Complementation table which stores complementation data. It works >> as a reference table to another allele. It can be an internal or an >> external reference. There are three types of complementation >> references : "complements", "complemented_by", "fails_to_complement". >> >> Controlled vocabulary tables: >> * PhenotypeClass - controlled vocabulary to classify the allele >> effects - recessif/dominant - lethal, >> * Mutagen - controlled vocabulary of chemical products used to >> generate mutants, >> >> and >> * AllelePhenotypeClass, >> * AlleleMutagen, >> * AllelePhenotype, >> * AlleleComplementation, >> >> * RNAi table to store RNAi constructs data, >> * RNAiPhenotype table. >> >> Please let me know if you have any questions or comments. >> cheers >> Arnaud >> <phenotype.tar.gz> > > |
From: Chris S. <sto...@pc...> - 2002-11-12 15:33:17
|
Hi Arnaud, These look fine to me and represent what we've discussed. Thanks for putting this together! Cheers, Chris On Monday, November 11, 2002, at 06:53 AM, Arnaud Kerhornou wrote: > Hi everyone > > I've attached SQL statements and doc for draft tables and views for > holding genetic data. > > You'll find statements for: > > Allele tables: > > * Allele table, > * AlleleInstance table, > * AlleleFeature view, > > * A Complementation table which stores complementation data. It works > as a reference table to another allele. It can be an internal or an > external reference. There are three types of complementation > references : "complements", "complemented_by", "fails_to_complement". > > Controlled vocabulary tables: > * PhenotypeClass - controlled vocabulary to classify the allele > effects - recessif/dominant - lethal, > * Mutagen - controlled vocabulary of chemical products used to > generate mutants, > > and > * AllelePhenotypeClass, > * AlleleMutagen, > * AllelePhenotype, > * AlleleComplementation, > > * RNAi table to store RNAi constructs data, > * RNAiPhenotype table. > > Please let me know if you have any questions or comments. > cheers > Arnaud > <phenotype.tar.gz> |
From: Arnaud K. <ax...@sa...> - 2002-11-12 15:29:58
|
Hi Jonathan Sorry for the delay to come back to you with some thoughts on attribution data. Here a case of what could happen on a given project: * The sequences would come from TIGR, * The gene models would come from SBRI, * The manual annotation of the gene models and the GO curation would be done by TIGR, * The curation would be done by the Sanger, * Some curated comments would be sent by members of the community. Instead of using the evidence table, would it be possible to attribute data by using the user_id attribute ? e.g. if the gene models are coming from SBRI, the user_id would acknowledge the gene features as owned by SBRI. Any update would keep the ownership and would acknowledge who's done the update. The other point was the attribution of data coming from publication or personal communication. I had a look at flybase. Flybase considers personal communication as references. To differentiate them, they have an extra attribute in the reference table to allow the classification of the different references. For more information about the refernce class controlled vocabulary, see http://flybase.bio.indiana.edu/.data/docs/refman/refman-B.html#B.13.2. cheers Arnaud ------------------------------ Item 3: Attribution of data from multiple sources. Three methods are available in GUS3.0 to attach information to tables. Evidence which allows attributions to be linked to any row. NAComment which allows multiple attachment of comments to a sequence. Comment which is attached to a review_status_id; each NAFeature has a review_status_id. Use cases are needed to determine if any of these mechanisms are appropriate. see addendum from Jonathan Crabtree below. Addendum to item 3 from Jonathan. I spent a little time looking into this and the number of methods differs depending on how you count them (and also because in most situations the number of alternatives differs depending on which table you're commenting on.) But here are the ways we currently support in GUS 3.0 for adding comments to things (external to the tables themselves): 1. DoTS.Comments (not "Comment") + DoTS.Evidence I list these together because the Comments table relies on the Evidence table to link its rows to other objects in the db. This method can be used with any table and supports CLOB comments. 2. DoTS.AAComment + DoTS.CommentName Can be used only with AASequence entries and supports VARCHAR2(4000). 3. DoTS.NAComment Can be used only with NASequence entries and supports VARCHAR2(4000). (Does *not* have a link to DoTS.CommentName) 4. DoTS.Note Can be used only with NAFeature entries and supports VARCHAR2(4000) (Note that this is different from gusdev.Note, which has a VARCHAR(255) AND a CLOB column.) Note that DoTS.Comments is the only generic option (that I found) for associating notes/comments with rows. Note also that AASequence, NASequence, and NAFeature all have their own specialized comment tables, but AAFeature doesn't appear to (at least not one with "comment" in its name!) Conceptually speaking I'm also not sure that I agree with the use of the "Evidence" table to link comments to rows in general. For example, during the conference call I gave the example of a note in PlasmoDB that basically says "the second exon of this predicted gene is incorrect"; this would actually be evidence *against* the GeneFeature, not *for* it (the typical use of the Evidence table.) Likewise, one could merely be commenting on an aspect of a predicted feature, without actually providing any further evidence for its existence or correctness. In other words, an implicit statement of the form "if this thing exists, then it's interesting that such and such would be true...". Another thing to point out is that none of these tables (as far as I can remember), has a pointer to SRES.Contact, so they don't really address the question of attribution. In PlasmoDB right now we handle attribution mainly through creative use of the ExternalDatabase table (external_db_id in the current GUSdev). In GUS 3.0 I believe that external database releases will be linked to Contacts, so perhaps the thing to do is to allow a single entry in the database to be associated with multiple external databases? This gets slightly messy if you want to be able to attribute something to a personal communication with somebody, or to a journal article (neither of which is expressed particularly well as an "external database".) Although both might be nicely represented as References, perhaps? There are enough possibilities that maybe we should just find out exactly what the PSU folks have in mind, and tailor a solution that works for them (using the existing schema as much as possible.) |
From: Arnaud K. <ax...@sa...> - 2002-11-11 11:53:35
|
Hi everyone I've attached SQL statements and doc for draft tables and views for holding genetic data. You'll find statements for: Allele tables: * Allele table, * AlleleInstance table, * AlleleFeature view, * A Complementation table which stores complementation data. It works as a reference table to another allele. It can be an internal or an external reference. There are three types of complementation references : "complements", "complemented_by", "fails_to_complement". Controlled vocabulary tables: * PhenotypeClass - controlled vocabulary to classify the allele effects - recessif/dominant - lethal, * Mutagen - controlled vocabulary of chemical products used to generate mutants, and * AllelePhenotypeClass, * AlleleMutagen, * AllelePhenotype, * AlleleComplementation, * RNAi table to store RNAi constructs data, * RNAiPhenotype table. Please let me know if you have any questions or comments. cheers Arnaud |
From: Adrian R. T. <ar...@sa...> - 2002-10-28 00:52:14
|
Hi, Following on from our discussion on Thursday this is to announce where to find our latest code/site. http://www2.genedb.org/gusapp/servlet?page=boolq is a good place to start (Please don't advertise!). It should be running again later today, but we currently have system problems and I can only log in intermittenly. Future plans ============ At the moment it's not tied in properly via the navbar, and there's a couple of extra queries/bug fixes we want to get working. Once they're there and it's been tested a bit we'll switch it onto www.genedb.org. Our hope is this can be virtually frozen from a development point of view, with weekly data updates. Then we can start focussing our efforts on GUS3. Ideally we'd like to keep the hybrid pointing at our copy of GUSdev simply to save migrating to GUS3. Our development targets are then loading all our data into GUS3 and continue to work on the new web front-end/schema. Once the web front-end is sufficiently developed we can switch www over to the new code/schema. The only downside is that it doesn't look like anything is happening to GeneDB over this period as all the changes are happening behind the scenes. Changes to the GUS code - why ============================= The curators felt that the query options should be more organism specific eg if you're a tryp person and choose "GO component" you should only see the GO terms curated to features in T. brucei. Changes to the GUS code - technical =================================== The changes to the CBIL code are pretty minimal. The real change is moving some of the object creation away from the configFile into Factories. The configFile is still used to control the Factories. Deferring the object creation to runtime allows us to modify the objects based on organism. The code is stored at: http://cvs.sanger.ac.uk/cgi-bin/cvsweb.cgi/genedb/hybrid/?cvsroot=Pathogen Please let us know if you have trouble accessing it. I was originally planning to store any modified code in the org.gene.hybrid package but a couple of the classes were hard-coded or required package-private access. In the former I've modified the classes in situ, in the latter added new versions in the cbil heirachy called GeneDB... GeneDBQueryPage: Based heavily on BooleanQueryPage. First checks if a organism has been selected and sends you away (currently to a static list) to choose one if not. The other change is, just before finally evaluating the query tree, to add a new root node which is "select all features for given organism" and "AND"ing that in. (It'd possibly be more efficient to restrict each query to be organism specific, but doing it like this means the organism shows up on the result page and the history page) SqlQuerySetFactory: rather than passing a SqlQuerySet to the query page, you pass a factory. This factory is configured using the configFile as normal. You pass it a QuerySet which is a superset of all the queries available across all organisms. You can also pass in Options which is a list of which queries are available for which organism. If an organism isn't specifed here it's assumed to want the entire range of queries. (Format queryFactory.Options=org1,queryName1:qn2:qn3,\ org2,qn3,qn4,\ etc) SqlQueryParamFactory: the SqlQuerySetFactory also has a reference to a SqlQueryParamFactory. It uses this to replace the parameter names in the query with real Params. For most types this is a trivial substitution, the only case I've overridden is SqlEnumParam. By convention this now has a substitution for organism name in the SQl query. The factory makes the substitution and returns an organism-specific SqlEnumParam (which is slotted into the SqlQuery which is put into an organism-specific SqlQuerySet, which is passed back to the QueryPage, which treats it as usual) There were a few other minor changes (eg to SqlQuery to allow it to take either Params or names of Params) If you have any questions please don't hesitate to ask. Adrian |
From: Arnaud K. <ax...@sa...> - 2002-10-21 13:50:28
|
Chris please find attached the documentation file. Let me know if the syntax is not correct. This doc file gives information about the proposal I sent. It also includes some comments, following the feed-back you sent. The proposal needs now to incorporate your feed-back, in particular regarding the controlled vocabulary tables. cheers Arnaud Chris Stoeckert wrote: > Hi Arnaud, > I finally went through your list. These will certainly enrich GUS! > Some questions/issues though. > First a general request for documentation of the tables and attributes > to explain what they are to be used for. We have a plug-in that takes > a file in the format: > > TableName\t\tdescription > TableName\tAttributeName\tDescription > > In particular, I am curious as to what InflectionPointFeature and > ReplicationFeature are. > > For the NAFeature views you propose, are you using "source_id" to > point to SRes:SequenceOntology? If so, why not call the attribute "so_id"? > Similarly, for GenomeSequence as a view of NASequence, is this what > "source_id" is for? > > The AAFeature views have "name" attributes and I wonder whether we > should have a table in SRes for controlled vocabulary terms for > protein features that we can point to (as with sequence ontology). > This would avoid the uncontrolled use of "name." I notice that > PeptideProperty has been given a controlled vocabulary table > PeptidePropertyType in this regard. Rather than have a table for each, > we could centralize them. Any choices for the resource to use for > these names? SWISS-PROT? > > Cheers, > Chris > > On Tuesday, October 8, 2002, at 09:00 AM, Arnaud Kerhornou wrote: > >> From: Arnaud Kerhornou <ax...@sa...> >> Date: Tue Oct 8, 2002 9:00:32 AM US/Eastern >> To: gusdev-gusdev <gus...@li...>, >> gen...@li... >> Subject: [Gusdev-gusdev] DNA, RNA and Protein GUS Features + >> PeptidePropertyType Table >> >> Hi >> >> I've attached the SQL statements for new views/tables in GUS3, as >> well as updates of existing views/tables. It covers a new sequence >> object and new DNA, RNA and protein features that we would like to >> use. Some of them have been designed to go along Sequence Ontology >> classification (see below). >> >> Here a summary of the list of new views or tables: >> >> Updated views/tables: >> * NAFeatureImp - modified: >> * name from varchar2(30) to varchar2(50) >> * RestrictionFragmentFeature - added: >> * type_of_cut (sticky or blunt) >> * SignalPeptideFeature - added: >> * targetting >> * GeneSynonym - added: >> * is_obsolete >> >> New views/tables: >> >> A new sequence object on the top of NASequenceImp: >> * GenomicSequence >> >> New views on the top of NAFeatureImp: >> * ChromosomeElement >> NB : (centromere, telomere => SO) >> * InflectionPointFeature >> * RepeatFeature >> NB : repeat type => SO >> * RepeatRegionFeature >> * ReplicationFeature >> * TransposableElementFeature >> NB: SO would give the type >> * RNARegulatory >> * RNASecondaryStructure >> * SpliceSiteFeature >> >> New views on the top of AAFeatureImp: >> * AASecondaryStructure >> * AATertiaryStructure >> * DomainFeature >> * PeptideProperty >> * PostTranslationModification >> * TransmembraneDomainFeature >> >> A new table: >> * PeptidePropertyType >> >> >> Note that the design takes into account the use of the Sequence >> Ontology (SO) to refine the types of some the features, eg to >> differentiate the different types of transposable elements, of >> repeats or of chromosome elements (centromere, telomere ...). > > > |
From: Arnaud K. <ax...@sa...> - 2002-10-18 13:33:13
|
Hi Chris Chris Stoeckert wrote: > Hi Arnaud, > I finally went through your list. These will certainly enrich GUS! > Some questions/issues though. > First a general request for documentation of the tables and attributes > to explain what they are to be used for. We have a plug-in that takes > a file in the format: > > TableName\t\tdescription > TableName\tAttributeName\tDescription Sorry for the lack of documentation, I'm going to prepare a doc file. > > In particular, I am curious as to what InflectionPointFeature and > ReplicationFeature are. In Leishmania, but more generally for any organism which has polycistronic transcription, the inflection point represents the start of the transcription. There are some studies trying to find out whether or not it corresponds to a conserved sequence. If so, it might interesting for curator to annotate them. ReplicationFeature represents origins of replication. ReplicationFeature sounds more generic but they will be given a more specific SO term. > > For the NAFeature views you propose, are you using "source_id" to > point to SRes:SequenceOntology? If so, why not call the attribute "so_id"? > Similarly, for GenomeSequence as a view of NASequence, is this what > "source_id" is for? The source_id is not related to Sequence Ontology. The main point with my proposal is to replace controlled vocabularies specifying the type of a feature with SO. But to do so, I think we need a many to many relationship between feature views and SO. Could it be done by using the "DoTS::GOTermAssociation <http://www.cbil.upenn.edu/cgi-bin/GUS30/schemaBrowser.pl?db=GUS30&table=DoTS::GOTermAssociation&path=DoTS::GOTermAssociation>" table or cloning it ? As I realise it's an important point for the GUS design, please let me know if you agree or if you want to propose something else. I added a source_id to the sequence and NAfeature views because I can see that all feature objects have this attribute. What is this attribute for in GUS ? > > The AAFeature views have "name" attributes and I wonder whether we > should have a table in SRes for controlled vocabulary terms for > protein features that we can point to (as with sequence ontology). > This would avoid the uncontrolled use of "name." I notice that > PeptideProperty has been given a controlled vocabulary table > PeptidePropertyType in this regard. Rather than have a table for each, > we could centralize them. Any choices for the resource to use for > these names? SWISS-PROT? SO doesn't cover protein features but would eventually. Anyway in the meantime, it makes sense to have a controlled vocabulary. I'm not aware of such controlled vocabulary though. Shall I replace the PeptidePropertyType table by a more generic one, AAFeatureName ? > > Cheers, > Chris > > On Tuesday, October 8, 2002, at 09:00 AM, Arnaud Kerhornou wrote: > >> From: Arnaud Kerhornou <ax...@sa...> >> Date: Tue Oct 8, 2002 9:00:32 AM US/Eastern >> To: gusdev-gusdev <gus...@li...>, >> gen...@li... >> Subject: [Gusdev-gusdev] DNA, RNA and Protein GUS Features + >> PeptidePropertyType Table >> >> Hi >> >> I've attached the SQL statements for new views/tables in GUS3, as >> well as updates of existing views/tables. It covers a new sequence >> object and new DNA, RNA and protein features that we would like to >> use. Some of them have been designed to go along Sequence Ontology >> classification (see below). >> >> Here a summary of the list of new views or tables: >> >> Updated views/tables: >> * NAFeatureImp - modified: >> * name from varchar2(30) to varchar2(50) >> * RestrictionFragmentFeature - added: >> * type_of_cut (sticky or blunt) >> * SignalPeptideFeature - added: >> * targetting >> * GeneSynonym - added: >> * is_obsolete >> >> New views/tables: >> >> A new sequence object on the top of NASequenceImp: >> * GenomicSequence >> >> New views on the top of NAFeatureImp: >> * ChromosomeElement >> NB : (centromere, telomere => SO) >> * InflectionPointFeature >> * RepeatFeature >> NB : repeat type => SO >> * RepeatRegionFeature >> * ReplicationFeature >> * TransposableElementFeature >> NB: SO would give the type >> * RNARegulatory >> * RNASecondaryStructure >> * SpliceSiteFeature >> >> New views on the top of AAFeatureImp: >> * AASecondaryStructure >> * AATertiaryStructure >> * DomainFeature >> * PeptideProperty >> * PostTranslationModification >> * TransmembraneDomainFeature >> >> A new table: >> * PeptidePropertyType >> >> >> Note that the design takes into account the use of the Sequence >> Ontology (SO) to refine the types of some the features, eg to >> differentiate the different types of transposable elements, of >> repeats or of chromosome elements (centromere, telomere ...). > > > |
From: Christiane Hertz-F. <ch...@sa...> - 2002-10-18 10:36:46
|
Hi Jessica and Chris, With regards to some of the specific points you raised: 1. Homologous chromosomes: Yes, we are trying to tackle this one; T. brucei and L. major are also diploid and even possibly trisomic for some of the chromosomes. Arnaud is already thinking about how to represent the potentially large insertion/deletions between homologues of a given chromosome. It is in the functional specifications for GeneDB, both to be represented at the sequence level as well as graphically. However, because of the varied sequencing approaches [i.e. for T. brucei, Sanger used PFGE separated homologues (where possible) whereas TIGR doesn't map BACs to the homologues], we thought that storing these kinds of data was quite a way off and thus concentrated efforts on extending the schema to store other data first. Initially not being able to assign sequences to particular homologues will also hold true for T. cruzi (with an as yet undefined karyotype), using a whole genome shotgun approach. 2. Polycistronic transcription As far as I am aware for T. brucei and L. major, trans-splicing and polyadenylation are co-transcriptional. Occasionally, transcripts with one spliced leader sequence, two CDSs for e.g. and a polyA tail are observed when amplified by PCR from cDNA. However, it is apparently an error in processing, as these transcripts are unlikely to be functional and thus are probably degraded. As a consequence, we didn't think it necessary to represent these. Also, I think that at this stage pol II promoters for protein-coding genes are poorly characterised (obviously, that will change) and can't as yet be assigned to particular transcription units and it is clear that adjacent genes within the same transcription unit are regulated independently both in terms of differing localisation and expression levels (e.g. the phosphoglycerate kinase cluster in T. brucei). Is this different in T. cruzi? How can you at this stage assign genes to a given transcript? However, we have been thinking of this in the "bacterial sense". The first bacterium is now in the development version of GeneDB and as a consequence, we would like GUS to be able to cope with operons. Again, Arnaud is thinking about this. 3. Spliced leader: The spliced leader is the same for all transcripts in T. brucei and L. major. As a consequence (after very long discussions) we decided not to attempt to represent this. Also we understood it to be a problem attaching a transcript to two genes (which is effectively what you'd want i.e. the gene of interest + the sequence encoding the SL). What Arnaud proposed was to annotate the transpliced transcript with an additional note/qualifier about the SL. Are there different SL sequences in T. cruzi? Also, the SL sequences are transcribed from long arrays which are difficult to resolve in sequencing. So, it would have to be annotated to the array rather than individual genes. 4. Mitochondrial DNA We also thought about this. I am not sure to what extent the minicircles have been (and will be) sequenced, there are 1000s of them. For maxicircle encoded genes, Arnaud is proposing to use a unique RNAFeature object for both edited/unedited transcripts and the distinction between the two transcripts would be made using Sequence Ontology. The editing process would be annotated by using a SeqVariation object. As far as gRNA positions and sequences were concerned, we were thinking of linking to comprehensive databases such as http://www.rna.ucla.edu/trypanosome/database.html or http://www.ebi.ac.uk/parasites/kDNA/Source.html. However, it would be great if it were possible to store all this info in GUS. Cheers, Christiane and Arnaud -- Dr Christiane Hertz-Fowler GeneDB Curator (T. brucei) Pathogen Sequencing Unit The Wellcome Trust Sanger Institute Wellcome Trust Genome Campus Cambridge CB10 1SA Tel: 01223 494955 -----Original Message----- From: gus...@li... [mailto:gus...@li...]On Behalf Of Chris Stoeckert Sent: 16 October 2002 23:53 To: gusdev-gusdev Cc: jki...@ar... Subject: Fwd: [Gusdev-gusdev] DNA, RNA and Protein GUS Features + PeptidePropertyType Table Hi Folks, Jessie Kissinger has set up gusdev at the University of Georgia and I hope that she will be joining these discussions soon. As you can see from her mail below, there are issues she needs to address that we've been trying to avoid. Sigh. It may be time to address them. Cheers, Chris Begin forwarded message: > We are still setting up so, needless to say, we have not made a > detailed walk through the schema and the features of every table yet. > We have made a list of a few concepts that we presume will need to be > added to the schema to accomplish some of our goals and many of these > will also be needed by Sanger since they are particular to > Kinetoplastid organisms and or the sequencing strategy. > > Some issues that are on my list are the following: > > 1 - The concept of a homologous chromosome. T. cruzi is being > sequenced as a diploid. > > 2 - The concept of multiple genes per transcript, kinetoplastid > organisms are eukaryotic but use polycistronic transcription. This > feature is commonly ignored, but now that we have expression studies, > we need to be able to study expression levels of genes on the same > transcript to get testable ideas about post-transcription mechanisms > of control. > > 3 - The concept of a 5' splice leader sequence (the idea that it > exists and keeping track of which leader it was, there are multiple > leaders). Currently, nobody keeps track of this, they just remove it > and analyze the rest. > > 4 - Kinetoplastid mitochondria a quite weird, they consist of mini and > maxi circle plasmid DNA's and heavily utilize RNA editing. Thus in > addition to the keeping track of mini and maxi circle DNA's we need > the concept of a guide RNA and an 'edited' site in a message that is > edited. Idealy one would like to record the nature of the edit, i.e. > what change is made, what nucleotides are added to the sequence. > Transcripts can only encode ORF's after they have been edited. |
From: Chris S. <sto...@SN...> - 2002-10-16 22:51:23
|
Hi Folks, Jessie Kissinger has set up gusdev at the University of Georgia and I hope that she will be joining these discussions soon. As you can see from her mail below, there are issues she needs to address that we've been trying to avoid. Sigh. It may be time to address them. Cheers, Chris Begin forwarded message: > We are still setting up so, needless to say, we have not made a > detailed walk through the schema and the features of every table yet. > We have made a list of a few concepts that we presume will need to be > added to the schema to accomplish some of our goals and many of these > will also be needed by Sanger since they are particular to > Kinetoplastid organisms and or the sequencing strategy. > > Some issues that are on my list are the following: > > 1 - The concept of a homologous chromosome. T. cruzi is being > sequenced as a diploid. > > 2 - The concept of multiple genes per transcript, kinetoplastid > organisms are eukaryotic but use polycistronic transcription. This > feature is commonly ignored, but now that we have expression studies, > we need to be able to study expression levels of genes on the same > transcript to get testable ideas about post-transcription mechanisms > of control. > > 3 - The concept of a 5' splice leader sequence (the idea that it > exists and keeping track of which leader it was, there are multiple > leaders). Currently, nobody keeps track of this, they just remove it > and analyze the rest. > > 4 - Kinetoplastid mitochondria a quite weird, they consist of mini and > maxi circle plasmid DNA's and heavily utilize RNA editing. Thus in > addition to the keeping track of mini and maxi circle DNA's we need > the concept of a guide RNA and an 'edited' site in a message that is > edited. Idealy one would like to record the nature of the edit, i.e. > what change is made, what nucleotides are added to the sequence. > Transcripts can only encode ORF's after they have been edited. |
From: Chris S. <sto...@SN...> - 2002-10-16 22:45:49
|
Hi Arnaud, I finally went through your list. These will certainly enrich GUS! Some questions/issues though. First a general request for documentation of the tables and attributes to explain what they are to be used for. We have a plug-in that takes a file in the format: TableName\t\tdescription TableName\tAttributeName\tDescription In particular, I am curious as to what InflectionPointFeature and ReplicationFeature are. For the NAFeature views you propose, are you using "source_id" to point to SRes:SequenceOntology? If so, why not call the attribute "so_id"? Similarly, for GenomeSequence as a view of NASequence, is this what "source_id" is for? The AAFeature views have "name" attributes and I wonder whether we should have a table in SRes for controlled vocabulary terms for protein features that we can point to (as with sequence ontology). This would avoid the uncontrolled use of "name." I notice that PeptideProperty has been given a controlled vocabulary table PeptidePropertyType in this regard. Rather than have a table for each, we could centralize them. Any choices for the resource to use for these names? SWISS-PROT? Cheers, Chris On Tuesday, October 8, 2002, at 09:00 AM, Arnaud Kerhornou wrote: > From: Arnaud Kerhornou <ax...@sa...> > Date: Tue Oct 8, 2002 9:00:32 AM US/Eastern > To: gusdev-gusdev <gus...@li...>, > gen...@li... > Subject: [Gusdev-gusdev] DNA, RNA and Protein GUS Features + > PeptidePropertyType Table > > Hi > > I've attached the SQL statements for new views/tables in GUS3, as well > as updates of existing views/tables. It covers a new sequence object > and new DNA, RNA and protein features that we would like to use. Some > of them have been designed to go along Sequence Ontology > classification (see below). > > Here a summary of the list of new views or tables: > > Updated views/tables: > * NAFeatureImp - modified: > * name from varchar2(30) to varchar2(50) > * RestrictionFragmentFeature - added: > * type_of_cut (sticky or blunt) > * SignalPeptideFeature - added: > * targetting > * GeneSynonym - added: > * is_obsolete > > New views/tables: > > A new sequence object on the top of NASequenceImp: > * GenomicSequence > > New views on the top of NAFeatureImp: > * ChromosomeElement > NB : (centromere, telomere => SO) > * InflectionPointFeature > * RepeatFeature > NB : repeat type => SO > * RepeatRegionFeature > * ReplicationFeature > * TransposableElementFeature > NB: SO would give the type > * RNARegulatory > * RNASecondaryStructure > * SpliceSiteFeature > > New views on the top of AAFeatureImp: > * AASecondaryStructure > * AATertiaryStructure > * DomainFeature > * PeptideProperty > * PostTranslationModification > * TransmembraneDomainFeature > > A new table: > * PeptidePropertyType > > > Note that the design takes into account the use of the Sequence > Ontology (SO) to refine the types of some the features, eg to > differentiate the different types of transposable elements, of repeats > or of chromosome elements (centromere, telomere ...). |
From: Chris S. <sto...@SN...> - 2002-10-09 21:02:30
|
Hi Arnaud, Many thanks for compiling this. Just wanted to let you know that both Jonathan Crabtree and I plan to go through them to see if there are any proposals that require further discussion. Everyone else is encouraged to do so as well. Cheers, Chris On Tuesday, October 8, 2002, at 09:00 AM, Arnaud Kerhornou wrote: > Hi > > I've attached the SQL statements for new views/tables in GUS3, as well > as updates of existing views/tables. It covers a new sequence object > and new DNA, RNA and protein features that we would like to use. Some > of them have been designed to go along Sequence Ontology > classification (see below). > > Here a summary of the list of new views or tables: |
From: Arnaud K. <ax...@sa...> - 2002-10-08 13:00:36
|
Hi I've attached the SQL statements for new views/tables in GUS3, as well as updates of existing views/tables. It covers a new sequence object and new DNA, RNA and protein features that we would like to use. Some of them have been designed to go along Sequence Ontology classification (see below). Here a summary of the list of new views or tables: Updated views/tables: * NAFeatureImp - modified: * name from varchar2(30) to varchar2(50) * RestrictionFragmentFeature - added: * type_of_cut (sticky or blunt) * SignalPeptideFeature - added: * targetting * GeneSynonym - added: * is_obsolete New views/tables: A new sequence object on the top of NASequenceImp: * GenomicSequence New views on the top of NAFeatureImp: * ChromosomeElement NB : (centromere, telomere => SO) * InflectionPointFeature * RepeatFeature NB : repeat type => SO * RepeatRegionFeature * ReplicationFeature * TransposableElementFeature NB: SO would give the type * RNARegulatory * RNASecondaryStructure * SpliceSiteFeature New views on the top of AAFeatureImp: * AASecondaryStructure * AATertiaryStructure * DomainFeature * PeptideProperty * PostTranslationModification * TransmembraneDomainFeature A new table: * PeptidePropertyType Note that the design takes into account the use of the Sequence Ontology (SO) to refine the types of some the features, eg to differentiate the different types of transposable elements, of repeats or of chromosome elements (centromere, telomere ...). e.g. Transposable Elements annotations: The different types of transposable elements would be given by specific SO terms. Bear in mind the procaryotes transposable elements are not covered by SO, but we are working on addings SO procaryotes specific terms. Here the current SO tree for transposable elements: Transposable Element ---> Non Retrotransposon ---> TIR Element ----> Terminal Inverted Repeat ---> Foldback Element ---> Retrotransposon ---> LTR Retrotransposon ----> Long Terminal Repeat ---> non LTR Retrotransposon ----> LINE Element ----> SINE Element LTRs, as well as genes, part of a transposable element would be features attached to a TransposableElement Feature. These genes would have the following SO term : transposable element gene, SO0000111. Regarding LTRs, they will be considered as Repeat Feature, annotated with the right Sequence Ontology terms. Let me know if you have any comments. cheers Arnaud PS: Sequence Ontology URL => http://www.geneontology.org/gobo/sequence.ontology/sequence.ontology |
From: Chris S. <sto...@SN...> - 2002-10-02 12:00:28
|
Hi Arnaud, Yes please make the proposal in the form of CREATE TABLE statements. BTW, some of these properties are included in mass spec data that we got for PlasmoDB, so we may want to use or call the view PeptideProperty rather than ProteinProperty where the latter can be construed as referring to the entire protein. Either way, the place to start is the SQL. Also thanks for the FlyBase info. Am just starting to take a serious look at it - looks pretty interesting. Cheers, Chris On Wednesday, October 2, 2002, at 06:12 AM, Arnaud Kerhornou wrote: > Chris > > I realise I did some propositions about new protein features but I've > never done their formalisation in SQL statements. > Shall I do that to prepare their incorporation into GUS ? > > cheers > Arnaud > > Arnaud Kerhornou wrote: > >> Hi everyone >> >> I would like to report a new table, ProteinProperty and new views on >> the top of AAFeatureImp table for protein features such as domains. >> >> * Protein properties : >> >> There are 4 protein properties : >> * Isoelectric point (1), >> * Molecular mass (2), >> * Charge (3), >> * Average residue mass (4). >> >> The 3 first ones may have several values as they can be characterized >> experimentally. >> >> From a design point of view, we can have a unique ProteinProperty >> table or a view foreach proterty (a ProteinPropertyImp table and >> three views: IsoElectricPointProperty, MolecularMassProperty and >> ChargeProperty). >> The number of properties may not changed in the future so I may be >> simpler to create a unique ProteinProperty table. >> >> Specification => A property would behave like a feature, ie : >> * it is attached to a sequence modulo the fact it doesn't have a >> location within it, >> * it can be supported by evidences such as an experiment, published >> or from a personal communication. >> * have external db refs. >> >> ProteinProperty table: >> * protein_property_id : number >> * property_name : varchar2(50) >> * property_value : number (5) >> * & stuff common to any GUS table: modification_date ... >> >> The 4th property, average residue mass, could be an extra attribute >> in the proteinSequence or TranslatedAASequence view. >> >> ****************** >> * Protein Features : >> >> Features attached to a protein sequence. >> >> The new features objects are: >> (1) Signal Peptide Feature : >> It's already a view in GUS, but we will store curated data, such as >> targetting information. >> >> (2) Domains: >> It can be: >> * a Leucine Zipper domain, >> * a coiled-coil domain, >> * a Pfam, Smart or Prosite domain. >> >> DomainFeature view: >> * aa_feature_id : number (10), >> * aa_sequence_id : number (10), >> * name : varchar2 (50), >> * description : varchar2 (100), >> * score : number (4) >> * e_value : number (10), >> + external database link entries and a location object. >> >> (3) Transmenbrane domain feature: >> Question : PlasmoDB web site shows hydrophobicity graphics, where is >> it stored in GUS ? >> >> (4) Post-translational modification feature: >> * type : varchar2 (50) (e.g. glycosylation, phosphorylation ...) >> * modified_by : use of the Interaction table ? >> * Coordinates of the phosphorylation site in a AALocation object. >> >> (5) Repeat Features, should be the same design that at the DNA >> level : >> * RepeatRegionFeature as a set of RepeatUnitFeatures, >> * RepeatUnitFeature, with the consensus sequence, name and size >> * RepeatType table >> >> Another question : What about 2D structures (beta-sheet and >> alpha-helice) in GUS ? >> >> Let me know if you have any comments. I'll send another email for >> extra features at the DNA/RNA level. >> >> Cheers >> Arnaud >> > |
From: Arnaud K. <ax...@sa...> - 2002-10-02 10:12:52
|
Chris I realise I did some propositions about new protein features but I've never done their formalisation in SQL statements. Shall I do that to prepare their incorporation into GUS ? cheers Arnaud Arnaud Kerhornou wrote: > Hi everyone > > I would like to report a new table, ProteinProperty and new views on > the top of AAFeatureImp table for protein features such as domains. > > * Protein properties : > > There are 4 protein properties : > * Isoelectric point (1), > * Molecular mass (2), > * Charge (3), > * Average residue mass (4). > > The 3 first ones may have several values as they can be characterized > experimentally. > > From a design point of view, we can have a unique ProteinProperty > table or a view foreach proterty (a ProteinPropertyImp table and three > views: IsoElectricPointProperty, MolecularMassProperty and > ChargeProperty). > The number of properties may not changed in the future so I may be > simpler to create a unique ProteinProperty table. > > Specification => A property would behave like a feature, ie : > * it is attached to a sequence modulo the fact it doesn't have a > location within it, > * it can be supported by evidences such as an experiment, published > or from a personal communication. > * have external db refs. > > ProteinProperty table: > * protein_property_id : number > * property_name : varchar2(50) > * property_value : number (5) > * & stuff common to any GUS table: modification_date ... > > The 4th property, average residue mass, could be an extra attribute in > the proteinSequence or TranslatedAASequence view. > > ****************** > * Protein Features : > > Features attached to a protein sequence. > > The new features objects are: > (1) Signal Peptide Feature : > It's already a view in GUS, but we will store curated data, such as > targetting information. > > (2) Domains: > It can be: > * a Leucine Zipper domain, > * a coiled-coil domain, > * a Pfam, Smart or Prosite domain. > > DomainFeature view: > * aa_feature_id : number (10), > * aa_sequence_id : number (10), > * name : varchar2 (50), > * description : varchar2 (100), > * score : number (4) > * e_value : number (10), > + external database link entries and a location object. > > (3) Transmenbrane domain feature: > Question : PlasmoDB web site shows hydrophobicity graphics, where is > it stored in GUS ? > > (4) Post-translational modification feature: > * type : varchar2 (50) (e.g. glycosylation, phosphorylation ...) > * modified_by : use of the Interaction table ? > * Coordinates of the phosphorylation site in a AALocation object. > > (5) Repeat Features, should be the same design that at the DNA level : > * RepeatRegionFeature as a set of RepeatUnitFeatures, > * RepeatUnitFeature, with the consensus sequence, name and size > * RepeatType table > > Another question : What about 2D structures (beta-sheet and > alpha-helice) in GUS ? > > Let me know if you have any comments. I'll send another email for > extra features at the DNA/RNA level. > > Cheers > Arnaud > |
From: Chris S. <sto...@SN...> - 2002-09-18 21:00:57
|
Thanks to Jonathan Crabtree who generated this the day of the call. Sorry it took so long to distribute this. At the end are action items for which there has already been some action. I have added notes to that effect. Cheers, and best wishes for your meeting next week! Chris Conference call on September 11, 2002 ***Attending: CBIL: Jonathan Crabtree, Joan Mazzarelli, Deborah Pinney, Jonathan Schug, Yongchang Gan, Chris Stoeckert, Steve Fischer PSU: Marie-Adele Rajandream, Arnaud Kerhornou, Christiane Hertz-Fowler, Paul Mooney, Adrian Tivey, Matt Berriman (sorry if I missed anyone; I forgot that I was supposed to be taking notes at this point - JC) ***Loading data for orthologous groups Marie-Adele raised the issue of loading data on orthologous groups for the PSU databases, and specifically our earlier promise to make Li's code for doing this available by September. Nobody present had talked to Li recently, so we promised to follow up with her (see action item 1.) We confirmed that the information isn't needed immediately (i.e., for the Woods Hole meeting on the 22nd), but that the PSU group would like to have it sooner rather than later. ***GUS 3.0 migration As part of the above discussion the question of the CBIL GUS 3.0 migration was raised. This is also something that we had earlier said we would have finished by September. In Chris's absence Jonathan C. made the (perhaps controversial) statement that although it was still a high priority he suspected that the upcoming PlasmoDB release (in the first week of October) would quite likely delay substantial progress on the migration until then. That is, the decision to use GUSdev in both the next PlasmoDB release and also the GeneDB demo at Woods Hole has diverted development effort to GUSdev that could otherwise have been applied to the GUS 3.0 migration. ***Schema changes Arnaud raised the issue of schema changes needed in the short term (by next week) for RNAFeatures. We agreed that once consensus had been reached on a proposed schema change on the GUSdev mailing list we would implement it ASAP and update the create schema scripts in the CVS repository. We asked Arnaud to send/reconfirm his request by e-mail and promised to make the necessary changes forthwith (action items 2 and 3). ***CVS repository and new GUS/GUSdev home page The question of a CVS web interface was raised again but since the Sanger admins have been busy installing new hardware, no changes have been made to the existing CVS servers. With respect to having a dedicated GUSdev home page, it was resolved that Steve F. would set something up on one of the CBIL web servers, using the gusdev.org domain that we'd acquired previously (action item 4). The new home page will provide a friendly face to the GUS project and will link to other sites (e.g., Sourceforge or Sanger CVS) as necessary. If possible the site will be set up in such a way that both groups are able to contribute content. ***Woods Hole Requirements (GUSdev distribution and installation) Paul reported that the PSU's Oracle server was in the process of being moved, and so he hadn't yet had a chance to try generating the Perl object layer from the GUSdev tar file generated by Jonathan C. He also stated that he was collecting a list of problems with the installation process (mostly minor so far) that he would e-mail to Jonathan C. when complete (action item 5). Jonathan C. said that he would check the tar file into the Sourceforge CVS repository (under /scratch) after writing a short summary to put in the README file (action item 6). ***Representing phenotypes Chris S. pointed out that the notion of phenotype implied by Christiane's e-mail was somewhat wider in scope than what we'd originally had in mind. That is, the RNAi example given includes not only the observed phenotype (e.g., slow growth) but also the (RNAi-based) assay used to elicit that phenotype from a hapless collection of Trypanosomes. The conclusion was that we would definitely be able to come up with a set of tables to represent everything, but that it might take a bit of discussion. In the intermediate term it was agreed that the PSU annotators would be able to store this type of information in free text fields, specifically entries in the "Note" table, which is able to link to any NAFeature. Arnaud also suggested that we look at the new Flybase schema (in which they are also making a transition from free text phenotype representations to greater use of controlled vocabularies), and said that he would send around a link (action item 7). ***Biojava/JSP development for GUS 3.0 interfaces Biojava was mentioned and Jonathan C. said that he (and Dave Barkan, the programmer working on the project) would be evaluating whether it could be used productively in CBIL's new annotator interface. Jonathan promised to send periodic updates to the GUSdev mailing list (on the status of the CBIL annotator interface) to solicit feedback, and find out if there are any areas of development where we could collaborate (action item 8.) Adrian had earlier brought up the question of developing the JSP infrastructure for the GUS 3.0 web interfaces. He is going to come up with a list of requirements and we are going to start brainstorming about the design over the mailing list (action item 9). ***Meetings There was some discussion of who would be at what meetings; Chris S. and Trish Whetzel are the only ones from CBIL who will be attending the November meeting at Sanger. Matt also asked about the upcoming TIGR ontology meeting (action item 10); currently nobody from either group plans to attend. ***ACTION ITEMS: 1. Make Li's poster and code available [Li] Li's poster is available at http://www.cbil.upenn.edu/downloads/GUS/documentation/ISMB2002/. Li's code for loading ortholog tables in gusdev is at: http://www.cbil.upenn.edu/downloads/GUS/releases/2.0.1/source-code/ 2. E-mail schema changes [Arnaud] 3. Apply schema changes, update scripts/CVS [Jonathan C.] 4. Set up stunning new GUSdev web site on gusdev.org [Steve F.] See http://www.gusdb.org! 5. Collect list of problems with GUSdev source code tar file and e-mail to Jonathan C./GUSdev mailing list [Paul] Sent to Jonathan C. 6. Check GUSdev tar files (schema scripts + Perl/Servlet source code) into the Sourceforge CVS repository under /scratch [Jonathan C.] 7. Send around pointer to new Flybase schema [Arnaud] 8. Communicate status/design of annotator's interfaces on a periodic basis, maintain dialog over possible uses of biojava, collaboration on interfaces and/or implementation [Jonathan C., all developers] 9. Compile user interface "wish list" and send it out as an opening salvo in a discussion on the future JSP architecture [Adrian] 10. Send around pointer to TIGR ontology meeting [Matt] Sent to Chris. |
From: steve f. <st...@SN...> - 2002-09-17 17:01:33
|
folks- we have put together a preliminary GUS web site: http://www.gusdb.org as you can see, we have moved to a new domain name, gusdb.org, and propose to retire gusdev.org. comments are encouraged, particularly w/ respect to content, but keep in mind that the layout is a stopgap until we get a chance to make a snazzier site. steve |
From: Marie-Adele R. <ma...@sa...> - 2002-09-16 15:07:11
|
Chris, No changes from us. It is fine. I have forwarded it to the gusdev-mail list. Marie-Adele -----Original Message----- From: Chris Stoeckert [mailto:sto...@SN...] Sent: 11 September 2002 20:48 To: cra...@SN... Cc: Marie Adele Subject: Re: Draft of conference call summary Jonathan, Thanks. Very comprehensive and certainly more amusing than my summaries. I am forwarding this to Marie-Adele. Marie-Adele, Please forward this to the gusdev-mail list with any additions or edits that you feel are necessary. Thanks, Chris On Wednesday, September 11, 2002, at 11:55 AM, cra...@SN... wrote: > > Chris- > > A summary is attached; I think that you only missed the first 1 or 2 > items > on the list. > > Jonathan > > -- > Jonathan Crabtree > Center for Bioinformatics, University of Pennsylvania > 1406 Blockley Hall, 423 Guardian Drive Philadelphia, PA 19104-6021 > 215-573-3115 > > Conference call on September 11, 2002 > > ***Attending: > > CBIL: Jonathan Crabtree, Joan Mazzarelli, Deborah Pinney, Jonathan > Schug, > Yongchang Gan, Chris Stoeckert, Steve Fischer > PSU: Marie-Adele Rajandream, Arnaud Kerhornou, Christiane Hertz-Fowler, > Paul Mooney, Adrian Tivey, Matt Berriman, Christopher Peacock > I forgot that I was supposed to be taking notes at this point - > JC) > > ***Loading data for orthologous groups > Marie-Adele raised the issue of loading data on orthologous groups for > the PSU databases, and specifically our earlier promise to make Li's > code for doing this available by September. Nobody present had talked > to Li recently, so we promised to follow up with her (see action item > 1.) We confirmed that the information isn't needed immediately (i.e., > for the Woods Hole meeting on the 22nd), but that the PSU group would > like to have it sooner rather than later. > > ***GUS 3.0 migration > As part of the above discussion the question of the CBIL GUS 3.0 > migration was raised. This is also something that we had earlier said > we would have finished by September. In Chris's absence Jonathan C. > made the (perhaps controversial) statement that although it was still > a high priority he suspected that the upcoming PlasmoDB release (in the > first week of October) would quite likely delay substantial progress > on the migration until then. That is, the decision to use GUSdev in > both > the next PlasmoDB release and also the GeneDB demo at Woods Hole has > diverted development effort to GUSdev that could otherwise have been > applied to the GUS 3.0 migration. > > ***Schema changes > Arnaud raised the issue of schema changes needed in the short term > (by next week) for RNAFeatures. We agreed that once consensus had been > reached on a proposed schema change on the GUSdev mailing list we > would implement it ASAP and update the create schema scripts in the > CVS repository. We asked Arnaud to send/reconfirm his request by > e-mail and promised to make the necessary changes forthwith (action > items 2 and 3). > > ***CVS repository and new GUS/GUSdev home page > The question of a CVS web interface was raised again but since the > Sanger admins have been busy installing new hardware, no changes have > been made to the existing CVS servers. With respect to having a > dedicated GUSdev home page, it was resolved that Steve F. would set > something up on one of the CBIL web servers, using the gusdev.org > domain that we'd acquired previously (action item 4). The new home > page will provide a friendly face to the GUS project and will link to > other sites (e.g., Sourceforge or Sanger CVS) as necessary. If > possible the site will be set up in such a way that both groups are > able to contribute content. > > ***Woods Hole Requirements (GUSdev distribution and installation) > Paul reported that the PSU's Oracle server was in the process of being > moved, and so he hadn't yet had a chance to try generating the Perl > object layer from the GUSdev tar file generated by Jonathan C. He > also stated that he was collecting a list of problems with the > installation process (mostly minor so far) that he would e-mail to > Jonathan C. when complete (action item 5). Jonathan C. said that he > would check the tar file into the Sourceforge CVS repository (under > /scratch) after writing a short summary to put in the README file > (action item 6). > > ***Representing phenotypes > Chris S. pointed out that the notion of phenotype implied by > Christiane's > e-mail was somewhat wider in scope than what we'd originally had in > mind. That is, the RNAi example given includes not only the observed > phenotype (e.g., slow growth) but also the (RNAi-based) assay used to > elicit that phenotype from a hapless collection of Trypanosomes. The > conclusion was that we would definitely be able to come up with a set > of tables to represent everything, but that it might take a bit of > discussion. In the intermediate term it was agreed that the PSU > annotators would be able to store this type of information in free > text fields, specifically entries in the "Note" table, which is able > to link to any NAFeature. Arnaud also suggested that we look at the > new Flybase schema (in which they are also making a transition from > free text phenotype representations to greater use of controlled > vocabularies), and said that he would send around a link (action item > 7). > > ***Biojava/JSP development for GUS 3.0 interfaces > Biojava was mentioned and Jonathan C. said that he (and Dave Barkan, > the programmer working on the project) would be evaluating whether it > could be used productively in CBIL's new annotator interface. Jonathan > promised to send periodic updates to the GUSdev mailing list (on the > status of the CBIL annotator interface) to solicit feedback, and find > out if there are any areas of development where we could collaborate > (action item 8.) > Adrian had earlier brought up the question of developing the JSP > infrastructure for the GUS 3.0 web interfaces. He is going to come up > with a list of requirements and we are going to start brainstorming > about the design over the mailing list (action item 9). > > ***Meetings > There was some discussion of who would be at what meetings; Chris > S. and Trish Whetzel are the only ones from CBIL who will be attending > the November meeting at Sanger. Matt also asked about the upcoming > TIGR ontology meeting (action item 10); currently nobody from either > group plans to attend. > > ***ACTION ITEMS: > 1. Make Li's poster and code available [Li] > 2. E-mail schema changes [Arnaud] > 3. Apply schema changes, update scripts/CVS [Jonathan C.] > 4. Set up stunning new GUSdev web site on gusdev.org [Steve F.] > 5. Collect list of problems with GUSdev source code tar file and > e-mail to Jonathan C./GUSdev mailing list [Paul] > 6. Check GUSdev tar files (schema scripts + Perl/Servlet source code) > into the Sourceforge CVS repository under /scratch [Jonathan C.] > 7. Send around pointer to new Flybase schema [Arnaud] > 8. Communicate status/design of annotator's interfaces on a periodic > basis, maintain dialog over possible uses of biojava, collaboration > on interfaces and/or implementation [Jonathan C., all developers] > 9. Compile user interface "wish list" and send it out as an opening > salvo in a discussion on the future JSP architecture [Adrian] > 10. Send around pointer to TIGR ontology meeting [Matt] > |
From: Chris S. <sto...@SN...> - 2002-09-09 19:00:23
|
Hi Arnaud, > I'd like to illustrate the alternative splicing discussion by the worst > case scenario Bart could up with: the major immediate early (MIE) > region of the Human Cytomegalovirus (HCMV). > > See attached the map of this region. Briefly, the upstream gene has 5 > exons and gives, after splicing, 5 transcript variants and consequently > 5 proteins whose size differs quite a lot. In this situation all exons > can have both behaviour, coding/non-coding, depending on which RNA > variant they're attached to. > Besides, note that the 5th exon can be splited into 2 smaller exons and > an intron, so it's not only the behaviour of the exon which changes but > it's internal structure as well !!!! This is scary. > What we would like is the flexibility of either attaching the RNA > variants to separate gene feature entries or to a unique gene feature > entry. As long as we have this flexibility, the underlying schema is > fine with us. Yes. Each RNA variant can be a separate gene if desired. Note that each ExonFeature can only have one parent_id (i.e., can belong to only one gene). So any common exons would have to be duplicated for each gene. > At the ExonFeature level, you're mentioning that the annotation of the > exons (first/last/middle - coding/non-coding) could move to the > RNAFeatureExon table. Would it replace the duplication of the exons ? > This would not prevent duplication of exons due to being shared by different genes but it would prevent duplication of exons shared by RNAs from the same gene. Chris |
From: <cra...@SN...> - 2002-09-09 17:14:47
|
steve fischer wrote: > > its called InsertNewExternalSequences > This is incorrect; Arnaud was asking about the population of the 'PfamEntry' table, and I don't believe that InsertNewExternalSequences can be used to populate this table. I wrote a plugin to do this, but I don't see it in the tar file I just sent to you guys, which probably just means that it needs updating and moving into the correct directory (Objects/GA_plugins versus GA_plugins) I'll look into this and get back to you. Jonathan -- Jonathan Crabtree Center for Bioinformatics, University of Pennsylvania 1406 Blockley Hall, 423 Guardian Drive Philadelphia, PA 19104-6021 215-573-3115 |
From: steve f. <st...@SN...> - 2002-09-09 16:38:09
|
its called InsertNewExternalSequences contact me if you need more details steve Arnaud Kerhornou wrote: > Hi > > Is there a gusdev GA plugin to populate Pfam entries into gusdev ? > > cheers > Arnaud > > -- > Arnaud Kerhornou > > The Wellcome Trust Sanger Institute > The Pathogen Sequencing Unit > Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK > Work: +44 (0) 1223 494955 > Fax: +44 (0) 1223 494919 > > > > > ------------------------------------------------------- > This sf.net email is sponsored by: OSDN - Tired of that same old > cell phone? Get a new here for FREE! > https://www.inphonic.com/r.asp?r=sourceforge1&refcode1=vs3390 > _______________________________________________ > Gusdev-gusdev mailing list > Gus...@li... > https://lists.sourceforge.net/lists/listinfo/gusdev-gusdev > > |
From: Arnaud K. <ax...@sa...> - 2002-09-09 16:18:56
|
Hi Is there a gusdev GA plugin to populate Pfam entries into gusdev ? cheers Arnaud -- Arnaud Kerhornou The Wellcome Trust Sanger Institute The Pathogen Sequencing Unit Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK Work: +44 (0) 1223 494955 Fax: +44 (0) 1223 494919 |
From: Adrian R. T. <ar...@sa...> - 2002-09-06 13:42:16
|
> I just noticed that the first "stable" version of Tomcat 4.1.x, version > 4.1.10, > was released this morning, so I'm going to use that for my testing and I'll > > probably upgrade most of our machines if it looks stable. Cheers. We're just upgrading too. Adrian |
From: Arnaud K. <ax...@sa...> - 2002-09-04 17:16:41
|
Hi everyone > Genes and alternative-splicing > Topic addressed was defining when transcripts are alternative forms > of the same gene and when they are different genes. The most important > point is that GUS is completely flexible in which RNAs are assigned to a > Gene, Several cases were discussed as to how the different groups would > make assignments. There was consensus in that RNAs should have some > overlap of translated regions in order to be considered as deriving from > the same gene. Overlaps on opposite strands or in intronic regions were > not considered as grounds for being the same gene. For CBIL, it did not > matter whether the proteins derived from alternative splice forms had > different functions. For PSU, if proteins had minimal overlap and had > different functions they were usually annotated as different genes. Both > approaches are allowed by the schema. The schema does however require > that an exon have one parent. Thus, if an exon is shared between two > RNAs that are assigned to different genes, the exon will be represented > twice, once for each gene. Also, if an exon is coding for one RNA and > non-coding for another RNA, it will be duplicated even if the RNAs are > from the same gene in order to specify the difference in coding. This > is because ExonFeatures contains a number of attributes specifying > coding start and stop and reading frame as well as initial exon. For > future, these attributes may be moved to another table such as > RNAFeatureExon. > > I'd like to illustrate the alternative splicing discussion by the worst case scenario Bart could up with: the major immediate early (MIE) region of the Human Cytomegalovirus (HCMV). See attached the map of this region. Briefly, the upstream gene has 5 exons and gives, after splicing, 5 transcript variants and consequently 5 proteins whose size differs quite a lot. In this situation all exons can have both behaviour, coding/non-coding, depending on which RNA variant they're attached to. Besides, note that the 5th exon can be splited into 2 smaller exons and an intron, so it's not only the behaviour of the exon which changes but it's internal structure as well !!!! What we would like is the flexibility of either attaching the RNA variants to separate gene feature entries or to a unique gene feature entry. As long as we have this flexibility, the underlying schema is fine with us. At the ExonFeature level, you're mentioning that the annotation of the exons (first/last/middle - coding/non-coding) could move to the RNAFeatureExon table. Would it replace the duplication of the exons ? cheers Arnaud -- Arnaud Kerhornou The Wellcome Trust Sanger Institute The Pathogen Sequencing Unit Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK Work: +44 (0) 1223 494955 Fax: +44 (0) 1223 494919 |
From: Christiane Hertz-F. <ch...@sa...> - 2002-09-02 15:46:17
|
Hi, further to recent conversations re. phenotype data, here is the kind of Trypanosoma data I was envisaging capturing (This is all very much based around RNAi, there's little "classical genetics" in tryps and due to the diploid nature of tryps, researchers are using RNAi extensively. Also, one of the functional genomics projects is based on RNAi - with the possibility of GeneDB housing data in years to come): 1. type of experiment: RNAi (double stranded) 2. construct: targets coding region 3. developmental stage the experiment is carried out in: (covered by life cycle ontology) 4. effect on expression of other proteins 5. phenotype (first thoughts going on what has already been published): A. (no) growth (cell proliferation) or morphological [e.g. cytoskeleton, kinetoplast mitochondrial DNA)] phenotype in a particular life cycle stage B. (no) differentiation defect C. cell cycle block, cytokinesis block, RNA editing block D. mitochondrial defect, Golgi/endocytic pathway defect, motility (i.e. non-motile, reduced motility) The types of data generated by the RNAi project are graphs, video clips and microscopy images. Arnaud pointed out during discussions today, that the allele tables probably won't be the right place to store this information. Discussions are continuing ... Christiane |
From: <cra...@SN...> - 2002-08-23 15:25:20
|
> 2. big: from what i can see, there isn't a way to access the .ppt > files correctly. The "download" "HEAD" and "view" links brink up a > Netscape window which attempts to display the binary files as text. > Those are the only links offered. (I used cvs admin -kb to alert the > CVS server that these files are binary, but that didn't help) > > Here is a sample page with the big problem: > http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/gusdev/GUS-PSU-meeting-06-2002/Biojava.ppt Steve- Whenever Netscape does this to you, try right-clicking on the link and choosing the option "Save Link As"; you may have to tell it what filename to use, but it should work. I was able to download and view the file this way using Netscape. Jonathan -- Jonathan Crabtree Center for Bioinformatics, University of Pennsylvania 1406 Blockley Hall, 423 Guardian Drive Philadelphia, PA 19104-6021 215-573-3115 |