From: Arnaud K. <ax...@sa...> - 2002-08-05 15:53:17
|
Jonathan First, I agree on a 2-week timescale. We're going to use a generic parser by using bioperl and populate an empty GUSdev instance. The first stage will be to generate bioperl objects from any format (embl, genbank that bioperl recognises), then gus objects. We have already done in the past, for pombe data, a GUS script to populate data from XML files, so it will just be made generic by changing the parsing stage. The objects we're planning to generate are: * Sequence: => ExternalNASequence objects * Features: => GeneFeature objects, => ExonFeature, => RNAFeature, => TranslatedAAFeature, => TranslatedAASequence, => SignalPeptideFeature, => Re. Transmembrane and Pfam domains, should we generate PredictedAAFeature objects ? * Central Dogma: => Gene objects, => RNA, => Protein, * GO associations: => ProteinGOProcess, ProteinGOFunction, ProteinGOComponent objects. Do you have any comments on these objects ? What does the website expect to be populated re. Pfam and transmembrane domains ? cheers Arnaud Jonathan Crabtree wrote: >On Sun, 4 Aug 2002, Paul Mooney wrote: > >>hi all, >> >>I have finally got back on to a computer - no more cold turkey... >> >>Arnaud has told me the java layer makes some calls to perl which try to query >>tables that do not exist in our schema because it is a couple of months older. >>Rather than try to patch the schema and to avoid any other problems like this >>it seems a good idea to get a point-in-time snapshot of GUSdev. >> ># > >>1st we need the schema - we are develping the EMBL parser with what we have >>running now. The java layer will be needed once we are happy we have loaded a >>good range of data so we can run the query servlet stuff. We will want to >>modify this (hack :) so we can point to GeneDB gene pages in the prototype. >> >>Arnaud might be able to give time scales on when the data loading is in a good >>enough state for a web interface but I imagine we will need the web stuff >>within 2 weeks. Is this do-able? >> >>Does this help at all? >> > >Paul- > >Yes, thanks. I think 2 weeks is doable. So basically the plan is to >start with an empty snapshot of the latest GUSdev schema and use the EMBL >parser to populate it? It might also help if I could have a quick look at >the EMBL parser, or at least just find out which tables it's loading. >Anyway, I should have an updated set of schema creation scripts ready in >the next day or two. > >Jonathan > > |
From: Pjm <pj...@sa...> - 2002-08-12 14:08:16
|
Jonathan, thanks for putting the file gusdev-sql-jul-25-2002.tar.gz on the download site. Since we have to put bootstrap/dictinary data into the rows of our schema I was wondering how we could share some of these things. For example, the taxon table could easily be shared (does this relate to NCBI Ids at all?). Another reason for doing this is so when we (and others..) install GUS we can run our parsers straight away and populate the schema with our favourite data. Could the bootstrap data be added to the install script please? Paul. Jonathan Crabtree wrote: > On Sun, 4 Aug 2002, Paul Mooney wrote: > >>hi all, >> >>I have finally got back on to a computer - no more cold turkey... >> >>Arnaud has told me the java layer makes some calls to perl which try to query >>tables that do not exist in our schema because it is a couple of months older. >>Rather than try to patch the schema and to avoid any other problems like this >>it seems a good idea to get a point-in-time snapshot of GUSdev. > > # > >>1st we need the schema - we are develping the EMBL parser with what we have >>running now. The java layer will be needed once we are happy we have loaded a >>good range of data so we can run the query servlet stuff. We will want to >>modify this (hack :) so we can point to GeneDB gene pages in the prototype. >> >>Arnaud might be able to give time scales on when the data loading is in a good >>enough state for a web interface but I imagine we will need the web stuff >>within 2 weeks. Is this do-able? >> >>Does this help at all? >> > > > Paul- > > Yes, thanks. I think 2 weeks is doable. So basically the plan is to > start with an empty snapshot of the latest GUSdev schema and use the EMBL > parser to populate it? It might also help if I could have a quick look at > the EMBL parser, or at least just find out which tables it's loading. > Anyway, I should have an updated set of schema creation scripts ready in > the next day or two. > > Jonathan > > |
From: <cra...@SN...> - 2002-08-12 14:29:27
|
Pjm wrote: > > Jonathan, > > thanks for putting the file gusdev-sql-jul-25-2002.tar.gz on the download site. > Since we have to put bootstrap/dictinary data into the rows of our schema I was > wondering how we could share some of these things. For example, the taxon table > could easily be shared (does this relate to NCBI Ids at all?). > > Another reason for doing this is so when we (and others..) install GUS we can > run our parsers straight away and populate the schema with our favourite data. > Could the bootstrap data be added to the install script please? > > Paul. > Hi Paul- Angel discovered some problems with the CREATE VIEW statements in that file. I'm just ironing out the last of the problems with it and will have an updated version on the site in a couple of hours. The new version has two files with actual data: 1) the 4 or 5 "bootstrap" rows needed for those common tables, and 2) the contents of the TableInfo table. I had the Taxon table on my list of things to add, and I think it would be straightforward to just dump that as well. I *believe* that the current gusdev Taxon table is essentially a copy of a similar table from GSDB. However, we have a new set of taxon tables for GUS 3.0 ("Taxon3" in GUSdev) that Debbie populated using NCBI's flat files. Jonathan |