From: Don G. <gil...@bi...> - 2005-07-21 22:41:18
|
Nine eukaryote proteomes have been aligned to the D. pulex genome, with help from Daphnia Genomics Consortium, TeraGrid and Generic Model Organism Database projects. A focus of this is to assess feasibility of using shared cyberinfrastructure like TeraGrid for the genome database community. So far the stumbling blocks seem surmountable, and the question will be to see if we can make this easy and compelling enough so that genomicists from such as Daphnia Genomics Consortium can pick a new genome data set to run their BLAST (or other) analysis using TeraGrid. My question for the GMOD group: is this the sort of shared-need that new and old organism genome databases would like tools and support for? E.g., a Teragrid Science Gateway project http://www.teragrid.org/programs/sci_gateways/ could offer scheduled times to run large cross-species comparisons, grid-aware bioinformatics tools, and provide access to shared genome data sets for analyses. ..................... The D.pulex genome is a 4x preliminary assembly available to Daphnia Genomics Consortium. The nine proteomes, with 217,006 total protein sequences, are drawn from organism genome databases, Ensembl and NCBI. Alignment is done using NCBI tBLASTn, with a Grid-aware version of NCBI software developed at IU, and run on the TeraGrid. The TeraGrid run for this took 12 hours using 64 processors. Blast output is converted to scaffold locations, and displayed for browsing and seaching in GMOD GBrowse genome maps. Find sample map snapshots at http://wfleabase.org/maps/dpgenomesample/ The full genome annotation map is available to DGC members, and to everyone at public release of this genome. - Don Gilbert -- d.gilbert--bioinformatics--indiana-u--bloomington-in-47405 -- gil...@in...--http://marmot.bio.indiana.edu/ |