From: Hilmar L. <hl...@ne...> - 2011-04-14 14:31:53
|
Could we not, similar to what NCBI includes in their dumps, create simple downloadable mapping tables, for example one with (at least) two columns, one being Genbank accession, and the other being the TB matrix ID (and to make it more useful, columns for TB study, TB taxon, and citation could be added). If we create a script that extracts that out of the database once a week, that might be pretty useful to people. NCBI produces a variety of these, to map accession to taxon, gene ID, etc, and people use them a lot. -hilmar On Apr 14, 2011, at 9:39 AM, Richard Ree wrote: > Hi folks, > > Just joined the list, as I am interested in using TB to develop > collaborative methods for synthesizing plant phylogeny, e.g., by > grafting clades together and other kinds of agglomeration. So, I am > particularly interested in the API and harvesting, especially with > respect to taxonomic names and classifications, and GenBank > identifiers of sequences. > > Speaking of which, how would I go about harvesting the GenBank > numbers for a given study and associating them with their > alignments? Its that possible with the current API? > > I like Rutger's plug-in ideas. In general, it's easier for someone > like me to provide a simple web service, rather than contribute > directly to TB development. > > -Rick > > > On Thu, Apr 14, 2011 at 7:24 AM, Rutger Vos <R....@re...> > wrote: > I've been thinking about a redesign lately, and here's what I would > do: > > - make sure we can export all the metadata from TreeBASE in (nexml) > files, i.e. Laurel's GSoC project > > - get all the files for all the studies and create a simple folder > structure with those files, more or less along the lines of the > phylows urls (e.g phylows/tree/TB2/nexml/T1312), in all file formats > we can think of (json, nexus, phylip, newick, phyloxml, fasta, static > html, etc...) > > - create some mod_rewrite rules to map our urls onto the folder > structure (so going from phylows/tree/TB2:T1312?format=nexml to the > actual file). Performance will obviously be much, much better. > > - also allow harvesting of those files using rsync and ftp. Now we > have data dumps. > > - create a simple plug in architecture where, given a query string, a > remote web service returns a list of hits which it somehow generates > from its local, harvested data dump > > Here's a use case: Laurel wants to implement BLASTing into TreeBASE. > So she periodically harvests everything in phylows/matrix/TB2/fasta > with an rsync cron job. She runs formatdb on the fasta sequences, > creating a standalone BLAST. She then writes a simple cgi script that > accepts a target string (e.g. > myblast.cgi?tb2.blast=acgctcgcatcgcatcgactacgac) and returns a list of > phylows matrix urls from the matching results. > > On the TreeBASE side, we simply add a search widget that delegates the > query to Laurel's service and integrates it into our interfaces > (graphical, web services). > > With an architecture like that it would be so much easier for anyone > to add functionality in whatever programming language they like > without having to deal with a massive database schema. Of course any > of those remote services might have its own little database, but it'd > be more along the lines of a three-table SQLite database to store the > ITIS taxonomy structure such that we can find all TreeBASE taxa > subtended by a given ITIS higher taxon ID (for example). > > We'd have to implement some core plug ins ourselves, notably one that > extracts the metadata (using RDFA2RDFXML) and sticks that in a triple > store so we can search on, say, author names, journals, etc. I think > that's scalable because it's only a few dozen triples for each study. > > The idea is a little bit inspired by DAS, which seems to work quite > well: http://www.biodas.org/wiki/Main_Page > > (I'm leaving out the submission part as an exercise for the reader. > Presumably there would have to be a restricted area where properly > formatted files are uploaded and made available to reviewers.) > > Rutger > > p.s. I've done some harvesting a few weeks back, but unless that's > created queries that are still running I'm innocent. > > On Thu, Apr 14, 2011 at 12:28 PM, Roderic Page > <r....@bi...> wrote: > > TreeBASE performance has nothing to do with me folks, I pretty > much gave up trying to download data from it a few weeks back. > Someone really, really needs to rethink the way TreeBASE works, > because it's virtually unusable. > > > > Regards > > > > Rod > > > > On 14 Apr 2011, at 04:20, William Piel wrote: > > > >> > >> On Apr 13, 2011, at 9:36 PM, Hilmar Lapp wrote: > >> > >>> The trees in S11267 seem to silently fail to render in > PhyloWidget. > >>> All I get is a single dot. At least try the first three: > >>> > >>> http://www.treebase.org/treebase-web/search/study/trees.html?id=11267 > >>> > >>> Is this a temporary glitch, a problem with the reconstructed > file, or > >>> something else? Should I best file this as a bug in the bug > tracker? > >>> > >>> -hilmar > >>> > >> > >> Thanks for the alert. This is actually a bug in PhyloWidget -- if > the word "tree" appears in the title of the tree block, PhyloWidget > confuses this word with the TREE command, and so goofs up the parsing. > >> > >> I have removed the word "tree" from the tree block, so it works > now. > >> > >> However, you might need to refresh the PhyloWidget window after > loading to get the tree to pull through -- TreeBASE feels quite slow > lately; our API must be getting hit by someone (Rod?). > >> > >> bp > >> > >> > >> > >> > ------------------------------------------------------------------------------ > >> Benefiting from Server Virtualization: Beyond Initial Workload > >> Consolidation -- Increasing the use of server virtualization is a > top > >> priority.Virtualization can reduce costs, simplify management, > and improve > >> application availability and disaster protection. Learn more > about boosting > >> the value of server virtualization. http://p.sf.net/sfu/vmware-sfdev2dev > >> _______________________________________________ > >> Treebase-devel mailing list > >> Tre...@li... > >> https://lists.sourceforge.net/lists/listinfo/treebase-devel > >> > > > > --------------------------------------------------------- > > Roderic Page > > Professor of Taxonomy > > Institute of Biodiversity, Animal Health and Comparative Medicine > > College of Medical, Veterinary and Life Sciences > > Graham Kerr Building > > University of Glasgow > > Glasgow G12 8QQ, UK > > > > Email: r....@bi... > > Tel: +44 141 330 4778 > > Fax: +44 141 330 2792 > > AIM: rod...@ai... > > Facebook: http://www.facebook.com/profile.php?id=1112517192 > > Twitter: http://twitter.com/rdmpage > > Blog: http://iphylo.blogspot.com > > Home page: http://taxonomy.zoology.gla.ac.uk/rod/rod.html > > > > > > > > > > > > > > > > > > > ------------------------------------------------------------------------------ > > Benefiting from Server Virtualization: Beyond Initial Workload > > Consolidation -- Increasing the use of server virtualization is a > top > > priority.Virtualization can reduce costs, simplify management, and > improve > > application availability and disaster protection. Learn more about > boosting > > the value of server virtualization. http://p.sf.net/sfu/vmware-sfdev2dev > > _______________________________________________ > > Treebase-devel mailing list > > Tre...@li... > > https://lists.sourceforge.net/lists/listinfo/treebase-devel > > > > > > -- > Dr. Rutger A. Vos > School of Biological Sciences > Philip Lyle Building, Level 4 > University of Reading > Reading > RG6 6BX > United Kingdom > Tel: +44 (0) 118 378 7535 > http://www.nexml.org > http://rutgervos.blogspot.com > > ------------------------------------------------------------------------------ > Benefiting from Server Virtualization: Beyond Initial Workload > Consolidation -- Increasing the use of server virtualization is a top > priority.Virtualization can reduce costs, simplify management, and > improve > application availability and disaster protection. Learn more about > boosting > the value of server virtualization. http://p.sf.net/sfu/vmware-sfdev2dev > _______________________________________________ > Treebase-devel mailing list > Tre...@li... > https://lists.sourceforge.net/lists/listinfo/treebase-devel > > ------------------------------------------------------------------------------ > Benefiting from Server Virtualization: Beyond Initial Workload > Consolidation -- Increasing the use of server virtualization is a top > priority.Virtualization can reduce costs, simplify management, and > improve > application availability and disaster protection. Learn more about > boosting > the value of server virtualization. http://p.sf.net/sfu/vmware-sfdev2dev_______________________________________________ > Treebase-devel mailing list > Tre...@li... > https://lists.sourceforge.net/lists/listinfo/treebase-devel -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- informatics.nescent.org : =========================================================== |