From: Richard R. <rr...@fi...> - 2011-04-14 14:01:47
|
Hi folks, Just joined the list, as I am interested in using TB to develop collaborative methods for synthesizing plant phylogeny, e.g., by grafting clades together and other kinds of agglomeration. So, I am particularly interested in the API and harvesting, especially with respect to taxonomic names and classifications, and GenBank identifiers of sequences. Speaking of which, how would I go about harvesting the GenBank numbers for a given study and associating them with their alignments? Its that possible with the current API? I like Rutger's plug-in ideas. In general, it's easier for someone like me to provide a simple web service, rather than contribute directly to TB development. -Rick On Thu, Apr 14, 2011 at 7:24 AM, Rutger Vos <R....@re...> wrote: > I've been thinking about a redesign lately, and here's what I would do: > > - make sure we can export all the metadata from TreeBASE in (nexml) > files, i.e. Laurel's GSoC project > > - get all the files for all the studies and create a simple folder > structure with those files, more or less along the lines of the > phylows urls (e.g phylows/tree/TB2/nexml/T1312), in all file formats > we can think of (json, nexus, phylip, newick, phyloxml, fasta, static > html, etc...) > > - create some mod_rewrite rules to map our urls onto the folder > structure (so going from phylows/tree/TB2:T1312?format=nexml to the > actual file). Performance will obviously be much, much better. > > - also allow harvesting of those files using rsync and ftp. Now we > have data dumps. > > - create a simple plug in architecture where, given a query string, a > remote web service returns a list of hits which it somehow generates > from its local, harvested data dump > > Here's a use case: Laurel wants to implement BLASTing into TreeBASE. > So she periodically harvests everything in phylows/matrix/TB2/fasta > with an rsync cron job. She runs formatdb on the fasta sequences, > creating a standalone BLAST. She then writes a simple cgi script that > accepts a target string (e.g. > myblast.cgi?tb2.blast=acgctcgcatcgcatcgactacgac) and returns a list of > phylows matrix urls from the matching results. > > On the TreeBASE side, we simply add a search widget that delegates the > query to Laurel's service and integrates it into our interfaces > (graphical, web services). > > With an architecture like that it would be so much easier for anyone > to add functionality in whatever programming language they like > without having to deal with a massive database schema. Of course any > of those remote services might have its own little database, but it'd > be more along the lines of a three-table SQLite database to store the > ITIS taxonomy structure such that we can find all TreeBASE taxa > subtended by a given ITIS higher taxon ID (for example). > > We'd have to implement some core plug ins ourselves, notably one that > extracts the metadata (using RDFA2RDFXML) and sticks that in a triple > store so we can search on, say, author names, journals, etc. I think > that's scalable because it's only a few dozen triples for each study. > > The idea is a little bit inspired by DAS, which seems to work quite > well: http://www.biodas.org/wiki/Main_Page > > (I'm leaving out the submission part as an exercise for the reader. > Presumably there would have to be a restricted area where properly > formatted files are uploaded and made available to reviewers.) > > Rutger > > p.s. I've done some harvesting a few weeks back, but unless that's > created queries that are still running I'm innocent. > > On Thu, Apr 14, 2011 at 12:28 PM, Roderic Page <r....@bi...> > wrote: > > TreeBASE performance has nothing to do with me folks, I pretty much gave > up trying to download data from it a few weeks back. Someone really, really > needs to rethink the way TreeBASE works, because it's virtually unusable. > > > > Regards > > > > Rod > > > > On 14 Apr 2011, at 04:20, William Piel wrote: > > > >> > >> On Apr 13, 2011, at 9:36 PM, Hilmar Lapp wrote: > >> > >>> The trees in S11267 seem to silently fail to render in PhyloWidget. > >>> All I get is a single dot. At least try the first three: > >>> > >>> http://www.treebase.org/treebase-web/search/study/trees.html?id=11267 > >>> > >>> Is this a temporary glitch, a problem with the reconstructed file, or > >>> something else? Should I best file this as a bug in the bug tracker? > >>> > >>> -hilmar > >>> > >> > >> Thanks for the alert. This is actually a bug in PhyloWidget -- if the > word "tree" appears in the title of the tree block, PhyloWidget confuses > this word with the TREE command, and so goofs up the parsing. > >> > >> I have removed the word "tree" from the tree block, so it works now. > >> > >> However, you might need to refresh the PhyloWidget window after loading > to get the tree to pull through -- TreeBASE feels quite slow lately; our API > must be getting hit by someone (Rod?). > >> > >> bp > >> > >> > >> > >> > ------------------------------------------------------------------------------ > >> Benefiting from Server Virtualization: Beyond Initial Workload > >> Consolidation -- Increasing the use of server virtualization is a top > >> priority.Virtualization can reduce costs, simplify management, and > improve > >> application availability and disaster protection. Learn more about > boosting > >> the value of server virtualization. > http://p.sf.net/sfu/vmware-sfdev2dev > >> _______________________________________________ > >> Treebase-devel mailing list > >> Tre...@li... > >> https://lists.sourceforge.net/lists/listinfo/treebase-devel > >> > > > > --------------------------------------------------------- > > Roderic Page > > Professor of Taxonomy > > Institute of Biodiversity, Animal Health and Comparative Medicine > > College of Medical, Veterinary and Life Sciences > > Graham Kerr Building > > University of Glasgow > > Glasgow G12 8QQ, UK > > > > Email: r....@bi... > > Tel: +44 141 330 4778 > > Fax: +44 141 330 2792 > > AIM: rod...@ai... > > Facebook: http://www.facebook.com/profile.php?id=1112517192 > > Twitter: http://twitter.com/rdmpage > > Blog: http://iphylo.blogspot.com > > Home page: http://taxonomy.zoology.gla.ac.uk/rod/rod.html > > > > > > > > > > > > > > > > > > > ------------------------------------------------------------------------------ > > Benefiting from Server Virtualization: Beyond Initial Workload > > Consolidation -- Increasing the use of server virtualization is a top > > priority.Virtualization can reduce costs, simplify management, and > improve > > application availability and disaster protection. Learn more about > boosting > > the value of server virtualization. http://p.sf.net/sfu/vmware-sfdev2dev > > _______________________________________________ > > Treebase-devel mailing list > > Tre...@li... > > https://lists.sourceforge.net/lists/listinfo/treebase-devel > > > > > > -- > Dr. Rutger A. Vos > School of Biological Sciences > Philip Lyle Building, Level 4 > University of Reading > Reading > RG6 6BX > United Kingdom > Tel: +44 (0) 118 378 7535 > http://www.nexml.org > http://rutgervos.blogspot.com > > > ------------------------------------------------------------------------------ > Benefiting from Server Virtualization: Beyond Initial Workload > Consolidation -- Increasing the use of server virtualization is a top > priority.Virtualization can reduce costs, simplify management, and improve > application availability and disaster protection. Learn more about boosting > the value of server virtualization. http://p.sf.net/sfu/vmware-sfdev2dev > _______________________________________________ > Treebase-devel mailing list > Tre...@li... > https://lists.sourceforge.net/lists/listinfo/treebase-devel > |