From: Roderic P. <r....@bi...> - 2011-04-15 08:50:00
|
Dear Bill, Yes, I was being a little glib, and I'm all for the data being available. But I wonder whether, given the rate at which new sequence data is being acquired, many people will redo analyses using more data, rather than reanalyse a older data set. My comment about trees is that, at the end of the day, it's the one thing that makes TreeBASE unique. From my no doubt biased perspective, it could/should be the place I go to find out "what do we know about the phylogeny of group x?" Why not just make a CouchDB database right now? I tried, believe me I tried, but a big chunk of TreeBASE didn't make it down the wire. For large studies the Nexml generation simply times out, so I gave up. I guess this is what Rutger's suggestion of having file dumps would address. If every study had a Nexml file sitting on the server then I could just fetch those, rather than hammer the database and get frustrated when it times out. Given that I have the attention span of a gnat, if something doesn't work I tend to drop it for a while and go on to something else. If I could reliably get all TreeBASE studies in Nexml, I'd make the CouchDB version in a flash. Regards Rod On 14 Apr 2011, at 21:00, William Piel wrote: > > > On Apr 14, 2011, at 12:04 PM, Roderic Page wrote: >> It's not about publications, it's not about sequences, it's not really about data (OK, a little bit about data), it's about trees > > From where I sit, alignments are an important resource for the community. Nobody emails me asked for a tree that is missing from TreeBASE, but I'm always being asked for an alignment that should be in TreeBASE but is not.Typically it is because the author started, but never finished, a submission. Earlier this year I had a case where an author wanted her alignments embargoed for a year post-publication. After several people independently emailed me to request access, I contacted the journal, they convened the board, and they passed a resolution stating that all data must be released immediately. So these are not without value. Alignments are collections of hypotheses of homology (NCHAR of them per alignment!) that are often difficult to rebuild from scratch -- trees are merely blended summary diagrams of these hypotheses. Plus, retyping a morphological dataset after OCR'ing a PDF is an enormous pain. > > On Apr 14, 2011, at 12:04 PM, Roderic Page wrote: >> So I guess I'd do the following: > > Is there anything to stop anyone from doing exactly this? > > And you could have your CouchDB updated periodically by doing a cron on TreeBASE's OAI-PMH to get the IDs of all new or modified studies (e.g., since April 12th, GMT: http://treebase.org/treebase-web/top/oai?verb=ListIdentifiers&metadataPrefix=oai_dc&from=2011-04-12T00:00:00Z), and then pull down the NeXML for just the trees, convert to JSON, populate the CouchDB, etc. It should be relatively easy to maintain a CouchDB mirror. > > bp > > > PS - Hmm... Rod, do you know you do have a way of loving-and-then-hating things? :-) > > On Apr 14, 2011, at 12:04 PM, Roderic Page wrote: >> 7. Never, ever mention RDF. > > On Apr 14, 2011, at 1:05 PM, Roderic Page wrote: >> I ... was once an enthusiast [of RDF] > > On Apr 14, 2011, at 12:04 PM, Roderic Page wrote: >> Bonus points for not mentioning XML. > > On May 20, 2004, at 7:37 PM, Roderic D. M. Page wrote: >> [I think TreeBASE should] store data (say, the character states for a taxon) as an XML formatted BLOB. > > On Jan 15, 2006, at 4:15 PM, Roderic Page wrote: >> once one of the major providers adopts LSIDs (my money is on uBio), whatever they adopt will drive standards > > On Apr 1, 2009, at 9:20 AM, Roderic D. M. Page wrote: >> I think that [LSIDs have] been the Achilles heel of biodiversity informatics. > > [etc..] > > > > ------------------------------------------------------------------------------ > Benefiting from Server Virtualization: Beyond Initial Workload > Consolidation -- Increasing the use of server virtualization is a top > priority.Virtualization can reduce costs, simplify management, and improve > application availability and disaster protection. Learn more about boosting > the value of server virtualization. http://p.sf.net/sfu/vmware-sfdev2dev_______________________________________________ > Treebase-devel mailing list > Tre...@li... > https://lists.sourceforge.net/lists/listinfo/treebase-devel --------------------------------------------------------- Roderic Page Professor of Taxonomy Institute of Biodiversity, Animal Health and Comparative Medicine College of Medical, Veterinary and Life Sciences Graham Kerr Building University of Glasgow Glasgow G12 8QQ, UK Email: r....@bi... Tel: +44 141 330 4778 Fax: +44 141 330 2792 AIM: rod...@ai... Facebook: http://www.facebook.com/profile.php?id=1112517192 Twitter: http://twitter.com/rdmpage Blog: http://iphylo.blogspot.com Home page: http://taxonomy.zoology.gla.ac.uk/rod/rod.html |