From: Roderic P. <r....@bi...> - 2011-04-14 17:05:41
|
Dear Rutger, Not sure why CouchDB choked, I have databases with millions of JSON documents and it works OK. That said, the JSON you sent was pretty ugly (namespaces, brrr!) It seems to me that much of what we do is pass around objects, and JSON (without namespaces) is a light-weight way to do this that we can also operate on very simply. I lots patience with triple stores, partly because there's the overhead of having RDF vocabularies, and the interesting queries are not supported (e.g., spatial queries). I get the idea behind RDF, and was once an enthusiast, but I think there are too many practical issues that make it less than useful. Distributed editing makes sense to me in that for most things there's always a bigger fish. Does anybody think they can do a better job of providing online management of bibliographic metadata than Zotero or Mendeley? If not, why bother recreating that, just enable people to use those tools and harvest the edits. So, I'd want to focus on the one thing TreeBASE has that nobody else has, namely the trees. Although, having said that, if Phylota had a decent interface it would be awesome. Regards Rod On 14 Apr 2011, at 17:31, Rutger Vos wrote: > I tried loading the JSON files I shared with you the other day in > couchdb and it turned out I would have to recompile it with more > memory or else it chokes. So the idea is that it's just the metadata > that goes into a database, and you like that to be couchdb as opposed > to a triple store, right? And you like the idea of distributed > editing, so does that mean you also like the idea of distributed > searching, along the plug-in idea? With the various projects you've > done to annotate/correct/taxon map treebase data, it would be great if > those could be plugged into a common, easy-to-use front end. > > On Thu, Apr 14, 2011 at 5:04 PM, Roderic Page <r....@bi...> wrote: >> So I guess I'd do the following: >> >> 1. Separate data entry from data access. SQL may have a place for data entry, but that's it. And MySQL is fine, really. >> >> 2. The data access end is a document database like CouchDB which stores metadata (and trees) as JSON >> >> 3. Simple query API that more or less wraps CouchDB queries, search by taxon, identifier, geography, or full text. >> >> 4. Store data on disk in original format, as well as derived formats as Rutger suggests. Being able to grab dumps in various formats is handy, especially if the data can be reliably obtained. >> >> 5. Have a web interface that's simple, easy to use, supports search without asking user whether something is a number or not, use SVG for trees, enable users to log in using Facebook/Twitter/Mendeley >> >> 6. Devolve as much editing as possible to other places, e.g. Mendeley for bibliographic stuff >> >> 7. Never, ever mention RDF. Bonus points for not mentioning XML. >> >> My sense as an outside observer is that much of the current iteration of TreeBASE has been driven by technology (Postgresql, Tomcat, RDF, Java, XML), not usability. I understand the rationale for the choices (I think), but at the end of the date TreeBASE should be about the trees. It's not about publications, it's not about sequences, it's not really about data (OK, a little bit about data), it's about trees. I should be able to find my trees, find trees from a paper, find trees for a taxon, find trees from a given part of the world, find trees that use a given sequence, find trees that look like my trees. >> >> Read Michael Wolfe's answer to the question "Why is Dropbox more popular than other programs with similar functionality?" and you'll see where I'm coming from >> >> http://www.quora.com/Dropbox/Why-is-Dropbox-more-popular-than-other-programs-with-similar-functionality >> >> Regards >> >> Rod >> >> --------------------------------------------------------- >> Roderic Page >> Professor of Taxonomy >> Institute of Biodiversity, Animal Health and Comparative Medicine >> College of Medical, Veterinary and Life Sciences >> Graham Kerr Building >> University of Glasgow >> Glasgow G12 8QQ, UK >> >> Email: r....@bi... >> Tel: +44 141 330 4778 >> Fax: +44 141 330 2792 >> AIM: rod...@ai... >> Facebook: http://www.facebook.com/profile.php?id=1112517192 >> Twitter: http://twitter.com/rdmpage >> Blog: http://iphylo.blogspot.com >> Home page: http://taxonomy.zoology.gla.ac.uk/rod/rod.html >> >> >> >> >> >> >> >> >> ------------------------------------------------------------------------------ >> Benefiting from Server Virtualization: Beyond Initial Workload >> Consolidation -- Increasing the use of server virtualization is a top >> priority.Virtualization can reduce costs, simplify management, and improve >> application availability and disaster protection. Learn more about boosting >> the value of server virtualization. http://p.sf.net/sfu/vmware-sfdev2dev >> _______________________________________________ >> Treebase-devel mailing list >> Tre...@li... >> https://lists.sourceforge.net/lists/listinfo/treebase-devel >> > > > > -- > Dr. Rutger A. Vos > School of Biological Sciences > Philip Lyle Building, Level 4 > University of Reading > Reading > RG6 6BX > United Kingdom > Tel: +44 (0) 118 378 7535 > http://www.nexml.org > http://rutgervos.blogspot.com > --------------------------------------------------------- Roderic Page Professor of Taxonomy Institute of Biodiversity, Animal Health and Comparative Medicine College of Medical, Veterinary and Life Sciences Graham Kerr Building University of Glasgow Glasgow G12 8QQ, UK Email: r....@bi... Tel: +44 141 330 4778 Fax: +44 141 330 2792 AIM: rod...@ai... Facebook: http://www.facebook.com/profile.php?id=1112517192 Twitter: http://twitter.com/rdmpage Blog: http://iphylo.blogspot.com Home page: http://taxonomy.zoology.gla.ac.uk/rod/rod.html |