Re: [Chordless-devel] Java based CHORD system for your enjoyment
Status: Beta
Brought to you by:
zond
From: Martin K. <zon...@gm...> - 2009-07-10 09:51:26
|
Hello Steven and thanks for your preliminary praise! I have made some small benchmarks on Chordless a couple of months ago, but they are far from perfect since I only had a small set (4 machines) of which only two (the two weakest) were of similar hardware configuration. But on the most powerful machine (my workstation, an "Intel(R) Core(TM)2 Quad CPU Q9550 @ 2.83GHz" (from cpuinfo)) with 2 mirrored disks and 4GB RAM, I was able to write 286 data entries per second. I also noticed that the system seems to scale with a constant factor up to those 4 machines, at least - if I manually set the node ids of each machine so that it gets responsibility for a part of the namespace that is proportional to the performance of the machine in the cluster. So 4 of my workstations would (if I had them :) probably be able to insert 286x4 entries per second. Updating and inserting are the costly operations anyway, of course, due to replication of copies and disk operations. Lookups are lightning fast in comparison. (There is a profiler built into the GUI tool that I built, that shows exactly how much time is spent executing different types of commands...) This is, of course, nothing conclusive. I would love to get the chance to either test it on a bigger cluster myself, or assist someone doing it. Also one should note that it all depends on what persistence backend is used. Right now I distribute Chordless with HSQLDB (http://hsqldb.org/) running in a memory only db, but with a logfile on disk to persist data between runs. This is the fastest of the stable variants I have tried, but any JDBC-driver-endowed database can be (within sql-standard limits) plugged in as replacement. It is also a task of no more than a couple of days to create a persistence backend using any other tool (I built one on tokyo cabinet in two days, but it became evident that tokyo cabinet with java interface was slower than hsqldb). And yes, it is a pure key/value system. I have created a set of collection classes (think java.util collections) that scale constantly as well (except the sorted tree, of course, which scales logarithmically) to facilitate more clever relations between classes. I would say that it shines when it comes to ease of use and scalability. Where it falls down I am not really sure (so "being relatively untested" sounds like a reasonable guess :), but I have had no problems so far. Of course, a distributed system with merkle tree-based triplicate redundancy (default setup) costs some performance, so it is by no means as fast as a system without all the bells and whistles. The binary object data should be no problem to put inside Chordless unless they are big blobs. I would say that Chordless is suitable for the same element sizes as most SQL databases, ie you can have small images, but not huge files. For those uses I would recommend mogilefs or hadoop or something suchlike. The scenario we are planning this for is a massively large webshop (that I hope will become reality :), with both facebook-like structures and financial transactions as dataset. I would be happy to help you trying it out and debug problems you find, and I'm sure that your C# experience will prove useful. By the way, can't you compile java code in .net? In that case you could even try to compile it to .net-code and then use it from your familiar C# environment? regards, //Martin On Fri, Jul 10, 2009 at 10:06 AM, Steven Taylor<ta...@gm...> wrote: > Hi Martin, > looks like some good work. > I've had a few infatuations too ranging from Berkeley, distributed, ORM. > Could you give me an idea about how fast your tools is + how far it might > scale? I assume that it is a key/value system. > I've got quite a large data model that needs to be implemented (upwards of > 180 objects/classes/tables). Could you tell me where your distributed hash > impementation shines + where it falls down? I have a mixture of row based > and binary object data. > I've noticed a tendency in this space to cater to simple blogging / facebook > data structures (but on a large scale). > My skillset is C/C#... so I'm not that comfortable with Java and would > likely have a largish learning curve using your offering if I needed to > debug something. > --Steven > On Fri, Jul 10, 2009 at 8:36 AM, Martin Kihlgren <zon...@gm...> > wrote: >> >> Hello NOSQL! >> >> For the longest time I have felt that SQL is not my cup of tea, and >> worked in various ways to avoid using it, and help others to avoid it. >> >> From the humble beginnings of infatuations with different ORMs (and >> the creation of like) and the more ambitious creation of some sort of >> scalable JavaSpaces hybrid, to the much more ambitious Archipelago >> (http://rubyforge.org/projects/archipelago) I now finally got the >> chance to work on such a system at my place of employment and release >> it under the GPL. >> >> Therefore, let me present you with Chordless! (http:// >> chordless.wiki.sourceforge.net/) >> >> It is a rather complete implementation of Chord/DHash (http:// >> pdos.csail.mit.edu/chord/), but with some changes that seemed >> reasonable for our usecase (built-in remote execution inside chord >> nodes, transaction support, scalable data structures, lack of erasure >> codes for data dissemination etc). >> >> Right now it passes all the tests, and is quite feature complete for >> our application (a backend for a jruby on rails app (and yes, an >> ActiveRecord replacement api is under development)), so we are going >> to start porting the middleware to this system within the next couple >> of weeks probably. >> >> To facilitate trying-it-out and giving-it-a-spin of Chordless I have >> made it as easy as i could possibly think of to just get it started: >> http://chordless.wiki.sourceforge.net/Getting+started >> >> I hope someone out there finds it interesting enough to take a look >> at, and provide criticism of any kind :D >> >> regards, >> //Martin Kihlgren >> --~--~---------~--~----~------------~-------~--~----~ >> You received this message because you are subscribed to the Google Groups >> "NOSQL" group. >> To post to this group, send email to nos...@go... >> To unsubscribe from this group, send email to >> nos...@go... >> For more options, visit this group at >> http://groups.google.com/group/nosql-discussion?hl=en >> -~----------~----~----~----~------~----~------~--~--- >> > > |