[Kai-devel] Performance of Kai
Kai is a distributed key-value datastore
Status: Beta
Brought to you by:
takemaru
From: Takeru I. <tak...@gm...> - 2008-08-09 10:13:33
|
Hi, I conducted detailed experiments to reveals the performance of Kai. I'm happy to let you know the results are pretty good. * Experimental environments The experiments were conducted in the following environments. Kai nodes: CPU Xeon 2.13 GHz x4 Memory 8GB 1000Base-T (RTT 90 us) Debian Etch OTP R12B-3 Clients: Perl Cache::Memcached Kai: branches/takemaru_connection_pooling (r66) Conditions: Data size: 1 KB # of operations: around hundreds of thousands # of nodes: shown in each experiment # of clients: shown in each experiment * As a stand alone server In this experiment, Kai ran as a stand alone server like memcached. Conditions: # of nodes: 1 # of clients: 12-14 Configurations (kai.config): n: 1 (# of replication) r: 1 w: 1 max_connections: 90 memcache_max_connections: 30 Results: set: rate: 7136 qps latency: median: 1.55 ms 99%: 2.76 ms CPU: 370 % get: rate: 10193 qps latency: median: 1.21 ms 99%: 3.71 ms CPU: 355 % I was surprised that Kai showed over 10,000 qps for get. The latency was also good. The value of 99 percentile as well as median was less than 4 ms. Kai is a kind of persistent memory storage, not a transient cache like memcached. However, it's performance is not much worse than memcached. * As a cluster system Next, I conducted larger scale experiments with the Kai cluster, which consists of five nodes. Conditions: # of nodes: 5 # of clients: 30-40 Configurations (kai.config): n: 2 (# of replication) r: 1 w: 2 max_connections: 180 memcache_max_connections: 60 I presented the experimental results in the following paragraph. The value "x (y/z)" means "total (set/get"). For example, rate of "12155 (2176/9979)" shows the total rate is 12155, that of set is 2176, and that of get is 9979. Results: rate: 12155 (2176/9979) qps latency: median: (2.06/2.01) ms 99%: (17.1/14.5) ms CPU: 175 % Kai executes quite complicated tasks; it calculates consistent hashing, routes requests to the appropriate nodes, replicates data, and manages the version. However, the performance is still good. The nodes have four cores and the limitation of CPU usage is 400 %, but the experimental results were less than 200%. Unfortunately, the current implementation has inefficient points, such as internal communication. I guess that the performance can be more than double when the issue will be resolved. -- Takeru INOUE <tak...@gm...> |