kai Wiki

Kai is a distributed key-value datastore

Status: Beta

Brought to you by: takemaru

Getting Started

Authors:

Building Kai
Running Kai as a Stand-alone Server

This article is described based on Kai version 0.4 and Erlang R13B.

Building Kai

To build Kai, Erlang must be installed at your platform. If you're not familiar with Erlang, see [Deployment].

Kai is hosted at sourceforge.net, and the archive file is found at a download page.

Extract the archive file and do "make", as follows:

% tar zxvf kai-0.4.0.tar.gz
% cd kai-0.4.0/
% make    # gmake for FreeBSD

Currently, no configure script is used in Kai. Also, "make install" is not defined.

You can run the self-diagnostic tests by "make test", if Common Test is correctly configured in your Erlang environment. If you're not sure, skip this.

Running Kai as a Stand-alone Server

For practice, we begin with a stand-alone server, not clustered system. The stand-alone Kai is not attractive, because it has no reliability (no replication). The behavior is, however, quite similar with that of well-known memcached, and so this can be a good starting point.

Before running Kai, stop a memcached if runnnig, because Kai uses the same port number (11211) by default.

Set parameters "n", "r", and "w" to 1 in the configuration file "kai.config"; other parameters remain in default (Details for the parameters are described in [Configuration]).

kai.config

[{kai, [
    %{logfile, "kai.log"},
    %{hostname, "localhost"},
    {rpc_port, 11011},
    {rpc_max_processes, 30},
    {memcache_port, 11211},
    {memcache_max_processes, 10},
    {max_connections, 32},
    {n, 1},
    {r, 1},
    {w, 1},
    {number_of_buckets, 1024},
    {number_of_virtual_nodes, 128},
    {store, ets},
    %{dets_dir, "/path/to/dir"},
    {number_of_tables, 256}
]}].

Run a Kai node as a stand-alone server, as follows:

% erl -pa ebin -config kai
> application:start(kai).

We show brief explanation on the arguments.

Description of arguments for erl command.

Argument	Description
-pa Dir	Adds the specified directories to the path to object codes.
-config Config	Specifies the name of a configuration file, Config.config.

The configurations will be described in detail later.

Here, let's try writing and reading a value (bar) with a key (foo) against the node. We show an example code in Perl (before running the following script, install Cache::Memcached module by CPAN). Of course, you can choose any language providing a memcache library to access Kai nodes.

kai_standalone.pl

use Test::More 'no_plan';
use Cache::Memcached;

my $mem = Cache::Memcached->new({
    servers => \['127.0.0.1:11211'],
});

ok $mem->set(foo => 'bar', 0);    # Expiration time must be zero
is $mem->get('foo'), 'bar';

ok $mem->delete('foo');

The following output is given if succeeded.

% perl kai_standalone.pl
ok 1
ok 2
ok 3
~~~~~~~

Exit from the Erlang shell for the next section.

q().

Using Internal RPCs
===

Before going to create a clustered system, we explain RPCs (remote procedure calls) in Kai. While Kai nodes provide memcache API for data operations, they also provide the internal RPCs for administrative purposes. A target node, an IP address and a port number, is passed as the first argument to the RPCs. In Erlang, an IP address is not a dotted decimal, is separated by commas, like {127,0,0,1}. The port number is defined as rpc_port in the configuration file.

For practice, call function "kai_rpc:node_list/1", which simply returns a node list.

% erl -pa ebin -config kai

application:start(kai).
kai_rpc:node_list({{127,0,0,1}, 11011}).
{node_list,[{{127,0,0,1},11011}]}
q().

In this example, we get a node list that includes only itself (127.0.0.1:11011).

The RPCs can be called at any Kai node (any Erlang shell on which Kai application is loaded), which can be a different physical machine.

Exit from the Erlang shell for the next section.

q().

Running Kai as a Clustered System
===

This section describes how to run multiple Kai nodes as a clustered system. While we will run all nodes on a single physical machine for convenience, the cluster can be consisted of different machines just by changing IP addresses and port numbers in the following example.

In this example, we run four distinct nodes on a single physical machine. Copy the configuration file for them. They are named "kai1.config" to "kai4.config"; a node configured by kai1.config is called Node1, hereafter.

To replicate data, set the quorum parameter "n" to the degree of replication. In this example, we set "n" to 3 in the configuration file "kai.config". Other quorum parameters "r" and "w" is set in according to the quorum conditions, as described in [Introduction].

Related parameters in kai1.config to kai4.config.

 Name | Value
 ---- | -----
 n    | 3
 r    | 2
 w    | 2

Set port numbers differently for each node to run the multiple nodes on a single physical machine. Set the port numbers as follows.

Related parameters in kai1.config

 Name          | Value
 ------------- | -----
 rpc_port      | 11011
 memcache_port | 11211

Related parameters in kai2.config

 Name          | Value
 ------------- | -----
 rpc_port      | 11012
 memcache_port | 11212

Related parameters in kai3.config

 Name          | Value
 ------------- | -----
 rpc_port      | 11013
 memcache_port | 11213

 Related parameters in kai4.config

 Name          | Value
 ------------- | -----
 rpc_port      | 11014
 memcache_port | 11214

Here, start Node1.

% erl -pa ebin -config kai1

application:start(kai).

To make a cluster, start Node2 and add it to the cluster (that includes only Node1 currently) by informing of their neighbors with function "kai_rpc:check_node/2".

% erl -pa ebin -config kai2

application:start(kai).
kai_rpc:check_node({{127,0,0,1}, 11012}, {{127,0,0,1}, 11011}).
kai_rpc:check_node({{127,0,0,1}, 11011}, {{127,0,0,1}, 11012}).

We give a brief explanation this process. A Kai node maintains a node list including members of the cluster it belongs to. In the initial state, each node list includes only the node itself. By issuing function "kai_rpc:check_node/2", a node of the first argument retrieves a node list from another node of the second argument, and merge the list with that of itself. By calling this function twice, a new node can be added to the cluster; inform a new node of any cluster node by the first call, and vice versa (wait for finishing data synchronization if some data have already been stored in the cluster).

In this example, Node2 (127.0.0.1:11012) gets a node list of Node1 (127.0.0.1:11011) by the first "kai_rpc:check_node/2"; Node2 knows Node1. The second RPC informs Node1 of Node2.

Here, make sure Node1 and Node2 know each other.

kai_rpc:node_list({{127,0,0,1}, 11011}).
{node_list,[{{127,0,0,1},11011},]
{{127,0,0,1},11012}}

kai_rpc:node_list({{127,0,0,1}, 11012}).
{node_list,[{{127,0,0,1},11011},]
{{127,0,0,1},11012}}

In the same manner, add Node3 to the cluster.

% erl -pa ebin -config kai3

application:start(kai).
kai_rpc:check_node({{127,0,0,1}, 11013}, {{127,0,0,1}, 11011}).
kai_rpc:check_node({{127,0,0,1}, 11011}, {{127,0,0,1}, 11013}).

By the third "kai_rpc:check_node/2", Node3 gets a node list of Node1, which includes Node2 as well as Node1. The fourth "kai_rpc:check_node/2" informs Node1 of Node3.

Now, while Node1 and Node3 know all the three nodes, Node2 does not know Node3 yet. Node2, however, gets to know Node3 eventually, because each node periodically exchanges its node list to a node randomly chosen among the node list. Every node, finally, knows each other, and makes the cluster of the three nodes.

Finally, add Node4 to the cluster.

% erl -pa ebin -config kai4

application:start(kai).
kai_rpc:check_node({{127,0,0,1}, 11014}, {{127,0,0,1}, 11011}).
kai_rpc:check_node({{127,0,0,1}, 11011}, {{127,0,0,1}, 11014}).

We make sure whether each node knows all by function "kai_rpc:node_list/1". If the cluster is correctly made up, the output includes the four nodes like this.

kai_rpc:node_list({{127,0,0,1}, 11011}).
{node_list,[{{127,0,0,1},11011},]
{{127,0,0,1},11012},
{{127,0,0,1},11013},
{{127,0,0,1},11014}}
~~~~~~

To remove a node, no procedure is needed. This implies that no action is needed when a node is accidentally going down. However, it's better to remove a node one by one; don't remove more than or equal to N nodes before finishing data synchronization.

This article was originally published in Japanese at gihyo.jp June 2009.

Wiki: Configuration
Wiki: Deployment
Wiki: Home
Wiki: Introduction
Wiki: Publications