kai Wiki

Kai is a distributed key-value datastore

Status: Beta

Brought to you by: takemaru

Configuration

Authors:

Specifying Configuration File
Parameters

This article is described based on Kai version 0.4 and Erlang R13B.

Specifying Configuration File

This page describes the configuration parameters for a Kai node. The parameters are set in the configuration file "kai.config" by default. This filename can be specified when starting a Kai node.

% erl -pa ebin -config <filename>

The configuration values cannot be changed without rebooting the Kai node.

Parameters

Basic Parameters: logfile, hostname, rpc_port, memcache_port

Descriptions for the parameters "logfile", "hostname", "rpc_port", and "memcache_port".

Name	Type	Default value	Example
logfile	string	not specified	"/path/to/logfile"
hostname	string or tuple	not specified	"kai1.example.com" or {192,168,1,1}
rpc_port	integer	11011
memcache_port	integer	11211

First, we show some basic parameters.

The parameter "logfile" is the path to a log file. If not specified, log messages are output on the console.

The parameter "hostname" is either hostname or IP address, to which other nodes connect. If hostname, it should look like "kai1.example.com". Otherwise, it must be an IP address delimitered by commas, like {192,168,1,1}. If not specified, the "hostname" command is used.

The parameter "rpc_port" is a number of TCP port for internal RPCs. Default is 11011.

The parameter "memcache_port" is a number of TCP port for external memcache APIs. Default is 11211 (same as memcached).

Quorum Protocol for Availability and Performance: n, r, w

Descriptions for the parameters "n", "r", and "w".

Name	Type	Default value
n	integer	3
r	integer	2
w	integer	2

The quorum protocol adjusts availability and performance. Large N gives high reliability and availability for the sake of low efficiency. Small R is preferable for read intensive services. By default N:R:W = 3:2:2, which provides good reliability for most services. The values of R and W are subject to the following two constraints,

R + W > N,
W > N/2.

The Kai cluster tries to replicate a given value (data) to N nodes. The value is successfully replicated to W nodes in the cluster at least. For retrieving the value, R of N replicas are read. See [Introduction], for the quorum protocol in detail.

Using Disk Storage for Large Amount of Data: store, dets_dir, number_of_tables

Descriptions for the parameters "store", "dets_dir", and "number_of_tables".

Name	Type	Default value	Example
store	atom	ets	ets or dets
dets_dir	string	not specified	"/path/to/kai/data"
number_of_tables	integer	128

While data is stored on memory by default, Kai supports disk storage for large amount of data. Moreover, disk storage enables to reboot a whole cluster without data loss.

The paramter "store" indicates a storage type used by the node; ets means a memory storage and dets means a disk storage (they are embedded storage systems in Erlang). The capacity of ets is limited to 3 GB for a 32bit machine.

The parameter "dets_dir" is used only for dets. It specifies a directory that contains storage files (don't forget making the directory before running Kai).

The parameter "number_of_tables" is also used only for dets. It specifies the number of storage files, each of which contains 2 GB at maximum; (2 * number_of_tables) GB is the maximum capacity of a single Kai node.

Load Balancing by Using Virtual Nodes: number_of_virtual_nodes, number_of_buckets

Descriptions for the parameters "number_of_virtual_nodes" and "number_of_buckets".

Name	Type	Default value
number_of_virtual_nodes	integer	128
number_of_buckets	integer	1024

The Kai cluster provides fine-grain load balancing for accommodating broad range of machines.

The parameter "number_of_virtual_nodes" specifies the degree of load distribution. More virtual nodes a Kai node has, more loads are assigned to the node, because the Kai cluster equally distributes the loads for each virtual node. This parameter must be more than one hundred for statistical equality. By default, 128.

The parameter "number_of_buckets" determines the unit of load distribution. Large number of buckets provides finer grain distribution for the sake of computation cost. This parameter should be roughly greater than hundred times the cluster size. By default, 1024. This parameter must be shared in the Kai cluster.

number_of_buckets > 100 * cluster size.

The Number of Processes and Sockets: memcache_max_processes, rpc_max_processes, max_connections

Descriptions for the parameters "rpc_max_processes", "memcache_max_processes", and "max_connections".

Name	Type	Default value
memcache_max_processes	integer	20
rpc_max_processes	integer	60
max_connections	integer	64

The number of processes and sockets determines the degree of parallelism.

The parameter "memcache_max_processes" is the maximum number of processes for external memcache APIs. The value should not be less than the expected number of concurrent connections per a node from memcache clients.

The parameter "rpc_max_processes" is the maximum number of processes for internal RPCs. The value should be N times the maximum number of memcache processes.

rpc_max_processes = memcache_max_processes * N of quorum,

The parameter "max_connections" determines the maximum number of sockets connected for other nodes. Actually, sockets more than the maximum number can be opened temporarily to avoid dead lock. The value is roughly equal to the maximum number of processes for internal RPCs.

max_connections ~ rpc_max_processes

This article was originally published in Japanese at gihyo.jp June 2009.

Wiki: Deployment
Wiki: Getting Started
Wiki: Home
Wiki: Introduction