This article is described based on Kai version 0.4 and Erlang R13B.
This page describes the configuration parameters for a Kai node. The parameters are set in the configuration file "kai.config" by default. This filename can be specified when starting a Kai node.
% erl -pa ebin -config <filename>
The configuration values cannot be changed without rebooting the Kai node.
Descriptions for the parameters "logfile", "hostname", "rpc_port", and "memcache_port".
Name | Type | Default value | Example |
---|---|---|---|
logfile | string | not specified | "/path/to/logfile" |
hostname | string or tuple | not specified | "kai1.example.com" or {192,168,1,1} |
rpc_port | integer | 11011 | |
memcache_port | integer | 11211 |
First, we show some basic parameters.
The parameter "logfile" is the path to a log file. If not specified, log messages are output on the console.
The parameter "hostname" is either hostname or IP address, to which other nodes connect. If hostname, it should look like "kai1.example.com". Otherwise, it must be an IP address delimitered by commas, like {192,168,1,1}. If not specified, the "hostname" command is used.
The parameter "rpc_port" is a number of TCP port for internal RPCs. Default is 11011.
The parameter "memcache_port" is a number of TCP port for external memcache APIs. Default is 11211 (same as memcached).
Descriptions for the parameters "n", "r", and "w".
Name | Type | Default value |
---|---|---|
n | integer | 3 |
r | integer | 2 |
w | integer | 2 |
The quorum protocol adjusts availability and performance. Large N gives high reliability and availability for the sake of low efficiency. Small R is preferable for read intensive services. By default N:R:W = 3:2:2, which provides good reliability for most services. The values of R and W are subject to the following two constraints,
R + W > N, W > N/2.
The Kai cluster tries to replicate a given value (data) to N nodes. The value is successfully replicated to W nodes in the cluster at least. For retrieving the value, R of N replicas are read. See [Introduction], for the quorum protocol in detail.
Descriptions for the parameters "store", "dets_dir", and "number_of_tables".
Name | Type | Default value | Example |
---|---|---|---|
store | atom | ets | ets or dets |
dets_dir | string | not specified | "/path/to/kai/data" |
number_of_tables | integer | 128 |
While data is stored on memory by default, Kai supports disk storage for large amount of data. Moreover, disk storage enables to reboot a whole cluster without data loss.
The paramter "store" indicates a storage type used by the node; ets means a memory storage and dets means a disk storage (they are embedded storage systems in Erlang). The capacity of ets is limited to 3 GB for a 32bit machine.
The parameter "dets_dir" is used only for dets. It specifies a directory that contains storage files (don't forget making the directory before running Kai).
The parameter "number_of_tables" is also used only for dets. It specifies the number of storage files, each of which contains 2 GB at maximum; (2 * number_of_tables) GB is the maximum capacity of a single Kai node.
Descriptions for the parameters "number_of_virtual_nodes" and "number_of_buckets".
Name | Type | Default value |
---|---|---|
number_of_virtual_nodes | integer | 128 |
number_of_buckets | integer | 1024 |
The Kai cluster provides fine-grain load balancing for accommodating broad range of machines.
The parameter "number_of_virtual_nodes" specifies the degree of load distribution. More virtual nodes a Kai node has, more loads are assigned to the node, because the Kai cluster equally distributes the loads for each virtual node. This parameter must be more than one hundred for statistical equality. By default, 128.
The parameter "number_of_buckets" determines the unit of load distribution. Large number of buckets provides finer grain distribution for the sake of computation cost. This parameter should be roughly greater than hundred times the cluster size. By default, 1024. This parameter must be shared in the Kai cluster.
number_of_buckets > 100 * cluster size.
Descriptions for the parameters "rpc_max_processes", "memcache_max_processes", and "max_connections".
Name | Type | Default value |
---|---|---|
memcache_max_processes | integer | 20 |
rpc_max_processes | integer | 60 |
max_connections | integer | 64 |
The number of processes and sockets determines the degree of parallelism.
The parameter "memcache_max_processes" is the maximum number of processes for external memcache APIs. The value should not be less than the expected number of concurrent connections per a node from memcache clients.
The parameter "rpc_max_processes" is the maximum number of processes for internal RPCs. The value should be N times the maximum number of memcache processes.
rpc_max_processes = memcache_max_processes * N of quorum,
The parameter "max_connections" determines the maximum number of sockets connected for other nodes. Actually, sockets more than the maximum number can be opened temporarily to avoid dead lock. The value is roughly equal to the maximum number of processes for internal RPCs.
max_connections ~ rpc_max_processes
This article was originally published in Japanese at gihyo.jp June 2009.
Wiki: Deployment
Wiki: Getting Started
Wiki: Home
Wiki: Introduction