kluster Wiki

Tools for GridEngine+NFS administration on EC2

Brought to you by: cpupa, danielpovey, majortal, vpeddin

SettingConfig

Setting up your Kluster config

The "kluster" scripts that add and remove nodes need certain information in addition to the more generic information in the file vars.sh. We put this in a file config.sh. An example config.sh file looks like this:

export KL_NAME=mycluster
export KL_IMAGE=ami-7b0ae310
export KL_ZONE=us-east-1c
export KL_GPU_IMAGE=
export KL_NETWORK=

The usage of this is that you simply source the files vars.sh and config.sh before calling certain kluster scripts such as kl-add-node and kl-remove-nodes. An explanation of the variables follows:

KL_NAME is the name of the cluster, is used for the keys, and is prepended to the tags we attach to the instances in the cluster. It is also used for the security group if you do not specify KL_NETWORK.
KL_IMAGE is the Amazon image you use for most nodes in the cluster. This would be the image you prepared earlier, at the end of Customizing your Image (Phase 2).
KL_ZONE is the availability zone where you want your machines located.
KL_GPU_IMAGE is a different image type that you will need to prepare if you want to add any GPU instances (type g2.2xlarge or g2.8xlarge). You can leave this blank if you don't need to use GPUs.
KL_NETWORK is the argument to ec2run's -a (attach) option, which allows you so specify a virtual private cloud, if you have created one via the AWS console. If set, it will look something like: export KL_NETWORK=":0:subnet-d16d2aa6:::sg-4f3e042b", where 0 indicates the first network interface, subnet-d16d2aa6 is the subnet id, and sg-4f3e042b is the security group (if KL_NETWORK is specified, KL_NAME will not be used for the security group).

Once you have set up your Kluster config, you can test it as follows. First, let's shut down the master node we created in the last section, as we'll be starting with a fresh cluster name. Leave KL_NAME as mycluster, source the variables (by doing . config.sh), and run

kl-remove-nodes master

This command will look for a node tagged mycluster-master and delete it. Now change KL_NAME in config.sh to testcluster, and source it again (. config.sh). Since this is a new cluster, we have to create the security group and keys, so do

kl-create-sg testcluster
kl-create-key testcluster

To start working with this cluster, you first need the master node, so type:

kl-run-master c3.large

Note: you may receive a message "unable to ssh to instance as root" -- if so, AWS may simply be a little slow and the test timed out. To check the ssh connection, call kl-check-ssh with the instance id, for example: kl-sshmaster i-ac21717c

This is just a slightly more convenient way of creating the master node, versus the more manual way we did it in Spawning the Master Node. The argument is type of machine we want to use for the master. Since we shouldn't be running anything very heavy on the master anyway, we don't need an xlarge image, etc., so I used c3.large. The instances in this cluster, including the master, will have the ssh public key for testcluster on them, because the init scripts add this, but they will also have the ssh public and private keys for mycluster on them, which allows them to ssh to each other. In future we may find a more elegant way to do this, but this method was just easier to implement. Since you created this image yourself and presumably didn't give out the ssh keys to anyone, there shouldn't be a security issue.

The kluster scripts use the name tag on the Amazon instances to keep track of which nodes belong to which cluster. The kluster tools do not store any "state" anywhere locally.

For your convenience, if you want to ssh to the master node of the cluster you can do so as follows:

kl-sshmaster

This script is quite simple; it just works out the IP address of the master node from the output of ec2din, and ssh's to it:

# tail -2 bin/kl-sshmaster
exec ssh -i ~/.ssh/${KL_NAME}.pem root@$public_ip

Once you are on the master you can check that everything is working OK:

# qhost -q
HOSTNAME                ARCH         NCPU  LOAD  MEMTOT  MEMUSE  SWAPTO  SWAPUS
-------------------------------------------------------------------------------
global                  -               -     -       -       -       -       -
master                  lx26-amd64      2  0.00    7.5G  267.9M    2.9G     0.0
   all.q                BIP   0/0/1

Previous: Spawning the Master Node
Next: Adding Nodes
Up: Kluster Wiki

Wiki: AddingNodes
Wiki: CustomizingImage2
Wiki: Home
Wiki: SpawningMaster

kluster Wiki

Tools for GridEngine+NFS administration on EC2

SettingConfig

Setting up your Kluster config

Related