We'll assume that the AMI you just created is now ready. Execute the command below, replacing the AMI-ID with the one you just made:
ec2run ami-8eaf37e7 -g mycluster -k mycluster -z us-east-1c -t c3.large \ -b "/dev/xvdb=ephemeral0" -b "/dev/xvdc=ephemeral1"
Here, the -b
option makes it attach the local storage ephemeral0
of the machine as device /dev/xvdb. When we create an image from this machine, this specification becomes part of the AMI and it will happen automatically. The command above will print out the instance-id (i-something) of the instance, along with other information. It is good practice to tag the instance with a meaningful name:
ec2tag i-b7a975d8 --tag Name=customizing_phase_2
Type ec2din i-b7a975d8
(but use your new instance-id) to get the internet address (FQDN) of the instance, and ssh into it as the admin user (we cannot yet login as root user): something like,
ssh -i ~/.ssh/mycluster.pem admin@ec2-54-234-152-143.compute-1.amazonaws.com
This will hang or say "connection refused" until your machine is up, but retry and you should be able to get to it after a minute or so. This works even though we previously deleted this key from the authorized_keys
on the image, because the start-up scripts on the image add it back.
If you were able to ssh to the instance, then nothing serious went wrong. This means you probably don't need the previous instance you made, any more. If you can't remember the instance-id then ec2din
with no arguments will help you to find it. The terminate the old instance: from your local machine, type something like the following.
# ec2kill i-7877a019 INSTANCE i-7877a019 stopped terminated
OK, now we're ready to continue configuring the image. On the new instance, type:
sudo su apt-get install autofs nfs-kernel-server lockfile-progs lvm2 curl -y
Some of the next installs are a little finicky with regard to the hostname of the local machine, so we'll set the hostname at this point.
hostname master echo master > /etc/hostname
Now edit the file /etc/hosts so the first line looks like this:
127.0.0.1 master localhost localhost.localdomain
The next thing to do is to install NIS. The first line below is just to make it harder for you to continue blindly if you failed to set the hostname properly.
[ `hostname` != master ] && exit echo ypserver master > /etc/yp.conf apt-get install nis -y
This is one of those Debian interactive installs that brings up what looks like a DOS menu on your screen (if your terminal is powerful enough). It will ask you for the NIS domain; you can leave this at the value master
. It will query about a conflict with the file /etc/yp.conf
and ask you what you want to do. Choose N
, which is to keep our version. We only set yp.conf
to stop the installation process from hanging for an annoyingly long time (it will still hang for a while). The installation will look like it failed:
Setting NIS domainname to: kluster. Starting NIS services: ypbindbinding to YP server...........................................failed (backgrounded). . ok
The next installation is interactive:
apt-get install gridengine-client -y
To the question "Install SGE Automatically?" reply "Yes". The "SGE cell name" should be left as "default", and the "SGE master hostname" should be set to "master". Next we install the "gridengine master" package. This is the queue manager, and it will run just on the "master" node, not on the regular nodes:
apt-get install gridengine-master -y
We next install the "execution client" part of GridEngine:
apt-get install gridengine-exec -y
The following packages were things I needed for various reasons; you may find it useful to install them all now, in case you later need one of them.
apt-get install -y gawk automake1.10 libtool zlib1g-dev gfortran screen ntp \ sudo rsync pkg-config gdb iftop libxml-simple-perl subversion \ libatlas3-base g++ patch bzip2
Next we need to set up one more thing so that we can ssh as root. On startup, Debian 7 inserts a command into /root/.ssh/authorized_keys entries which disables root login.
# cat /root/.ssh/authorized_keys no-port-forwarding,no-agent-forwarding,no-X11-forwarding,command="echo 'Please login as the user \"admin\" rather than the user \"root\".';echo;sleep 10" ssh-rsa <snip long rsa entry>
We need to remove the ,command="echo 'Please login as the user \"admin\" rather than the user \"root\".';echo;sleep 10" part, so that it looks like:
no-port-forwarding,no-agent-forwarding,no-X11-forwarding ssh-rsa <snip long rsa entry>
This could be done manually, or via the commandline:
sed -i 's/,command=.*\bssh-rsa\b/ ssh-rsa/g' /root/.ssh/authorized_keys
Edit /root/.ssh/authorized_keys, and then do:
service ssh restart
Next, from the instance, type
ssh master
just to verify that we can still ssh to ourself as root without a password. You should get a prompt. Just type "exit" to go back to your original session. If there is an error you'll have to figure out what went wrong.
Then, from a separate window on your local machine, verify that you can now ssh to the instance as root: something like
ssh -i ~/.ssh/mycluster.pem root@ec2-54-235-5-54.compute-1.amazonaws.com
Now you are ready to transfer a large number of config files and scripts from your "kluster" distribution on the local machine, to the instance. If you want to see what kinds of configuration changes are taking place, the following command may be useful. Run this from your local machine, in the kluster directory, and use the internet name of your actual instance:
bin/push-configs.sh ~/.ssh/mycluster.pem ec2-54-234-152-143.compute-1.amazonaws.com \ `cat scripts/root/config_files`
We now have to finish a few things before creating the image. First we have to initialize the NIS database. Do as follows on the instance:
shadowconfig off cd /var/yp /usr/lib/yp/ypinit -m
The ypinit
command requires user interaction; you have to press ctrl-D and then y. There will be some harmless warnings about failed to send 'clear' to local ypserv: RPC: Program not registered
. Next, run service nis restart
:
# service nis restart Stopping NIS services: ypbind ypserv ypppasswdd ypxfrd. Starting NIS services: ypserv yppasswdd ypxfrd ypbind.
If the output does not look like above, check that your /etc/hosts file has a line like 127.0.0.1 master localhost localhost.localdomain
, that the command hostname
prints out master
, and that /etc/hostname
says master
.
Next, execute the following commands; this ensures that the user and group information will be propagated from the master via NIS.
echo '+::::::' >> /etc/passwd echo "+:" >> /etc/group
We added some init scripts in /etc/init.d, so we need to register them with Debian as follows:
insserv -d kluster-misc-tasks kluster-mktemp kluster-set-hostname \ mem-killer gridengine-exec kluster-configure-queue
This should not produce any output. It sets up soft links to the init scripts in /etc/init.d/, from the directories /etc/rcN.d for different runlevels. The directory for runlevel 4 (normal startup) should look as follows; this lets you know what order things will be started up in:
# ls /etc/rc4.d/ README S12rpcbind S01bootlogs S13nfs-common S01cloud-init-local S13nis S01motd S14autofs S01rsyslog S14nfs-kernel-server S01sudo S15cron S02cloud-init S15gridengine-master S02dbus S16kluster-configure-queue S02exim4 S17kluster-misc-tasks S02mem-killer S18gridengine-exec S02ntp S19cloud-final S02rsync S19kluster-mktemp S03cloud-config S19rc.local S03ssh S19rmnologin S04kluster-set-hostname
(Note: the insserv
command uses the "init info" in the comments at the top of the scripts in /etc/init.d to work out the dependencies between init jobs, which determines the order).
At this point we can check that a few things are working before we shut down and make the image. First we check that NIS (which used to be called Yellow Pages/YP) is working:
# ypcat -k auto.master /export auto.export -rw,nfsvers=3,intr,rsize=8192,wsize=8192,timeo=1000,retrans=5,bg,retry=5,proto=tcp,actimeo=10 /home auto.home -rw,nfsvers=3,intr,rsize=8192,wsize=8192,timeo=1000,retrans=5,bg,retry=5,proto=tcp,actimeo=10
Next we check that GridEngine is working OK:
# service gridengine-master restart Restarting Sun Grid Engine Master Scheduler: sge_qmaster. # service gridengine-exec restart Restarting Sun Grid Engine Execution Daemon: sge_execd. # qhost -q HOSTNAME ARCH NCPU LOAD MEMTOT MEMUSE SWAPTO SWAPUS ------------------------------------------------------------------------------- global - - - - - - - master lx26-amd64 2 0.02 7.5G 280.8M 2.9G 0.0 #
If anything went wrong with NIS or GridEngine, check that the hostname and /etc/hosts are correct. Unfortunately there are many other things that can go wrong. With GridEngine in particular, if the initial installation goes wrong, e.g. the /etc/hosts or hostname was wrong at the time of installation, in my experience the only way to fix it is to start from scratch with an image that has never had GridEngine installed on it.
Next, we need to make some configuration changes in the queue. To make this easier I previously saved some configuration information from my own queue setup (see /root/queue/README for more info). On the instance, do:
cd /root/queue qconf -as master ( echo 'group_name @allhosts'; echo "hostlist `qconf -sel`" ) > foo qconf -Ahgrp foo qconf -Ap sp_smp qconf -Aq sq_all.q qconf -Mc sc qconf -Msconf ssconf cp sconf global; qconf -Mconf global; rm global
These commands set various configuration parameters of the queue, to values that I generally work with and that should work well for Kaldi system building. If you are going to administer a GridEngine cluster you should probably become familiar with GridEngine administration. Commands and associated options that you will likely use a lot include qstat, qhost -q, qconf -mq, qconf -dh, qconf -dh, qconf -ae, qconf -de, qconf -mc, qconf -mconf. Type man qconf
for more information.
Now we will create an image from the instance. From your local machine, stop the image:
# ec2stop i-b7a975d8 INSTANCE i-b7a975d8 running stopping
Now create an AMI from your image:
# ec2cim i-b7a975d8 -n 'customized_phase_2_try1' IMAGE ami-74128a1d
Previous: Customize your image (Phase 1)
Next: Spawning the master node
Up: Kluster Wiki
Wiki: CustomizingImage1
Wiki: Home
Wiki: SettingConfig
Wiki: SpawningMaster