Welcome, Guest! Log In | Create Account

Quick Install Tutorial

From xcat

Jump to: navigation, search

Contents

xCAT 2 Tutorial

vallard@us.ibm.com

This document is a brief step by step tutorial to install xCAT 2. In our lab in Copell, TX, we have set up a cluster that includes 1 management node (wopr) and 8 x336 x86_64 bit nodes. The operating system on this setup is RedHat 4 update 4, but the instructions apply to most supported operating systems. Deviations will be mentioned where appropriate, but you can assume that most instructions apply to all setups of Red Hat derivatives of this nature.

Install xCAT

Get xCAT 2 (connected to the internet)

Since our machine, wopr, is connected to the internet, we run the following to download and install xCAT 2

# cd /etc/yum.repos.d
# wget http://xcat.sourceforge.net/yum/xcat-core/xCAT-core.repo
# wget http://xcat.sourceforge.net/yum/xcat-dep/rh5/x86_64/xCAT-dep.repo
# yum clean metadata
# yum -y install screen # do this to make sure your standard repo works!
# yum install xCAT.x86_64

Note: I had to remove some conflicting RPMS: tftp-server, OpenIPMI-tools, warewulf, system-config-netboot then rerun. I have seen this on most systems that I've installed that OpenIPMI-tools will need to be removed.

Get xCAT 2 (if not connected to the internet)

If you're not connected to the internet, then you'll need to copy two files onto the system that will be your head node. These two files can be downloaded from the xCAT web site.
Get the dep and core repo from the xCAT Download Page When you are done you should have two files on your management server:

  • xcat-core-2.*.tar.bz2
  • xcat-dep-2.*.tar.bz2

Extract these files into the /install/xcat directory and create a yum repository, then install:

# mkdir -p /install/xcat
# cd /instal/xcat
# tar jxvf <location of xcat tarballs>/xcat-core-2*.bz2
# tar jxvf <location of xcat tarballs>/xcat-dep-2*.bz2
# xcat-core/mklocalrepo.sh
# xcat-dep/rh4/x86_64/mklocalrepo.sh
# yum clean meta
# yum install xCAT.x86_64

You'll need to make sure that you have some sort of repo setup to access RPMs on the current distribution.

Verify xCAT 2 install

After you install the RPMs via YUM, verify that everything works:

# source /etc/profile.d/xcat.sh
# tabdump site
#key,value,comments,disable
 "xcatdport","3001",,
 "xcatiport","3002",,
 "tftpdir","/tftpboot",,
 "master","172.20.0.1",,
 "domain",,,
 "installdir","/install",,
 "timezone","America/New_York",,
 "nameservers","172.30.0.1,172.20.0.1",,

If you get a connection error like:

 Connection failure: IO::Socket::SSL: Timeout at /opt/xcat/lib/perl/xCAT/Client.pm line 138.

Then something went awry during the RPM installation. Please don't go further until you make sure you can do a tabdump on your site database.


Configure xCAT 2 database tables and various services

Now the work begins. Lets modify the tables. This is similar to how xCAT 1.3 was set up. The difference is that the tables are no longer flat files but held in a database. xCAT 2 provides various ways of editing these tables. We'll use several methods throughout this tutorial.

The Site Table

Lets start off by changing some things manually, like the timezone, NTP, and DNS:

# chtab key=timezone site.value="US/Central"
# chtab key=ntpservers site.value=wopr
# chtab key=forwarders site.value=9.0.7.1,9.0.6.11
# chtab key=domain site.value=cluster

NOTE: A word about DNS. A typical installation of xCAT has a dual-homed master server with one connection into the cluster and another connection to the outside world. The best thing to do is set the management nodes /etc/resolv.conf to point to itself. Then put your external DNS servers in the site table's forwarders value. That way you can resolve internally and externally.

You can also use the new tabedit command and edit the rest of the table. Run the tabedit command and edit the rest of the site table so that it appears as follows:

#key,value,comments,disable
"xcatdport","3001",,
"xcatiport","3002",,
"tftpdir","/tftpboot",,
"master","172.20.0.1",,
"domain","cluster",,
"installdir","/install",,
"timezone","US/Central",,
"nameservers","172.20.0.1",,
"ntpservers","wopr",,
"forwarders","9.0.7.1,9.0.6.11",,
"xcatprefix","/opt/xcat",,
"xcatroot","/opt/xcat",,
"dhcpinterfaces","eth0",,

When you are done, you can run the 'tabdump site' command. This will show you what your file looks like. Also note that the 'tabdump -d site' command will show you what values can be put into each table. Its handy for looking to see what belongs where. The tabdump command without any arguments will list all the tables.


The Hosts File: /etc/hosts

The default RedHat installs seem to always put the host name of the management server in the 127.0.0.1 network. Make sure this doesn't happen. The top line should just say:

127.0.0.1 localhost localhost.localdomain

In most cases, you'll populate this file by yourself with the appropriate IP addresses of your nodes like in the old days. But xCAT 2 comes with a new table called hosts. With this table you can write out your /etc/hosts file using regular expressions. We won't go too much into it here. But once you edit this file you can then run the makehosts command. See the man page for more details. Our /etc/host file with relevant information looks like this:

172.20.0.1 wopr
172.20.1.1 x336001-bmc
172.20.1.2 x336002-bmc
172.20.1.3 x336003-bmc
172.20.1.4 x336004-bmc
172.20.1.5 x336005-bmc
172.20.1.6 x336006-bmc
172.20.1.7 x336007-bmc
172.20.1.8 x336008-bmc
172.20.11.1 x336001
172.20.11.2 x336002
172.20.11.3 x336003
172.20.11.4 x336004
172.20.11.5 x336005
172.20.11.6 x336006
172.20.11.7 x336007
172.20.11.8 x336008
172.20.1.201 ts001
172.20.0.254 smc001

The Networks Table

xCAT filled a lot of this table out when it installed. However, we may need to add a few changes. We edit our networks table as follows:

# tabedit network
#netname,net,mask,mgtifname,gateway,dhcpserver,tftpserver,nameservers,dynamicrange,nodehostname,comments,disable "cluster","172.20.0.0","255.255.0.0","eth0",,,"172.20.0.1","172.20.0.1,172.20.0.1","172.10.0.200-172.10.0.250",,,

You only need to put in the networks that you want xCAT to own. Eth1 on my cluster is connected to the public network, but I don't want xCAT to run DNS or DHCP on that network.

The changes we made from the default that xCAT set up are as follows: First we need a dynamic range for mac address discovery. Our nodes are on the 172.20.0.0 network so we added the dynamic range of 172.20.0.200-172.20.0.250. We also added Name servers, DHCP servers and TFTP servers - All of them are the master node.

DNS

You don't have to make xCAT the DNS server, but you will need to make sure things are resolved somewhere. xCAT can make your DNS server and has had this capability since 2000! Let's turn our management node into a DNS server. This is necessary so that all nodes will be able to resolve IP addresses via our master node instead of using /etc/hosts. We run:

# makedns
# service named restart
# cat /etc/resolv.conf
search cluster
nameserver 172.20.0.1

Note: Make sure that your nameserver and domain matches what you have in your site table! Like we mentioned before, you want your management node to resolve the nodes and then forward other queries outside the cluster.

Now try to test it:

# host x336001

The name should resolve into an IP address.

If you have errors then this means that the networks aren't set up quite right. We'll have to edit the networks table to make sure that makedns pulls in all the IP addresses that it is trying to resolve.

TFTP

Run:

# mknb x86_64

This will set up TFTP on wopr. It will also take all the files like pxelinux.0 and put them here for our installations.

Node List Table

Let's add our nodes to the nodelist file.
Run:

# tabedit nodelist

Now fill it in so it looks like this:

#node,groups,status,comments,disable
"x336001","compute,ipmi,mrv,all",,,
"x336002","compute,ipmi,mrv,all",,,
"x336003","compute,ipmi,mrv,all",,,
"x336005","compute,ipmi,mrv,all",,,
"x336006","compute,ipmi,mrv,all",,,
"x336007","compute,ipmi,mrv,all",,,
"x336008","compute,ipmi,mrv,all",,,

You should now be able to run a few more commands:

# nodels compute
# nodels x336001-x336004

Node Hardware Management Table

Now we want to be able to power on/off our machines as well as view remote console. This information is held in the nodehm as well as the ipmi tables. Run:

# tabedit nodehm

Edit it so it appears as follows:

#node,power,mgt,cons,termserver,termport,conserver,serialport,serialspeed,serialflow,getmac,comments,disable
"mrv",,,"mrv",,,,,,,,,
"x336001",,"ipmi",,"ts001","21",,"0","19200","hard",,,
"x336002",,"ipmi",,"ts001","22",,"0","19200","hard",,,
"x336003",,"ipmi",,"ts001","23",,"0","19200","hard",,,
"x336004",,"ipmi",,"ts001","4",,"0","19200","hard",,,
"x336005",,"ipmi",,"ts001","5",,"0","19200","hard",,,
"x336006",,"ipmi",,"ts001","6",,"0","19200","hard",,,
"x336007",,"ipmi",,"ts001","7",,"0","19200","hard",,,
"x336008",,"ipmi",,"ts001","8",,"0","19200","hard",,,

Notice here that we added the mrv group. This is a nice way to be lazy. It means that all nodes in the mrv group will use the mrv for the console method. In xCAT 1.3, this group method in the tables was only available in the noderes table. Now its everywhere.
#n7lw455#n7lw456
The next lines specify our nodes. Since each node is attached to a terminal server we added the appropriate ports, serial device, serial speed and serial flow.
#n7lw465#n7lw466#g74yIPMI Table

#n7lw472#n7lw473#n7lw470All our nodes are managed with IPMI. So we won't modify the mp or mpa database tables. With the ipmi database we will use regular expressions:
#c-40#c-400#c-401

  1. chtab node=ipmi ipmi.bmc="/\z/-bmc/"

#n7lw475#n7lw476
#n7lw478#n7lw479#n7lw480#n7lw481 Notice here that the /\z/ substitutes the name of the node for every ipmi device. Since all of our nodes have <nodename>-bmc as their bmc, then this is satisfied in one line. Regular expressions are good.
#n7lw483#n7lw484
#n7lw486#n7lw487Interruption!

We now interrupt the exciting db creation to setup conserver. Since we have already defined the necessary attributes for it, we can now see if it works:

#n7lw492#n7lw493# makeconservercf
#n7lw495#n7lw496# service conserver stop
#n7lw498#n7lw499# service conserver start
#n7lw504#n7lw505
#n7lw507#n7lw508Now see if it works by looking at them:
#n7lw510#n7lw5111# rcons x336001
#n7lw513#n7lw514
#n7lw516#n7lw517#n7lw518If you see them you're on the right track. If you have problems, check that there is only one conserver instance on your network connecting to the MRVs. MRVs only allow one connection and can be fickle if someone else is logged in. Also be sure the BIOS is set to redirect output. You may want to reboot the nodes and verify that you see the hardware setup.

#n7lw523#n7lw524We now resume our database building...
#n7lw526#n7lw527
#n7lw529#n7lw530The Node Resources Table (noderes)

#n7lw4161#cg0h21#n7lw4171 Run:

  1. tabedit noderes

#n7lw532#n7lw533Edit it so it appears as follows:

  1. node,servicenode,netboot,tftpserver,nfsserver,monserver,kernel,initrd,kcmdline,nfsdir,serialport,installnic,primarynic,xcatmaster,current_osimage,next_osimage,comments,disable

"compute",,"pxe","172.10.0.1","172.10.0.1",,,,,,"0","eth0","eth0",,,,,

#n7lw544#n7lw545The Password Table (passwd)

#n7lw547#n7lw548Here we add the password for our IPMI devices as well as the password we want for our nodes when they're installed:

#n7lw551#n7lw552# chtab key=ipmi passwd.username=xcat passwd.password=f00bar
#n7lw554#n7lw555# chtab key=system passwd.username=root passwd.password=cluster
#n7lw557#n7lw558
#n7lw560#n7lw561Table Interruption 2!

#n7lw563#n7lw564#n7lw565We have entered all that is needed to remotely boot our nodes via the rpower command. Check that it works:

#n7lw567#n7lw568# rpower compute stat
#n7lw570#n7lw5711
#n7lw573#n7lw574Nice! And now lets finish the tabs...
#n7lw576#n7lw577
#n7lw579#n7lw580The Node Type table (nodetype)

This is similar to the nodetype.tab in xCAT 1.3. The new item to be aware of is the nodetype.nodetype parameter. This is used to say whether it is an image, an RSA, a switch, or even a virtual machine. This will become more involved as xCAT 2 develops. We change it as follows:

#n7lw5821#n7lw5831# chtab node=compute nodetype.os=centos4.6 nodetype.arch=x86_64 \ nodetype.profile=compute nodetype.nodetype=osi
#n7lw5851#n7lw5861
The profiles defaults are in /opt/xcat/share/xcat/install/centos since that is what we're installing. If you want to customize the install image you can change it here and then change the nodetype.profile key to what you want. It is very similar to xCAT 1.3.

#n7lw638#n7lw639The Switch Table
#n7lw641#n7lw642The big difference in the xCAT 2 model as opposed to the 1.3 model is the method for identifying bare hardware. In xCAT 2 the model of getting a MAC address via getmacs is largely done away with and identification is done via network switches. In the case of blades it is largely done the same as it was before but uses an SNMP method instead of screen scraping the web pages. So when you set xCAT up on a system it is imperative that you know where each node is plugged into. Here we setup our switch table as follows:

#n7lw648#n7lw649#tabedit switch

  1. node,switch,port,vlan,comments,disable

#n7lw651#n7lw652"x336001","smc001","5",,,,
#n7lw654#n7lw655"x336002","smc001","6",,,,
#n7lw657#n7lw658"x336003","smc001","7",,,,
#n7lw660#n7lw661"x336004","smc001","8",,,,
#n7lw663#n7lw664"x336005","smc001","9",,,,
#n7lw666#n7lw667"x336006","smc001","10",,,,
#n7lw669#n7lw670"x336007","smc001","11",,,,
#n7lw672#n7lw673"x336008","smc001","12",,,,
#n7lw675#n7lw676
#n7lw678#n7lw679The Chain Table
#n7lw681#n7lw682The chain table made a brief appearance in 1.3, but was largely ignored. However, this is the destiny control table and shows the automatic provisioning abilities of xCAT: e.g: The one button press and walk away abilities of it. We want a node to do the following: be discovered, setup the bmc, then standby. So we do this:
#n7lw686#n7lw687
#n7lw689#n7lw690# chtab node=compute chain.chain="runcmd=bmcsetup,standby" chain.ondiscover=nodediscover
#n7lw695#n7lw696#n7lw692#n7lw693
Verify tables
#n7lw701#n7lw702#n7lw704Most of the work is done. Now Run lsdef and see what you have. This is where you can see the CSM and xCAT merge points into xCAT 2
#n7lw706#n7lw707
#n7lw709#n7lw710# lsdef x336001
Discovery and node Provisioning

#n7lw721#n7lw722We now reboot the nodes and watch the discovery them.
#n7lw725#n7lw726
#n7lw728#n7lw729# rpower compute reset
#n7lw731#n7lw732
#n7lw734#n7lw735Watch the fun via:

  1. tail -f /var/log/messages

#n7lw738#n7lw739#n7lw744#n7lw745 You'll see the PXE stuff go on the dynamic ranges and boot up the xCAT netboot kernel. Once you're done, you can even ssh into the nodes to see what happened to them.

Copycds

This command hasn't changed:

#n7lw805#n7lw806# copycds CentOS-4.6-x86_64-binDVD.iso

#n7lw818#n7lw819We're now ready to install. In our example, we'll just leave the default post scripts. If you wanted something else, then you could modify the postscripts table. Note that xCAT post scripts are now contained in the
/install/postscripts directory.

#n7lw840#n7lw8411Install the Nodes (diskful)
Run:
#n7lw847#n7lw848
#n7lw850#n7lw851# rinstall compute
#n7lw853#n7lw854
Pretty easy. One other change is that the kickstart files get placed in /install/autoinst/<nodename>. That is also the same location of the autoyast files.
#n7lw876#n7lw877#n7lw873#n7lw874
#n7lw879#g45m#n7lw880#g45m0 Install Stateless Images (WIP)
#n7lw8791#g45m1
#n7lw882#n7lw883#g45m2To boot a node stateless, we do the following:
#n7lw8821#n7lw8831#g45m21#ywmw#ywmw0#iton # cd /opt/xcat/share/xcat/netboot/centos
#n7lw8822#n7lw8832#iton0# ./genimage -i eth0 -n tg3,bnx2 -o centos4.6 -p compute

  1. cd /install/netboot/centos4.6/x86_64/compute/rootimg/etc/

#n7lw8823#n7lw8833# cp fstab fstab.ORIG
#n7lw8824#n7lw8834#cee0
#n7lw8825#n7lw8835Edit fstab so it looks like this:

#n7lw8826#n7lw8836#b.6o#b.6o0#b.6o1#b.6o2#b.6o3#b.6o4 #tmpfs /dev/shm tmpfs defaults 0 0
proc /proc proc defaults 0 0
sysfs /sys sysfs defaults 0 0
devpts /dev/pts devpts gid=5,mode=620 0 0
compute_x86_64 / tmpfs rw 0 1
none /tmp tmpfs defaults,size=10m 0 2
none /var/tmp tmpfs defaults,size=10m 0 2

#n7lw8827#b.6o5Continue with commands:
#n7lw8828#n7lw8837#b.6o6# cd
#n7lw8829#n7lw8838# packimage -o centos4.6 -p compute -a x86_64
#n7lw88210#et6o
#n7lw88211#n7lw8839Test that it boots:
#n7lw88212#n7lw88310# nodeset x336008 netboot
#n7lw88213#n7lw88311# rpower x336008 boot
#n7lw891#n7lw892Removing xCAT

#n7lw894#n7lw895First, back old tables:
#n7lw897#n7lw898# mkdir -p /tmp/xcatdb.backup
#n7lw900#n7lw901# for i in $(tabdump); do echo "Dumping $i..."; tabdump $i >/tmp/xcatdb.backup/$i.csv; done
#n7lw903#n7lw904
If you need to restore old tabs run:
#n7lw909#n7lw910# for i in $(ls *.csv); do echo "Restoring $i... "; tabrestore $i; done

#n7lw912#n7lw913Next do:

#n7lw915#n7lw916# service dhcpd stop
#n7lw918#n7lw919# rm -rf /var/lib/dhcp/*leases
#n7lw924#n7lw925#n7lw921#n7lw922 # yum remove xCAT

Conclusions and recommendations

No tutorial can cover all the aspects of xCAT 2. For further support consider the mailing list, the IRC channel #xcat, the xCAT wiki, and other information on the xCAT website.