This document is no longer used! It's content has been integrated into the main xcat cookbooks.
This doc describes how to set up an xCAT management node and service node to boot compute nodes over the system p HFI network.
We are using the following node names as an example in this document:
Management node (MN): c250mgrs04-pvt Service node (SN): c250f07c04ap01 Compute node (CN): c250f07c04ap13
All the steps should be run on the xCAT management node. For those commands that need to run on the service node, we are using xdsh to run the command remotely on the service node.
After installing xCAT on the management node according [XCAT_pLinux_Clusters], increase the /boot file system size for service node since we need to use a customized kernel on service node, there will be two kernels existing on service nodes, /boot needs more space.
This is just for a workaround of a customized kernel. After the kernel is accepted by Linux community, there will be only one kernel existing on service node, and we don't need to increase the /boot space at that time.
In file /opt/xcat/share/xcat/install/rh/service.rhels6.ppc64.tmpl, change the line:
From:
part /boot --size 50 --fstype ext4
To:
part /boot --size 200 --fstype ext4
Refer to the rest of [XCAT_pLinux_Clusters] and [Setting_Up_a_Linux_Hierarchical_Cluster] for instructions for setting up a basic xCAT MN and SN and stateless compute nodes. The rest of the instructions below are additional steps you need to take to be able to boot the compute nodes over the HFI network. Eventually, this document will be integrated with the other two.
After the basic MN is set up and the SN is installed, install the HFI device drivers on xCAT MN
cd /hfi/dd rpm -ivh kernel-2.6.32hfi-3.ppc64.rpm rpm -ivh --nodeps hfi_util-1.0-0.el6.ppc64.rpm rpm -ivh --force net-tools-1.60-102.el6.ppc64.rpm
Configure the xCAT SN with the HFI device driver
Copy the HFI driver and DHCP packages to service node:
xdcp c250f07c04ap01 -R /hfi /hfi
Install the HFI device drivers on the xCAT SN
xdsh c250f07c04ap01 rpm -ivh /hfi/dd/kernel-2.6.32hfi-2.ppc64.rpm xdsh c250f07c04ap01 rpm -ivh --nodeps /hfi/dd/hfi_util-1.0-0.el6.ppc64.rpm xdsh c250f07c04ap01 rpm -ivh --force /hfi/dd/net-tools-1.60-102.el6.ppc64.rpm xdsh c250f07c04ap01 /sbin/new-kernel-pkg --mkinitrd --depmod --install 2.6.32hfi xdsh c250f07c04ap01 /sbin/new-kernel-pkg --rpmposttrans 2.6.32hfi
Create soft links and change yaboot.conf to boot from the customized kernel with HFI support
xdsh c250f07c04ap01 ln -sf /boot/vmlinuz-2.6.32hfi /boot/vmlinuz xdsh c250f07c04ap01 ln -sf /boot/System.map-2.6.32hfi /boot/System.map
Login to the service node and change the "default=" setting in /boot/etc/yaboot.conf to the new label with HFI support. For example, change it from:
boot=/dev/sda1 init-message="Welcome to Red Hat Enterprise Linux!\nHit <TAB> for boot options" partition=3 timeout=5 install=/usr/lib/yaboot/yaboot delay=5 enablecdboot enableofboot enablenetboot nonvram fstype=raw default=linux image=/vmlinuz-2.6.32hfi label=2.6.32hfi read-only initrd=/initrd-2.6.32hfi.img append="rd_NO_LUKS rd_NO_LVM rd_NO_MD rd_NO_DM LANG=en_US.UTF-8 SYSFONT=latarcyrhebsun16 KEYTABLE=us console=hvc0 crashkernel=auto rhgb quiet root=UUID=e2123609-7080-45f0-b583-23d5ef27dbba" image=/vmlinuz-2.6.32-71.el6.ppc64 label=linux read-only initrd=/initramfs-2.6.32-71.el6.ppc64.img append="rd_NO_LUKS rd_NO_LVM rd_NO_MD rd_NO_DM LANG=en_US.UTF-8 SYSFONT=latarcyrhebsun16 KEYTABLE=us console=hvc0 crashkernel=auto rhgb quiet root=UUID=e2123609-7080-45f0-b583-23d5ef27dbba"
TO:
boot=/dev/sda1 init-message="Welcome to Red Hat Enterprise Linux!\nHit <TAB> for boot options" partition=3 timeout=5 install=/usr/lib/yaboot/yaboot delay=5 enablecdboot enableofboot enablenetboot nonvram fstype=raw default=2.6.32hfi image=/vmlinuz-2.6.32hfi label=2.6.32hfi read-only initrd=/initrd-2.6.32hfi.img append="rd_NO_LUKS rd_NO_LVM rd_NO_MD rd_NO_DM LANG=en_US.UTF-8 SYSFONT=latarcyrhebsun16 KEYTABLE=us console=hvc0 crashkernel=auto rhgb quiet root=UUID=e2123609-7080-45f0-b583-23d5ef27dbba" image=/vmlinuz-2.6.32-71.el6.ppc64 label=linux read-only initrd=/initramfs-2.6.32-71.el6.ppc64.img append="rd_NO_LUKS rd_NO_LVM rd_NO_MD rd_NO_DM LANG=en_US.UTF-8 SYSFONT=latarcyrhebsun16 KEYTABLE=us console=hvc0 crashkernel=auto rhgb quiet root=UUID=e2123609-7080-45f0-b583-23d5ef27dbba" 4. Reset the service node to boot from the kernel with HFI xdsh c250f07c04ap01 reboot
Now the HFI interfaces should be available on the service node. Sync the /etc/hosts from MN to SN and run the hficonfig postscript to configure the HFI interfaces with IP addresses.
xdcp c250f07c04ap01 /etc/hosts /etc/hosts cp /hfi/scripts/hficonfig /install/postscripts/hficonfig chdef c250f07c04ap01 postscripts=servicenode,xcatserver,xcatclient,hficonfig updatenode c250f07c04ap01
In xCAT 2.7, the xcatserver and xcatclient postscripts are not longer needed in the postscripts table. You should use the following command:
chdef c250f07c04ap01 postscripts=servicenode,hficonfig
Install the DHCP server/client on MN and SN.
rpm -Uvh /hfi/dhcp/dhcp-4.1.1-13.P1.el6.ppc64.rpm rpm -Uvh /hfi/dhcp/dhclient-4.1.1-13.P1.el6.ppc64.rpm xdsh c250f07c04ap01 rpm -Uvh /hfi/dhcp/dhcp-4.1.1-13.P1.el6.ppc64.rpm xdsh c250f07c04ap01 rpm -Uvh /hfi/dhcp/dhclient-4.1.1-13.P1.el6.ppc64.rpm
Create a new networks definition in the xCAT database for the HFI interface. Generally we only need to create one new HFI network definition (for hf0) for diskless booting of compute nodes over HFI.
mkdef -t network -o hfinet net=20.0.0.0 mask=255.0.0.0 gateway=20.255.255.254 mgtifname=hf0 dhcpserver=20.7.4.1 tftpserver=20.7.4.1 nameservers=20.7.4.1
where the above net, mask, and gateway are appropriate for your hf0 network, and 20.7.4.1 should be changed to be the IP address of the hf0 NIC on your SN.
Create compute node definitions. Make sure the servicenode and xcatmaster attributes are set correctly.
servicenode attribute should be set to the IP address or hostname of the ethernet NIC on the service node that faces the MN. The xcatmaster attribute should be set to hf0 IP address on the service node. An example mkdef stanza file for defining the compute node:
c250f07c04ap13: objtype=node arch=ppc64 cons=fsp groups=lpar,all hcp=f07c04fsp1_a id=13 installnic=hf0 ip=20.7.4.13 mgt=fsp monserver=20.7.4.1 netboot=yaboot nfsserver=20.7.4.1 nodetype=ppc,osi hwtype=lpar os=rhels6 parent=f07c04fsp1_a postscripts=hficonfig pprofile=c250f07c04ap13 primarynic=hf0 profile=compute provmethod=netboot servicenode=10.7.4.1 tftpserver=20.7.4.1 xcatmaster=20.7.4.1
Create a diskless image for the compute node
copycds -n rhels6 /iso/RHEL6.0-20100922.1-Server-ppc64-DVD1.iso
Where rhels6 stands for the OS version (RHEL 6 Server).
The HFI kernel can be installed by xCAT automatically, but the other two packages, hfi_util and nettools, require the rpm options --nodeps and --force respectively, which xCAT cannot handle automatically. So we need to modify the postinstall file manually which will be run during the diskless image generation.
Add the following lines to /opt/xcat/share/xcat/netboot/rh/compute.rhels6.ppc64.postinstall. (rhels6 stands for the OS version - it should be the same as the previous step.)
cp /hfi/dd/hfi_util-1.0-0.el6.ppc64.rpm /install/netboot/rhels6/ppc64/compute/rootimg/tmp/hfi_util-1.0-0.el6.ppc64.rpm cp /hfi/dd/net-tools-1.60-102.el6.ppc64.rpm /install/netboot/rhels6/ppc64/compute/rootimg/tmp/nettools-1.60-102.el6.ppc64.rpm cp /hfi/dhcp/dhclient-4.1.1-13.P1.el6.ppc64.rpm /install/netboot/rhels6/ppc64/compute/rootimg/tmp/dhclient-.1.1-13.P1.el6.ppc64.rpm cp /hfi/dhcp/dhcp-4.1.1-13.P1.el6.ppc64.rpm /install/netboot/rhels6/ppc64/compute/rootimg/tmp/dhcp-4.1.1-13.P1.el6.ppc64.rpm chroot /install/netboot/rhels6/ppc64/compute/rootimg/ /bin/rpm -ivh /tmp/net-tools-1.60-102.el6.ppc64.rpm --force chroot /install/netboot/rhels6/ppc64/compute/rootimg/ /bin/rpm -ivh /tmp/hfi_util-1.0-0.el6.ppc64.rpm --nodeps --force chroot /install/netboot/rhels6/ppc64/compute/rootimg/ /bin/rpm -Uvh /tmp/dhclient-4.1.1-13.P1.el6.ppc64.rpm --force chroot /install/netboot/rhels6/ppc64/compute/rootimg/ /bin/rpm -Uvh /tmp/dhcp-4.1.1-13.P1.el6.ppc64.rpm --force
Sync /etc/hosts to the diskless image. This is used by the postscript hficonfig to configure all the HFI interfaces on the compute nodes.
cp /hfi/scripts/compute.rhels6.ppc64.synclist /install/custom/netboot/rh/compute.rhels6.ppc64.synclist
Generate the diskless image
cd /opt/xcat/share/xcat/netboot/rh/ && /opt/xcat/share/xcat/netboot/rh/genimage -i hf0 -n hf_if -o rhels6 -p compute -k 2.6.32hfi
where hf0 stands for the boot interface on compute nodes, and hf_if stands for the hfi device driver.
pack the image:
packimage -o rhels6 -p compute -a ppc64
Get the compute node mac address and set up dhcp services.
getmacs c250f07c04ap13 --hfi -D
Prepare for the compute node boot
nodeset c250f07c04ap13 netboot
Make sure your site.dhcpinterfaces attribute is set correctly to have the MN and SN listen only on the correct NICs.
> tabdump site | grep dhcp "dhcpinterfaces","c250mgrs04-pvt|eth0;c250f07c04ap01|hf0",,
Issue makedhcp to setup dhcp services. Specify --HFI option to identify this is a HFI devices.
makedhcp c250f07c04ap13 --HFI
Reboot the node to boot from network.
rnetboot c250f07c04ap13 --hfi
Open a console to watch the compute node installation.
rcons c250f07c04ap13
Instructions for SLES 11 will come at a later time.
Wiki: Setting_Up_a_Linux_Hierarchical_Cluster
Wiki: XCAT_pLinux_Clusters