Setting Up a Linux Hierarchical Cluster
From xcat
Introduction
Service Nodes
In large clusters, it is desirable to have more than one node (the Management Node - MN) handle the installation and management of the compute nodes. We call these additional nodes service nodes (SN). The management node can delegate all management operations need for a compute node to the SN that is managing that compute node. You can have one or more service nodes set up to install and manage groups of compute nodes. See xCAT Overview, Architecture, and Planning#xCAT Architecture for a high-level diagram showing this structure.
With xCAT, you have the choice of either having each service node install/manage a specific set of compute nodes, or having a pool of service nodes, any of which can respond to an installation request from a compute node. (Service node pools must be aligned with the network broadcast domains, because the way a compute node choose its SN for that boot is by whoever responds to the DHPC request broadcast first.) You can also have a hybrid of the 2 approaches, in which for each specific set of compute nodes you have 2 or more SNs in a pool.
Where Did I Come From and Where Am I Going?
This document explains the basics for setting up a hierarchical Linux cluster using service nodes. The user of this document should be very familiar with setting up an xCAT non-hierarchical cluster. The document will cover only the additional steps needed to make the cluster hierarchical by setting up the SNs. It is assumed that you have already set up your management node according to the instructions in the relevant xCAT "cookbook" on the xCAT Documentation page. For example:
- xCAT iDataPlex Cluster Quick Start
- xCAT system x support for IBM Flex
- xCAT system p support for Linux on IBM Flex
- xCAT pLinux Clusters
Note: If you are trying to reference P775 Linux instructions, reference the xCAT pLinux Clusters link above
Once you have used this document to define your service nodes, you should return to the xCAT cookbook you are using to deploy the service nodes and then the compute nodes.
Service Node 101
The SNs each run an instance of xcatd, just like the MN does. The xcatd's communicate with each other using the same XML/SSL protocol that the xCAT client uses to communicate with xcatd on the MN.
The service nodes need to communicate with the xCAT database on the Management Node. They do this by using the remote client capability of the database (i.e. they don't go through xcatd for that). Therefore the Management Node must be running one of the daemon-based databases supported by xCAT (PostgreSQL, MySQL, or DB2). (Currently DB2 is only supported in xCAT clusters of Power 775 nodes.) The default SQLite database does not support remote clients and cannot be used in hierarchical clusters. This document includes instructions for migrating your cluster from SQLite to one of the other databases. Since the initial install of xCAT will always set up SQLite, you must migrate to a database that supports remote clients before installing your service nodes.
xCAT will help you install your service nodes will install on the SNs xCAT software and other required rpms such as perl, the database client, and other pre-reqs. Service nodes require all the same software as the MN (because it can do all of the same functions), except that there is a special top level xCAT rpm for SNs called xCATsn vs. the xCAT rpm that is on the Management Node. The xCATsn rpm tells the SN that the xcatd on it should behave as an SN, not the MN.
Setup the MN Hierarchical Database
Before setting up service nodes, you need to set up either MySQL, PostgreSQL, or DB2 as the xCAT Database on the Management Node. The database client on the Service Nodes will be set up later when the SNs are installed. MySQL and PostgreSQL are available with the Linux OS. DB2 is an IBM product and must be purchased and is only supported by xCAT in Power 775 clusters.
Follow the instructions in one of these documents for setting up the Management node to use the selected database:
- Setting_Up_MySQL_as_the_xCAT_DB
- Follow this section and be sure to use the xCAT provided mysqlsetup command to setup the database for xCAT: Setting_Up_MySQL_as_the_xCAT_DB#Switch_to_MySQL_on_the_Management_Node
- Setting_Up_PostgreSQL_as_the_xCAT_DB
- Follow this section and be sure and use the xCAT provided pgsqlsetup command to setup the database for xCAT: Setting_Up_PostgreSQL_as_the_xCAT_DB#Switching_to_PostgreSQL_Database_on_Management_Node
- Setting_Up_DB2_as_the_xCAT_DB
- Follow the sections on setting up the Management Node. Be sure to use the xCAT provided db2sqlsetup command to setup the database for xCAT. At this time, do not do anything to setup the DB2 Client on the Service Nodes, so stop after you have run the db2sqlsetup script to setup the Management Node: Setting_Up_DB2_as_the_xCAT_DB#Overview_Setup_DB2_on_the_Management_Node
Define the service nodes in the database
This document assumes that you have previously defined your compute nodes in the database. It is also possible at this point that you have generic entries in your db for the nodes you will use as service nodes as a result of the node discovery process. We are now going to show you how to add all the relevant database data for the service nodes (SN) such that the SN can be installed and managed from the Management Node (MN). In addition, you will be adding the information to the database that will tell xCAT which service nodes (SN) will service which compute nodes (CN).
For this example, we have two service nodes: sn1 and sn2. We will call our Management Node: mn1. Note: service nodes are, by convention, in a group called service. Some of the commands in this document will use the group service to update all service nodes.
Note: a Service Node's service node is the Management Node; so a service node must have a direct connection to the management node. The compute nodes do not have to be directly attached to the Management Node, only to their service node. This will all have to be defined in your networks table.
Add Service Nodes to the nodelist Table
Define your service nodes (if not defined already), and by convention we put them in a service group. We usually have a group compute for our compute nodes, to distinguish between the two types of nodes. (If you want to use your own group name for service nodes, rather than service, you need to change some defaults in the xCAT db that use the group name service. For example, in the postscripts table there is by default a group entry for service, with the appropriate postscripts to run when installing a service node. Also, the default kickstart/autoyast template, pkglist, etc that will be used have files names based on the profile name service.)
mkdef sn1,sn2 groups=service,ipmi,all
Add OS and Hardware Attributes to Service Nodes
When you ran copycds, it created several osimage definitions, including some appropriate for SNs. Display the list of osimages and choose one with "service" in the name:
lsdef -t osimage
For this example, let's assume you chose the stateful osimage definition for rhels 6.3: rhels6.3-x86_64-install-service . If you want to modify any of the osimage attributes (e.g. kickstart/autoyast template, pkglist, etc), make a copy of the osimage definition and also copy to /install/custom any files it points to that you are modifying.
Now set some of the common attributes for the SNs at the group level:
chdef -t group service arch=x86_64 os=rhels6.3 nodetype=osi profile=service netboot=xnba installnic=mac primarynic=mac provmethod=rhels6.3-x86_64-install-service
Add Service Nodes to the servicenode Table
An entry must be created in the servicenode table for each service node or the service group. This table describes all the services you would like xcat to setup on the service nodes. (Even if you don't want xCAT to set up any services - unlikely - you must define the service nodes in the servicenode table with at least one attribute set (you can set it to 0), otherwise it will not be recognized as a service node.)
When the xcatd daemon is started or restarted on the service node, it will make sure all of the requested services are configured and started. (To temporarily avoid this when restarting xcatd, use "service xcatd reload" instead.)
To set up the minimum recommended services on the service nodes:
chdef -t group -o service setupnfs=1 setupdhcp=1 setuptftp=1 setupnameserver=1 setupconserver=1
See the setup* attributes in the node object definition man page for the services available. (The HTTP server is also started when setupnfs is set.) If you are using the setupntp postscript on the compute nodes, you should also set setupntp=1. For clusters with subnetted management networks (i.e. the network between the SN and its compute nodes is separate from the network between the MN and the SNs) you might want to also set setupipforward=1.
Add Service Node Postscripts
By default, xCAT defines the service node group to have the "servicenode" postscript run when the SNs are installed or diskless booted. This postscript sets up the xcatd credentials and installs the xCAT software on the service nodes. If you have your own postscript that you want run on the SN during deployment of the SN, put it in /install/postscripts on the MN and add it to the service node postscripts or postbootscripts. For example:
chdef -t group -p service postscripts=<mypostscript>
Notes:
- For Red Hat type distros, the postscripts will be run before the reboot of a kickstart install, and the postbootscripts will be run after the reboot.
- Make sure that the servicenode postscript is set to run before the otherpkgs postscript or you will see errors during the service node deployment.
- The -p flag automatically adds the specified postscript at the end of the comma-separated list of postscripts (or postbootscripts).
If you are running additional software on the service nodes that need ODBC to access the database (e.g. LoadLeveler or TEAL), use this command to add the xCAT supplied postbootscript called "odbcsetup".
chdef -t group -p service postbootscripts=odbcsetup
If using DB2 follow the instructions in this document for setting up the postscripts table to enable DB2 during servicenode installs. Setting_Up_DB2_as_the_xCAT_DB
Assigning Nodes to their Service Nodes
The node attributes servicenode and xcatmaster define which SN services this particular node. The servicenode attribute for a compute node defines which SN the MN should send a command to (e.g. xdsh), and should be set to the hostname or IP address of the service node that the management node contacts it by. The xcatmaster attribute of the compute node defines which SN the compute node should boot from, and should be set to the hostname or IP address of the service node that the compute node contacts it by.
Host name resolution must have been setup in advance, with /etc/hosts, DNS or dhcp to ensure that the names put in this table can be resolved on the Management Node, Service nodes, and the compute nodes. It is easiest to have a node group of the compute nodes for each service node. For example, if all the nodes in node group compute1 are serviced by sn1 and all the nodes in node group compute2 are serviced by sn2:
chdef -t group compute1 servicenode=sn1 xcatmaster=sn1-c chdef -t group compute2 servicenode=sn2 xcatmaster=sn2-c
Note: in this example, sn1 and sn2 are the node names of the service nodes (and therefore the hostnames associated with the NICs that the MN talks to). The hostnames sn1-c and sn2-c are associated with the SN NICs that communicate with their compute nodes.
Note: the attribute tftpserver defaults to the value of xcatmaster if not set, but in some releases of xCAT it has not defaulted correctly, so it is safer just to set it to the same value as xcatmaster.
These attributes will allow you to specify which service node should run the conserver (console) and monserver (monitoring) daemon for the nodes in the group specified in the command. In this example, we are having each node's primary SN also act as its conserver and monserver (the most typical setup).
chdef -t group compute1 conserver=sn1 monserver=sn1,sn1-c chdef -t group compute2 conserver=sn2 monserver=sn2,sn2-c
Service Node Pools
Service Node Pools are multiple service nodes that service the same set of compute nodes. Having multiple service nodes allows backup service node(s) for a compute node when the primary service node is unavailable, or can be used for work-load balancing on the service nodes. But note that the selection of which SN will service which compute node is made at compute node boot time. After that, the selection of the SN for this compute node is fixed until the compute node is rebooted or the compute node is explicitly moved to another SN using the snmove command.
To use Service Node pools, you need to architect your network such that all of the compute nodes and service nodes in a partcular pool are on the same flat network. If you don't want the management node to respond to/manage some of the compute nodes, it shouldn't be on that same flat network. The site.dhcpinterfaces attribute should be set such that the SNs' DHCP daemon only listens on the NIC that faces the compute nodes, not the NIC that faces the MN. This avoids some timing issues when the SNs are being deployed (so that they don't respond to each other before they are completely ready). You also need to make sure the networks table accurately reflects the physical network structure.
To define a list of service nodes that support a set of compute nodes, set the servicenode attribute to a comma-delimited list of the service nodes. When running an xCAT command like xdsh or updatenode for compute nodes, the list will be processed left to right, picking the first service node on the list to run the command. If that service node is not available, then the next service node on the list will be chosen until the command is successful. Errors will be logged. If no service node on the list can process the command, then the error will be returned. You can provide some load-balancing by assigning your service nodes as we do below.
When using service node pools, xcatmaster should not be set because the SN that responds first to the compute node's DHCP request during boot will be chosen.
For example:
chdef -t node compute1 servicenode=sn1,sn2 chdef -t node compute2 servicenode=sn2,sn1
You also need to set the sharedtftp site attribute to 0 so that the SNs will not automatically mount the /tftpboot directory from the management node:
chdef -t site clustersite sharedtftp=0
Note: If your service nodes are stateless and site.sharedtftp=0, if you reboot any service node when using servicenode pools, any data written to the local /tftpboot directory of that SN is lost. You will need to run nodeset for all of the compute nodes serviced by that SN again.
Conserver and Monserver and Pools
The support of conserver and monserver with Service Node Pools is still not supported. You must explicitly assign these functions to a service node using the nodehm.conserver and noderes.monserver attribute as above.
Setup Site Table
If you are not using the NFS-based statelite method of booting your compute nodes, set the installloc attribute to "/install". This instructs the service node to mount /install from the management node. (If you don't do this, you have to manually sync /install between the management node and the service nodes.)
chdef -t site clustersite installloc="/install"
For IPMI controlled nodes, if you want the out-of-band IPMI operations to be done directly from the management node (instead of being sent to the appropriate service node), set site.ipmidispatch=n.
If you want to throttle the rate at which nodes are booted up, you can set the following site attributes:
- syspowerinterval
- syspowermaxnodes
- powerinterval (system p only)
See the site table man page for details.
Setup networks Table
All networks in the cluster must be defined in the networks table. When xCAT was installed, it ran makenetworks, which created an entry in this table for each of the networks the management node is on. You need to add entries for each network the service nodes use to communicate to the compute nodes.
For example:
mkdef -t network net1 net=10.5.1.0 mask=255.255.255.224 gateway=10.5.1.1
If you want to set the nodes' xcatmaster as the default gateway for the nodes, the gateway attribute can be set to keyword "<xcatmaster>". In this case, xCAT code will automatically substitute the IP address of the node's xcatmaster for the keyword. Here is an example:
mkdef -t network net1 net=10.5.1.0 mask=255.255.255.224 gateway=<xcatmaster>
The ipforward attribute should be enabled on all the xcatmaster nodes that will be acting as default gateways. You can set ipforward to 1 in the servicenode table or add the line "net.ipv4.ip_forward = 1" in file /etc/sysctl.conf and then run "sysctl -p /etc/sysctl.conf" manually to enable the ipforwarding.
Note:If using service node pools, the networks table dhcpserver attribute can be set to any single service node in your pool. The networks tftpserver, and nameserver attributes should be left blank.
Verify the Tables
To verify that the tables are set correctly, run lsdef on the service nodes, compute1, compute2:
lsdef service,compute1,compute2
Add additional adapters configuration script (optional)
It is possible to have additional adapter interfaces automatically configured when the nodes are booted. XCAT provides sample configuration scripts for ethernet, IB, and HFI adapters. These scripts can be used as-is or they can be modified to suit your particular environment. The ethernet sample is /install/postscript/configeth. When you have the configuration script that you want you can add it to the "postscripts" attribute as mentioned above. Make sure your script is in the /install/postscripts directory and that it is executable.
Note: For system p servers, if you plan to have your service node perform the hardware control functions for its compute nodes, it is necessary that the SN ethernet network adapters connected to the HW service VLAN be configured. For Power 775 clusters specifically, see Configuring xCAT SN Hierarchy Ethernet Adapter DFM Only for more information.
Configuring Secondary Adapters
This doc only works for xCAT 2.8 and beyond, for the previous xCAT versions, use Configuring_Secondary_Adapters_Old instead.
The nics table and the confignics postscript can be used to automatically configure additional ethernet and Infiniband adapters on nodes as they are being deployed. ("Additional adapters" means adapters other than the primary adapter that the node is being installed/booted over.)
The way the confignics postscript decides what IP address to give the secondary adapter is by checking the nics table, in which the nic configuration information is stored.
To use the nics table and confignics postscript to define a secondary adapter on one or more nodes, follow these steps:
Define configuration information for the Secondary Adapters in the nics table
You could either use tabedit to edit the nics table directly or use the *def commands to define the nics attributes for the nodes. The syntax of the fields in the nics table is a little bit complex, directly editing the nics table can be tedious and error prone. It is recommended to use the *def commands to define the nics attributes for the nodes.
Here is a sample nics table content:
[root@ls21n01 ~]# tabdump nics #node,nicips,nichostnamesuffixes,nictypes,niccustomscripts,nicnetworks,nicaliases,comments,disable "cn1","eth1!11.1.89.7|12.1.89.7,eth2!13.1.89.7|14.1.89.7","eth1!-eth1-1|-eth1-2,eth2!-eth2-1|-eth2-2,"eth1!Ethernet,eth2!Ethernet",,"eth1!net11|net12,eth2!net13|net14",,, [root@ls21n01 ~]#
Using *def commands is much easier and cleaner:
Use stanza file
[root@ls21n01 ~]# cat cn1 # <xCAT data object stanza file> cn1: objtype=node arch=x86_64 groups=kvm,vm,all nichostnamesuffixes.eth1=-eth1-1|-eth1-2 nichostnamesuffixes.eth2=-eth2-1|-eth2-2 nicips.eth1=11.1.89.7|12.1.89.7 nicips.eth2=13.1.89.7|14.1.89.7 nicnetworks.eth1=net11|net12 nicnetworks.eth2=net13|net14 nictypes.eth1=Ethernet nictypes.eth2=Ethernet [root@ls21n01 ~]# cat cn1 | mkdef -z 1 object definitions have been created or modified. [root@ls21n01 ~]# lsdef cn1 Object name: cn1 arch=x86_64 groups=kvm,vm,all nichostnamesuffixes.eth1=-eth1-1|-eth1-2 nichostnamesuffixes.eth2=-eth2-1|-eth2-2 nicips.eth1=11.1.89.7|12.1.89.7 nicips.eth2=13.1.89.7|14.1.89.7 nicnetworks.eth1=net11|net12 nicnetworks.eth2=net13|net14 nictypes.eth1=Ethernet nictypes.eth2=Ethernet postbootscripts=otherpkgs postscripts=syslog,remoteshell,syncfiles [root@ls21n01 ~]#
Use command line input
[root@ls21n01 ~]# mkdef cn1 groups=all nicips.eth0="1.1.1.1|1.2.1.1" nicnetworks.eth0="net1|net2" nictypes.eth0="Ethernet" 1 object definitions have been created or modified. [root@ls21n01 ~]# [root@ls21n01 ~]# chdef cn1 nicips.eth0="2.1.1.1|2.1.1.1|3.1.1.1" 1 object definitions have been created or modified. [root@ls21n01 ~]#
Once you have this entry in the xCAT hosts table, you can run:
makehosts cn1
and it will put these entries in the /etc/hosts table:
11.1.89.7 cn1-eth1-1 cn1-eth1-1.ppd.pok.ibm.com 12.1.89.7 cn1-eth1-2 cn1-eth1-2.ppd.pok.ibm.com 13.1.89.7 cn1-eth2-1 cn1-eth2-1.ppd.pok.ibm.com 14.1.89.7 cn1-eth2-2 cn1-eth2-2.ppd.pok.ibm.com
Add confignics into the node's postscripts list
chdef cn1 -p postscripts=confignics
If you wish to configure IB interfaces, the basic procedure is very similar with the procedure described above, here are the minor differences:
1. In the nics table, the NIC names should be something like "ib0", "ib1", etc; the nictypes should be "Infiniband". Here is an example: [root@ls21n01 postscripts]# lsdef dx360m3n06 --nics Object name: dx360m3n06
nichostnamesuffixes.ib0=-ib0|-ib0-2 nichostnamesuffixes.ib1=-ib1|-ib1-2 nicips.ib0=11.1.89.10|21.1.89.10 nicips.ib1=12.1.89.10|22.1.89.10 nicnetworks.ib0=11_1_0_0-255_255_0_0|21_1_0_0-255_255_0_0 nicnetworks.ib1=12_1_0_0-255_255_0_0|22_1_0_0-255_255_0_0 nictypes.ib0=Infiniband nictypes.ib1=Infiniband
[root@ls21n01 postscripts]#
2. A flag --ibaports can be used to specify the ports number of IB adapters, the default value is 1.
chdef cn1 -p postscripts=confignics --ibaports=2
Gather MAC information for the install adapters
[NOTE] you should get the MAC information for the service node firstly. After finishing the OS provision for service node, and create the connections between the hdwr_svr on the service node and the non-sn-CEC, you can get the MAC information for the compute node.
Using getmacs
Use the xCAT getmacs command to gather adapter information from the nodes. This command will return the MAC information for each Ethernet or HFI adapter available on the target node. The command can be used to either display the results or write the information directly to the database. If there are multiple adapters the first one will be written to the database and used as the install adapter for that node.
The command can also be used to do a ping test on the adapter interfaces to determine which ones could be used to perform the network boot. In this case the first adapter that can be successfully used to ping the server will be written to the database.
getmacs working with P775
The P775 cec supports two networks with the getmacs command. The default network in the P775 is the HFI which is used to communicate between all the P775 octants. There is also support for an Ethernet network which is used to communicate between the xCAT EMS, and other P775 server octants. It is important that all the networks are properly defined in the xCAT networks table.
Before running getmacs you must first run the makeconservercf command. You need to run makeconservercf any time you add new nodes to the cluster.
makeconservercf
Shut down all the nodes that you will be querying for MAC addresses. For example, to shut down all nodes in the group "compute" you could run the following command.
rpower compute off
To display all adapter information but not write anything to the database you should use the "-d" flag. For example:
getmacs -d compute
To retrieve the Ethernet MAC address for a P775 xCAT service node, you will need to provide the -D flag (ping test) with getmacs command.
getmacs <xcatsn> -D
The output would be similar to the following.
Type Location Code MAC Address Full Path Name Ping Result Device Type ent U9125.F2A.024C362-V6-C2-T1 fef9dfb7c602 [/vdevice/l-lan@30000002 /vdevice/l-lan@30000002] successful virtual ent U9125.F2A.024C362-V6-C3-T1 fef9dfb7c603 /vdevice/l-lan@30000003 unsuccessful virtual
From this result you can see that " fef9dfb7c602" should be used for this service nodes MAC address.
To retrieve the HFI MAC address used for an P775 xCAT compute nodes, your getmacs command does not require the -D flag since the first HFI adapter recognized will be used.
getmacs <HFInodes>
The output for HFI interface would be similar to the following.
# Type Location Code MAC Address Full Path Name Ping Result hfi-ent U78A9.001.1122233-P1 020004030004 /hfi-iohub@300000000000002/hfi-ethernet@10 unsuccessful physical hfi-ent U78A9.001.1122233-P1 020004030004 /hfi-iohub@300000000000002/hfi-ethernet@11 unsuccessful physical
For more information on using the getmacs command see the man page.
If you did not have the getmacs command write the MAC addresses directly to the database you can do it manually using the the chdef command. For example:
chdef -t node node01 mac=fef9dfb7c603
Configure DHCP
Add the relevant networks into the DHCP configuration, refer to:
Add networks into DHCP
Add the defined nodes into the DHCP configuration, refer to:
Add nodes into DHCP
Set Up the Service Nodes for Stateful (Diskful) Installation
Any cluster using statelite compute nodes (include p775 clusters) must use a stateful (diskful) service nodes.
Note: If you are using diskless service nodes, go to #Setup the Service Node for diskless install
First, go to the Download xCAT site and download the level of the xCAT tarball you desire. Then go to http://sourceforge.net/projects/xcat/files/xcat-dep/2.x_Linux and get the latest xCAT dependency tarball. Note: All xCAT service nodes must be at the exact same xCAT version as the xCAT Management Node. Copy the files to the Management Node (MN) and untar them in the appropriate sub-directory of /install/post/otherpkgs:
mkdir -p /install/post/otherpkgs/rhels6/ppc64/xcat cd /install/post/otherpkgs/rhels6/ppc64/xcat tar jxvf core-rpms-snap.tar.bz2 tar jxvf xcat-dep-*.tar.bz2
Next, add rpm names into your own version of service.<osver>.<arch>.otherpkgs.pkglist file. In most cases, you can find an initial copy of this file under /opt/xcat/share/xcat/install/<platform> . If not, copy one from a similar platform.
cp /opt/xcat/share/xcat/install/rh/service.rhels6.ppc64.otherpkgs.pkglist /install/custom/install/rh vi /install/custom/install/rh/service.rhels6.ppc64.otherpkgs.pkglist
Add the following. You must include at least one rpm from each directory you want xCAT to use, because xCAT uses the paths to set up the yum/zypper repositories:
xcat/xcat-core/xCATsn xcat/xcat-dep/rh6/x86_64/conserver-xcat
Note: you will be installing the xCAT Service Node rpm xCATsn meta-package on the Service Node, not the xCAT Management Node meta-package. Do not install both.
For Power 775 Clusters, you should add the DFM and hdwr_svr into the list of packages to be installed on the SN. Refer to: Installing DFM and hdwr_svr on Linux SN
If you want to setup disk mirroring(RAID1) on the service nodes, see Use_RAID1_In_xCAT_Cluster for more details.
Update the powerpc-utils-1.2.2-18.el6.ppc64.rpm in the rhels6 RPM repository (rhels6 only)
- This section could be removed after the powerpc-utils-1.2.2-18.el6.ppc64.rpm is built in the base rhels6 ISO.
- The direct rpm download link is: ftp://linuxpatch.ncsa.uiuc.edu/PERCS/powerpc-utils-1.2.2-18.el6.ppc64.rpm
- The update steps are as following:
- put the new rpm in the base OS packages
cd /install/rhels6/ppc64/Server/Packages mv powerpc-utils-1.2.2-17.el6.ppc64.rpm /tmp cp /tmp/powerpc-utils-1.2.2-18.el6.ppc64.rpm . chmod +r powerpc-utils-1.2.2-18.el6.ppc64.rpm # make sure that the rpm is be readable by other users
- create the repodata
cd /install/rhels6/ppc64/Server
ls -al repodata/
total 14316
dr-xr-xr-x 2 root root 4096 Jul 20 09:34 .
dr-xr-xr-x 3 root root 4096 Jul 20 09:34 ..
-r--r--r-- 1 root root 1305862 Sep 22 2010 20dfb74c144014854d3b16313907ebcf30c9ef63346d632369a19a4add8388e7-other.sqlite.bz2
-r--r--r-- 1 root root 1521372 Sep 22 2010 57b3c81512224bbb5cebbfcb6c7fd1f7eb99cca746c6c6a76fb64c64f47de102-primary.xml.gz
-r--r--r-- 1 root root 2823613 Sep 22 2010 5f664ea798d1714d67f66910a6c92777ecbbe0bf3068d3026e6e90cc646153e4-primary.sqlite.bz2
-r--r--r-- 1 root root 1418180 Sep 22 2010 7cec82d8ed95b8b60b3e1254f14ee8e0a479df002f98bb557c6ccad5724ae2c8-other.xml.gz
-r--r--r-- 1 root root 194113 Sep 22 2010 90cbb67096e81821a2150d2b0a4f3776ab1a0161b54072a0bd33d5cadd1c234a-comps-rhel6-Server.xml.gz
-r--r--r-- 1 root root 1054944 Sep 22 2010 98462d05248098ef1724eddb2c0a127954aade64d4bb7d4e693cff32ab1e463c-comps-rhel6-Server.xml
-r--r--r-- 1 root root 3341671 Sep 22 2010 bb3456b3482596ec3aa34d517affc42543e2db3f4f2856c0827d88477073aa45-filelists.sqlite.bz2
-r--r--r-- 1 root root 2965960 Sep 22 2010 eb991fd2bb9af16a24a066d840ce76365d396b364d3cdc81577e4cf6e03a15ae-filelists.xml.gz
-r--r--r-- 1 root root 3829 Sep 22 2010 repomd.xml
-r--r--r-- 1 root root 2581 Sep 22 2010 TRANS.TBL
createrepo -g repodata/98462d05248098ef1724eddb2c0a127954aade64d4bb7d4e693cff32ab1e463c-comps-rhel6-Server.xml .
# Note: you should use comps-rhel6-Server.xml with its key as the group file.
Additional Configuration for Power 775 Clusters Only
- Increase /boot filesystem size for service nodes
- A customized kernel is required on the service nodes to work with the HFI network. Since both the base and customized kernels will exist on the service nodes, the /boot filesystem will need be increased from the standard default when installing a service node.
- Note, increasing /boot is a workaround for using the customized kernel. After this kernel is accepted by the Linux Kernel community, only one kernel will be required on the service node, and this step will no longer be needed.
- Copy the service node Kickstart install template provided by xCAT to a custom location. For example:
cp /opt/xcat/share/xcat/install/rh/service.rhels6.ppc64.tmpl /install/custom/install/rh
- Edit the copied file and change the line:
part /boot --size 50 --fstype ext4</pre>
- to:
part /boot --size 200 --fstype ext4</pre>
Additional Configuration when using DB2
To have DB2 installed and configured during the install of the Service Node, you need to do the additional setup as documented in Setting_Up_DB2_as_the_xCAT_DB#Automatic_install_of_DB2_and_Client_setup_on_SN.
Set the node status to ready for installation
nodeset service install
Initialize network boot to install Service Nodes
rnetboot service
Initialize network boot to install Service Nodes in Power 775
Starting from xCAT 2.6 and working in Power 775 cluster, there are two ways to initialize a network boot: one way is that using rbootseq command to setup the boot device as network adapter for the compute node, and after that, you can issue rpower command to power on or reset the compute node to boot from network, another way is to use rnetboot to the compute node directly. Comparing between these two ways, rbootseq/rpower command doesn't require the console support and operate in the console, so it has a better performance. It is recommended to use rbootseq/rpower to setup the boot device to network adapter and initialize the network boot in Power 775 cluster.
Example of using rbootseq and rpower:
rbootseq service net rpower service boot
Monitor the Installation
Watch the installation progress using either wcons or rcons:
wcons service # make sure DISPLAY is set to your X server/VNC or
rcons <one-node-at-a-time>
tail -f /var/log/messages
Note: We have experienced one problem while trying to install RHEL6 diskful service node working with SAS disks. The service node cannot reboots from SAS disk after the RHEL6 operating system has been installed. We are waiting for the build with fixes from RHEL6 team, once meet this problem, you need to manually select the SAS disk to be the first boot device and boots from the SAS disk.
Update Service Node Diskfull Image
If you need to update the service nodes later on with a new version of xCAT and its dependencies, obtain the new xCAT and xCAT dependencies rpms. (Follow the same steps that were followed in Setting Up a Linux Hierarchical Cluster#Set Up the Service Nodes for Diskfull Installation.)
Update the service nodes with the new xCAT rpms:
updatenode service -S
If you want to update the service nodes later on with a new release of Linux, you will need to follow the instructions in the Linux documentation, or use the following procedure at your own risk, it is not fully tested and not formally supported according to the Linux documentation:
1. Run copycds on the management node to copy the installation image of the new Linux release, for example:
copycds /iso/RHEL6.2-20111117.0-Server-x86_64-DVD1.iso
2. Modify the os version of the service nodes, for example:
chdef <servicenode> os=rhels6.2
3. Run updatenode <servicenode> -P ospkgs to update the Linux on the service nodes to a new release:
updatenode <servicenode> -P ospkgs
Setup the Service Node for Stateless Deployment
If you want, your service nodes can be stateless (diskless). The service node must contain not only the OS, but also the xCAT software and it's dependencies. In addition, a number of files are added to the service node to support the PostgreSQL, or MySQL database access from the service node to the Management node, and ssh access to the nodes that the service nodes services. (DB2 is not supported on diskless Service Nodes.) The following sections explain how to accomplish this.
Build the Service Node Stateless Image
This section assumes you can build the stateless image on the management node because the service nodes are the same OS and architecture as the management node. If this is not the case, you need to build the image on a machine that matches the service node's OS/architecture.
- Check the service node packaging to see if it has all the rpms required. We ship a basic requirements lists to bring up a service node, but you may want to add additional operating system packages.
cd /opt/xcat/share/xcat/netboot/fedora/ vi service.pkglist vi service.exlist
Edit service.exlist and verify that nothing is excluded that you want on the service nodes.
While you are here, edit compute.pkglist and compute.exlist, adding and removing parckages as necessary. Ensure that the pkglist contains bind-utils so that name resolution will work during boot.
Note:If you change the' files, you should copy them to the /install/custom/netboot/fedora directory, so that the next install of xCAT will not wipe out your changes. The code will look in the /install/custom directory first.
- Add additional packages including xCAT and xCAT dependencies required on the Service Nodes. The Service Nodes require the same dependency packages needed for the Management Node.
Notice we are installing xCATsn, the servicenode meta package.
Use the otherpkgs function to install the additional software: (see https://sourceforge.net/apps/mediawiki/xcat/index.php?title=Using_Updatenode)
First, go to the xCAT Download site in the section titled RPMs in tarball - download tarball ... Download the level of xCAT tarball you desired. Go to the xCAT Dependencies Download page and get the latest xCAT dependency tarball.
Download and copy the files to the Management Node (MN) and untar them:
mkdir -p /install/post/otherpkgs/fedora8/x86_64/xcat cd /install/post/otherpkgs/fedora8/x86_64/xcat tar jxvf core-rpms-snap.tar.bz2 tar jxvf xcat-dep-2*.tar.bz2
Next add rpm names into the service.<osver>.<arch>.otherpkgs.pkglist file. In most cases, this file is already created under /opt/xcat/share/xcat/netboot/<platform> directory. If it is not, you can create your own by referencing the existing ones. You can know about the relationship between <platform> and <os>/<osver> from OS_versus_Platform.
vi /install/custom/netboot/fedora/service.ferdora8.x86_64.otherpkgs.pkglist
Add the following:
xcat/xcat-core/xCATsn xcat/xcat-dep/fedora8/x86_64/conserver
where the '-' means remove the rpm first before installing other rpms. For SLES, the packages to be removed is perl-doc.
- Run image generation:
Note: we assume you have run copycds to place the OS image in /install/fedora8/...
rm -rf /install/netboot/fedora8/x86_64/service cd /opt/xcat/share/xcat/netboot/fedora/ ./genimage -i eth0 -n tg3,bnx2 -o fedora8 -p service
- For SLES 11, make sure that atftp rpm from xCAT is installed in the image not the atftp from the OS.
rpm -root /install/netboot/fedora8/x86_64/service/rootimg -qa | grep atftp
If the rpm is from then OS then remove and install the xCAT atftp ( atftp-0.7.1.*rpm) into the otherpkgs directory with the xCAT rpms as you did above and run genimage again.
rpm -root /install/netboot/fedora8/x86_64/service/rootimg -e <atftp rpm> --nodeps
cp atftp-0.7.1.*rpm /install/post/otherpkgs/fedora8/x86_64
add the rpm name into:
/install/custom/netboot/fedora/service.otherpkgs.pkglist
Run genimage to pick up the additional package.
cd /opt/xcat/share/xcat/netboot/fedora/ ./genimage -i eth0 -n tg3,bnx2 -o fedora8 -p service
- Prevent DHCP from starting up until xcatd has had a chance to configure it:
chroot /install/netboot/fedora8/x86_64/service/rootimg chkconfig dhcpd off chroot /install/netboot/fedora8/x86_64/service/rootimg chkconfig dhcrelay off
- Edit fstab:
cd /install/netboot/fedora8/x86_64/service/rootimg/etc/ cp fstab fstab.ORIG
Put in fstab:
proc /proc proc rw 0 0 sysfs /sys sysfs rw 0 0 devpts /dev/pts devpts rw,gid=5,mode=620 0 0 service_x86_64 / tmpfs rw 0 1 none /tmp tmpfs defaults,size=10m 0 2 none /var/tmp tmpfs defaults,size=10m 0 2
- IF using NFS hybrid mode, export /install read-only in service node image:
cd /install/netboot/fedora8/x86_64/service/rootimg/etc echo '/install *(ro,no_root_squash,sync,fsid=13)' >exports
- Pack the image
packimage -o fedora8 -p service -a x86_64
- Set the node status to ready for netboot
nodeset service netboot
- To diskless boot the service nodes
rnetboot service
- To diskless boot the service nodes in Power 775
Starting from xCAT 2.6 and working in Power 775 cluster, there are two ways to initialize a network boot: one way is that using rbootseq command to setup the boot device as network adapter for the compute node, and after that, you can issue rpower command to power on or reset the compute node to boot from network, another way is to use rnetboot to the compute node directly. Comparing between these two ways, rbootseq/rpower command doesn't require the console support and operate in the console, so it has a better performance. It is recommended to use rbootseq/rpower to setup the boot device to network adapter and initialize the network boot in Power 775 cluster.
rbootseq service net rpower service boot
Update Service Node Stateless Image
To update the xCAT software in the image at a later time:
yum --installroot=/install/netboot/fedora8/x86_64/service/rootimg update '*xCAT*' packimage -o fedora8 -p service -a x86_64
OR
use updatenode otherpkgs function.
see previous section for downloading the xCAT tarballs.
Be sure to genimage and repack the image:
cd /opt/xcat/share/xcat/netboot/fedora/ ./genimage -i eth0 -n tg3,bnx2 -o fedora8 -p service packimage -o fedora8 -p service -a x86_64
Set the node status to ready for netboot
nodeset service netboot
To diskless boot the service nodes
rnetboot service
To diskless boot the service nodes in Power 775
Starting from xCAT 2.6 and working in Power 775 cluster, there are two ways to initialize a network boot: one way is that using rbootseq command to setup the boot device as network adapter for the compute node, and after that, you can issue rpower command to power on or reset the compute node to boot from network, another way is to use rnetboot to the compute node directly. Comparing between these two ways, rbootseq/rpower command doesn't require the console support and operate in the console, so it has a better performance. It is recommended to use rbootseq/rpower to setup the boot device to network adapter and initialize the network boot in Power 775 cluster.
rbootseq service net rpower service boot
Note: The service nodes are set up as NFS-root servers for the compute nodes. Any time changes are made to any compute image on the mgmt node it will be necessary to sync all changes to all service nodes. In our case the /install directory is mounted on the servicenodes, so the update to the compute node image is automatically available.
Monitor install and boot
wcons service # make sure DISPLAY is set to your X server/VNC or
rcons <one-node-at-a-time> # or do rcons for each node
tail -f /var/log/messages
Test Service Node installation
- ssh to the service nodes. You should not be prompted for a password.
- Check to see that the xcat daemon xcatd is running.
- Run some database command on the service node, e.g tabdump site, or nodels, and see that the database can be accessed from the service node.
- Check that /install and /tftpboot are mounted on the service node from the Management Node, if appropriate.
- Make sure that the Service Node has Name resolution for all nodes, it will service.
Additional Configuration of the Service Nodes for Power 775 Clusters Only
Switch xcat-yaboot to yaboot released by RHEL
Note: This is a workaround for the issue that xcat-yaboot is not working properly with netboot over HFI. After xcat-yaboot is rebuilt on top of yaboot version 1.3.17, this step is not required:
yum install yaboot mv /tftpboot/yaboot /tftpboot/yaboot.back cp /usr/lib/yaboot/yaboot /tftpboot/yaboot
Copy the HFI driver and DHCP packages to service node
xdcp c250f07c04ap01 -R /hfi /hfi
Install the HFI device drivers and HFI enabled dhcp server on the xCAT SN
xdsh c250f07c04ap01 rpm -ivh /hfi/dd/kernel-2.6.32-*.ppc64.rpm xdsh c250f07c04ap01 rpm -ivh /hfi/dd/kernel-headers-2.6.32-*.ppc64.rpm --force xdsh c250f07c04ap01 rpm -ivh /hfi/dd/hfi_util-*.el6.ppc64.rpm xdsh c250f07c04ap01 rpm -ivh /hfi/dd/hfi_ndai-*.el6.ppc64.rpm
xdsh c250f07c04ap01 rpm -ivh /hfi/dhcp/net-tools-*.el6.ppc64.rpm --force xdsh c250f07c04ap01 rpm -ivh /hfi/dhcp/dhcp-*.el6.ppc64.rpm --force xdsh c250f07c04ap01 rpm -ivh /hfi/dhcp/dhclient-*.el6.ppc64.rpm
xdsh c250f07c04ap01 /sbin/new-kernel-pkg --mkinitrd --depmod --install 2.6.32-71.el6.20110617.ppc64 xdsh c250f07c04ap01 /sbin/new-kernel-pkg --rpmposttrans 2.6.32-71.el6.20110617.ppc64
Change yaboot to boot from customized kernel
Create soft links and change yaboot.conf to boot from the customized kernel with HFI support
xdsh c250f07c04ap01 ln -sf /boot/vmlinuz-2.6.32-71.el6.20110617.ppc64 /boot/vmlinuz xdsh c250f07c04ap01 ln -sf /boot/System.map-2.6.32-71.el6.20110617.ppc64 /boot/System.map
Login to the service node and change the "default=" setting in /boot/etc/yaboot.conf to the new label with HFI support. For example, change it from:
boot=/dev/sda1 init-message="Welcome to Red Hat Enterprise Linux!\nHit <TAB> for boot options" partition=3 timeout=5 install=/usr/lib/yaboot/yaboot delay=5 enablecdboot enableofboot enablenetboot nonvram fstype=raw default=linux image=/vmlinuz-2.6.32hfi label=2.6.32hfi read-only initrd=/initrd-2.6.32hfi.img append="rd_NO_LUKS rd_NO_LVM rd_NO_MD rd_NO_DM LANG=en_US.UTF-8 SYSFONT=latarcyrhebsun16 KEYTABLE=us console=hvc0 crashkernel=auto rhgb quiet root=UUID=e2123609-7080-45f0-b583-23d5ef27dbba" image=/vmlinuz-2.6.32-71.el6.ppc64 label=linux read-only initrd=/initramfs-2.6.32-71.el6.ppc64.img append="rd_NO_LUKS rd_NO_LVM rd_NO_MD rd_NO_DM LANG=en_US.UTF-8 SYSFONT=latarcyrhebsun16 KEYTABLE=us console=hvc0 crashkernel=auto rhgb quiet root=UUID=e2123609-7080-45f0-b583-23d5ef27dbba"
TO:
boot=/dev/sda1 init-message="Welcome to Red Hat Enterprise Linux!\nHit <TAB> for boot options" partition=3 timeout=5 install=/usr/lib/yaboot/yaboot delay=5 enablecdboot enableofboot enablenetboot nonvram fstype=raw default=2.6.32-71.el6.20110617.ppc64 image=/vmlinuz-2.6.32-71.el6.20110617.ppc64 label=2.6.32-71.el6.20110617.ppc64 read-only initrd=/initrd-2.6.32-71.el6.20110617.ppc64.img append="rd_NO_LUKS rd_NO_LVM rd_NO_MD rd_NO_DM LANG=en_US.UTF-8 SYSFONT=latarcyrhebsun16 KEYTABLE=us console=hvc0 crashkernel=auto rhgb quiet root=UUID=e2123609-7080-45f0-b583-23d5ef27dbba" image=/vmlinuz-2.6.32-71.el6.ppc64 label=linux read-only initrd=/initramfs-2.6.32-71.el6.ppc64.img append="rd_NO_LUKS rd_NO_LVM rd_NO_MD rd_NO_DM LANG=en_US.UTF-8 SYSFONT=latarcyrhebsun16 KEYTABLE=us console=hvc0 crashkernel=auto rhgb quiet root=UUID=e2123609-7080-45f0-b583-23d5ef27dbba"
Reset the service node to boot from the kernel with HFI
xdsh c250f07c04ap01 reboot
Sync /etc/hosts to Service Node and configure HFI interfaces
Now the HFI interfaces should be available on the service node. Sync the /etc/hosts from MN to SN and run the confighfi postscript to configure the HFI interfaces with IP addresses.
xdcp c250f07c04ap01 /etc/hosts /etc/hosts chdef c250f07c04ap01 postscripts=confighfi updatenode c250f07c04ap01
Create new network definition for HFI network
Create new networks definitions in the xCAT database for the HFI interfaces. Generally there will be four HFI interfaces on the service nodes, so for each network, create an network entry.
mkdef -t network -o hfinet0 net=20.0.0.0 mask=255.0.0.0 gateway=20.7.4.1 mgtifname=hf0 dhcpserver=20.7.4.1 tftpserver=20.7.4.1 nameservers=20.7.4.1 mkdef -t network -o hfinet1 net=21.0.0.0 mask=255.0.0.0 gateway=21.7.4.1 mgtifname=hf1 dhcpserver=21.7.4.1 tftpserver=21.7.4.1 nameservers=21.7.4.1 mkdef -t network -o hfinet2 net=22.0.0.0 mask=255.0.0.0 gateway=22.7.4.1 mgtifname=hf2 dhcpserver=22.7.4.1 tftpserver=22.7.4.1 nameservers=22.7.4.1 mkdef -t network -o hfinet3 net=23.0.0.0 mask=255.0.0.0 gateway=23.7.4.1 mgtifname=hf3 dhcpserver=23.7.4.1 tftpserver=23.7.4.1 nameservers=23.7.4.1
where the above net, mask, and gateway are appropriate for your hf0,hf1,hf2 and hf3 network
Define and install your Compute Nodes
At this point you can return to the documentation for your cluster environment to define and deploy your compute nodes. For Power 775 Cluster, after the service node has been installed, you should create the connections between the hdwr_svr and non-sn_CEC, and then do other hardware control commands.
Note that all of the files and directories pointed to by your osimages should be placed under the directory referred to in site.installdir (usually /install), so they will be available to the service nodes. The installdir directory is mounted or copied to the service nodes during the hierarchical install of compute nodes from the service nodes.
Appendix A: Setup backup Service Nodes
For reliability, availability, and serviceability purposes you may wish to designate backup service nodes in your hierarchical cluster. The backup service node will be another active service node that is set up to easily take over from the original service node if a problem occurs. This is not an automatic fail over feature. You will have to initiate the switch from the primary service node to the backup manually. The xCAT support will handle most of the setup and transfer of the nodes to the new service node. This procedure can also be used to simply switch some compute nodes to a new service node, for example, for planned maintenance.
Abbreviations used below:
- MN - management node.
- SN - service node.
- CN - compute node.
Initial deployment
Integrate the following steps into the hierarchical deployment process described above.
- Make sure both the primary and backup service nodes are installed, configured, and can access the MN database.
- When defining the CNs add the necessary service node values to the "servicenode" and "xcatmaster" attributes of the node definitions.
- (Optional) Create an xCAT group for the nodes that are assigned to each SN. This will be useful when setting node attributes as well as providing an easy way to switch a set of nodes back to their original server.
To specify a backup service node you must specify a comma-separated list of two service nodes for the servicenode value of the compute node. The first one will be the primary and the second will be the backup (or new SN) for that node. Use the hostnames of the SNs as known by the MN.
For the xcatmaster value you should only include the primary SN, as known by the compute node.
In most hierarchical clusters, the networking is such that the name of the SN as known by the MN is different than the name as known by the CN. (If they are on different networks.)
In the following example assume the SN interface to the MN is on the "a" network and the interface to the CN is on the "b" network. To set the attributes you would run a command similar to the following.
chdef <noderange> servicenode="xcatsn1a,xcatsn2a" xcatmaster="xcatsn1b"
The process can be simplified by creating xCAT node groups to use as the <noderange> in the chdef command. To create an xCAT node group containing all the nodes that have the service node "SN27" you could run a command similar to the following.
mkdef -t group sn1group members=node[01-20]
Note: Normally backup service nodes are the primary SNs for other compute nodes. So, for example, if you have 2 SNs, configure half of the CNs to use the 1st SN as their primary SN, and the other half of CNs to use the 2nd SN as their primary SN. Then each SN would be configured to be the backup SN for the other half of CNs.
When you run makedhcp, it will configure dhcp and tftp on both the primary and backup SNs, assuming they both have network access to the CNs. This will make it possible to do a quick SN takeover without having to wait for replication when you need to switch.
Synchronizing statelite persistent files
If you are using xCAT's "statelite" support, you may want to replicate your statelite files to the backup (or new) service node. This would be the case if you are using the service node as the server for the statelite persistent directory. In this case you need to copy your statelite files and directories to the backup service node and keep them synchronized over time. An easy and efficient way to do this would be to use the rsync command from the primary SN to the backup SN.
For example, to copy and/or update the /nodedata directory on the backup service node "sn2" you could run the following command on sn1:
rsync -auv /nodedata sn2:/
Note: The xCAT snmove command has a new option -l to synchronize statelite files from the primary service node to the backup service node, but it is currently only implemented on AIX.
See xCAT Linux Statelite for details on using the xCAT statelite support.
Monitoring the service nodes
In most cluster environments it is very important to monitor the state of the service nodes. If a SN fails for some reason you should switch nodes to the backup service node as soon as possible.
See Monitor_and_Recover_Service_Nodes#Monitoring_Service_Nodes for details on monitoring your service nodes.
Switch to the backup SN
When an SN fails, or you want to bring it down for maintenance, use this procedure to move its CNs over to the backup SN.
Move the nodes to the new service nodes
Use the xCAT snmove to make the database updates necessary to move a set of nodes from one service node to another, and to make configuration modifications to the nodes.
For example, if you want to switch all the compute nodes that use service node "sn1" to the backup SN (sn2), run:
snmove -s sn1
Modified database attributes
The snmove command will check and set several node attribute values.
- servicenode:
- This will be set to either the second server name in the servicenode attribute list or the value provided on the command line.
- xcatmaster:
- Set with either the value provided on the command line or it will be automatically determined from the servicenode attribute.
- nfsserver:
- If the value is set with the source service node then it will be set to the destination service node.
- tftpserver:
- If the value is set with the source service node then it will be reset to the destination service node.
- monserver:
- If set to the source service node then reset it to the destination servicenode and xcatmaster values.
- conserver:
- If set to the source service node then reset it to the destination servicenode and run makeconservercf.
Run postscripts on the nodes
If the CNs are up at the time the snmove command is run then snmove will run postscripts on the CNs to reconfigure them for the new SN. The "syslog" postscript is always run. The "mkresolvconf" and "setupntp" scripts will be run IF they were included in the nodes postscript list.
You can also specify an additional list of postscripts to run.
Modify system configuration on the nodes
If the CNs are up the snmove command will also perform some configuration on the nodes such as setting the default gateway and modifying some configuration files used by xCAT.
Statelite migration
If you are using the xCAT statelite support you may need to modify the statelite and litetree tables. This would be necessary if any of the entries in the tables include the name of the primary service node as the server for the file or directory. In this case you would have to change those entries to the name of the backup service node. But a better solution is to use the variable $noderes.xcatmaster in the statelite and litetree tables. See xCAT Linux Statelite for details.
Boot the statelite nodes
For statelite nodes that do not use an external NFS server, if the original service node is down, the CNs it manages will be down too. You must run the nodeset command for those nodes and then boot the nodes after running snmove. For stateless nodes (and in some cases RAMDisk statelite nodes), the nodes will be up even if the original service node is down. However, make sure to run the nodeset command in case you need to reboot the nodes later.
Note: when moving p775 nodes, use the rbootseq and rpower commands to boot the nodes. If you do not use the rbootseq command, the nodes will still try to boot from the old SN.
Switching back
The process for switching nodes back will depend on what must be done to recover the original service node. If the SN needed to be reinstalled, you need to set it up as an SN again and make sure the CN images are replicated to it. Once you've done this, or if the SN's configuration was not lost, then follow these steps to move the CNs back to their original SN:
- Use snmove:
snmove sn1group -d sn1
- If these are statelite CNs:
- rsync persistent files back to the original SN
- run nodeset for these CNs
- boot the CNs
Appendix B: Diagnostics
- root ssh keys not setup -- If you are prompted for a password when ssh to the service node, then check to see if /root/.ssh has authorized_keys. If the directory does not exist or no keys, on the MN, run xdsh service -K, to exchange the ssh keys for root. You will be prompted for the root password, which should be the password you set for the key=system in the passwd table.
- XCAT rpms not on SN --On the SN, run rpm -qa | grep xCAT and make sure the appropriate xCAT rpms are installed on the servicenode. See the list of xCAT rpms in Set Up the Service Nodes for diskfull Installation. If rpms missing check your install setup as outlined in Build the Service Node Stateless Image for diskless or Set Up the Service Nodes for diskfull Installation for diskfull installs.
- otherpkgs(including xCAT rpms) installation failed on the SN --The OS repository is not created on the SN. When the "yum" command is processing the dependency, the rpm packages (including expect, nmap, and httpd, etc) required by xCATsn can't be found. In this case, please check whether the /install/postscripts/repos/<osver>/<arch>/ directory exists on the MN. If it is not on the MN, you need to re-run the "copycds" command, and there will be some file created under the /install/postscripts/repos/<osver>/<arch> directory on the MN. Then, you need to re-install the SN, and this issue should be gone.
- Error finding the database/starting xcatd -- If on the Service node when you run tabdump site, you get "Connection failure: IO::Socket::SSL: connect: Connection refused at /opt/xcat/lib/perl/xCAT/Client.pm". Then restart the xcatd daemon and see if it passes by running the command: service xcatd restart. If it fails with the same error, then check to see if /etc/xcat/cfgloc file exists. It should exist and be the same as /etc/xcat/cfgloc on the MN. If it is not there, copy it from the MN to the SN. The run service xcatd restart. This indicates the servicenode postscripts did not complete successfully. Check to see your postscripts table was setup correctly in Add Service Nodes postscripts to the postscripts table.
- Error accessing database/starting xcatd credential failure-- If you run tabdump site on the servicenode and you get "Connection failure: IO::Socket::SSL: SSL connect attempt failed because of handshake problemserror:14094418:SSL routines:SSL3_READ_BYTES:tlsv1 alert unknown ca at /opt/xcat/lib/perl/xCAT/Client.pm", check /etc/xcat/cert. The directory should contain the files ca.pem and server-cred.pem. These were suppose to transfer from the MN /etc/xcat/cert directory during the install. Also check the /etc/xcat/ca directory. This directory should contain most files from the /etc/xcat/ca directory on the MN. You can manually copy them from the MN to the SN, recursively. This indicates the the servicenode postscripts did not complete successfully. Check to see your postscripts table was setup correctly in Add Service Nodes postscripts to the postscripts table. Again service xcatd restart and try the tabdump site again.
- Missing ssh hostkeys -- Check to see if /etc/xcat/hostkeys on the SN, has the same files as /etc/xcat/hostkeys on the MN. These are the ssh keys that will be installed on the compute nodes, so root can ssh between compute nodes without password prompting. If they are not there copy them from the MN to the SN. Again, these should have been setup by the servicenode postscripts.
Appendix C: Migrating a Management Node to a Service Node
If you find you want to convert an existing Management Node to a Service Node, you need to work with the xCAT team. It is recommended for now, to backup your database, setup your new Management Server, and restore your database into it. Take the old Management Node and remove xCAT and all xCAT directories, and your database. See Uninstalling xCAT and then follow the process for setting up a SN as if it is a new node.
Appendix D: Set up Hierarchical Conserver
To allow you to open the rcons from the Management Node running the conserver daemon on the Service Nodes, do the following:
- Set nodehm.conserver to be the service node (using the ip that faces the management node)
- chdef -t <noderange> conserver=<servicenode as known by the MN>
- makeconservercf
- service conserver stop
- service conserver start
