Note: this is an xCAT design document, not an xCAT user document. If you are an xCAT user, you are welcome to glean information from this design, but be aware that it may not have complete or up to date procedures.
This document is the current design for Multiple zone support.
A new customer requirement is covered by this design.
The requirement is to be able to take an xCAT Cluster managed by one xCAT Management Node and divide it into multiple zones. The nodes in each zone will share common root ssh keys. This allows the nodes in a zone to be able to ssh to each other without password, but cannot do the same to any node in another zone. You might even call them secure zones.
Note:These zones share a common xCAT Management Node and database including the site table, which defines the attributes of the entire cluster.
There will be no support for AIX.
The multiple zone support requires several enhancements to xCAT.
Currently xCAT changes root ssh keys on the service nodes (SN) and compute nodes (CN) that are generated at install time to the root ssh keys from the Management node. It also changes the ssh hostkeys on the SN and CN to a set of pre-generated hostkeys from the MN. Putting the public key in the authorized-keys file on the service nodes and compute nodes allows passwordless ssh to the Service Nodes (SN) and the compute nodes from the Management Node (MN). This setup also allowed for passwordless ssh between all compute nodes and servicenodes. The pre-generated hostkey makes all nodes look like the same to ssh, so you are never prompted for updates to known_hosts
Having zones that cannot passwordless ssh to nodes in other zones requires xCAT to generate a set of root ssh keys for each zone and install them on the compute nodes in that zone. In addition the MN public key must still be put in the authorized_keys file on all the nodes in the non-hierarchical cluster or the Service Node public key on all the nodes that it services for hierarchical support.
We will still use the MN root ssh keys on any devices, switches, hardware control. All ssh access to these is done from the MN or SN, so they will not be part of any zone.
The management node cannot be assigned to a zone. xdsh -K and updatenode -k are already not allowed to the Management node.
Service nodes may be assigned to a zone. We need to document that if they are not in the same zone, then they will not be able to ssh passwordless to each other - service node pools will not work for example.
To support multiple zones we have the proposed changes:
A new table zone will be created.
key:zone name
sshkeydir - directory containing root ssh RSA keys.
defaultzone - yes/1 or no/0
Note: defaultzone, if not set, default is no. Only one defaultzone=yes is allowed and there must exist one default.
sshbetweennodes - yes/1 or no/0, default is yes and that means we setup to allow passwordless root ssh between nodes
The nodelist table will have a new attribute:
zonename - this will be the name of the zone that the node is currently assigned to.
For this implementation we are proposing we can do the following:
Note: these command will be packaged in xCAT-client rpm. They must run on the Linux Management Node. There will be no support for AIX in this release. I think checking that the MN is Linux is enough for now when the command runs. We will check if any node in the noderange is AIX ( mixed clusters).
mkzone will be used to do the following:
mkzone will have the following interface:
mkzone <zonename > [ --defaultzone] [-k <full path to the ssh RSA private key>] [ -a <noderange>] [-g] [-f] [-s <yes|no>] [-V]
mkzone <zonename>/.ssh.
If there is no currently defined default zone, an error will be reported. There must be one default zone in the zone table.
rmzone zonename [-g] [-f] [-V]
rmzone <zonename> on MN .
Note: Checks for id_rsa and id_rsa.pub in the directory. If not there will not remove the directory. This is to make sure that the rm -rf on the directory does not remove some arbitrary directory, since the directory will be taken from the zone table sshkeydir attribute. Also if the directory is /root/.ssh, I will not remove the id_rsa or id_rsa.pub file. That will be left to the admin.
Note:This means if there is a default zone still in the zone table that will be used as the nodes new zone. If the zone table is empty then it goes back to using <roothome>.ssh keys
Note: If -f is not used then you will get an error and the zone will not be removed. Removing the default zone means that any node that does not have a zonename defined will not have ssh keys assigned. The admin should define a new default zone using chzone or mkzone before removing the default.
chzone zonename [-k <fullpath to the ssh private key>] [-K] [-a <noderange> | -r <noderange>] [-g] [-f] {--default] [-s <yes|no>] [-V]
chzone <noderange> entered, add nodes to zone
Note: When you add and and remove node, it will not take affect until reinstall or xdsh -k / updatenode -K is run to update the ssh keys.
With this implementation, the existing xCAT command can list needed zone information.
This support affects several existing xCAT components:
Some of the issues discussed:
Hierarchy support
If a node is not defined in a zone, root ssh keys and passwords must work as today. This makes sure that a xCAT upgrade does not disrupt an existing xCAT installation. This should work with this design. This is accomplished by the fact, we come up with an empty zone table. As long as no zones are defined, we use the old code paths and the site table sshbetweennodes attribute. Once they start using zones, we no longer support the site table sshbetweennodes attribute. You will set the zone table sshbetweennodes attribute for each zone. The groups option supported in the site table sshbetweennodes attribute will not be supported in the zone table.
Would like to have all customers using a generated root ssh key. I think with this support documented, the mkzone command gives them the ability to switch from using root/.ssh keys to a new generated key. They can define their zone as all the compute nodes in their cluster. They can use the --defaultzone option. This leaves the change under their control.
We would need a new document on setting this type of cluster up and managing it. Hierarchy adds even more complexity.
Wiki: Multiple_Zone_Support
Wiki: XCAT_2_Mini_Designs_for_New_Features