Note: this is an xCAT design document, not an xCAT user document. If you are an xCAT user, you are welcome to glean information from this design, but be aware that it may not have complete or up to date procedures.
This design is replaced with [Multiple_Zone_Support]
Two new customer requirements are covered by this design.
The first requirement is to be able to take an xCAT Cluster managed by one xCAT Management Node and divide it into multiple subclusters. The nodes in each subcluster will share common ssh keys and root password. This allows the nodes in a subcluster to be able to ssh to each other without password, but cannot do the same to any node in another subcluster.
Note:We are calling these subclusters, because they share a common xCAT Management Node and database including the site table which defines the attributes of the entire cluster.
The second requirement is for mkvlan enhancements (TBD).
The multiple subcluster support requires several enhancements to xCAT.
Currently xCAT changes root ssh keys on the service nodes (SN) and compute nodes (CN) that are generated at install time to the root ssh keys from the Management node. It also changes the ssh hostkeys on the SN and CN to a set of pre-generated hostkeys from the MN. Putting the public key in the authorized-keys file on the service nodes and compute nodes allows passwordless ssh to the Service Nodes (SN) and the compute nodes from the Management Node (MN). This setup also allowed for passwordless ssh between all compute nodes and servicenodes. The pre-generated hostkey makes all nodes look like the same to ssh, so you are never prompted for updates to known_hosts
Having subclusters that cannot passwordless ssh to nodes in other subclusters requires xCAT to generate a set of root ssh keys for each subcluster and install them on the compute nodes in that subcluster. In addition the MN public key must still be put in the authorized_keys file on the nodes in the non-hierarchical cluster or the SN public key for hierarchical support.
Question: How about the common ssh hostkeys? Should we generate a set of those for each subcluster?
We will still use the MN root ssh keys on any service nodes. Service Nodes would not be allowed to be a member of a subcluster.
Currently xCAT puts the root password on the node only during install. It is taken from the passwd table where key=system. The new subcluster support requires a unique password for each subcluster to be installed.
To support multiple subclusters we have the proposed changes:
A new table Cluster will be created.
key:subcluster name
password - root password for this subcluster
sshkeydir - directory containing root ssh RSA keys.
For this implementation we are proposing we can make a subcluster, remove a subcluster, but not be able to move nodes from one subcluster to another. I think this is very complex and out of the scope of being supported in 2.8.4. This can be debated.
mksubcluster will be used to do the following:
mksubcluster will have the following interface:
mksubcluster <noderange> -n <subclustername> [-k <full path to the ssh private key>]
Note: The command will prompt for the subcluster root password or take env variable containing password.
It will do the following:
For each node in the noderange it will add to the nodelist.groups attribute, a new group by the subclustername.
If a ssh private key is supplied (-k), it will generate the ssh public key and store both in /etc/xcat/sshkeys/<subclustername> directory.
If no (-k) then it will generate a set of root ssh keys for the cluster and store them in /etc/xcat/sshkeys/<subclustername>
It will create a cluster.password table entry with the key=subclustername, password the input root password and the cluster.sshkeydir attribute with the directory name containing the keys /etc/xcat/sshkeys/<subclustername>.
rmsubcluster -n <subclustername>
rmsubcluster will be used to do the following:
chsubcluster -n <subclustername> [-p] [-k <full path to the ssh private key>] [-K] [-a <noderange> [-r <noderange>]
Note: if using the -p flag will prompt for password or take env variable containing password.
chsubcluster will be used to do the following:
This support affects several existing xCAT components:
Some of the issues discussed:
If a node is not defined in a subcluster, root ssh keys and passwords must work as today. This makes sure that a xCAT upgrade does not disrupt an existing xCAT cluster.
Would like to have all customers using a generated root ssh key even if not using subclusters. If the node is not defined in a subcluster, then the key would be generated and stored in /etc/xcat/sshkeys/xcat(maybe system). How could we migrate current customers without disruption.
We would need maybe a new document on setting this type of cluster up and managing it. Hierarchy adds even more complexity.
Needed mkvlan enhancements (TBD). But here a come comments about current support.
Currently it only supports Cisco and some modules of BNT switches (EN4093,G8000,G8124,G8264, 8264E). To support more BNT modules, we need to update the OID table because each BNT modules uses different OIDs for the same function (a very bad design by BNT). And to support other switch vendors like Juniper, a significant code change needs to be done because currently Juniper does not support vlan function through SNMP interface. We have to use its own libraries to have it done. This needs framework change in our vlan code.