Stateless

Generically speaking, stateless operation is where no changes are ultimately persistant to the node-specific filesystem space. In a clustering context, this means that all nodes run a relatively small number of role-oriented images.

One approach taken by an alternative implementation is to modify the distribution to use a read-only root filesystem. This means init scripts and many basic utilities have to have their assumptions changed, particularly about the contents of /etc. Some third-party utilities may not work well in this situation, but for the core platform, this is a pretty straightforward way for many hosts to share the same filesystem namespace.

A popular approach in clustering is to store the entire FS in memory, and relying upon bulk shared storage to store much of the data. This method is implemented in xCAT 2.0 as the default, as it requires the least effort on the part of the user. The filesystem will get wiped on every reboot, each node once it has acquired its image is at a fundamental level independent and not brought down by a boot server outage, and in xCAT is built 100% out of the distribution provided pieces (unmodified kernel, utilities). The downside being that large amounts of ram are required and that the images are by necessity sparse in functionality. A fairly stripped down image may still consume on the order of 200 MB of memory.

A new approach in xCAT 2.0 is analagous to a lot of live CD implementations. Memory is still used to host the FS, but in a compressed squashfs form. Read-write access is done through a union mount of that filesystem and a ramfs on top. This can save up to an order of magnitude of space depending on content and page size. Other than the reduced footprint, it is similar in functionality to the uncompressed memory system above. A downside of this is that an extra kernel module (aufs) must be compiled, but does not require any modification of the distribution provided kernel). If the images are updated post-boot, the updated files will become expanded in memory until a reboot allows the new compressed edition of the filesystem to be retrieved. The fairly stripped down images can consume on the order of 31-50 MB of memory.

A variant of the squashfs approach given above is an NFS hybrid approach. This has the absolute lowest memory overhead (one example was 5 MB of memory consumed by non-droppable cache) for a no-compromises image. Updates to the shared image implicitly propogate to all host FSes effectively instantly without affecting overhead. Two disadvantages are that for most configurations, it introduces a continuous single point of failure, and it requires that the distribution kernel source to be patched to work.

With that introduction, stateless in xCAT is comprised of a few steps:

  • /opt/xcat/share/netboot/<platform>/genimage: This command should be run on a system matching the architecture and distribution of the intended image. It requires access to /install/ to read the source and write the image out. In most configurations, the management node is a reasonable place to do this
  • packimage: This command is executed on your management server to prepare the image for netboot. This may include mild manipulates to the environment and optionally creating a compressed image with certain files excluded or a token nfs file to signal that usage. This is the step where normal, squashfs, or nfs method is chosen and the environment customized for that usage pattern.
  • nodeset <noderange> netboot: This command points the <noderange> at the image prepared by the previous two steps. OSes and Profiles are supported just like installations, so it is easy to prepare a set of profiles of multiple variants of OS and rapidly provision between them.

Related

Wiki: Managing_MIC_(Intel_Xeon_Phi)_nodes

MongoDB Logo MongoDB