[SSI] RFC: Etherboot/PXE to simplify installation and management

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

In an SSI cluster, it should only be necessary to install software 
on a single node. Most other nodes can be thin clients, using 
Etherboot or PXE to load their kernel and ramdisk from the 
CLMS master. A potential CLMS master node needs to have its kernel
and ramdisk stored locally on a SCSI or IDE disk, in case it's
the first node booted in the cluster. Even a potential CLMS master, 
however, can initially get its kernel and ramdisk via Etherboot/PXE 
and install them onto its hard disk with minimal sysadmin involvement.

Etherboot is an open-source software package for creating ROM images 
that allow a computer to boot off the network using DHCP or BOOTP. 
For those who cannot or will not flash their ROM with one of these
images, Etherboot includes a special boot block for loading the image
from a floppy or hard drive. Etherboot appears to support about
a hundred different NIC models. Unfortunately, it only supports
the x86 platform right now.

For more information, visit the Etherboot website:
        http://etherboot.sourceforge.net/

PXE (Preboot Execution Environment) is an Intel specification for
doing pretty much the same thing. An advantage is that PXE images
come pre-loaded on certain NICs, but I suspect most PXE images are
closed source.

To read Intel's PXE spec:
        ftp://download.intel.com/ial/wfm/pxespec.pdf

To support this new dependent node booting model, changes to initial 
node installation would include:
  - Making sure dhcpd and tftpd are installed as part of the base 
    Linux distribution.
  - Installing mknbi (part of Etherboot) on the shared root for 
    building a tagged image of the kernel and ramdisk.
  - Adding an /etc/ssitab file for specifying the MAC address, 
    IP address, node number, and local boot flag for each node
    allowed to join the cluster. For each node with the local boot
    flag set, a device for the boot partition must also be specified.
    The local boot flag should only be set for potential CLMS master 
    nodes on the x86 platform. For platforms not supported by 
    Etherboot/PXE, such as Alpha, _all_ nodes should have the local 
    boot flag set.
  - Eliminating /etc/cluster.conf, which is obsoleted by /etc/ssitab.
  - Installing a new mkdhcpd.ssi command that builds /etc/dhcpd.conf
    from the data in /etc/ssitab. To support non-SSI uses of DHCP,
    it copies anything it finds in /etc/dhcpd.proto before appending 
    the generated lines.
  - Installing a new lilo.ssi command that does the following:
      * reads /etc/lilo.conf and /etc/ssitab, and uses onnode and lilo 
        to sync the default kernel and ramdisk out to all potential 
        nodes that are up with the local boot flag set
      * runs mknbi to generate a tagged image of the default kernel 
        and ramdisk in /tftpboot/, so that dependent nodes can 
        download it while booting

In addition, changes will have to be made to the ramdisk, which means
changes to the mkinitrd.ssi script:
  - Copy /etc/ssitab into the ramdisk.
  - Enhance /linuxrc to match a local MAC address to an entry in 
    /etc/ssitab to determine the local IP address and node number.
  - If the local boot flag is set, then /linuxrc compares the default
    kernel and ramdisk on the shared root to those on the local disk. 
    If they differ, it runs lilo.ssi with a special flag to just sync
    the local disk.
  - The hack in VI.3 of the installation instructions will go away. 
    Dave Zafman and I cooked up a scheme for /linuxrc to read 
    /proc/partitions and make all the devices it finds there.
    That removes the need for the sysadmin to figure out the local 
    device names of the two GFS partitions.
  - As well as building the ramdisk, mkinitrd.ssi also runs 
    mkdhcpd.ssi, since the sysadmin likely changed /etc/ssitab.

Adding new nodes -- this is the beautiful part:
  - Make sure there are enough available journals for the new nodes 
    on the GFS shared root. Note that the Cluster Filesystem (CFS) 
    that Dave is porting doesn't have this requirement, which makes 
    it better suited for large clusters.
  - Edit /etc/ssitab to add records for each new node. The MAC 
    address can be determined by booting the new node with an 
    Etherboot floppy or ROM image. Although the DHCP server will 
    not respond to this unknown MAC address just yet, the node will 
    display on its console the MAC address of the card it discovered.
  - Run mkinitrd.ssi to rebuild the SSI ramdisk and /etc/dhcpd.conf.
  - Run lilo.ssi to distribute the new ramdisk to all nodes that are
    up with the local boot flag set, and to rebuild the tagged image
    in /tftpboot/.
  - If a new node does not have the local boot flag set, just boot it
    with the appropriate Etherboot/PXE ROM image or floppy. Like magic,
    it'll join the cluster.
  - If the local boot flag is set, and the platform is x86, boot it 
    with the ROM image or floppy. While running /linuxrc, it'll sync 
    the local disk if the boot partition has already been created.
  - If the boot partition has not been created, /linuxrc will proceed
    with joining the cluster. Once it has joined, run fdisk and mkfs
    to set up the boot partition. Then reboot the node one more time 
    with the ROM image or floppy, so it can sync the local disk the 
    next time it joins.
  - On a platform that does not support Etherboot/PXE, the PITA factor
    is a bit higher for adding a new node (which must have the
    local boot flag set). To avoid needless installation of the base 
    OS, try booting off a distribution CD into rescue mode. Use fdisk
    and mkfs to set up the boot partition. Mount it. Either use a 
    floppy or set up networking to copy the default kernel and ramdisk 
    from the cluster to the boot partition. Also, copy the appropriate 
    stanza for your bootloader (e.g., aboot), and run it to install 
    the boot block. Now it's ready to join the cluster. Finally, 
    consider adding support for your platform to Etherboot or an
    equivalent software package.

Some weaknesses in this proposal are support for non-x86 platforms,
to which I've given some thought, and support for User Mode Linux,
to which I've given very little thought. There are probably other
weaknesses, but overall I think this improves the installation and 
management of OpenSSI on the x86 platform.

Suggestions are definitely welcome, especially since I haven't 
started the implementation, yet. ;)

-- 
Brian Watson                | "Now I don't know, but I been told it's
Linux Kernel Developer      |  hard to run with the weight of gold,
Open SSI Clustering Project |  Other hand I heard it said, it's
Compaq Computer Corp        |  just as hard with the weight of lead."
Los Angeles, CA             |     -Robert Hunter, 1970

mailto:Bri...@co...
http://opensource.compaq.com/