[SSI] RFC: Etherboot/PXE to simplify installation and management
Brought to you by:
brucewalker,
rogertsang
From: Brian J. W. <Bri...@co...> - 2001-11-09 03:10:53
|
In an SSI cluster, it should only be necessary to install software on a single node. Most other nodes can be thin clients, using Etherboot or PXE to load their kernel and ramdisk from the CLMS master. A potential CLMS master node needs to have its kernel and ramdisk stored locally on a SCSI or IDE disk, in case it's the first node booted in the cluster. Even a potential CLMS master, however, can initially get its kernel and ramdisk via Etherboot/PXE and install them onto its hard disk with minimal sysadmin involvement. Etherboot is an open-source software package for creating ROM images that allow a computer to boot off the network using DHCP or BOOTP. For those who cannot or will not flash their ROM with one of these images, Etherboot includes a special boot block for loading the image from a floppy or hard drive. Etherboot appears to support about a hundred different NIC models. Unfortunately, it only supports the x86 platform right now. For more information, visit the Etherboot website: http://etherboot.sourceforge.net/ PXE (Preboot Execution Environment) is an Intel specification for doing pretty much the same thing. An advantage is that PXE images come pre-loaded on certain NICs, but I suspect most PXE images are closed source. To read Intel's PXE spec: ftp://download.intel.com/ial/wfm/pxespec.pdf To support this new dependent node booting model, changes to initial node installation would include: - Making sure dhcpd and tftpd are installed as part of the base Linux distribution. - Installing mknbi (part of Etherboot) on the shared root for building a tagged image of the kernel and ramdisk. - Adding an /etc/ssitab file for specifying the MAC address, IP address, node number, and local boot flag for each node allowed to join the cluster. For each node with the local boot flag set, a device for the boot partition must also be specified. The local boot flag should only be set for potential CLMS master nodes on the x86 platform. For platforms not supported by Etherboot/PXE, such as Alpha, _all_ nodes should have the local boot flag set. - Eliminating /etc/cluster.conf, which is obsoleted by /etc/ssitab. - Installing a new mkdhcpd.ssi command that builds /etc/dhcpd.conf from the data in /etc/ssitab. To support non-SSI uses of DHCP, it copies anything it finds in /etc/dhcpd.proto before appending the generated lines. - Installing a new lilo.ssi command that does the following: * reads /etc/lilo.conf and /etc/ssitab, and uses onnode and lilo to sync the default kernel and ramdisk out to all potential nodes that are up with the local boot flag set * runs mknbi to generate a tagged image of the default kernel and ramdisk in /tftpboot/, so that dependent nodes can download it while booting In addition, changes will have to be made to the ramdisk, which means changes to the mkinitrd.ssi script: - Copy /etc/ssitab into the ramdisk. - Enhance /linuxrc to match a local MAC address to an entry in /etc/ssitab to determine the local IP address and node number. - If the local boot flag is set, then /linuxrc compares the default kernel and ramdisk on the shared root to those on the local disk. If they differ, it runs lilo.ssi with a special flag to just sync the local disk. - The hack in VI.3 of the installation instructions will go away. Dave Zafman and I cooked up a scheme for /linuxrc to read /proc/partitions and make all the devices it finds there. That removes the need for the sysadmin to figure out the local device names of the two GFS partitions. - As well as building the ramdisk, mkinitrd.ssi also runs mkdhcpd.ssi, since the sysadmin likely changed /etc/ssitab. Adding new nodes -- this is the beautiful part: - Make sure there are enough available journals for the new nodes on the GFS shared root. Note that the Cluster Filesystem (CFS) that Dave is porting doesn't have this requirement, which makes it better suited for large clusters. - Edit /etc/ssitab to add records for each new node. The MAC address can be determined by booting the new node with an Etherboot floppy or ROM image. Although the DHCP server will not respond to this unknown MAC address just yet, the node will display on its console the MAC address of the card it discovered. - Run mkinitrd.ssi to rebuild the SSI ramdisk and /etc/dhcpd.conf. - Run lilo.ssi to distribute the new ramdisk to all nodes that are up with the local boot flag set, and to rebuild the tagged image in /tftpboot/. - If a new node does not have the local boot flag set, just boot it with the appropriate Etherboot/PXE ROM image or floppy. Like magic, it'll join the cluster. - If the local boot flag is set, and the platform is x86, boot it with the ROM image or floppy. While running /linuxrc, it'll sync the local disk if the boot partition has already been created. - If the boot partition has not been created, /linuxrc will proceed with joining the cluster. Once it has joined, run fdisk and mkfs to set up the boot partition. Then reboot the node one more time with the ROM image or floppy, so it can sync the local disk the next time it joins. - On a platform that does not support Etherboot/PXE, the PITA factor is a bit higher for adding a new node (which must have the local boot flag set). To avoid needless installation of the base OS, try booting off a distribution CD into rescue mode. Use fdisk and mkfs to set up the boot partition. Mount it. Either use a floppy or set up networking to copy the default kernel and ramdisk from the cluster to the boot partition. Also, copy the appropriate stanza for your bootloader (e.g., aboot), and run it to install the boot block. Now it's ready to join the cluster. Finally, consider adding support for your platform to Etherboot or an equivalent software package. Some weaknesses in this proposal are support for non-x86 platforms, to which I've given some thought, and support for User Mode Linux, to which I've given very little thought. There are probably other weaknesses, but overall I think this improves the installation and management of OpenSSI on the x86 platform. Suggestions are definitely welcome, especially since I haven't started the implementation, yet. ;) -- Brian Watson | "Now I don't know, but I been told it's Linux Kernel Developer | hard to run with the weight of gold, Open SSI Clustering Project | Other hand I heard it said, it's Compaq Computer Corp | just as hard with the weight of lead." Los Angeles, CA | -Robert Hunter, 1970 mailto:Bri...@co... http://opensource.compaq.com/ |