From: Larry B. <ba...@us...> - 2002-08-08 23:11:26
|
This supercedes the posting I made on July 25, 2002. I have worked more on the support for NFS locking on the slave nodes. While I was at it, I made some additional fixes and enhancements to the beoboot slave node setup process. This fixes most of the problems I had running the ANL MPICH IO validation suite. However, I still occassionally have problems with parallel IO -- between the master and a slave node, usually. Also, it seems to matter whether the test file exists before the test is run. I am suspicious of the MPICH code. (E.g., errno comes back with a non-zero value, even though an MPI call returns MPI_SUCCESS.) One problem I have seen with Clustermatic/bproc that I don't understand: sometimes input redirection (< or <<) results in an empty input stream. For example, "cat <file | bpsh -n 0 cat" will echo nothing, even though "cat <file" works fine. I found this out because my previous submission created a zero-length slave node /etc/nsswitch.conf file when I added more lines to the <<EOF input stream in /usr/lib/beoboot/bin/node_up. That's why I now use a bpcp option to copy /etc/beowulf/nsswitch.conf. If someone can explain what this is symptomatic of, I'd like to know how to fix it or avoid it. I am using the Clustermatic CD image distribution of March 2002. Larry Baker US Geological Survey Steps in slave node file system setup: Create directories/soft links specified in /etc/beowulf/config (mkdir option). As specified in /etc/beowulf/fstab[.$NODE]: Load kernel modules for all file system types. Create device nodes for all local file systems. Mount all local and "nolock" network file systems without "noauto". Create cooked version in /etc/fstab on slave node. Copy files specified in /etc/beowulf/config (bpcp option). Create a default slave node /etc/nsswitch.conf file, if none exists. If there are NFS file systems without the "nolock" option: Create the statd database directories. Start the portmapper and statd daemons. Complete any deferred NFS mounts. Summary of file changes: /etc/beowulf/config Add mkdir option to create directories and soft links (no more hard-coded directories in /usr/lib/beoboot/bin/node_up). Add bpcp option to copy files to slave node. /etc/beowulf/fstab Add NODE to list of variables that will get substituted. "noauto" option is now honored. Cooked version is created as slave node /etc/fstab. /etc/beowulf/nsswitch.conf (new file) Name Server Switch configuration file to add local passwd, group, and rpc files to NSS search lists. (See bpcp entry in /etc/beowulf/config.) /etc/beowulf/node_up Define NODE, MASTER, and PATH variables. /usr/lib/beoboot/bin/node_up (#--- 1.17.1 --- brackets changes) Remove hard-coded creation of slave node /dev, /etc, /tmp and /scratch directories. Copy configuration files (bpcp option in config). Conditionally create a default slave node /etc/nsswitch.conf. If there are any NFS mounts in the slave node /etc/fstab without the "nolock" option: create the slave node statd database files, start the portmap and rpc.statd daemons, and "mount -a -t nfs". /usr/lib/beoboot/bin/setup_fs (#--- 1.4.1 --- brackets changes) Create default directories/soft links (mkdir option in config). Only tar device nodes that begin with "/dev". Always load file system kernel modules. Don't mount file systems with "noauto" option. Defer network mounts without "nolock" option until the portmapper and status daemons are running (completed in node_up). Add support for ext3 file systems. Create cooked version of /etc/beowulf/fstab[.$NODE] in slave node /etc/fstab. Below are the files I have modified/use (watch out for extra e-mail line breaks): /etc/exports The NFS file systems exported by the master /etc/beowulf/config The bproc/beoboot configuration file /etc/beowulf/fstab The file systems file for the nodes /etc/beowulf/nsswitch.conf The Name Server Switch configuration file for the nodes /etc/beowulf/node_up The beoboot stub node startup script /usr/lib/beoboot/bin/node_up The beoboot node startup script /usr/lib/beoboot/bin/setup_fs The beoboot node file system setup script After rebooting, this is what /var/log/beowulf/node.0 looks like: node_up: Setting system clock. node_up: Configuring loopback interface. setup_fs: Configuring node filesystems... setup_fs: mkdir -p /dev setup_fs: mkdir -p /etc setup_fs: ln -s /var/tmp /tmp setup_fs: ln -s /home/node.0 /scratch setup_fs: Using /etc/beowulf/fstab. setup_fs: Checking 192.168.50.209:/bin (type=nfs)... setup_fs: Mounting 192.168.50.209:/bin on /rootfs/bin... (type=nfs; options=ro,nolock,rsize=8192) setup_fs: Checking 192.168.50.209:/home (type=nfs)... setup_fs: Mounting 192.168.50.209:/home on /rootfs/home... (type=nfs; options=rw,rsize=8192,wsize=8192,noac) setup_fs: Mount deferred until lock daemon running. setup_fs: Checking 192.168.50.209:/opt (type=nfs)... setup_fs: Mounting 192.168.50.209:/opt on /rootfs/opt... (type=nfs; options=ro,nolock,rsize=8192) setup_fs: Checking 192.168.50.209:/sbin (type=nfs)... setup_fs: Mounting 192.168.50.209:/sbin on /rootfs/sbin... (type=nfs; options=ro,nolock,rsize=8192) setup_fs: Checking 192.168.50.209:/usr (type=nfs)... setup_fs: Mounting 192.168.50.209:/usr on /rootfs/usr... (type=nfs; options=ro,nolock,rsize=8192) setup_fs: Checking 192.168.50.209:/var/node.0 (type=nfs)... setup_fs: Mounting 192.168.50.209:/var/node.0 on /rootfs/var... (type=nfs; options=rw,nolock,rsize=8192,wsize=8192) setup_fs: Checking none (type=proc)... setup_fs: Mounting none on /rootfs/proc... (type=proc; options=defaults) setup_fs: Checking none (type=devpts)... setup_fs: Mounting none on /rootfs/dev/pts... (type=devpts; options=gid=5,mode=620) node_up: Copying over device nodes. node_up: Copying over time zone info. node_up: Copying /etc/{passwd,group,rpc} /etc/beowulf/nsswitch.conf to 0:/etc. node_up: Starting the RPC portmapper and status daemon. node_up: Completing deferred NFS mounts. node_up: Node setup finished. ---------- /etc/exports ---------- # # /etc/exports # # Read-only exports # /bin 192.168.50.209/255.255.255.224(ro) /opt 192.168.50.209/255.255.255.224(ro) /sbin 192.168.50.209/255.255.255.224(ro) /usr 192.168.50.209/255.255.255.224(ro) # # Private read-write exports # /var/node.0 192.168.50.210(rw,no_root_squash) /var/node.1 192.168.50.211(rw,no_root_squash) # # Shared read-write exports (MPICH 1.2.4, section 4.11.1: use "noac") # /home 130.118.45.45/255.255.252.0(rw) \ 192.168.50.209/255.255.255.224(rw,no_root_squash) ---------- /etc/beowulf/config ---------- # # /etc/beowulf/config # # Sample Beowulf Configuration file # # $Id: config,v 1.7 2002/03/12 20:54:58 hendriks Exp $ # $Id: config,v 1.7.1 2002/08/05 L. M. Baker $ # # # Default cluster configuration (uses eth1, and 192.168.1.0/24) # interface: internal cluster interface (the one connected to the nodes) # # iprange: range of IP addresses for nodes. interface eth1 192.168.50.209 255.255.255.224 # Setup addresses in the cluster. The "nodes" line is REQUIRED here to specify # cluster size. "iprange" and "ip" assign addresses to nodes. The "0" in # iprange here tells it to start assigning at node zero. nodes 2 iprange 0 192.168.50.210 192.168.50.211 # Default libraries (These are the libraries which will automagically be made # available to the slaves.) # No line continuation; multiple lines are concatenated. libraries /lib /usr/lib /usr/X11R6/lib libraries /opt/intel/compiler60/ia32/lib /opt/intel/mkl/lib/32 # Default directories. Syntax: mkdir { [ { -m mode | -s target } ] name } ... # $NODE is slave node no. No line continuation; multiple lines are # concatenated. # /dev and /etc are required. mkdir /dev /etc # Useful (local) temporary and scratch directories. #mkdir -m 1777 /tmp -m 1777 /scratch # Use NFS for /tmp and /scratch. # (NFS exports for /var and /home must be no_root_squash.) mkdir -s /var/tmp /tmp -s /home/node.$NODE /scratch # Optional bpcp file copy commands, executed one line at a time. # Syntax: bpcp [ options ] from ... to. Do not specify slave node no. -- the # destination is automatically translated to $NODE:to. $NODE is slave node no. # Enable the following line for NFS file locking support. bpcp /etc/{passwd,group,rpc} /etc/beowulf/nsswitch.conf /etc # Default file system policies. fsck full mkfs if_needed # Default location of boot images bootfile /var/beowulf/boot.img kernelimage /boot/vmlinuz-2.4.18-lanl.16 kernelcommandline apm=power-off # Here we assign MAC addresses to nodes. Nodes can have multiple MAC # addresses. Here the optional "0" zero argument states that the address # should be assigned to node zero. Node lines following that will assign # addresses to nodes sequentially # Onboard RealTek RTL8100BL chip node 0 00:40:63:c0:5e:08 node 00:40:63:c0:5f:b4 ---------- /etc/beowulf/fstab ---------- # # /etc/beowulf/fstab # # This file is the fstab for nodes. # One difference is that we allow for shell variable expansions... # # Variables that will get substituted: # MASTER = IP address of the master node. (good for doing NFS mounts) # NODE = slave's node no. # RAMDISK = device name (/dev/<ramdev>) of a device suitable for a root fs # # A cooked version (with variable substitution) of this file will be copied # to /etc/fstab on the slave node. # # The root file system is a tmpfs provided by the boot scripts. You # can mount something on / if you'd like but due to oddities in the file # caching code it's not recommended right now. # This is the default setup from beofdisk, once you setup your disks. #/dev/hda2 swap swap defaults 0 0 #/dev/hda3 / ext2 defaults 0 0 # These should always be added none /proc proc defaults 0 0 none /dev/pts devpts gid=5,mode=620 0 0 # NFS (for example and default friendliness) # Note: Mounts without the "nolock" option are deferred until the RPC portmapper # and status daemons are running -- see /usr/lib/beoboot/bin/{node_up,setup_fs}. # # Read-only mount points # $MASTER:/bin /bin nfs ro,nolock,rsize=8192 0 0 $MASTER:/opt /opt nfs ro,nolock,rsize=8192 0 0 $MASTER:/sbin /sbin nfs ro,nolock,rsize=8192 0 0 $MASTER:/usr /usr nfs ro,nolock,rsize=8192 0 0 # # Private read-write mount points # $MASTER:/var/node.$NODE /var nfs rw,nolock,rsize=8192,wsize=8192 0 0 # # Shared read-write mount points (MPICH 1.2.4, section 4.11.1: use "noac") # $MASTER:/home /home nfs rw,rsize=8192,wsize=8192,noac 0 0 ---------- /etc/beowulf/nsswitch.conf ---------- # # /etc/beowulf/nsswitch.conf # hosts: bproc passwd: bproc files group: bproc files rpc: files ---------- /etc/beowulf/node_up ---------- #!/bin/sh # # /etc/beowulf/node_up # # This shell script is called automatically by BProc to perform any # steps necessary to bring up the nodes. This is just a stub script # pointing to the real script NODE=$1 MASTER=`bpstat -a master` BINDIR=/usr/lib/beoboot/bin PATH=$BINDIR:/sbin:/usr/sbin:$PATH $BINDIR/node_up $* || exit 1 # Clean out /tmp every boot bpsh -n $NODE rm -r -f /tmp/* bpsh -n $NODE rm -r -f /tmp/.* 2>/dev/null # Ignore rm errors exit 0 ---------- /usr/lib/beoboot/bin/node_up ---------- #!/bin/sh #--- 1.17.1 --- # # /usr/lib/beoboot/bin/node_up # #--- 1.17.1 --- #--------------------------------------------------------------------- # Erik Arjan Hendriks <hen...@la...> # Copyright (C) 2000 Scyld Computing Corporation # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation; either version 2 of the License, or # (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program; if not, write to the Free Software # Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. # # $Id: node_up,v 1.17 2002/01/04 00:39:59 hendriks Exp $ # $Id: node_up,v 1.17.1 2002/08/05 L. M. Baker $ #--------------------------------------------------------------------- umask 022 # Default umask for this stuff. cd / # Argument sanity checking if [ "$1" = "" ] ; then echo "Usage: node_up <nodenumber>" exit 1 fi NODE=$1 CONFIG=/etc/beowulf/config BINDIR=/usr/lib/beoboot/bin #--- 1.17.1 --- PATH=$BINDIR:/sbin:/usr/sbin:$PATH # Standard location of statd database files SMDIR=/var/lib/nfs # Location of statd database files on Red Hat Linux if [ -f /etc/redhat-release ] ; then SMDIR=$SMDIR/statd fi #--- 1.17.1 --- #--- 1.17.1 --- # Usage: do_bpcp node [ options ] from [ ... ] to do_bpcp() { if [ -z "$1" ] ; then return fi local NODE=$1 shift local OPTS= while [ "${1:0:1}" = "-" ] ; do local OPTS="$OPTS $1" shift done local NFILES=$(( $# - 1 )) if [ $NFILES -lt 1 ] ; then return 1 fi local FILES= for (( i = $NFILES ; i ; i-- )) ; do local FILES="$FILES $1" shift done echo "node_up: Copying$FILES to $NODE:$1." eval bpcp $OPTS $FILES $NODE:/rootfs$1 } #--- 1.17.1 --- # Usage: beoconfig tag [config_file] beoconfig() { local FILE=$2 if [ -z "$FILE" ] ; then FILE=${CONFIG} ; fi if [ ! -f ${FILE} ] ; then echo "Warning: ${FILE} file not found." >&2 return fi # These sed bits: # - strip spaces # - strip leading + trailing space # - if line starts with $1, strip off $1 and print it. sed -ne "s/#.*//" < ${FILE} \ -e "s/^[[:space:]]\+//;s/[[:space:]]\+\$//" \ -e "/^$1[[:space:]]/{s/^$1[[:space:]]\+//;p;}" } die() { if [ -n "$1" ] ; then echo 1>&2 "$1" fi if [ -n "$2" ] ; then echo 1>&2 "Fatal error performing: $*" fi if [ -n "$MOUNTED" ] ; then umount $INITRD_BUILD rmdir $INITRD_BUILD fi exit 1 } run_cmd() { eval "$*" || die "" "$*" } # A message for the console on the remote end. bpsh $NODE --stdout /dev/console \ echo -e "node_up: This is node $NODE.\nnode_up: boot log available in /var/log/beowulf/node.$NODE on the master." #--------------------------------------------------------------------- # First things first... set the system clock echo "node_up: Setting system clock." run_cmd $BINDIR/bdate $NODE # mapping of ram devices at this point. # /dev/ram0 <- initrd goes here #run_cmd bpsh $NODE mount -nt proc none /proc # XXX We need a way to figure out what interface is up at this point # so that we know which one to slap a netmask onto. #--- 1.17.1 --- #echo "node_up: TODO set interface netmask." #--- 1.17.1 --- # ... and kick on that loop back interface echo "node_up: Configuring loopback interface." run_cmd bpsh $NODE ifconfig lo 127.0.0.1 netmask 255.0.0.0 run_cmd bpsh $NODE route add -net 127.0.0.0 netmask 255.0.0.0 lo #--------------------------------------------------------------------- # Kernel Modules # # We should probably pay attention to "insmod" lines in the config # file here... KVER=`bpsh $NODE uname -r` # Make note of the remote kernel version for module in `$BINDIR/pcilookup $NODE`; do modprobe --node $NODE $module done #--------------------------------------------------------------------- # File Systems # # We need a way for setup_fs to let us know where the root filesystem # is mounted... $BINDIR/setup_fs $NODE || exit 1 # Populate it ? # Setup scratch and tmp space... #--- 1.17.1 --- #run_cmd bpsh $NODE mkdir -p /rootfs/{tmp,scratch} #run_cmd bpsh $NODE chmod 1777 /rootfs/{tmp,scratch} #--- 1.17.1 --- bplib -l | bpsh $NODE bplib -a - #$BINDIR/setup_libs $NODE /rootfs || exit 1 # Copy over device nodes from the front end. #--- 1.17.1 --- #echo "node_up: Populating /dev and /etc." #run_cmd bpsh $NODE mkdir -p /rootfs/{dev,etc} #--- 1.17.1 --- echo "node_up: Copying over device nodes." run_cmd bpsh $NODE mkdir -p /rootfs/dev #find /dev -mount -type b -o -type c | \ # sed -e 's!^/!!' | tar cf - -T - | bpsh $NODE tar -C /rootfs -xf - DEVLIST="console zero null" tar -C /dev -cf - $DEVLIST | bpsh $NODE tar -C /rootfs/dev -xf - [ "$?" = "0" ] || die "" "copying device nodes" echo "node_up: Copying over time zone info." run_cmd bpcp /etc/localtime $NODE:/rootfs/etc/localtime #--- 1.17.1 --- # Copy configuration files beoconfig bpcp | ( while read line ; do if ! do_bpcp $NODE $line ; then echo 1>&2 "Failed to copy files." exit 1 fi done ) || die # Supply a default /etc/nsswitch.conf, if needed if ! bpsh -n $NODE ls /rootfs/etc/nsswitch.conf >/dev/null 2>&1 ; then echo "node_up: Copy over default nsswitch info." run_cmd cat << EOF | bpsh -n $NODE --stdout /rootfs/etc/nsswitch.conf cat passwd: bproc hosts: bproc EOF fi #--- 1.17.1 --- # nss_bproc is optional equipment so ignore errors.... #echo "node_up: Copying over bproc nss library." #bpcp /lib/libnss_bproc.so.2 $NODE:/rootfs/lib #--------------------------------------------------------------------- # Finish up... #run_cmd bpsh $NODE umount -n /proc run_cmd bpctl -S $NODE -r /rootfs # This is a hack to make the dynamic linker work for things which are # exec'ed remotely. run_cmd bpsh -N $NODE /sbin/ldconfig -l /lib/ld-* run_cmd bpsh -N $NODE hostname n$NODE run_cmd $BINDIR/nodeinfo $NODE # Update node information DB #--- 1.17.1 --- # At this point, all file systems in $NODE:/etc/fstab have been mounted, # except for network devices (host:export) without the "nolock" option. # NFS devices without the "nolock" option require the RPC portmapper and # status daemons. The status daemon requires read/write access to the # $SMDIR/sm and $SMDIR/sm.bak directories, which must exist and be owned # 700 by rpcuser (on Red Hat, see http://nfs.sourceforge.net, item 17). # True if there are any NFS mounts in $NODE:/etc/fstab without the "nolock" # option, i.e., that need the RPC portmapper and status daemon. if [ `bpsh -n $NODE cat /etc/fstab | \ while read line ; do if [ -n "$line" -a "${line:0:1}" != "#" ] ; then echo "$line" | ( read device mountpt fstype options rest && \ echo "$fstype" | grep -q nfs && \ echo "$options" | grep -q -v nolock \ ) && echo "$line" fi done | \ wc -l` -gt 0 ] ; then # Create $SMDIR/sm and $SMDIR/sm.bak owned 700 by rpcuser (on Red Hat) bpsh -n $NODE mkdir -m 700 -p $SMDIR/{sm,sm.bak} if [ -f /etc/redhat-release ] ; then bpsh -n $NODE chmod 700 $SMDIR bpsh -n $NODE chown rpcuser $SMDIR bpsh -n $NODE chgrp rpcuser $SMDIR bpsh -n $NODE chown rpcuser $SMDIR/{sm,sm.bak} bpsh -n $NODE chgrp rpcuser $SMDIR/{sm,sm.bak} fi # Start the RPC portmapper and status daemon echo "node_up: Starting the RPC portmapper and status daemon." bpsh -n $NODE initlog -c portmap bpsh -n $NODE initlog -c rpc.statd # Mount the network devices that were deferred earlier echo "node_up: Completing deferred NFS mounts." bpsh -n $NODE mount -a -t nfs fi #--- 1.17.1 --- #--- A message for the log file and node's console. echo "node_up: Node setup finished." bpsh $NODE --stdout /dev/console echo "node_up: Node setup finished." exit 0 ---------- /usr/lib/beoboot/bin/setup_fs ---------- #!/bin/sh #--- 1.4.1 --- # # /usr/lib/beoboot/bin/setup_fs # #--- 1.4.1 --- # # Erik Hendriks <hen...@la...> # # $Id: setup_fs,v 1.4 2001/11/30 17:52:40 hendriks Exp $ # $Id: setup_fs,v 1.4.1 2002/08/05 L. M. Baker $ # # This bit of code is a first stab at understanding fstab for mount. # It's a lot like mount dealing with its own fstab. # Differences with just allowing mount to chew on an fstab: # We can do fsck checks before attempting to mount. # We can (re)create file systems before mounting. # We can create mount points before mounting. # #-------------------------------------------------------------------------- # Generic functions to do operations on varUseful functions #-------------------------------------------------------------------------- #--- 1.4.1 --- # Usage: do_mkdir node { [ -s target ] name } ... do_mkdir() { if [ -z "$1" ] ; then return fi local NODE=$1 shift if [ -z "$1" ] ; then return fi while [ -n "$1" ] ; do if [ "$1" == "-s" ] ; then shift if [ -z "$1" -o -z "$2" ] ; then return 1 fi local target=`eval echo "$1"` local name=`eval echo "$2"` echo "setup_fs: ln -s $target $name" if ! bpsh -n $NODE ln -s $target /rootfs$name ; then return 1 fi shift else if [ "$1" == "-m" ] ; then shift if [ -z "$1" -o -z "$2" ] ; then return 1 fi local mode=$1 local name=`eval echo "$2"` echo "setup_fs: mkdir -m $mode -p $name" if ! bpsh -n $NODE mkdir -m $mode -p /rootfs$name ; then return 1 fi shift else local name=`eval echo "$1"` echo "setup_fs: mkdir -p $name" if ! bpsh -n $NODE mkdir -p /rootfs$name ; then return 1 fi fi fi shift done } # Usage: do_safefsck node device fstype #--- 1.4.1 --- do_safefsck() { case $2 in /dev/ram*) echo "setup_fs: Hmmm...This appears to be a ramdisk. " echo -n "setup_fs: I'm going to try to try checking the " echo "filesystem (fsck) anyway." echo -n "setup_fs: If it is a RAM disk the following will " echo "fail harmlessly." ;; esac case $3 in #--- 1.4.1 --- ext*) bpsh -n $1 e2fsck -p $2 ; ret=$? #--- 1.4.1 --- if [ "$ret" = 1 ] ; then ret=0; fi ;; swap) bpsh -n $1 chkswap $2 ; ret=$? ;; *) ret=0;; esac [ "$ret" = 0 ] } do_fsck() { echo "setup_fs: Checking $2 (type=$3)..." case $2 in /dev/ram*) echo "setup_fs: Hmmm...This appears to be a ramdisk. " echo -n "setup_fs: I'm going to try to try checking the " echo "filesystem (fsck) anyway." echo -n "setup_fs: If it is a RAM disk the following will " echo "fail harmlessly." ;; esac case $3 in #--- 1.4.1 --- ext*) bpsh -n $1 e2fsck -y $2 ; ret=$? #--- 1.4.1 --- if [ "$ret" = 1 ] ; then ret=0; fi ;; swap) bpsh -n $1 chkswap $2 ; ret=$? ;; *) ret=0;; esac [ "$ret" = 0 ] } # Usage: do_mkfs node device fstype fssize do_mkfs() { echo "setup_fs: Creating $3 on $2..." case $3 in ext2) bpsh -n $1 mke2fs -q $2 $4 ; ret=$? ;; #--- 1.4.1 --- ext3) bpsh -n $1 mke2fs -q -j $2 $4 ; ret=$? ;; #--- 1.4.1 --- swap) bpsh -n $1 mkswap $2 $4 ; ret=$? ;; *) ret=0;; esac [ "$ret" = 0 ] } # Usage: load_fs node fstype load_fs () { if [ -z "`bpsh -n $1 grep $2 /proc/filesystems`" ] ; then modprobe --node $1 $2 fi } # Usage: do_mount node device mountpt fstype options do_mount() { #--- 1.4.1 --- # Load file system module for all fstypes so they can be mounted later if [ "$4" != "swap" ] ; then load_fs $1 $4 fi # Don't mount devices with the "noauto" option if echo $5 | grep -q noauto ; then return fi #--- 1.4.1 --- echo "setup_fs: Mounting $2 on $3... (type=$4; options=$5)" case $4 in swap) bpsh -n $1 swapon $2 ;; #--- 1.4.1 --- # Defer mounts of network devices (host:export) without the "nolock" option *) if [ -z "`echo $2 | grep :`" -o \ -n "`echo $5 | grep nolock`" ] ; then if bpsh -n $1 mount -nt $4 -o $5 $2 $3 ; then if [ "${mountpt:0:1}" == "/" ] ; then echo "$device $mountpt $fstype $options" >>$MTABFILE fi fi else echo "setup_fs: Mount deferred until lock daemon running." fi ;; #--- 1.4.1 --- esac } # Usage: beoconfig tag [config_file] beoconfig() { local FILE=$2 if [ -z "$FILE" ] ; then FILE=${CONFIG} ; fi if [ ! -f ${FILE} ] ; then echo "Warning: ${FILE} file not found." >&2 return fi # These sed bits: # - strip spaces # - strip leading + trailing space # - if line starts with $1, strip off $1 and print it. sed -ne "s/#.*//" < ${FILE} \ -e "s/^[[:space:]]\+//;s/[[:space:]]\+\$//" \ -e "/^$1[[:space:]]/{s/^$1[[:space:]]\+//;p;}" } #-------------------------------------------------------------------------- # Argument sanity checking if [ "$1" = "" ] ; then echo "Usage: setup_fs <nodenumber>" exit 1 fi echo "setup_fs: Configuring node filesystems..." NODE=$1 CONFIG=/etc/beowulf/config #--- 1.4.1 --- BINDIR=/usr/lib/beoboot/bin PATH=$BINDIR:/sbin:/usr/sbin:$PATH #--- 1.4.1 --- MASTER=`bpstat -a master` RAMDISK=/dev/ram3 FSCK=`beoconfig fsck` MKFS=`beoconfig mkfs` #--- 1.4.1 --- MKDIR=`beoconfig mkdir` #--- 1.4.1 --- #--- 1.4.1 --- # Select which FSTAB to use. #if [ -r /etc/beowulf/fstab.$NODE ] ; then # FSTAB=/etc/beowulf/fstab.$NODE #else # FSTAB=/etc/beowulf/fstab #fi #echo "setup_fs: Using $FSTAB" #--- 1.4.1 --- # XXX We need a way to pick up per-node commands! # Control flags # #--- 1.4.1 --- # FSCK = #--- 1.4.1 --- # 0 = Don't touch anything, just try to mount. # 1 = Ok to fsck but don't do anything if it fails. # 2 = fsck and do mkfs if it fails. # 3 = skip fsck go straight to mkfs # #--- 1.4.1 --- # Sanity check FSCK (default = 1) #--- 1.4.1 --- case $FSCK in "never"|"safe"|"full") ;; "") FSCK=safe ;; *) echo 1>&2 "Invalid value '$FSCK' for fsck tag in $CONFIG." exit 1 ;; esac case $MKFS in "never"|"if_needed"|"always") ;; "") MKFS=if_needed ;; *) echo 1>&2 "Invalid value '$MKFS' for mkfs tag in $CONFIG." exit 1 ;; esac #--- 1.4.1 --- # Select which FSTAB to use. FSTAB=/etc/beowulf/fstab.$NODE if [ ! -r $FSTAB ] ; then FSTAB=/etc/beowulf/fstab fi #--- 1.4.1 --- if [ ! -f $FSTAB ] ; then echo 1>&2 "setup_fs: $FSTAB (file system table) is missing." exit 1 fi #--- 1.4.1 --- # Create default directories if ! do_mkdir $NODE $MKDIR ; then echo 1>&2 "Failed to create default directories." exit 1 fi #--- 1.4.1 --- # Ok... This is one big nasty pipe line... Here's what this mess does: # * Use sed to remove comments. (starting with #) # * Run it all though eval to do variable substitutions. # * Go through all the lines doing: # + Ignore the empty lines # + Remove trailing slashes from the mount points # + Prepend a number that will allow us to sort the mount points. # * Sort the mount points #--- 1.4.1 --- # * On each point point (depending on the FSCK policy): #--- 1.4.1 --- # + fsck the file system # + if bad, possibly recreate the file system. # + mount the file system #--- 1.4.1 --- # * Create /etc/fstab for the new node. #--- 1.4.1 --- # * Create /etc/mtab for the new node. MTABFILE=/tmp/.setup_fs.mtab.$$ if ! rm -f $MTABFILE ; then echo 1>&2 "setup_fs: $MTABFILE already exists and can't remove." exit 1 fi touch $MTABFILE #--- 1.4.1 --- FSTABFILE=/tmp/.setup_fs.fstab.$$ if ! rm -f $FSTABFILE ; then echo 1>&2 "setup_fs: $FSTABFILE already exists and can't remove." exit 1 fi touch $FSTABFILE echo "setup_fs: Using $FSTAB." cat $FSTAB | \ while read line ; do if [ -z "$line" -o "${line:0:1}" == "#" ] ; then echo "$line" >>$FSTABFILE else line=`eval echo "$line"` echo "$line" >>$FSTABFILE echo "$line" fi done | \ #--- 1.4.1 --- while read device mountpt fstype options junk ; do if [ -z "$options" ] ; then #--- 1.4.1 --- # if [ -n "$device" ] ; then #--- 1.4.1 --- echo 1>&2 "Ignoring incomplete line: $device $mountpt $fstype $options $junk" #--- 1.4.1 --- # fi #--- 1.4.1 --- continue fi # Sanitize mount point... (squeeze multiple slashes, remove # any trailing slashes) mountpt=`echo $mountpt | sed -e 's!/\+!/!g' -e 's!/\+$!!'` slashct=`echo $mountpt | tr -cd / | wc -c` if [ -z $mountpt ] ; then mountpt=/ ; fi echo $slashct $device $mountpt $fstype $options done | \ sort -n | \ (while read slashct device mountpt fstype options junk ; do if [ -z "$options" ] ; then #--- 1.4.1 --- # if [ -n "$device" ] ; then #--- 1.4.1 --- echo 1>&2 "Ignoring incomplete line: $device $mountpt $fstype $options $junk" #--- 1.4.1 --- # fi #--- 1.4.1 --- continue fi # Get a file system size option if it's there... fssize=`echo $options | sed -e 's/.*fs_size=\([0-9]\+\).*/\1/p;d'` options=`echo $options | sed -e 's/fs_size=[0-9]\+//g'` if [ -z "$options" ] ; then options=defaults; fi # Everything gets a "/rootfs" prefix at this stage. Also we create the # mount points as needed. This requires that people have their fstab # in some resonable order. (It might be hard for us to sort it....) #--- 1.4.1 --- # if echo $mountpt | grep -q '^/' ; then # echo "$device $mountpt $fstype $options" >> $MTABFILE # fi #--- 1.4.1 --- # see to it that the device node exists on the remote machine #--- 1.4.1 --- if [ "${device:0:4}" == "/dev" ] ; then (cd / ; tar cf - $device) | bpsh -n $NODE tar xf - #--- 1.4.1 --- fi mknewfs=0 if [ $MKFS = "always" ]; then mknewfs=1 else case $FSCK in "never") ;; # No FSCK! "safe") if ! do_safefsck $NODE $device $fstype ; then echo 1>&2 "setup_fs: RAM disks fail FSCK, that's OK" echo 1>&2 "setup_fs: FSCK failure. (OK for RAM disks)" mknewfs=1 fi ;; "full") if ! do_fsck $NODE $device $fstype ; then echo 1>&2 "setup_fs: FSCK failure. (OK for RAM disks)" mknewfs=1 fi ;; esac fi if [ $MKFS != "never" -a "$mknewfs" = 1 ] ; then if ! do_mkfs $NODE $device $fstype $fssize ; then echo 1>&2 "Failed to create $fstype file system on $device." exit 1 fi fi # See to it that the mount point exists before trying to mount. #--- 1.4.1 --- if [ "${mountpt:0:1}" == "/" ] ; then if ! bpsh -n $NODE mkdir -p /rootfs$mountpt ; then #--- 1.4.1 --- echo 1>&2 "Failed to create mount point." exit 1 fi fi #--- 1.4.1 --- if ! do_mount $NODE $device /rootfs$mountpt $fstype $options ; then #--- 1.4.1 --- echo 1>&2 "Failed to mount $device on $mountpt." exit 1 fi done #--- 1.4.1 --- # Create fstab on the remote node... if ! bpcp $FSTABFILE $NODE:/rootfs/etc/fstab ; then echo 1>&2 "Failed to create /etc/fstab." exit 1 fi rm -f $FSTABFILE #--- 1.4.1 --- # Finally, create mtab on the remote node... #--- 1.4.1 --- # if ! bpsh -n $NODE mkdir -p /rootfs/etc ; then # echo 1>&2 "Failed to create /etc." # exit 1 # fi #--- 1.4.1 --- if ! bpcp $MTABFILE $NODE:/rootfs/etc/mtab ; then echo 1>&2 "Failed to create /etc/mtab." exit 1 fi rm -f $MTABFILE ) # Exit with status of this nutty pipeline. |