You can subscribe to this list here.
| 2001 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(25) |
Nov
|
Dec
(22) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2002 |
Jan
(13) |
Feb
(22) |
Mar
(39) |
Apr
(10) |
May
(26) |
Jun
(23) |
Jul
(38) |
Aug
(20) |
Sep
(27) |
Oct
(76) |
Nov
(32) |
Dec
(11) |
| 2003 |
Jan
(8) |
Feb
(23) |
Mar
(12) |
Apr
(39) |
May
(1) |
Jun
(48) |
Jul
(35) |
Aug
(15) |
Sep
(60) |
Oct
(27) |
Nov
(9) |
Dec
(32) |
| 2004 |
Jan
(8) |
Feb
(16) |
Mar
(40) |
Apr
(25) |
May
(12) |
Jun
(33) |
Jul
(49) |
Aug
(39) |
Sep
(26) |
Oct
(47) |
Nov
(26) |
Dec
(36) |
| 2005 |
Jan
(29) |
Feb
(15) |
Mar
(22) |
Apr
(1) |
May
(8) |
Jun
(32) |
Jul
(11) |
Aug
(17) |
Sep
(9) |
Oct
(7) |
Nov
(15) |
Dec
|
|
From: Larry B. <ba...@us...> - 2002-08-09 21:33:25
|
I noticed that the web page postings of my code for fixing NFS file locking turns all double < and > into single < and >, so beware. >$MTABFILE and >$FSTABFILE in /usr/lib/beoboot/bin/setup_fs, and <EOF in /usr/lib/beoboot/bin/node_up should be double arrows. Larry Baker US Geological Survey |
|
From: Larry B. <ba...@us...> - 2002-08-09 21:25:57
|
My posting of 08/08/2002, "This time it's right: NFS locking support and other
enhancements", fixes Daniel Widyono's posting of 05/29/2002, "getpwuid and getgrgid
bombing". Part of getting NFS file locking to work required getting the getpwxxx() functions
to work. Without exorting to code modifications, I found the simplest fix is to copy
the /etc/passwd and /etc/group files from the master node to the slave node, and create an
nsswitch.conf on the slave node with entries that reference those files. I don't have a clue
how the "bproc" NSS stuff works, so I left it first in the list of places NSS tries.
For example, given the /etc/beowulf/nsswitch.conf:
#
# /etc/beowulf/nsswitch.conf
#
hosts: bproc
passwd: bproc files
group: bproc files
rpc: files
Copy the following files to each slave node (e.g., in /etc/beowulf/node_up):
bpcp 0 /etc/{passwd,group,rpc} /etc/beowulf/nsswitch.conf 0:/etc
This seems to be sufficient, as shown by my terminal session below.
Larry Baker
US Geological Survey
[baker@sfsmanet tmp]$ cat getpwuid.c
#include <stdio.h>
#include <sys/bproc.h>
#include <pwd.h>
#include <sys/types.h>
main() {
struct passwd *pwent = getpwuid(0);
printf("%s\n",pwent->pw_name);
}
[baker@sfsmanet tmp]$ su
Password:
[root@sfsmanet tmp]# bpcp nsswitch.conf 0:/etc
[root@sfsmanet tmp]# bpsh 0 cat /etc/nsswitch.conf
hosts: bproc
passwd: bproc
[root@sfsmanet tmp]# exit
exit
[baker@sfsmanet tmp]$ bpsh 0 ./a.out
bpsh: Child process exited abnormally.
[baker@sfsmanet tmp]$ su
Password:
[root@sfsmanet tmp]# bpcp /etc/beowulf/nsswitch.conf 0:/etc
[root@sfsmanet tmp]# bpsh 0 cat /etc/nsswitch.conf
#
# /etc/beowulf/nsswitch.conf
#
hosts: bproc
passwd: bproc files
group: bproc files
rpc: files
[root@sfsmanet tmp]# exit
exit
[baker@sfsmanet tmp]$ bpsh 0 ./a.out
root
|
|
From: Erik A. H. <er...@he...> - 2002-08-09 21:11:47
|
I've released the scheduler we've been working on for BProc. PLEASE READ THE RELEASE NOTES! http://sourceforge.net/project/showfiles.php?group_id=24453&release_id=104390 Here are the release notes and a change log: 1.0 ------------------------------------------------------------------------ This is the first public release of BJS. BJS is a simple/efficient scheduler for clusters running BProc. --- WARNING --- WARNING --- WARNING --- WARNING --- WARNING --- WARNING --- THIS IS NOT PRODUCTION QUALITY CODE. THERE ARE ALMOST CERTAINLY SEVERE SECURITY HOLES IN THIS SYSTEM. SINCE THE BJS SERVER MUST RUN AS ROOT, IT IS LIKELY THAT ANY SUCH HOLES CAN/WILL LEAD TO ROOT COMPROMISE. --- WARNING --- WARNING --- WARNING --- WARNING --- WARNING --- WARNING --- Ok, got that out of the way. Normally, I wouldn't call code like this version 1.0 but there are political reasons for doing so. This code is being released in this state so that people willing to work with something that's not 100% finished or secure will have something to work with. I hope that those people will also help us get this scheduler into a production ready state. Please note that the student who wrote this code (Paul Ruth <ru...@ac...) will be returning to school. It's good to include him in any discussions regarding the scheduler but I will be the one merging patches, making new releases, etc. - Erik Hendriks <hen...@la...> Version 1.0 - First release * Scheduler crawls out of the slime. -- Erik Arjan Hendriks Printed On 100 Percent Recycled Electrons er...@he... Contents may settle during shipment |
|
From: Erik A. H. <er...@he...> - 2002-08-09 20:08:29
|
You'll need a modified mpirun to work with BProc 3.2.0 and MPICH. I've put the tarball (which contains patches for MPICH) in the BProc files section on sourceforge as well. - Erik |
|
From: Erik A. H. <er...@he...> - 2002-08-09 20:07:24
|
On Fri, Aug 09, 2002 at 10:33:55AM -0400, Mike Snitzer wrote: > On Mar 25, 2002 Erik Arjan Hendriks (er...@he...) said: > > > A new version of the clustermatic CD was placed on > > www.clustermatic.org today. The diff looks like this: > > > > * Updated BProc (3.1.9): fewer bugs, remote exec hacks, etc. > > > > * Updated Beoboot (lanl 1.2): minor changes. > <snip> > > The MPICH hacks which I've mentioned are included on the CD in the > > mpirun-0.1 tar ball in the tarballs directory. We used those hacks to > > run a 396 process job on Chiba city at Argonne National Lab the other > > day. This clustermatic also survived some boot time torture testing > > on about 200 nodes there. (Thanks guys!) The load spike on the front > > end is severe when they all boot at once but all the nodes we started > > with came up. We've got a guy working on that problem here. > > Hopefully beoboot lanl 1.3 will mostly eliminate that problem. > > Has there been any progress on beoboot lanl 1.3 development? Yup. Lots. It's gotten a bit more complicated than I had in mind but it's actually started working (as of a few days ago). I'm planning on cleaning up a bit and throwing it up "soon". I expect next week. It's looking good right now. I setup 100 nodes (with 40MB of libraries) in 12 seconds yesterday. There's basically no more load spike though 12 seconds isn't really long enough to tell via the usual mechanisms. - Erik |
|
From: Erik A. H. <er...@he...> - 2002-08-09 19:58:15
|
I've put BProc 3.2.0 up on sourceforge in the usual place: http://sourceforge.net/project/showfiles.php?group_id=24453&release_id=104379 Here are the release notes and change log. 3.2.0 ---------------------------------------------------------------------- Demand loading of libraries The file request stuff has been ifdef'ed so that building it in is optional. I anticipate removing it as soon as our boot-time environement which doesn't use it becomes stable. Support for it can be switched on and off with the FILEREQ:= line in Makefile.conf. I've been testing with this turned off. It builds fine with it turned on and I presume it still works. The file request stuff is essentially untested from here on in. vrfork() and vexecmove() vrfork() and vexecmove() have been mostly rewritten. The interface changed a little bit to separate the input (node numbers) and the return values (pids). They are now more resiliant for failure of individual moves. The returned array of pids may include negative numbers which are error values. The return value is now a processes index in the list of children or the total number of nodes (regardless of failure) to the parent. Hopefully this interface will be stabilizing somewhat. There's also the usual round of bug fixes. See the change log for details. Changes from 3.1.10 to 3.2.0 * Added another work around a Linux TCP bug. This problem resulted in occasional segfaults when using the ghost execve hook. * Fixed a locking bug that could lead to master node crashes. * Fixed a lingering bpsh problem which could lose output from child processes. (Patch from Sean Dilda <ag...@sc...>) * Made flush_icache in vmadump conditional as suggested by Grant Taylor <gt...@sw...> * Fixed problems with using PTRACE_TRACEME on slave nodes. * Fixed a vrfork buglet which could lead to kernel oopses. * Fixed a problem with the ghost execve hook not doing mm_release on the slave node like a real execve. This lead to the parent process hanging if it used vfork() on the slave node. * Changed vrfork return value semantics. The vrfork return value is now -1, your index in the list of nodes or the total number of nodes. * Changed vrfork and vexecmove interface to separate input and output. * Added a bunch of BProc-specific errno values to allow for more detailed error reporting. * Fixed process migration hangs in the case of sender failure. * Added rank reporting to vexecmove via the environment variable BPROC_RANK=XXXXXXX. * Changed bpsh to use vexecmove instead of migrating each copy of the process off the front end manually. * Fixed a case where permissions of the ghost process and the real process could get out of sync. * Changed behavior for failed power off on the slave node from a reboot to a halt. * Added a patch for Linux 2.4.19. |
|
From: <gor...@ph...> - 2002-08-09 15:53:15
|
On Tue, 16 Jul 2002 08:29:52 -0600 Erik Arjan Hendriks wrote:
>It is possible for a machine to be a master and slave for itself at
>the same time. It's a pretty confusing arrangement though since
>processes will show up several times in the process tree.
>
> . . .
Ideally, all of the processes in a parallel job would be created on the
master node and migrated to slave nodes. It also appears to be possible
(see perl script at end) for a single bproc process on a slave node to
rfork itself off to other slave nodes. The "ghost" processes only end up
on the master node, which is fine.
In this is the case, what advantage is there to having bpmaster run on
every node?
#!/usr/bin/perl
use Parallel::Bproc;
sub printhost {
$host=`hostname`;
print "I am $host\n";
}
printhost;
Parallel::Bproc::bproc_rfork(1) ;
printhost;
Parallel::Bproc::bproc_rfork(10) ;
printhost;
sleep 1000;
It appeared to work:
# /tmp/a.pl
I am lxsrvr0
I am lxsrva2
I am lxsrva11
|
|
From: Mike S. <msn...@pl...> - 2002-08-09 14:30:25
|
On Mar 25, 2002 Erik Arjan Hendriks (er...@he...) said: > A new version of the clustermatic CD was placed on > www.clustermatic.org today. The diff looks like this: > > * Updated BProc (3.1.9): fewer bugs, remote exec hacks, etc. > > * Updated Beoboot (lanl 1.2): minor changes. <snip> > The MPICH hacks which I've mentioned are included on the CD in the > mpirun-0.1 tar ball in the tarballs directory. We used those hacks to > run a 396 process job on Chiba city at Argonne National Lab the other > day. This clustermatic also survived some boot time torture testing > on about 200 nodes there. (Thanks guys!) The load spike on the front > end is severe when they all boot at once but all the nodes we started > with came up. We've got a guy working on that problem here. > Hopefully beoboot lanl 1.3 will mostly eliminate that problem. Has there been any progress on beoboot lanl 1.3 development? Thanks, Mike |
|
From: Larry B. <ba...@us...> - 2002-08-08 23:11:26
|
This supercedes the posting I made on July 25, 2002.
I have worked more on the support for NFS locking on the slave
nodes. While I was at it, I made some additional fixes and
enhancements to the beoboot slave node setup process.
This fixes most of the problems I had running the ANL MPICH IO
validation suite. However, I still occassionally have problems with
parallel IO -- between the master and a slave node, usually. Also,
it seems to matter whether the test file exists before the test is run.
I am suspicious of the MPICH code. (E.g., errno comes back with
a non-zero value, even though an MPI call returns MPI_SUCCESS.)
One problem I have seen with Clustermatic/bproc that I don't
understand: sometimes input redirection (< or <<) results in an
empty input stream. For example, "cat <file | bpsh -n 0 cat" will
echo nothing, even though "cat <file" works fine. I found this out
because my previous submission created a zero-length slave node
/etc/nsswitch.conf file when I added more lines to the <<EOF
input stream in /usr/lib/beoboot/bin/node_up. That's why I now
use a bpcp option to copy /etc/beowulf/nsswitch.conf. If someone
can explain what this is symptomatic of, I'd like to know how to fix
it or avoid it.
I am using the Clustermatic CD image distribution of March 2002.
Larry Baker
US Geological Survey
Steps in slave node file system setup:
Create directories/soft links specified in /etc/beowulf/config
(mkdir option).
As specified in /etc/beowulf/fstab[.$NODE]:
Load kernel modules for all file system types.
Create device nodes for all local file systems.
Mount all local and "nolock" network file systems without
"noauto".
Create cooked version in /etc/fstab on slave node.
Copy files specified in /etc/beowulf/config (bpcp option).
Create a default slave node /etc/nsswitch.conf file, if none
exists.
If there are NFS file systems without the "nolock" option:
Create the statd database directories.
Start the portmapper and statd daemons.
Complete any deferred NFS mounts.
Summary of file changes:
/etc/beowulf/config
Add mkdir option to create directories and soft links (no more
hard-coded directories in /usr/lib/beoboot/bin/node_up).
Add bpcp option to copy files to slave node.
/etc/beowulf/fstab
Add NODE to list of variables that will get substituted.
"noauto" option is now honored.
Cooked version is created as slave node /etc/fstab.
/etc/beowulf/nsswitch.conf (new file)
Name Server Switch configuration file to add local passwd, group,
and rpc files to NSS search lists. (See bpcp entry in
/etc/beowulf/config.)
/etc/beowulf/node_up
Define NODE, MASTER, and PATH variables.
/usr/lib/beoboot/bin/node_up (#--- 1.17.1 --- brackets changes)
Remove hard-coded creation of slave node /dev, /etc, /tmp and
/scratch directories.
Copy configuration files (bpcp option in config).
Conditionally create a default slave node /etc/nsswitch.conf.
If there are any NFS mounts in the slave node /etc/fstab without the
"nolock" option: create the slave node statd database files, start
the portmap and rpc.statd daemons, and "mount -a -t nfs".
/usr/lib/beoboot/bin/setup_fs (#--- 1.4.1 --- brackets changes)
Create default directories/soft links (mkdir option in config).
Only tar device nodes that begin with "/dev".
Always load file system kernel modules.
Don't mount file systems with "noauto" option.
Defer network mounts without "nolock" option until the portmapper
and status daemons are running (completed in node_up).
Add support for ext3 file systems.
Create cooked version of /etc/beowulf/fstab[.$NODE] in slave node
/etc/fstab.
Below are the files I have modified/use (watch out for extra e-mail line
breaks):
/etc/exports The NFS file systems exported by the master
/etc/beowulf/config The bproc/beoboot configuration file
/etc/beowulf/fstab The file systems file for the nodes
/etc/beowulf/nsswitch.conf The Name Server Switch configuration file
for the nodes
/etc/beowulf/node_up The beoboot stub node startup script
/usr/lib/beoboot/bin/node_up The beoboot node startup script
/usr/lib/beoboot/bin/setup_fs The beoboot node file system setup script
After rebooting, this is what /var/log/beowulf/node.0 looks like:
node_up: Setting system clock.
node_up: Configuring loopback interface.
setup_fs: Configuring node filesystems...
setup_fs: mkdir -p /dev
setup_fs: mkdir -p /etc
setup_fs: ln -s /var/tmp /tmp
setup_fs: ln -s /home/node.0 /scratch
setup_fs: Using /etc/beowulf/fstab.
setup_fs: Checking 192.168.50.209:/bin (type=nfs)...
setup_fs: Mounting 192.168.50.209:/bin on /rootfs/bin... (type=nfs; options=ro,nolock,rsize=8192)
setup_fs: Checking 192.168.50.209:/home (type=nfs)...
setup_fs: Mounting 192.168.50.209:/home on /rootfs/home... (type=nfs; options=rw,rsize=8192,wsize=8192,noac)
setup_fs: Mount deferred until lock daemon running.
setup_fs: Checking 192.168.50.209:/opt (type=nfs)...
setup_fs: Mounting 192.168.50.209:/opt on /rootfs/opt... (type=nfs; options=ro,nolock,rsize=8192)
setup_fs: Checking 192.168.50.209:/sbin (type=nfs)...
setup_fs: Mounting 192.168.50.209:/sbin on /rootfs/sbin... (type=nfs; options=ro,nolock,rsize=8192)
setup_fs: Checking 192.168.50.209:/usr (type=nfs)...
setup_fs: Mounting 192.168.50.209:/usr on /rootfs/usr... (type=nfs; options=ro,nolock,rsize=8192)
setup_fs: Checking 192.168.50.209:/var/node.0 (type=nfs)...
setup_fs: Mounting 192.168.50.209:/var/node.0 on /rootfs/var... (type=nfs; options=rw,nolock,rsize=8192,wsize=8192)
setup_fs: Checking none (type=proc)...
setup_fs: Mounting none on /rootfs/proc... (type=proc; options=defaults)
setup_fs: Checking none (type=devpts)...
setup_fs: Mounting none on /rootfs/dev/pts... (type=devpts; options=gid=5,mode=620)
node_up: Copying over device nodes.
node_up: Copying over time zone info.
node_up: Copying /etc/{passwd,group,rpc} /etc/beowulf/nsswitch.conf to 0:/etc.
node_up: Starting the RPC portmapper and status daemon.
node_up: Completing deferred NFS mounts.
node_up: Node setup finished.
---------- /etc/exports ----------
#
# /etc/exports
#
# Read-only exports
#
/bin 192.168.50.209/255.255.255.224(ro)
/opt 192.168.50.209/255.255.255.224(ro)
/sbin 192.168.50.209/255.255.255.224(ro)
/usr 192.168.50.209/255.255.255.224(ro)
#
# Private read-write exports
#
/var/node.0 192.168.50.210(rw,no_root_squash)
/var/node.1 192.168.50.211(rw,no_root_squash)
#
# Shared read-write exports (MPICH 1.2.4, section 4.11.1: use "noac")
#
/home 130.118.45.45/255.255.252.0(rw) \
192.168.50.209/255.255.255.224(rw,no_root_squash)
---------- /etc/beowulf/config ----------
#
# /etc/beowulf/config
#
# Sample Beowulf Configuration file
#
# $Id: config,v 1.7 2002/03/12 20:54:58 hendriks Exp $
# $Id: config,v 1.7.1 2002/08/05 L. M. Baker $
#
#
# Default cluster configuration (uses eth1, and 192.168.1.0/24)
# interface: internal cluster interface (the one connected to the nodes)
#
# iprange: range of IP addresses for nodes.
interface eth1 192.168.50.209 255.255.255.224
# Setup addresses in the cluster. The "nodes" line is REQUIRED here to
specify
# cluster size. "iprange" and "ip" assign addresses to nodes. The "0" in
# iprange here tells it to start assigning at node zero.
nodes 2
iprange 0 192.168.50.210 192.168.50.211
# Default libraries (These are the libraries which will automagically be
made
# available to the slaves.)
# No line continuation; multiple lines are concatenated.
libraries /lib /usr/lib /usr/X11R6/lib
libraries /opt/intel/compiler60/ia32/lib /opt/intel/mkl/lib/32
# Default directories. Syntax: mkdir { [ { -m mode | -s target } ] name }
...
# $NODE is slave node no. No line continuation; multiple lines are
# concatenated.
# /dev and /etc are required.
mkdir /dev /etc
# Useful (local) temporary and scratch directories.
#mkdir -m 1777 /tmp -m 1777 /scratch
# Use NFS for /tmp and /scratch.
# (NFS exports for /var and /home must be no_root_squash.)
mkdir -s /var/tmp /tmp -s /home/node.$NODE /scratch
# Optional bpcp file copy commands, executed one line at a time.
# Syntax: bpcp [ options ] from ... to. Do not specify slave node no. --
the
# destination is automatically translated to $NODE:to. $NODE is slave node
no.
# Enable the following line for NFS file locking support.
bpcp /etc/{passwd,group,rpc} /etc/beowulf/nsswitch.conf /etc
# Default file system policies.
fsck full
mkfs if_needed
# Default location of boot images
bootfile /var/beowulf/boot.img
kernelimage /boot/vmlinuz-2.4.18-lanl.16
kernelcommandline apm=power-off
# Here we assign MAC addresses to nodes. Nodes can have multiple MAC
# addresses. Here the optional "0" zero argument states that the address
# should be assigned to node zero. Node lines following that will assign
# addresses to nodes sequentially
# Onboard RealTek RTL8100BL chip
node 0 00:40:63:c0:5e:08
node 00:40:63:c0:5f:b4
---------- /etc/beowulf/fstab ----------
#
# /etc/beowulf/fstab
#
# This file is the fstab for nodes.
# One difference is that we allow for shell variable expansions...
#
# Variables that will get substituted:
# MASTER = IP address of the master node. (good for doing NFS mounts)
# NODE = slave's node no.
# RAMDISK = device name (/dev/<ramdev>) of a device suitable for a root fs
#
# A cooked version (with variable substitution) of this file will be copied
# to /etc/fstab on the slave node.
#
# The root file system is a tmpfs provided by the boot scripts. You
# can mount something on / if you'd like but due to oddities in the file
# caching code it's not recommended right now.
# This is the default setup from beofdisk, once you setup your disks.
#/dev/hda2 swap swap defaults 0 0
#/dev/hda3 / ext2 defaults 0 0
# These should always be added
none /proc proc defaults 0 0
none /dev/pts devpts gid=5,mode=620 0 0
# NFS (for example and default friendliness)
# Note: Mounts without the "nolock" option are deferred until the RPC
portmapper
# and status daemons are running -- see
/usr/lib/beoboot/bin/{node_up,setup_fs}.
#
# Read-only mount points
#
$MASTER:/bin /bin nfs ro,nolock,rsize=8192 0 0
$MASTER:/opt /opt nfs ro,nolock,rsize=8192 0 0
$MASTER:/sbin /sbin nfs ro,nolock,rsize=8192 0 0
$MASTER:/usr /usr nfs ro,nolock,rsize=8192 0 0
#
# Private read-write mount points
#
$MASTER:/var/node.$NODE /var nfs rw,nolock,rsize=8192,wsize=8192 0 0
#
# Shared read-write mount points (MPICH 1.2.4, section 4.11.1: use "noac")
#
$MASTER:/home /home nfs rw,rsize=8192,wsize=8192,noac 0 0
---------- /etc/beowulf/nsswitch.conf ----------
#
# /etc/beowulf/nsswitch.conf
#
hosts: bproc
passwd: bproc files
group: bproc files
rpc: files
---------- /etc/beowulf/node_up ----------
#!/bin/sh
#
# /etc/beowulf/node_up
#
# This shell script is called automatically by BProc to perform any
# steps necessary to bring up the nodes. This is just a stub script
# pointing to the real script
NODE=$1
MASTER=`bpstat -a master`
BINDIR=/usr/lib/beoboot/bin
PATH=$BINDIR:/sbin:/usr/sbin:$PATH
$BINDIR/node_up $* || exit 1
# Clean out /tmp every boot
bpsh -n $NODE rm -r -f /tmp/*
bpsh -n $NODE rm -r -f /tmp/.* 2>/dev/null
# Ignore rm errors
exit 0
---------- /usr/lib/beoboot/bin/node_up ----------
#!/bin/sh
#--- 1.17.1 ---
#
# /usr/lib/beoboot/bin/node_up
#
#--- 1.17.1 ---
#---------------------------------------------------------------------
# Erik Arjan Hendriks <hen...@la...>
# Copyright (C) 2000 Scyld Computing Corporation
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 2 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write to the Free Software
# Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
#
# $Id: node_up,v 1.17 2002/01/04 00:39:59 hendriks Exp $
# $Id: node_up,v 1.17.1 2002/08/05 L. M. Baker $
#---------------------------------------------------------------------
umask 022 # Default umask for this stuff.
cd /
# Argument sanity checking
if [ "$1" = "" ] ; then
echo "Usage: node_up <nodenumber>"
exit 1
fi
NODE=$1
CONFIG=/etc/beowulf/config
BINDIR=/usr/lib/beoboot/bin
#--- 1.17.1 ---
PATH=$BINDIR:/sbin:/usr/sbin:$PATH
# Standard location of statd database files
SMDIR=/var/lib/nfs
# Location of statd database files on Red Hat Linux
if [ -f /etc/redhat-release ] ; then
SMDIR=$SMDIR/statd
fi
#--- 1.17.1 ---
#--- 1.17.1 ---
# Usage: do_bpcp node [ options ] from [ ... ] to
do_bpcp() {
if [ -z "$1" ] ; then
return
fi
local NODE=$1
shift
local OPTS=
while [ "${1:0:1}" = "-" ] ; do
local OPTS="$OPTS $1"
shift
done
local NFILES=$(( $# - 1 ))
if [ $NFILES -lt 1 ] ; then
return 1
fi
local FILES=
for (( i = $NFILES ; i ; i-- )) ; do
local FILES="$FILES $1"
shift
done
echo "node_up: Copying$FILES to $NODE:$1."
eval bpcp $OPTS $FILES $NODE:/rootfs$1
}
#--- 1.17.1 ---
# Usage: beoconfig tag [config_file]
beoconfig() {
local FILE=$2
if [ -z "$FILE" ] ; then FILE=${CONFIG} ; fi
if [ ! -f ${FILE} ] ; then
echo "Warning: ${FILE} file not found." >&2
return
fi
# These sed bits:
# - strip spaces
# - strip leading + trailing space
# - if line starts with $1, strip off $1 and print it.
sed -ne "s/#.*//" < ${FILE} \
-e "s/^[[:space:]]\+//;s/[[:space:]]\+\$//" \
-e "/^$1[[:space:]]/{s/^$1[[:space:]]\+//;p;}"
}
die() {
if [ -n "$1" ] ; then
echo 1>&2 "$1"
fi
if [ -n "$2" ] ; then
echo 1>&2 "Fatal error performing: $*"
fi
if [ -n "$MOUNTED" ] ; then
umount $INITRD_BUILD
rmdir $INITRD_BUILD
fi
exit 1
}
run_cmd() {
eval "$*" || die "" "$*"
}
# A message for the console on the remote end.
bpsh $NODE --stdout /dev/console \
echo -e "node_up: This is node $NODE.\nnode_up: boot log available in
/var/log/beowulf/node.$NODE on the master."
#---------------------------------------------------------------------
# First things first... set the system clock
echo "node_up: Setting system clock."
run_cmd $BINDIR/bdate $NODE
# mapping of ram devices at this point.
# /dev/ram0 <- initrd goes here
#run_cmd bpsh $NODE mount -nt proc none /proc
# XXX We need a way to figure out what interface is up at this point
# so that we know which one to slap a netmask onto.
#--- 1.17.1 ---
#echo "node_up: TODO set interface netmask."
#--- 1.17.1 ---
# ... and kick on that loop back interface
echo "node_up: Configuring loopback interface."
run_cmd bpsh $NODE ifconfig lo 127.0.0.1 netmask 255.0.0.0
run_cmd bpsh $NODE route add -net 127.0.0.0 netmask 255.0.0.0 lo
#---------------------------------------------------------------------
# Kernel Modules
#
# We should probably pay attention to "insmod" lines in the config
# file here...
KVER=`bpsh $NODE uname -r` # Make note of the remote kernel version
for module in `$BINDIR/pcilookup $NODE`; do
modprobe --node $NODE $module
done
#---------------------------------------------------------------------
# File Systems
#
# We need a way for setup_fs to let us know where the root filesystem
# is mounted...
$BINDIR/setup_fs $NODE || exit 1
# Populate it ?
# Setup scratch and tmp space...
#--- 1.17.1 ---
#run_cmd bpsh $NODE mkdir -p /rootfs/{tmp,scratch}
#run_cmd bpsh $NODE chmod 1777 /rootfs/{tmp,scratch}
#--- 1.17.1 ---
bplib -l | bpsh $NODE bplib -a -
#$BINDIR/setup_libs $NODE /rootfs || exit 1
# Copy over device nodes from the front end.
#--- 1.17.1 ---
#echo "node_up: Populating /dev and /etc."
#run_cmd bpsh $NODE mkdir -p /rootfs/{dev,etc}
#--- 1.17.1 ---
echo "node_up: Copying over device nodes."
run_cmd bpsh $NODE mkdir -p /rootfs/dev
#find /dev -mount -type b -o -type c | \
# sed -e 's!^/!!' | tar cf - -T - | bpsh $NODE tar -C /rootfs -xf -
DEVLIST="console zero null"
tar -C /dev -cf - $DEVLIST | bpsh $NODE tar -C /rootfs/dev -xf -
[ "$?" = "0" ] || die "" "copying device nodes"
echo "node_up: Copying over time zone info."
run_cmd bpcp /etc/localtime $NODE:/rootfs/etc/localtime
#--- 1.17.1 ---
# Copy configuration files
beoconfig bpcp | (
while read line ; do
if ! do_bpcp $NODE $line ; then
echo 1>&2 "Failed to copy files."
exit 1
fi
done
) || die
# Supply a default /etc/nsswitch.conf, if needed
if ! bpsh -n $NODE ls /rootfs/etc/nsswitch.conf >/dev/null 2>&1 ; then
echo "node_up: Copy over default nsswitch info."
run_cmd cat << EOF | bpsh -n $NODE --stdout /rootfs/etc/nsswitch.conf
cat
passwd: bproc
hosts: bproc
EOF
fi
#--- 1.17.1 ---
# nss_bproc is optional equipment so ignore errors....
#echo "node_up: Copying over bproc nss library."
#bpcp /lib/libnss_bproc.so.2 $NODE:/rootfs/lib
#---------------------------------------------------------------------
# Finish up...
#run_cmd bpsh $NODE umount -n /proc
run_cmd bpctl -S $NODE -r /rootfs
# This is a hack to make the dynamic linker work for things which are
# exec'ed remotely.
run_cmd bpsh -N $NODE /sbin/ldconfig -l /lib/ld-*
run_cmd bpsh -N $NODE hostname n$NODE
run_cmd $BINDIR/nodeinfo $NODE # Update node information DB
#--- 1.17.1 ---
# At this point, all file systems in $NODE:/etc/fstab have been mounted,
# except for network devices (host:export) without the "nolock" option.
# NFS devices without the "nolock" option require the RPC portmapper and
# status daemons. The status daemon requires read/write access to the
# $SMDIR/sm and $SMDIR/sm.bak directories, which must exist and be owned
# 700 by rpcuser (on Red Hat, see http://nfs.sourceforge.net, item 17).
# True if there are any NFS mounts in $NODE:/etc/fstab without the "nolock"
# option, i.e., that need the RPC portmapper and status daemon.
if [ `bpsh -n $NODE cat /etc/fstab | \
while read line ; do
if [ -n "$line" -a "${line:0:1}" != "#" ] ; then
echo "$line" | (
read device mountpt fstype options rest && \
echo "$fstype" | grep -q nfs && \
echo "$options" | grep -q -v nolock \
) && echo "$line"
fi
done | \
wc -l` -gt 0 ] ; then
# Create $SMDIR/sm and $SMDIR/sm.bak owned 700 by rpcuser (on Red Hat)
bpsh -n $NODE mkdir -m 700 -p $SMDIR/{sm,sm.bak}
if [ -f /etc/redhat-release ] ; then
bpsh -n $NODE chmod 700 $SMDIR
bpsh -n $NODE chown rpcuser $SMDIR
bpsh -n $NODE chgrp rpcuser $SMDIR
bpsh -n $NODE chown rpcuser $SMDIR/{sm,sm.bak}
bpsh -n $NODE chgrp rpcuser $SMDIR/{sm,sm.bak}
fi
# Start the RPC portmapper and status daemon
echo "node_up: Starting the RPC portmapper and status daemon."
bpsh -n $NODE initlog -c portmap
bpsh -n $NODE initlog -c rpc.statd
# Mount the network devices that were deferred earlier
echo "node_up: Completing deferred NFS mounts."
bpsh -n $NODE mount -a -t nfs
fi
#--- 1.17.1 ---
#--- A message for the log file and node's console.
echo "node_up: Node setup finished."
bpsh $NODE --stdout /dev/console echo "node_up: Node setup finished."
exit 0
---------- /usr/lib/beoboot/bin/setup_fs ----------
#!/bin/sh
#--- 1.4.1 ---
#
# /usr/lib/beoboot/bin/setup_fs
#
#--- 1.4.1 ---
#
# Erik Hendriks <hen...@la...>
#
# $Id: setup_fs,v 1.4 2001/11/30 17:52:40 hendriks Exp $
# $Id: setup_fs,v 1.4.1 2002/08/05 L. M. Baker $
#
# This bit of code is a first stab at understanding fstab for mount.
# It's a lot like mount dealing with its own fstab.
# Differences with just allowing mount to chew on an fstab:
# We can do fsck checks before attempting to mount.
# We can (re)create file systems before mounting.
# We can create mount points before mounting.
#
#--------------------------------------------------------------------------
# Generic functions to do operations on varUseful functions
#--------------------------------------------------------------------------
#--- 1.4.1 ---
# Usage: do_mkdir node { [ -s target ] name } ...
do_mkdir() {
if [ -z "$1" ] ; then
return
fi
local NODE=$1
shift
if [ -z "$1" ] ; then
return
fi
while [ -n "$1" ] ; do
if [ "$1" == "-s" ] ; then
shift
if [ -z "$1" -o -z "$2" ] ; then
return 1
fi
local target=`eval echo "$1"`
local name=`eval echo "$2"`
echo "setup_fs: ln -s $target $name"
if ! bpsh -n $NODE ln -s $target /rootfs$name ; then
return 1
fi
shift
else
if [ "$1" == "-m" ] ; then
shift
if [ -z "$1" -o -z "$2" ] ; then
return 1
fi
local mode=$1
local name=`eval echo "$2"`
echo "setup_fs: mkdir -m $mode -p $name"
if ! bpsh -n $NODE mkdir -m $mode -p /rootfs$name ; then
return 1
fi
shift
else
local name=`eval echo "$1"`
echo "setup_fs: mkdir -p $name"
if ! bpsh -n $NODE mkdir -p /rootfs$name ; then
return 1
fi
fi
fi
shift
done
}
# Usage: do_safefsck node device fstype
#--- 1.4.1 ---
do_safefsck() {
case $2 in
/dev/ram*)
echo "setup_fs: Hmmm...This appears to be a ramdisk. "
echo -n "setup_fs: I'm going to try to try checking the "
echo "filesystem (fsck) anyway."
echo -n "setup_fs: If it is a RAM disk the following will "
echo "fail harmlessly." ;;
esac
case $3 in
#--- 1.4.1 ---
ext*) bpsh -n $1 e2fsck -p $2 ; ret=$?
#--- 1.4.1 ---
if [ "$ret" = 1 ] ; then ret=0; fi ;;
swap) bpsh -n $1 chkswap $2 ; ret=$? ;;
*) ret=0;;
esac
[ "$ret" = 0 ]
}
do_fsck() {
echo "setup_fs: Checking $2 (type=$3)..."
case $2 in
/dev/ram*)
echo "setup_fs: Hmmm...This appears to be a ramdisk. "
echo -n "setup_fs: I'm going to try to try checking the "
echo "filesystem (fsck) anyway."
echo -n "setup_fs: If it is a RAM disk the following will "
echo "fail harmlessly." ;;
esac
case $3 in
#--- 1.4.1 ---
ext*) bpsh -n $1 e2fsck -y $2 ; ret=$?
#--- 1.4.1 ---
if [ "$ret" = 1 ] ; then ret=0; fi ;;
swap) bpsh -n $1 chkswap $2 ; ret=$? ;;
*) ret=0;;
esac
[ "$ret" = 0 ]
}
# Usage: do_mkfs node device fstype fssize
do_mkfs() {
echo "setup_fs: Creating $3 on $2..."
case $3 in
ext2) bpsh -n $1 mke2fs -q $2 $4 ; ret=$? ;;
#--- 1.4.1 ---
ext3) bpsh -n $1 mke2fs -q -j $2 $4 ; ret=$? ;;
#--- 1.4.1 ---
swap) bpsh -n $1 mkswap $2 $4 ; ret=$? ;;
*) ret=0;;
esac
[ "$ret" = 0 ]
}
# Usage: load_fs node fstype
load_fs () {
if [ -z "`bpsh -n $1 grep $2 /proc/filesystems`" ] ; then
modprobe --node $1 $2
fi
}
# Usage: do_mount node device mountpt fstype options
do_mount() {
#--- 1.4.1 ---
# Load file system module for all fstypes so they can be mounted later
if [ "$4" != "swap" ] ; then
load_fs $1 $4
fi
# Don't mount devices with the "noauto" option
if echo $5 | grep -q noauto ; then
return
fi
#--- 1.4.1 ---
echo "setup_fs: Mounting $2 on $3... (type=$4; options=$5)"
case $4 in
swap) bpsh -n $1 swapon $2 ;;
#--- 1.4.1 ---
# Defer mounts of network devices (host:export) without the "nolock" option
*) if [ -z "`echo $2 | grep :`" -o \
-n "`echo $5 | grep nolock`" ] ; then
if bpsh -n $1 mount -nt $4 -o $5 $2 $3 ; then
if [ "${mountpt:0:1}" == "/" ] ; then
echo "$device $mountpt $fstype $options"
>>$MTABFILE
fi
fi
else
echo "setup_fs: Mount deferred until lock daemon running."
fi ;;
#--- 1.4.1 ---
esac
}
# Usage: beoconfig tag [config_file]
beoconfig() {
local FILE=$2
if [ -z "$FILE" ] ; then FILE=${CONFIG} ; fi
if [ ! -f ${FILE} ] ; then
echo "Warning: ${FILE} file not found." >&2
return
fi
# These sed bits:
# - strip spaces
# - strip leading + trailing space
# - if line starts with $1, strip off $1 and print it.
sed -ne "s/#.*//" < ${FILE} \
-e "s/^[[:space:]]\+//;s/[[:space:]]\+\$//" \
-e "/^$1[[:space:]]/{s/^$1[[:space:]]\+//;p;}"
}
#--------------------------------------------------------------------------
# Argument sanity checking
if [ "$1" = "" ] ; then
echo "Usage: setup_fs <nodenumber>"
exit 1
fi
echo "setup_fs: Configuring node filesystems..."
NODE=$1
CONFIG=/etc/beowulf/config
#--- 1.4.1 ---
BINDIR=/usr/lib/beoboot/bin
PATH=$BINDIR:/sbin:/usr/sbin:$PATH
#--- 1.4.1 ---
MASTER=`bpstat -a master`
RAMDISK=/dev/ram3
FSCK=`beoconfig fsck`
MKFS=`beoconfig mkfs`
#--- 1.4.1 ---
MKDIR=`beoconfig mkdir`
#--- 1.4.1 ---
#--- 1.4.1 ---
# Select which FSTAB to use.
#if [ -r /etc/beowulf/fstab.$NODE ] ; then
# FSTAB=/etc/beowulf/fstab.$NODE
#else
# FSTAB=/etc/beowulf/fstab
#fi
#echo "setup_fs: Using $FSTAB"
#--- 1.4.1 ---
# XXX We need a way to pick up per-node commands!
# Control flags
#
#--- 1.4.1 ---
# FSCK =
#--- 1.4.1 ---
# 0 = Don't touch anything, just try to mount.
# 1 = Ok to fsck but don't do anything if it fails.
# 2 = fsck and do mkfs if it fails.
# 3 = skip fsck go straight to mkfs
#
#--- 1.4.1 ---
# Sanity check FSCK (default = 1)
#--- 1.4.1 ---
case $FSCK in
"never"|"safe"|"full") ;;
"") FSCK=safe ;;
*)
echo 1>&2 "Invalid value '$FSCK' for fsck tag in $CONFIG."
exit 1 ;;
esac
case $MKFS in
"never"|"if_needed"|"always") ;;
"") MKFS=if_needed ;;
*)
echo 1>&2 "Invalid value '$MKFS' for mkfs tag in $CONFIG."
exit 1 ;;
esac
#--- 1.4.1 ---
# Select which FSTAB to use.
FSTAB=/etc/beowulf/fstab.$NODE
if [ ! -r $FSTAB ] ; then
FSTAB=/etc/beowulf/fstab
fi
#--- 1.4.1 ---
if [ ! -f $FSTAB ] ; then
echo 1>&2 "setup_fs: $FSTAB (file system table) is missing."
exit 1
fi
#--- 1.4.1 ---
# Create default directories
if ! do_mkdir $NODE $MKDIR ; then
echo 1>&2 "Failed to create default directories."
exit 1
fi
#--- 1.4.1 ---
# Ok... This is one big nasty pipe line... Here's what this mess does:
# * Use sed to remove comments. (starting with #)
# * Run it all though eval to do variable substitutions.
# * Go through all the lines doing:
# + Ignore the empty lines
# + Remove trailing slashes from the mount points
# + Prepend a number that will allow us to sort the mount points.
# * Sort the mount points
#--- 1.4.1 ---
# * On each point point (depending on the FSCK policy):
#--- 1.4.1 ---
# + fsck the file system
# + if bad, possibly recreate the file system.
# + mount the file system
#--- 1.4.1 ---
# * Create /etc/fstab for the new node.
#--- 1.4.1 ---
# * Create /etc/mtab for the new node.
MTABFILE=/tmp/.setup_fs.mtab.$$
if ! rm -f $MTABFILE ; then
echo 1>&2 "setup_fs: $MTABFILE already exists and can't remove."
exit 1
fi
touch $MTABFILE
#--- 1.4.1 ---
FSTABFILE=/tmp/.setup_fs.fstab.$$
if ! rm -f $FSTABFILE ; then
echo 1>&2 "setup_fs: $FSTABFILE already exists and can't remove."
exit 1
fi
touch $FSTABFILE
echo "setup_fs: Using $FSTAB."
cat $FSTAB | \
while read line ; do
if [ -z "$line" -o "${line:0:1}" == "#" ] ; then
echo "$line" >>$FSTABFILE
else
line=`eval echo "$line"`
echo "$line" >>$FSTABFILE
echo "$line"
fi
done | \
#--- 1.4.1 ---
while read device mountpt fstype options junk ; do
if [ -z "$options" ] ; then
#--- 1.4.1 ---
# if [ -n "$device" ] ; then
#--- 1.4.1 ---
echo 1>&2 "Ignoring incomplete line: $device $mountpt $fstype $options
$junk"
#--- 1.4.1 ---
# fi
#--- 1.4.1 ---
continue
fi
# Sanitize mount point... (squeeze multiple slashes, remove
# any trailing slashes)
mountpt=`echo $mountpt | sed -e 's!/\+!/!g' -e 's!/\+$!!'`
slashct=`echo $mountpt | tr -cd / | wc -c`
if [ -z $mountpt ] ; then mountpt=/ ; fi
echo $slashct $device $mountpt $fstype $options
done | \
sort -n | \
(while read slashct device mountpt fstype options junk ; do
if [ -z "$options" ] ; then
#--- 1.4.1 ---
# if [ -n "$device" ] ; then
#--- 1.4.1 ---
echo 1>&2 "Ignoring incomplete line: $device $mountpt $fstype $options
$junk"
#--- 1.4.1 ---
# fi
#--- 1.4.1 ---
continue
fi
# Get a file system size option if it's there...
fssize=`echo $options | sed -e 's/.*fs_size=\([0-9]\+\).*/\1/p;d'`
options=`echo $options | sed -e 's/fs_size=[0-9]\+//g'`
if [ -z "$options" ] ; then options=defaults; fi
# Everything gets a "/rootfs" prefix at this stage. Also we create the
# mount points as needed. This requires that people have their fstab
# in some resonable order. (It might be hard for us to sort it....)
#--- 1.4.1 ---
# if echo $mountpt | grep -q '^/' ; then
# echo "$device $mountpt $fstype $options" >> $MTABFILE
# fi
#--- 1.4.1 ---
# see to it that the device node exists on the remote machine
#--- 1.4.1 ---
if [ "${device:0:4}" == "/dev" ] ; then
(cd / ; tar cf - $device) | bpsh -n $NODE tar xf -
#--- 1.4.1 ---
fi
mknewfs=0
if [ $MKFS = "always" ]; then
mknewfs=1
else
case $FSCK in
"never") ;; # No FSCK!
"safe") if ! do_safefsck $NODE $device $fstype ; then
echo 1>&2 "setup_fs: RAM disks fail FSCK, that's OK"
echo 1>&2 "setup_fs: FSCK failure. (OK for RAM disks)"
mknewfs=1
fi ;;
"full") if ! do_fsck $NODE $device $fstype ; then
echo 1>&2 "setup_fs: FSCK failure. (OK for RAM disks)"
mknewfs=1
fi ;;
esac
fi
if [ $MKFS != "never" -a "$mknewfs" = 1 ] ; then
if ! do_mkfs $NODE $device $fstype $fssize ; then
echo 1>&2 "Failed to create $fstype file system on $device."
exit 1
fi
fi
# See to it that the mount point exists before trying to mount.
#--- 1.4.1 ---
if [ "${mountpt:0:1}" == "/" ] ; then
if ! bpsh -n $NODE mkdir -p /rootfs$mountpt ; then
#--- 1.4.1 ---
echo 1>&2 "Failed to create mount point."
exit 1
fi
fi
#--- 1.4.1 ---
if ! do_mount $NODE $device /rootfs$mountpt $fstype $options ; then
#--- 1.4.1 ---
echo 1>&2 "Failed to mount $device on $mountpt."
exit 1
fi
done
#--- 1.4.1 ---
# Create fstab on the remote node...
if ! bpcp $FSTABFILE $NODE:/rootfs/etc/fstab ; then
echo 1>&2 "Failed to create /etc/fstab."
exit 1
fi
rm -f $FSTABFILE
#--- 1.4.1 ---
# Finally, create mtab on the remote node...
#--- 1.4.1 ---
# if ! bpsh -n $NODE mkdir -p /rootfs/etc ; then
# echo 1>&2 "Failed to create /etc."
# exit 1
# fi
#--- 1.4.1 ---
if ! bpcp $MTABFILE $NODE:/rootfs/etc/mtab ; then
echo 1>&2 "Failed to create /etc/mtab."
exit 1
fi
rm -f $MTABFILE
)
# Exit with status of this nutty pipeline.
|
|
From: Erik A. H. <er...@he...> - 2002-07-30 15:36:59
|
On Sat, Jul 27, 2002 at 10:46:22PM -0300, Carlos Carvalho wrote: > Folks, > > I've just discovered bproc and it looks very interesting. However I > have a question that is crucial to our usage. > > At the moment we have a small cluster of identical machines. Our users > have FORTRAN programs that right now are not parallelized and run on a > single machine. If I understood the docs, bproc doesn't automatically > move jobs between nodes. That is correct. A process can only move itself. > How then can one distribute the load among the > nodes? The only way I see is that the users will have to launch their > programs via bpsh, specifying the node where they want the program to > run. However, how can one discover the load of each node? The only way > I see is to run > > % bpsh -aps 'cat /proc/loadavg|cut -f1' > > or something similar. Is there a better way? > > It'd be really nice if bproc did load balancing :-) I try to draw a pretty clear line between what BProc's job is and what the scheduler's job is. As I see it chosing which node to place a job on is the scheduler's job and actually putting it there is BProc's job. In other words, BProc doesn't make those decisions because I've decided that scheduling is a separate problem. - Erik P.S. I'm trying to get a simple BProc-oriented scheduler we've written out the door. |
|
From: Carlos C. <ca...@fi...> - 2002-07-28 01:46:39
|
Folks, I've just discovered bproc and it looks very interesting. However I have a question that is crucial to our usage. At the moment we have a small cluster of identical machines. Our users have FORTRAN programs that right now are not parallelized and run on a single machine. If I understood the docs, bproc doesn't automatically move jobs between nodes. How then can one distribute the load among the nodes? The only way I see is that the users will have to launch their programs via bpsh, specifying the node where they want the program to run. However, how can one discover the load of each node? The only way I see is to run % bpsh -aps 'cat /proc/loadavg|cut -f1' or something similar. Is there a better way? It'd be really nice if bproc did load balancing :-) |
|
From: Sadanand K. <sa...@ci...> - 2002-07-26 07:01:47
|
No, I am running bpslave on a different m/c than that of the one running bpmaster. Sadanand On Thu, 25 Jul 2002, Wilton Wong wrote: > > Are you trying to run bpslave on the same machine as bpmaster and the resto of > the bproc processes are running ? > > - Wilton > > ----[ Wilton William Wong ]--------------------------------------------- > 11060-166 Avenue Ph : 01-780-456-9771 High Performance UNIX > Edmonton, Alberta FAX: 01-780-456-9772 and Linux Solutions > T5X 1Y3, Canada URL: http://www.harddata.com > -------------------------------------------------------[ Hard Data Ltd. ]---- > > > > ------------------------------------------------------- > This sf.net email is sponsored by: Jabber - The world's fastest growing > real-time communications platform! Don't just IM. Build it in! > http://www.jabber.com/osdn/xim > _______________________________________________ > BProc-users mailing list > BPr...@li... > https://lists.sourceforge.net/lists/listinfo/bproc-users > |
|
From: Wilton W. <ww...@ha...> - 2002-07-26 01:29:52
|
Are you trying to run bpslave on the same machine as bpmaster and the resto of the bproc processes are running ? - Wilton ----[ Wilton William Wong ]--------------------------------------------- 11060-166 Avenue Ph : 01-780-456-9771 High Performance UNIX Edmonton, Alberta FAX: 01-780-456-9772 and Linux Solutions T5X 1Y3, Canada URL: http://www.harddata.com -------------------------------------------------------[ Hard Data Ltd. ]---- |
|
From: Erik A. H. <er...@he...> - 2002-07-25 23:12:18
|
On Thu, Jul 25, 2002 at 12:38:04AM -0500, Sadanand Kota wrote: > Hi, > Erik - thanks for reply. > I started the slave with -d option and its not exiting, but > the node status is still down. Also,where are the the log files. > /var/log/beowulf has no files. > > How do I check the "magic cookies"? It's tough to look at those. Both daemons will barf out with an error if there's a mistmatch though. It sounds like something weird is going on with your setup. At this point, I'd whip out strace and start looking at what's actually going on with connection establishment. - Erik > On Tue, 23 Jul 2002, Erik Arjan Hendriks wrote: > > > On Tue, Jul 23, 2002 at 03:28:41AM -0500, Sadanand Kota wrote: > > > Hi, > > > I have installed bproc on 2 of my linux systems taking the RPM from > > > Clustermatic. > > > I am able to run bpmaster and bpslave succesfully. But when I check the > > > status of the machines using bpstat, It always gives status as down. > > > If I try /etc/beowulf/node_up 0, the ouput is > > > > > > node_up: Setting system clock. > > > error moving to node 0: Invalid argument > > > Fatal error performing: /usr/lib/beoboot/bin/bdate 0 > > > > > > (The same with /etc/beowulf/node_up 1) > > > > > > Any idea how to change node status to up? > > > > "down" means that the slave is not connected to the master. That > > basically means that the slave either isn't connected with TCP or it > > hasn't sent the right magic cookies. > > > > There's no way to change "down" to anything else manually. Once the > > slave connects, the state will change to boot and the maste rdaemon > > will run /etc/beowulf/node_up. If that exits with status 0, the state > > will change to up. Otherwise the state will change to error. > > Manually setting the node state to "down" (with bpctl) will cause the > > slave to be disconnected. > > > > I'd check to make sure bpslave is actually connecting to the master. > > Try running bpslave with -d to make sure it's not just exiting with > > some error. Also, check the system logs. > > > > - Erik > > > > > > ------------------------------------------------------- > This sf.net email is sponsored by: Jabber - The world's fastest growing > real-time communications platform! Don't just IM. Build it in! > http://www.jabber.com/osdn/xim > _______________________________________________ > BProc-users mailing list > BPr...@li... > https://lists.sourceforge.net/lists/listinfo/bproc-users -- Erik Arjan Hendriks Printed On 100 Percent Recycled Electrons er...@he... Contents may settle during shipment |
|
From: Larry B. <ba...@us...> - 2002-07-25 18:33:53
|
I have modified the bproc/beoboot node startup scripts to automatically =
start the NFS RPC portmapper and status daemon so that an NFS mount =
without the "nolock" option succeeds. I also fixed a couple annoyances =
(i.e., tar failures), and added support for ext3 fstypes.
Locking support is required for the MPI-2 parallel IO routines (I have =
MPICH 1.2.4). I am not sure yet that it is completely working; while =
things have improved, a couple of the MPICH IO test routines still fail. =
I hope to track that down soon.
I'm going to try to add a bit more user control of the file system setup =
done in setup_fs through the /etc/beowulf/config configuration file. =
For example, I'd like to add entries that list the names of directories =
that get automatically created, such as /proc, /etc, /tmp, and /scratch. =
These are hard-coded now. Also, I'd like to make support for the RPC =
portmapper and status daemon optional (e.g., always, auto, never). =
Finally, I'd like to get the contents for /etc/nsswitch.conf either from =
a file in /etc/beowulf, or from /etc/beowulf/config.
Below are the files I have modified/use:
/etc/exports The NFS file systems exported by the master
/etc/beowulf/fstab The file systems file for the nodes
/etc/beowulf/config The bproc/beoboot configuration file
/etc/beowulf/node_up The beoboot stub node startup script
/usr/lib/beoboot/bin/setup_fs The beoboot node file system setup =
script
After rebooting, this is what /var/log/beowulf/node.0 looks like:
node_up: Setting system clock.
node_up: TODO set interface netmask.
node_up: Configuring loopback interface.
setup_fs: Configuring node filesystems...
setup_fs: Using /etc/beowulf/fstab
setup_fs: Checking 192.168.50.209:/bin (type=3Dnfs)...
setup_fs: Mounting 192.168.50.209:/bin on /rootfs/bin... (type=3Dnfs; =
options=3Dro,nolock,rsize=3D8192)
setup_fs: Checking 192.168.50.209:/home (type=3Dnfs)...
setup_fs: Mounting 192.168.50.209:/home on /rootfs/home... (type=3Dnfs; =
options=3Drw,rsize=3D8192,wsize=3D8192,noac)
setup_fs: Mount deferred until lock daemon running.
setup_fs: Checking 192.168.50.209:/opt (type=3Dnfs)...
setup_fs: Mounting 192.168.50.209:/opt on /rootfs/opt... (type=3Dnfs; =
options=3Dro,nolock,rsize=3D8192)
setup_fs: Checking 192.168.50.209:/sbin (type=3Dnfs)...
setup_fs: Mounting 192.168.50.209:/sbin on /rootfs/sbin... (type=3Dnfs; =
options=3Dro,nolock,rsize=3D8192)
setup_fs: Checking 192.168.50.209:/usr (type=3Dnfs)...
setup_fs: Mounting 192.168.50.209:/usr on /rootfs/usr... (type=3Dnfs; =
options=3Dro,nolock,rsize=3D8192)
setup_fs: Checking 192.168.50.209:/var/node.0 (type=3Dnfs)...
setup_fs: Mounting 192.168.50.209:/var/node.0 on /rootfs/var... =
(type=3Dnfs; options=3Drw,nolock,rsize=3D8192,wsize=3D8192)
setup_fs: Checking none (type=3Dproc)...
setup_fs: Mounting none on /rootfs/proc... (type=3Dproc; =
options=3Ddefaults)
setup_fs: Checking none (type=3Ddevpts)...
setup_fs: Mounting none on /rootfs/dev/pts... (type=3Ddevpts; =
options=3Dgid=3D5,mode=3D620)
node_up: populating /dev and /etc
node_up: Copying over device nodes.
node_up: Copying over time zone info.
node_up: Copy over nsswitch info.
node_up: Node setup finished.
/etc/beowulf/node_up: Copy files into /etc for /etc/nsswitch.conf.
/etc/beowulf/node_up: Start the RPC portmapper and status daemon.
/etc/beowulf/node_up: Complete deferred network mounts.
/etc/beowulf/node_up: Soft link /tmp to /var/tmp.
/etc/beowulf/node_up: Soft link /scratch to /home/node.0.
Larry Baker
US Geological Survey
ba...@us...
#
# /etc/exports
#
# Read-only exports
#
/bin 192.168.50.209/255.255.255.224(ro)
/opt 192.168.50.209/255.255.255.224(ro)
/sbin 192.168.50.209/255.255.255.224(ro)
/usr 192.168.50.209/255.255.255.224(ro)
#
# Private read-write exports
#
/var/node.0 192.168.50.210(rw,no_root_squash)
/var/node.1 192.168.50.211(rw,no_root_squash)
#
# Shared read-write exports (MPICH 1.2.4, section 4.11.1: use "noac")
#
/home 130.118.45.45/255.255.252.0(rw) \
192.168.50.209/255.255.255.224(rw,no_root_squash)
#
# /etc/beowulf/fstab
#
# This file is the fstab for nodes.
# One difference is that we allow for shell variable expansions...
#
# Variables that will get substituted:
# MASTER =3D IP address of the master node. (good for doing NFS =
mounts)
# NODE =3D slave's node no.
# RAMDISK =3D device name (/dev/<ramdev>) of a device suitable for a =
root fs
#
# A cooked version (with variable substitution) of this file will be =
copied
# to /etc/fstab on the slave node.
#
# The root file system is a tmpfs provided by the boot scripts. You
# can mount something on / if you'd like but due to oddities in the file
# caching code it's not recommended right now.
# This is the default setup from beofdisk, once you setup your disks.
#/dev/hda2 swap swap defaults 0 0
#/dev/hda3 / ext2 defaults 0 0
# These should always be added
none /proc proc defaults 0 0
none /dev/pts devpts gid=3D5,mode=3D620 0 0
# NFS (for example and default friendliness)
# Note: Mounts without the "nolock" option are deferred until the RPC =
portmapper
# and status daemons are running -- see the instructions in =
/etc/beowulf/node_up
#
# Read-only mount points
#
$MASTER:/bin /bin nfs ro,nolock,rsize=3D8192 0 0
$MASTER:/opt /opt nfs ro,nolock,rsize=3D8192 0 0
$MASTER:/sbin /sbin nfs ro,nolock,rsize=3D8192 0 0
$MASTER:/usr /usr nfs ro,nolock,rsize=3D8192 0 0
#
# Private read-write mount points
#
$MASTER:/var/node.$NODE /var nfs rw,nolock,rsize=3D8192,wsize=3D8192 0 0
#
# Shared read-write mount points (MPICH 1.2.4, section 4.11.1: use =
"noac")
#
$MASTER:/home /home nfs rw,rsize=3D8192,wsize=3D8192,noac 0 0
#
# /etc/beowulf/config
#
# Sample Beowulf Configuration file
#
# $Id: config,v 1.7 2002/03/12 20:54:58 hendriks Exp $
#
#
# Default cluster configuration (uses eth1, and 192.168.1.0/24)
# interface: internal cluster interface (the one connected to the =
nodes)
#
# iprange: range of IP addresses for nodes.
interface eth1 192.168.50.209 255.255.255.224
# Setup addresses in the cluster. The "nodes" line is REQUIRED here to =
specify
# cluster size. "iprange" and "ip" assign addresses to nodes. The "0" =
in
# iprange here tells it to start assigning at node zero.
nodes 2
iprange 0 192.168.50.210 192.168.50.211
# Default libraries (These are the libraries which will automagically be =
made
# available to the slaves.)
# No line continuation, multiple entries allowed
libraries /lib /usr/lib /usr/X11R6/lib
libraries /opt/intel/compiler60/ia32/lib /opt/intel/mkl/lib/32
# Default file system policies.
fsck full
mkfs if_needed
# Default location of boot images
bootfile /var/beowulf/boot.img
kernelimage /boot/vmlinuz-2.4.18-lanl.16
kernelcommandline apm=3Dpower-off
# Here we assign MAC addresses to nodes. Nodes can have multiple MAC
# addresses. Here the optional "0" zero argument states that the =
address
# should be assigned to node zero. Node lines following that will =
assign
# addresses to nodes sequentially
# D-Link DFE-500TX PCI card (DEC 21140-A chip)
#node 0 00:40:05:36:66:83
#node 00:40:05:40:60:e7
# Onboard RealTek RTL8100BL chip)
node 0 00:40:63:c0:5e:08
node 00:40:63:c0:5f:b4
#!/bin/sh
#
# /etc/beowulf/node_up
#
# This shell script is called automatically by BProc to perform any
# steps necessary to bring up the nodes. This is just a stub script
# pointing to the real script
NODE=3D$1
MASTER=3D`bpstat -a master`
BINDIR=3D/usr/lib/beoboot/bin
PATH=3D/sbin:/usr/sbin:$PATH:$BINDIR
# Standard location of statd database files
#SMDIR=3D/var/lib/nfs
# Location of statd database files on Red Hat Linux
SMDIR=3D/var/lib/nfs/statd
$BINDIR/node_up $* || exit 1
# At this point, all file systems in $NODE:/etc/fstab have been mounted,
# except for network devices (host:export) without the "nolock" option.
# The following sections finish preparing the node for "mount -a", =
below.
# (Currently, only the RPC portmapper and status daemon are started, if
# necessary, for NFS file systems (fstype=3Dnfs). Other fstypes may =
require
# similar preparation.)
# NFS devices without the "nolock" option require the RPC portmapper and
# status daemons. The status daemon requires read/write access to the
# /var/lib/nfs/statd/sm and .../sm.bak directories, which must exist and
# be owned 700 by rpcuser (see http://nfs.sourceforge.net, item 17).
# True if there are any NFS mounts in $NODE:/etc/fstab without the =
"nolock"
# option, i.e., that need the RPC portmapper and status daemon.
if [ `bpsh -n $NODE cat /etc/fstab | \
while read line ; do
if [ -n "${line}" -a "${line:0:1}" !=3D "#" ] ; then
echo "${line}" | ( \
read device mountpt fstype options rest && \
echo ${fstype} | grep -q "nfs" && \
echo ${options} | grep -q -v "nolock" \
) && echo "${line}"
fi
done | \
wc -l` > 0 ] ; then
# Copy the files needed for the Name Service Switch (NSS) to /etc
# (needed by getpwnam(), etc., in #include <pwd.h>, called by rpc.statd)
echo "/etc/beowulf/node_up: Copy files into /etc for =
/etc/nsswitch.conf."
bpcp /etc/passwd $NODE:/etc
bpcp /etc/group $NODE:/etc
bpcp /etc/rpc $NODE:/etc
# Replace the NSS config file
cat << EOF | bpsh -n $NODE --stdout /etc/nsswitch.conf cat
#
# /etc/nsswitch.conf
#
hosts: bproc
passwd: bproc files
group: bproc files
rpc: files
EOF
# Create /var/lib/nfs/statd/sm and .../sm.bak owned 700 by rpcuser (Red =
Hat)
bpsh -n $NODE mkdir -p $SMDIR/sm
bpsh -n $NODE chmod 700 $SMDIR/sm
bpsh -n $NODE mkdir -p $SMDIR/sm.bak
bpsh -n $NODE chmod 700 $SMDIR/sm.bak
if echo $SMDIR | grep -q "/statd" ; then
bpsh -n $NODE chmod 700 $SMDIR
bpsh -n $NODE chown rpcuser $SMDIR
bpsh -n $NODE chgrp rpcuser $SMDIR
bpsh -n $NODE chown rpcuser $SMDIR/sm
bpsh -n $NODE chgrp rpcuser $SMDIR/sm
bpsh -n $NODE chown rpcuser $SMDIR/sm.bak
bpsh -n $NODE chgrp rpcuser $SMDIR/sm.bak
fi
# Start the RPC portmapper and status daemon
echo "/etc/beowulf/node_up: Start the RPC portmapper and status =
daemon."
bpsh -n $NODE initlog -c portmap
bpsh -n $NODE initlog -c rpc.statd
fi
# Mount the network devices that were deferred earlier
echo "/etc/beowulf/node_up: Complete deferred network mounts."
bpsh -n $NODE mount -a
##### Add commands here to complete the setup of the node #####
# Soft link /tmp to /var/tmp (NFS /var must be no_root_squash)
echo "/etc/beowulf/node_up: Soft link /tmp to /var/tmp."
bpsh -n $NODE rmdir --ignore-fail-on-non-empty /tmp
bpsh -n $NODE mkdir -p /var/tmp
bpsh -n $NODE ln -s /var/tmp /tmp
bpsh -n $NODE chmod 1777 /var/tmp
# Clean out /tmp every boot
bpsh -n $NODE /bin/rm -r -f /var/tmp/*
bpsh -n $NODE /bin/rm -r -f /var/tmp/.* 2>/dev/null
# Soft link /scratch to /home/node.$NODE (NFS /home must be =
no_root_squash)
echo "/etc/beowulf/node_up: Soft link /scratch to /home/node.$NODE."
bpsh -n $NODE rmdir --ignore-fail-on-non-empty /scratch
bpsh -n $NODE mkdir -p /home/node.$NODE
bpsh -n $NODE ln -s /home/node.$NODE /scratch
bpsh -n $NODE chmod 1777 /home/node.$NODE
exit 0
#!/bin/sh
#
# /usr/lib/beoboot/bin/setup_fs
#
# Erik Hendriks <hen...@la...>
#
# $Id: setup_fs,v 1.4 2001/11/30 17:52:40 hendriks Exp $
#
# This bit of code is a first stab at understanding fstab for mount.
# It's a lot like mount dealing with its own fstab.
# Differences with just allowing mount to chew on an fstab:
# We can do fsck checks before attempting to mount.
# We can (re)create file systems before mounting.
# We can create mount points before mounting.
#
#------------------------------------------------------------------------=
--
# Generic functions to do operations on varUseful functions
#------------------------------------------------------------------------=
--
# Usage: fsckfs node device fstype
do_safefsck() {
case $2 in
/dev/ram*)
echo "setup_fs: Hmmm...This appears to be a ramdisk. "
echo -n "setup_fs: I'm going to try to try checking the "
echo "filesystem (fsck) anyway."
echo -n "setup_fs: If it is a RAM disk the following will "
echo "fail harmlessly." ;;
esac
case $3 in
ext*) bpsh -n $1 e2fsck -p $2 ; ret=3D$?
if [ "$ret" =3D 1 ] ; then ret=3D0; fi ;;
swap) bpsh -n $1 chkswap $2 ; ret=3D$? ;;
*) ret=3D0;;
esac
[ "$ret" =3D 0 ]
}
do_fsck() {
echo "setup_fs: Checking $2 (type=3D$3)..."
case $2 in
/dev/ram*)
echo "setup_fs: Hmmm...This appears to be a ramdisk. "
echo -n "setup_fs: I'm going to try to try checking the "
echo "filesystem (fsck) anyway."
echo -n "setup_fs: If it is a RAM disk the following will "
echo "fail harmlessly." ;;
esac
case $3 in
ext*) bpsh -n $1 e2fsck -y $2 ; ret=3D$?
if [ "$ret" =3D 1 ] ; then ret=3D0; fi ;;
swap) bpsh -n $1 chkswap $2 ; ret=3D$? ;;
*) ret=3D0;;
esac
[ "$ret" =3D 0 ]
}
# Usage: do_mkfs node device fstype fssize
do_mkfs() {
echo "setup_fs: Creating $3 on $2..."
case $3 in
ext2) bpsh -n $1 mke2fs -q $2 $4 ; ret=3D$? ;;
ext3) bpsh -n $1 mke2fs -q -j $2 $4 ; ret=3D$? ;;
swap) bpsh -n $1 mkswap $2 $4 ; ret=3D$? ;;
*) ret=3D0;;
esac
[ "$ret" =3D 0 ]
}
# Usage: load_fs node fstype
load_fs () {
if [ -z "`bpsh -n $1 grep $2 /proc/filesystems`" ] ; then
modprobe --node $1 $2
fi
}
# Usage: do_mount node device mountpt fstype options
do_mount() {
# Load file system module for all fstypes so they can be mounted later
if [ "$4" !=3D "swap" ] ; then
load_fs $1 $4
fi
# Don't mount devices with the "noauto" option
if [ -n "`echo $5 | grep noauto`" ] ; then
return
fi
echo "setup_fs: Mounting $2 on $3... (type=3D$4; options=3D$5)"
case $4 in
swap) bpsh -n $1 swapon $2 ;;
# Defer mounts of network devices (host:export) without the "nolock" =
option
*) if [ -z "`echo $2 | grep :`" -o \
-n "`echo $5 | grep nolock`" ] ; then
if bpsh -n $1 mount -nt $4 -o $5 $2 $3 ; then
if [ "${mountpt:0:1}" =3D=3D "/" ] ; then
echo "$device $mountpt $fstype $options" >> =
$MTABFILE
fi
fi
else
echo "setup_fs: Mount deferred until lock daemon =
running."
fi ;;
esac
}
# Usage: beoconfig tag [config_file]
beoconfig() {
local FILE=3D$2
if [ -z "$FILE" ] ; then FILE=3D${CONFIG} ; fi
if [ ! -f ${FILE} ] ; then
echo "Warning: ${FILE} file not found." >&2
return
fi
# These sed bits:
# - strip spaces
# - strip leading + trailing space
# - if line starts with $1, strip off $1 and print it.
sed -ne "s/#.*//" < ${FILE} \
-e "s/^[[:space:]]\+//;s/[[:space:]]\+\$//" \
-e "/^$1[[:space:]]/{s/^$1[[:space:]]\+//;p;}"
}
#------------------------------------------------------------------------=
--
# Argument sanity checking
if [ "$1" =3D "" ] ; then
echo "Usage: setup_fs <nodenumber>"
exit 1
fi
echo "setup_fs: Configuring node filesystems..."
NODE=3D$1
PATH=3D/sbin:/usr/sbin:$PATH:/usr/lib/beoboot/bin
CONFIG=3D/etc/beowulf/config
MASTER=3D`bpstat -a master`
RAMDISK=3D/dev/ram3
FSCK=3D`beoconfig fsck`
MKFS=3D`beoconfig mkfs`
# Select which FSTAB to use.
FSTAB=3D/etc/beowulf/fstab.$NODE
if [ ! -r $FSTAB ] ; then
FSTAB=3D/etc/beowulf/fstab
fi
echo "setup_fs: Using $FSTAB"
# XXX We need a way to pick up per-node commands!
# Control flags
#
# FSCK =3D
# 0 =3D Don't touch anything, just try to mount.
# 1 =3D Ok to fsck but don't do anything if it fails.
# 2 =3D fsck and do mkfs if it fails.
# 3 =3D skip fsck go straight to mkfs
#
# Sanity check FSCK (default =3D 1)
case $FSCK in
"never"|"safe"|"full") ;;
"") FSCK=3Dsafe ;;
*)
echo 1>&2 "Invalid value '$FSCK' for fsck tag in $CONFIG."
exit 1 ;;
esac
case $MKFS in
"never"|"if_needed"|"always") ;;
"") MKFS=3Dif_needed ;;
*)
echo 1>&2 "Invalid value '$MKFS' for mkfs tag in $CONFIG."
exit 1 ;;
esac
if [ ! -f $FSTAB ] ; then
echo 1>&2 "setup_fs: $FSTAB (file system table) is missing."
exit 1
fi
# Ok... This is one big nasty pipe line... Here's what this mess does:
# * Use sed to remove comments. (starting with #)
# * Run it all though eval to do variable substitutions.
# * Go through all the lines doing:
# + Ignore the empty lines
# + Remove trailing slashes from the mount points
# + Prepend a number that will allow us to sort the mount points.
# * Sort the mount points
# * On each point point (depending on the FSCK policy):
# + fsck the file system
# + if bad, possibly recreate the file system.
# + mount the file system (defer network mounts w/o the "nolock" =
option)
# * Create /etc/fstab for the new node.
# * Create /etc/mtab for the new node.
MTABFILE=3D/tmp/.setup_fs.mtab.$$
if ! rm -f $MTABFILE ; then
echo 1>&2 "setup_fs: $MTABFILE already exists and can't remove."
exit 1
fi
touch $MTABFILE
FSTABFILE=3D/tmp/.setup_fs.fstab.$$
if ! rm -f $FSTABFILE ; then
echo 1>&2 "setup_fs: $FSTABFILE already exists and can't remove."
exit 1
fi
touch $FSTABFILE
cat $FSTAB | \
while read line ; do
if [ -z "$line" -o "${line:0:1}" =3D "#" ] ; then
echo $line >>$FSTABFILE
else
line=3D`eval echo "$line"`
echo $line >>$FSTABFILE
echo $line
fi
done | \
while read device mountpt fstype options junk ; do
if [ -z "$options" ] ; then
echo 1>&2 "Ignoring incomplete line: $device $mountpt =
$fstype $options $junk"
continue
fi
# Sanitize mount point... (squeeze multiple slashes, remove
# any trailing slashes)
mountpt=3D`echo $mountpt | sed -e 's!/\+!/!g' -e 's!/\+$!!'`
slashct=3D`echo $mountpt | tr -cd / | wc -c`
if [ -z $mountpt ] ; then mountpt=3D/ ; fi
echo $slashct $device $mountpt $fstype $options
done | \
sort -n | \
(while read slashct device mountpt fstype options junk ; do
if [ -z "$options" ] ; then
if [ -n "$device" ] ; then
echo 1>&2 "Ignoring incomplete line: $device $mountpt $fstype $options =
$junk"
fi
continue
fi
# Get a file system size option if it's there...
fssize=3D`echo $options | sed -e 's/.*fs_size=3D\([0-9]\+\).*/\1/p;d'`
options=3D`echo $options | sed -e 's/fs_size=3D[0-9]\+//g'`
if [ -z "$options" ] ; then options=3Ddefaults; fi
=20
# Everything gets a "/rootfs" prefix at this stage. Also we create the
# mount points as needed. This requires that people have their fstab
# in some resonable order. (It might be hard for us to sort it....)
# see to it that the device node exists on the remote machine
if [ "${device:0:4}" =3D=3D "/dev" ] ; then
(cd / ; tar cf - $device) | bpsh -n $NODE tar xf -
fi
mknewfs=3D0
if [ $MKFS =3D "always" ]; then
mknewfs=3D1
else
case $FSCK in
"never") ;; # No FSCK!
"safe") if ! do_safefsck $NODE $device $fstype ; then
echo 1>&2 "setup_fs: RAM disks fail FSCK, that's OK"
echo 1>&2 "setup_fs: FSCK failure. (OK for RAM disks)"
mknewfs=3D1
fi ;;
"full") if ! do_fsck $NODE $device $fstype ; then
echo 1>&2 "setup_fs: FSCK failure. (OK for RAM disks)"
mknewfs=3D1
fi ;;
esac
fi
=20
if [ $MKFS !=3D "never" -a "$mknewfs" =3D 1 ] ; then
if ! do_mkfs $NODE $device $fstype $fssize ; then
echo 1>&2 "Failed to create $fstype file system on $device."
exit 1
fi
fi
# See to it that the mount point exists before trying to mount.
if echo $mountpt | grep -q '^/' ; then
if ! bpsh -n $NODE mkdir -p /rootfs$mountpt ; then
echo 1>&2 "Failed to create $mountpt."
exit 1
fi
fi
if ! do_mount $NODE $device /rootfs$mountpt $fstype $options ; then
echo 1>&2 "Failed to mount $device on $mountpt."
exit 1
fi
done
# Create fstab on the remote node...
if ! bpsh -n $NODE mkdir -p /rootfs/etc ; then
echo 1>&2 "Failed to create /etc."
exit 1
fi
if ! bpcp $FSTABFILE $NODE:/rootfs/etc/fstab ; then
echo 1>&2 "Failed to create /etc/fstab."
exit 1
fi
rm -f $FSTABFILE
# Finally, create mtab on the remote node...
if ! bpcp $MTABFILE $NODE:/rootfs/etc/mtab ; then
echo 1>&2 "Failed to create /etc/mtab."
exit 1
fi
rm -f $MTABFILE
)
# Exit with status of this nutty pipeline.
|
|
From: Sadanand K. <sa...@ci...> - 2002-07-25 05:38:08
|
Hi, Erik - thanks for reply. I started the slave with -d option and its not exiting, but the node status is still down. Also,where are the the log files. /var/log/beowulf has no files. How do I check the "magic cookies"? Sadanand On Tue, 23 Jul 2002, Erik Arjan Hendriks wrote: > On Tue, Jul 23, 2002 at 03:28:41AM -0500, Sadanand Kota wrote: > > Hi, > > I have installed bproc on 2 of my linux systems taking the RPM from > > Clustermatic. > > I am able to run bpmaster and bpslave succesfully. But when I check the > > status of the machines using bpstat, It always gives status as down. > > If I try /etc/beowulf/node_up 0, the ouput is > > > > node_up: Setting system clock. > > error moving to node 0: Invalid argument > > Fatal error performing: /usr/lib/beoboot/bin/bdate 0 > > > > (The same with /etc/beowulf/node_up 1) > > > > Any idea how to change node status to up? > > "down" means that the slave is not connected to the master. That > basically means that the slave either isn't connected with TCP or it > hasn't sent the right magic cookies. > > There's no way to change "down" to anything else manually. Once the > slave connects, the state will change to boot and the maste rdaemon > will run /etc/beowulf/node_up. If that exits with status 0, the state > will change to up. Otherwise the state will change to error. > Manually setting the node state to "down" (with bpctl) will cause the > slave to be disconnected. > > I'd check to make sure bpslave is actually connecting to the master. > Try running bpslave with -d to make sure it's not just exiting with > some error. Also, check the system logs. > > - Erik > |
|
From: Erik A. H. <er...@he...> - 2002-07-23 16:13:20
|
On Tue, Jul 23, 2002 at 03:28:41AM -0500, Sadanand Kota wrote: > Hi, > I have installed bproc on 2 of my linux systems taking the RPM from > Clustermatic. > I am able to run bpmaster and bpslave succesfully. But when I check the > status of the machines using bpstat, It always gives status as down. > If I try /etc/beowulf/node_up 0, the ouput is > > node_up: Setting system clock. > error moving to node 0: Invalid argument > Fatal error performing: /usr/lib/beoboot/bin/bdate 0 > > (The same with /etc/beowulf/node_up 1) > > Any idea how to change node status to up? "down" means that the slave is not connected to the master. That basically means that the slave either isn't connected with TCP or it hasn't sent the right magic cookies. There's no way to change "down" to anything else manually. Once the slave connects, the state will change to boot and the maste rdaemon will run /etc/beowulf/node_up. If that exits with status 0, the state will change to up. Otherwise the state will change to error. Manually setting the node state to "down" (with bpctl) will cause the slave to be disconnected. I'd check to make sure bpslave is actually connecting to the master. Try running bpslave with -d to make sure it's not just exiting with some error. Also, check the system logs. - Erik |
|
From: Sadanand K. <sa...@ci...> - 2002-07-23 08:28:45
|
Hi, I have installed bproc on 2 of my linux systems taking the RPM from Clustermatic. I am able to run bpmaster and bpslave succesfully. But when I check the status of the machines using bpstat, It always gives status as down. If I try /etc/beowulf/node_up 0, the ouput is node_up: Setting system clock. error moving to node 0: Invalid argument Fatal error performing: /usr/lib/beoboot/bin/bdate 0 (The same with /etc/beowulf/node_up 1) Any idea how to change node status to up? Sadanand |
|
From: Wilton W. <ww...@ha...> - 2002-07-16 22:38:10
|
I'm currently having the same difficulty here with NFS locking, I can tell what the problem is and I can give you a "cheap and dirty" solution. The problem with NFS locking is not with lockd it's with rpc.statd (lockd is usually automagically started by the kernel anyways). The current rpc.statd (at least un my nfs-utils package tried to drop privaleges and run it self as "rpcuser" or "nobody" but unless you have ldap/nis running on your nodes getpwuid(rpcuser) and getpwuid(nobody) will return 0, and rpc.statd will silently exit failing to setuid(). Quickest fix: uses files in nsswitch.conf instead of bproc and copy over the passwd/group files from the master node to the cluster node. Quick fix: use nis/nis+/ldap on the master node and configure nsswitch.conf etc.. on the cluster node accordingly. Hack fix: Remove the drop-privs patches from the nfs-utils package, not a bad idea since we are supposedly running on a "secure" network anyways. Best fix: add functinality to beonss - Wilton PS. any other suggestions are welcome ;) On Tue, 16 Jul 2002, Larry Baker wrote: > Thank you for your help. > > I am neither a Unix nor a Linux expert, so I don't know how to determine > which features were compiled into the kernel. NFS works fine, so I assume > the nfs module is there. It is just NFS locking that does not work. I am > using the Clustermatic bproc kernel. Is there a list of modules that are > built into the kernel somewhere? After I find that, where is the list of > modules that must be added (using modprobe) to get full NFS locking support? > I can hack the Clustermatic setup_fs script from there. > > Larry Baker > > on 7/16/02 2:09 AM, Wilton Wong at ww...@ha... wrote: > > > > > Have you inserted the lockd/nfs/sunrpc modules on the node ? ie: "modprobe -N > > 0 > > nfs", then run portmap then try mounting without the "nolock" option ? > > > > "lockdsvc: Function not implemented" seems to indicate that the lockd module > > wasn't loaded or NFS file locking was not compiled into the kernel. > > > > - Wilton > > > > ----[ Wilton William Wong ]--------------------------------------------- > > 11060-166 Avenue Ph : 01-780-456-9771 High Performance UNIX > > Edmonton, Alberta FAX: 01-780-456-9772 and Linux Solutions > > T5X 1Y3, Canada URL: http://www.harddata.com > > -------------------------------------------------------[ Hard Data Ltd. ]---- > > > > -- > ----[ Wilton William Wong ]--------------------------------------------- 11060-166 Avenue Ph : 01-780-456-9771 High Performance UNIX Edmonton, Alberta FAX: 01-780-456-9772 and Linux Solutions T5X 1Y3, Canada URL: http://www.harddata.com -------------------------------------------------------[ Hard Data Ltd. ]---- |
|
From: Erik A. H. <er...@he...> - 2002-07-16 14:47:13
|
On Tue, Jul 16, 2002 at 01:37:25AM -0500, Sadanand Kota wrote: > Also, if try to run bpmaster, it says > './bpmaster: BPROC_SYS_VERSION: Function not implemented'. That means the BProc module isn't loaded. > My /etc/bproc.conf is as follows > bind manager 2223 > node 127.0.0.1 127.0.0.2 This looks like some ANCIENT config file syntax. > Theres no specific reason why I am trying custom compiled kernel. My > partner( sitting right next to me now) is trying the clustematic binries > (i.e he download all rpms from clustermatic.org). He also has a problem as > follows - > On running the command /etc/rc.d/init.d/beowulf start it says > Configuring network interface (eth0):Error: No netmask given for interface > > In /etc/beowulf/config file, netfask for eth0 is properly defined as > follows > interface eth0 255.255.255.0. this is incorrect. The syntax is: interface eth0 IPaddress Netmask > Also, after installation of clustermatic RPMS, upon booting the machine > says, > etho - unknown hosts. > This problem was not there before installation of clustermatic RPM. > > Any idea for either(custom kernel or clustermatic RPM) are highly > appreciated. Another random note - you'll probably have to say depmod -a to get the dependencies rebuilt and the modules loaded after installing the clustermatic RPMs. - Erik -- Erik Arjan Hendriks Printed On 100 Percent Recycled Electrons er...@he... Contents may settle during shipment |
|
From: Erik A. H. <er...@he...> - 2002-07-16 14:30:01
|
On Tue, Jul 16, 2002 at 01:04:16AM -0600, Wilton Wong wrote: > > With Mac address, how do I set a single machine as both amster and > > slave ( for testing purposes)? > > I don't belive that this is possible. It is possible for a machine to be a master and slave for itself at the same time. It's a pretty confusing arrangement though since processes will show up several times in the process tree. You can't use the beoboot stuff (MAC addresses and so on) to do this though. You will have to run the slave daemon (bpslave) directly. I usually set this up on the loop-back interface. Tell the front end it is 127.0.0.1. Then give the slave nodes IPs in the range 127.0.0.2+ Then you can start like this "bpslave -s 127.0.0.2 127.0.0.1 2223". NOTE: Before doing this, you will want to comment out everything in the node setup script (/etc/beowulf/node_up) to avoid having the master try to set itself up. - Erik |
|
From: Erik A. H. <er...@he...> - 2002-07-16 14:26:41
|
On Tue, Jul 16, 2002 at 02:56:21AM -0500, Sadanand Kota wrote: > Our scheduling techniques involves selection of existing process' pids > based on certain criterion and move(migrate) that process to specified > node (also identified by our scheduling algorithm). Do you think this is > possible using BPROC ? Hope you undertand my question. Please let me know > if there is any confusion. ( As I understand bproc_execmove cannot be > used in this situation). There is no third party migration in BProc. In other words, a proces can not force another process to migrate to another node. A process can migrate itself at any time. Keep in mind that migration is NOT transparent - you will lose open files, etc when you migrate. - Erik |
|
From: Janez P. <jan...@fe...> - 2002-07-16 09:24:36
|
Wilton Wong wrote: > > With Mac address, how do I set a single machine as both amster and > > slave ( for testing purposes)? > > I don't belive that this is possible. I did not get the original question, however, I am successfuly running a bproc cluster with 6 machines, every one configured as master AND slave at the same time. No problems in bproc_moving whatsoever, both daemons coexist and perform their tasks simultaneously without conflicts. I admit that the process list is a mess. The cluster is used for teaching and demo purposes, so every machine can rfork processes to other 5 nodes, no crashes so far. Janez. |
|
From: Wilton W. <ww...@ha...> - 2002-07-16 09:09:51
|
Have you inserted the lockd/nfs/sunrpc modules on the node ? ie: "modprobe -N 0 nfs", then run portmap then try mounting without the "nolock" option ? "lockdsvc: Function not implemented" seems to indicate that the lockd module wasn't loaded or NFS file locking was not compiled into the kernel. - Wilton ----[ Wilton William Wong ]--------------------------------------------- 11060-166 Avenue Ph : 01-780-456-9771 High Performance UNIX Edmonton, Alberta FAX: 01-780-456-9772 and Linux Solutions T5X 1Y3, Canada URL: http://www.harddata.com -------------------------------------------------------[ Hard Data Ltd. ]---- |
|
From: Wilton W. <ww...@ha...> - 2002-07-16 08:08:04
|
On Tue, 16 Jul 2002, Sadanand Kota wrote: > Our scheduling techniques involves selection of existing process' pids > based on certain criterion and move(migrate) that process to specified > node (also identified by our scheduling algorithm). Do you think this is > possible using BPROC ? Hope you undertand my question. Please let me know > if there is any confusion. ( As I understand bproc_execmove cannot be > used in this situation). As far as I understand BProc is just for centrally managed pid space, and it does not have provisions for shared memory/shared user space so I don't think this would be possible with the current BProc, best ask Erik about this (er...@he...). I belive if your library has hooks to somehow save the program state and move it to a different node you can use bproc to manage it. This would be a tough. - Wilton ----[ Wilton William Wong ]--------------------------------------------- 11060-166 Avenue Ph : 01-780-456-9771 High Performance UNIX Edmonton, Alberta FAX: 01-780-456-9772 and Linux Solutions T5X 1Y3, Canada URL: http://www.harddata.com -------------------------------------------------------[ Hard Data Ltd. ]---- |