From: Reza S. <sh...@en...> - 2005-01-19 17:34:02
|
Hello again, Just thought I would keep you up to date with our experiments on getting MATLAB running with BProc on a Clustermatic cluster. The sysadmin here finally figured out that the reason why the node_up and nfs.init scripts were not working was that they were in Windows format and not Unix format, though this was not immediately visible e.g. from vi. So after getting the MATLAB directory mounted on the slave nodes, MATLAB is working more or less. The main problem we are having now is that after running 'bpsh 0 matlab' for example (actually 'bpsh 0 matlab2' where matlab2 is a modified script), the terminal hangs after the script is run and the MATLAB prompt or command prompt is not visible after the script is complete (a Ctrl-C is required to get this to show up). It seems the master node doesn't know that the slave node has finished its job. Even if I put an 'exit' at the end of the MATLAB script, it still does the same thing. If anybody has any idea of a workaround for this, then that would be really great. Thanks everybody for all your help! - Reza Reza Shahidi wrote: > Hi, > > I think this is not a mount problem. Even if I comment out the > entire nfs.init file, the nodes still hang on booting. The boot > process must be getting stuck in the node_up script. It is too bad I > am unable to find any useful log messages. If anybody can think of > what could be happening, please let me know. Thanks. > > Happy New Year, > > Reza > > Steven James wrote: > >> Greetings, >> >> NFS mounts can hang up if the server isn't running lockd and the mount >> options don't include nolock. >> >> G'day, >> sjames >> >> >> >> On Fri, 31 Dec 2004, Reza Shahidi wrote: >> >> >> >>> Hello, >>> >>> I tried the script you sent below, but now the nodes get stuck with >>> a status of boot when Clustermatic is restarted. I can't bpsh to the >>> nodes or anything. On the screen of node 0, the boot sequence gets >>> stuck at bpslave-0: setting node number to 0, and stays that way if not >>> restarted. This does not happen when the regular node_up/nfs.init >>> scripts are used, but of course, I am still not able to get the NFS >>> mount working in this case either. Any more ideas? >>> >>> Thanks, >>> >>> Reza >>> >>> Daniel Gruner wrote: >>> >>> >>> >>>> Reza, >>>> >>>> For some reason, in Clustermatic 5, trying to do NFS mounts according >>>> to the "manual" (which is what you tried, and used to work in >>>> Clustermatic 4), doesn't work anymore. We've had to do some hacks >>>> in order >>>> to make it work. In short, do NOT try the NFS mounts in >>>> /etc/clustermatic/fstab. What you have to do is run a script from the >>>> /etc/clustermatic/node_up script, which will do all the necessary >>>> stuff on >>>> the nodes. >>>> >>>> I am attaching here my /etc/clustermatic/node_up, and another file >>>> called >>>> nfs.init which is also put in /etc/clustermatic. This scheme works >>>> well for us, and it should work for you as well. You will need to >>>> modify >>>> the nfs.init script to mount your particular filesystem(s). >>>> >>>> Regards, >>>> Daniel >>>> >>>> >>>> >>>> ------------------------------------------------------------------------ >>>> >>>> >>>> #!/bin/sh >>>> # >>>> # This shell script is called automatically by BProc to perform any >>>> # steps necessary to bring up the nodes. This is just a stub script >>>> # pointing to the program that does the real work. >>>> # >>>> # $Id: node_up.stub,v 1.3 2003/11/12 23:30:59 mkdist Exp $ >>>> >>>> # All changes up to "############" line by >>>> # Michal Jaegermann, mi...@ha... >>>> >>>> seterror () { >>>> bpctl -S $1 -s error >>>> exit 1 >>>> } >>>> >>>> if [ -x /usr/lib64/beoboot/bin/node_up ] ; then >>>> /usr/lib64/beoboot/bin/node_up $* || seterror $* >>>> else >>>> /usr/lib/beoboot/bin/node_up $* || seterror $* >>>> fi >>>> # we are "sourcing" these script so variable assignments >>>> # remain like in here; pass a node number as an argument >>>> # if you want to _run_ them from a shell and wrap in a loop >>>> # for multiple nodes >>>> # >>>> # lm_sensors - 'bpsh 3 sensors' will produce sensors information >>>> for node 3 >>>> # . /etc/clustermatic/sensors.init >>>> # if we use pathscale libraries we have to make them available on >>>> nodes >>>> # . /etc/clustermatic/pathscale.init >>>> # similarly for Intel compiler >>>> # . /etc/clustermatic/intel.init >>>> # Turn the next line on for NFS support on nodes >>>> . /etc/clustermatic/nfs.init >>>> >>>> exit >>>> >>>> ############ >>>> >>>> # below the original script - now NOT executing due to 'exit' above >>>> >>>> if [ -x /usr/lib64/beoboot/bin/node_up ] ; then >>>> exec /usr/lib64/beoboot/bin/node_up $* >>>> else >>>> exec /usr/lib/beoboot/bin/node_up $* >>>> fi >>>> >>>> # If we reach this point there's an error. >>>> bpctl -S $* -s error >>>> exit 1 >>>> >>>> # If you want to put more setup stuff here, make sure do replace the >>>> # "exec" above with the following: >>>> # /usr/lib/beoboot/bin/node_up $* || exit 1 >>>> >>>> >>>> ------------------------------------------------------------------------ >>>> >>>> >>>> #!/bin/sh >>>> # >>>> # A sample how to get NFS modules on a node. >>>> # Make sure that /etc/modules.conf.dist for a node does not >>>> # define any 'install' actions for these >>>> # >>>> # Michal Jaegermann, 2004/Aug/19, mi...@ha... >>>> # >>>> >>>> node=$1 >>>> # get the list of modules, and copy them to the node >>>> mod=nfs >>>> modules=$( grep $mod.ko /lib/modules/$(uname -r)/modules.dep) >>>> modules=${modules/:/} >>>> modules=$( >>>> for m in $modules ; do >>>> echo $m >>>> done | tac ) >>>> ( cd / >>>> for m in $modules ; do >>>> echo $m >>>> done >>>> ) | ( cd / ; cpio -o -c --quiet ) | bpsh $node cpio -imd --quiet >>>> bpsh $node depmod -a >>>> # fix the permissions after cpio >>>> bpsh $node chmod -R a+rX /lib >>>> # load the modules >>>> for m in $modules ; do >>>> m=$(basename $m .ko) >>>> m=${m/_/-} >>>> case $m in >>>> sunrpc) >>>> bpsh $node modprobe -i sunrpc >>>> bpsh $node mkdir -p /var/lib/nfs/rpc_pipefs >>>> bpsh $node mount | grep -q rpc_pipefs || \ >>>> bpsh $node mount -t rpc_pipefs sunrpc /var/lib/nfs/rpc_pipefs >>>> ;; >>>> *) bpsh $node modprobe -i $m >>>> esac >>>> done >>>> # these are for a benfit of rpc.statd >>>> bpsh $node mkdir -p /var/lib/nfs/statd/ >>>> bpsh $node mkdir -p /var/run >>>> bpsh $node portmap >>>> bpsh $node rpc.statd >>>> bpsh $node mkdir /home >>>> bpsh $node mount -t nfs -o nfsvers=3,rw,noac master:/home /home >>>> bpsh $node mkdir /usr/local >>>> bpsh $node mount -t nfs -o nfsvers=3,rw,noac master:/usr/local >>>> /usr/local >>>> >>>> >>>> >>> >>> >>> ------------------------------------------------------- >>> The SF.Net email is sponsored by: Beat the post-holiday blues >>> Get a FREE limited edition SourceForge.net t-shirt from ThinkGeek. >>> It's fun and FREE -- well, almost....http://www.thinkgeek.com/sfshirt >>> _______________________________________________ >>> BProc-users mailing list >>> BPr...@li... >>> https://lists.sourceforge.net/lists/listinfo/bproc-users >>> >>> >> >> >> ||||| |||| ||||||||||||| ||| >> by Linux Labs International, Inc. >> Steven James, CTO >> >> 55 Marietta Street >> Suite 1830 >> Atlanta, Ga 30303 >> 866 824 9737 support >> >> >> > > > > ------------------------------------------------------- > The SF.Net email is sponsored by: Beat the post-holiday blues > Get a FREE limited edition SourceForge.net t-shirt from ThinkGeek. > It's fun and FREE -- well, almost....http://www.thinkgeek.com/sfshirt > _______________________________________________ > BProc-users mailing list > BPr...@li... > https://lists.sourceforge.net/lists/listinfo/bproc-users > |