You can subscribe to this list here.
2001 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(25) |
Nov
|
Dec
(22) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2002 |
Jan
(13) |
Feb
(22) |
Mar
(39) |
Apr
(10) |
May
(26) |
Jun
(23) |
Jul
(38) |
Aug
(20) |
Sep
(27) |
Oct
(76) |
Nov
(32) |
Dec
(11) |
2003 |
Jan
(8) |
Feb
(23) |
Mar
(12) |
Apr
(39) |
May
(1) |
Jun
(48) |
Jul
(35) |
Aug
(15) |
Sep
(60) |
Oct
(27) |
Nov
(9) |
Dec
(32) |
2004 |
Jan
(8) |
Feb
(16) |
Mar
(40) |
Apr
(25) |
May
(12) |
Jun
(33) |
Jul
(49) |
Aug
(39) |
Sep
(26) |
Oct
(47) |
Nov
(26) |
Dec
(36) |
2005 |
Jan
(29) |
Feb
(15) |
Mar
(22) |
Apr
(1) |
May
(8) |
Jun
(32) |
Jul
(11) |
Aug
(17) |
Sep
(9) |
Oct
(7) |
Nov
(15) |
Dec
|
From: Samuelson, Frank* <FW...@CD...> - 2005-01-12 17:33:03
|
I have a problem when trying to run a #!/usr/bin... scripts via bpsh on clustermatic 5: /proc/self/fd/3: Permission denied Last month someone else had a similar problem: http://sourceforge.net/mailarchive/forum.php?thread_id=6200756&forum_id=1908 In that thread Erik explains the workings of the shell hack, so I understand why perl on my remote nodes complains: 'Can't open perl script "/proc/self/fd/3"', but it doesn't explain why regular users are denied permission to fd 3. I have no problem running these scripts as root, so it really does appear to be a permission problem. What can I change so regular users can read /proc/self/fd/3 on all the nodes? Thanks for any help! -Frank |
From: Dale H. <ro...@ma...> - 2005-01-11 18:27:17
|
Hey, any bproc patches for 2.6.10? Or is it easy enough to roll your own. I'm noticing some rejects and fuzz, so I'm a little leery to dive into it. -- Dale Harris ro...@ma... /.-) |
From: Maurice H. <ma...@ha...> - 2005-01-11 06:32:51
|
Hi Dale. Also useful would be for someone with a bproc cluster to try the new Myricom beta "MX-2G" drivers. If there is anyone on the list who is interested in doing this please contact me off list for a signout of these. Thanks all. With our best regards, Maurice W. Hilarius Telephone: 01-780-456-9771 Hard Data Ltd. FAX: 01-780-456-9772 11060 - 166 Avenue email:ma...@ha... Edmonton, AB, Canada http://www.harddata.com/ T5X 1Y3 This email, message, and content, should be considered confidential, and is the copyrighted property of Hard Data Ltd., unless stated otherwise. |
From: Dale H. <ro...@ma...> - 2005-01-10 23:52:02
|
On Mon, Jan 10, 2005 at 01:33:01PM -0800, Dale Harris elucidated: > Any newer beonss source out there than 1.0.12? Doesn't seem to be on > the bproc sourceforge site. > D'oh! Sorry for not removing the old thread references from my post. Dale |
From: Dale H. <ro...@ma...> - 2005-01-10 21:37:29
|
Also just curious if anyone has done any comparisons with the gm_route and Myricom's GM mapper. gm_route 1.0 did seem to resolve some stability problems I was seeing before, but I'm wondering now since Myricom has apparently improving things if gm_route 1.0 isn't a little old. If there happens to be new source, I'd like to know where it is. -- Dale Harris ro...@ma... /.-) |
From: Dale H. <ro...@ma...> - 2005-01-10 21:33:13
|
Any newer beonss source out there than 1.0.12? Doesn't seem to be on the bproc sourceforge site. -- Dale Harris ro...@ma... /.-) |
From: Dale H. <ro...@ma...> - 2005-01-10 21:21:26
|
Hey, Just curious if anyone has tried any of the mpirun-1.5.3 patches with the newer myrinet GM mpich-1.2.6..13b? Which patches are going to be needed. I'm wondering if there aren't going to have to be some major changes. -- Dale Harris ro...@ma... /.-) |
From: Rene S. <rs...@tu...> - 2005-01-05 14:11:37
|
Thanks woked great all nodes are up and happy. Rene > Quick answer: > > get mkelfImage from ftp.lnxi.com/pub/mkelfImage/ > > then use something like the attached script to build your EBI (elf > bootable image), which is the native format that etherboot uses. > > Josh |
From: Joshua A. <lu...@ln...> - 2005-01-05 12:50:42
|
On Tue, Jan 04, 2005 at 10:30:40PM -0600, Rene Salmon wrote: > Hi list, > > I am very new to Linux Bios and etherboot. All our nodes have been > regular nodes with a pc bios booting via PXE so far. We just got some new > nodes that have linux bios and boot via etherboot and I am not sure how to > generate the images to boot these linux bios nodes. > > for the PXE PC Bios nodes I just generated the images with this command > > >beoboot -2 -i -o /tftboot/foo > > and things work fine. > > This of cource does not work for etherboot. Etherboot complains with > "error: not valid image" . > > Can I use beoboot the generate the image that etherboot wants? If yes what > options should I use? > > Can someone point me in the right direction? maybe to some docs or > howtos. Quick answer: get mkelfImage from ftp.lnxi.com/pub/mkelfImage/ then use something like the attached script to build your EBI (elf bootable image), which is the native format that etherboot uses. Josh |
From: Rene S. <rs...@tu...> - 2005-01-05 04:30:42
|
Hi list, I am very new to Linux Bios and etherboot. All our nodes have been regular nodes with a pc bios booting via PXE so far. We just got some new nodes that have linux bios and boot via etherboot and I am not sure how to generate the images to boot these linux bios nodes. for the PXE PC Bios nodes I just generated the images with this command >beoboot -2 -i -o /tftboot/foo and things work fine. This of cource does not work for etherboot. Etherboot complains with "error: not valid image" . Can I use beoboot the generate the image that etherboot wants? If yes what options should I use? Can someone point me in the right direction? maybe to some docs or howtos. Thanks you Rene |
From: John B. <j0n...@ya...> - 2005-01-02 13:45:15
|
Hi reza !! I too had the same problem... the node completes every step till displayin -> this is node xx** node setup successfully... i guess some problem due to loadin the libraries (the lib might be missin in path or something) or unable to load the module that u had specified in config.boot / node_up. did u try checkin the logs at /var/log/clustermatic/node.xx ?? if all seems not workin, get down to the rpms in clustermatic 5 and perform a reinstall.. that shud do it fine ! regardin mountin the nfs thing... well Michal Jaegermann had sent an updated version of the script than wot u r usin with some minor fixes. try usin that.. i carried out the steps manually like copyin the modules to the node (to the exact path), i guess there were three modules sunrpc, lockd and nfs. load sunrpc and do the mkdir and mount as specified. the rest u can load usually with modprobe. copy the other daemons if needed. Regards, Jon --- Reza Shahidi <sh...@en...> wrote: > Hi, > > I think this is not a mount problem. Even if I > comment out the > entire nfs.init file, the nodes still hang on > booting. The boot process > must be getting stuck in the node_up script. It is > too bad I am unable > to find any useful log messages. If anybody can > think of what could be > happening, please let me know. Thanks. > > Happy New Year, > > Reza > > Steven James wrote: > > >Greetings, > > > >NFS mounts can hang up if the server isn't running > lockd and the mount > >options don't include nolock. > > > >G'day, > >sjames > > > > > > > >On Fri, 31 Dec 2004, Reza Shahidi wrote: > > > > > > > >>Hello, > >> > >> I tried the script you sent below, but now the > nodes get stuck with > >>a status of boot when Clustermatic is restarted. > I can't bpsh to the > >>nodes or anything. On the screen of node 0, the > boot sequence gets > >>stuck at bpslave-0: setting node number to 0, and > stays that way if not > >>restarted. This does not happen when the regular > node_up/nfs.init > >>scripts are used, but of course, I am still not > able to get the NFS > >>mount working in this case either. Any more > ideas? > >> > >>Thanks, > >> > >>Reza > >> > >>Daniel Gruner wrote: > >> > >> > >> > >>>Reza, > >>> > >>>For some reason, in Clustermatic 5, trying to do > NFS mounts according > >>>to the "manual" (which is what you tried, and > used to work in > >>>Clustermatic 4), doesn't work anymore. We've had > to do some hacks in order > >>>to make it work. In short, do NOT try the NFS > mounts in > >>>/etc/clustermatic/fstab. What you have to do is > run a script from the > >>>/etc/clustermatic/node_up script, which will do > all the necessary stuff on > >>>the nodes. > >>> > >>>I am attaching here my /etc/clustermatic/node_up, > and another file called > >>>nfs.init which is also put in /etc/clustermatic. > This scheme works > >>>well for us, and it should work for you as well. > You will need to modify > >>>the nfs.init script to mount your particular > filesystem(s). > >>> > >>>Regards, > >>>Daniel > >>> > >>> > >>> > >>>------------------------------------------------------------------------ > >>> > >>>#!/bin/sh > >>># > >>># This shell script is called automatically by > BProc to perform any > >>># steps necessary to bring up the nodes. This is > just a stub script > >>># pointing to the program that does the real > work. > >>># > >>># $Id: node_up.stub,v 1.3 2003/11/12 23:30:59 > mkdist Exp $ > >>> > >>># All changes up to "############" line by > >>># Michal Jaegermann, mi...@ha... > >>> > >>>seterror () { > >>> bpctl -S $1 -s error > >>> exit 1 > >>>} > >>> > >>>if [ -x /usr/lib64/beoboot/bin/node_up ] ; then > >>> /usr/lib64/beoboot/bin/node_up $* || seterror > $* > >>>else > >>> /usr/lib/beoboot/bin/node_up $* || seterror $* > >>>fi > >>># we are "sourcing" these script so variable > assignments > >>># remain like in here; pass a node number as an > argument > >>># if you want to _run_ them from a shell and wrap > in a loop > >>># for multiple nodes > >>># > >>># lm_sensors - 'bpsh 3 sensors' will produce > sensors information for node 3 > >>># . /etc/clustermatic/sensors.init > >>># if we use pathscale libraries we have to make > them available on nodes > >>># . /etc/clustermatic/pathscale.init > >>># similarly for Intel compiler > >>># . /etc/clustermatic/intel.init > >>># Turn the next line on for NFS support on nodes > >>>. /etc/clustermatic/nfs.init > >>> > >>>exit > >>> > >>>############ > >>> > >>># below the original script - now NOT executing > due to 'exit' above > >>> > >>>if [ -x /usr/lib64/beoboot/bin/node_up ] ; then > >>> exec /usr/lib64/beoboot/bin/node_up $* > >>>else > >>> exec /usr/lib/beoboot/bin/node_up $* > >>>fi > >>> > >>># If we reach this point there's an error. > >>>bpctl -S $* -s error > >>>exit 1 > >>> > >>># If you want to put more setup stuff here, make > sure do replace the > >>># "exec" above with the following: > >>># /usr/lib/beoboot/bin/node_up $* || exit 1 > >>> > >>> > >>>------------------------------------------------------------------------ > >>> > >>>#!/bin/sh > >>># > >>># A sample how to get NFS modules on a node. > >>># Make sure that /etc/modules.conf.dist for a > node does not > >>># define any 'install' actions for these > >>># > >>># Michal Jaegermann, 2004/Aug/19, > mi...@ha... > >>># > >>> > >>>node=$1 > >>># get the list of modules, and copy them to the > node > >>>mod=nfs > >>>modules=$( grep $mod.ko /lib/modules/$(uname > -r)/modules.dep) > >>>modules=${modules/:/} > >>>modules=$( > >>>for m in $modules ; do > >>> echo $m > >>>done | tac ) > >>>( cd / > >>> for m in $modules ; do > >>> echo $m > >>> done > >>>) | ( cd / ; cpio -o -c --quiet ) | bpsh $node > cpio -imd --quiet > >>>bpsh $node depmod -a > >>># fix the permissions after cpio > >>>bpsh $node chmod -R a+rX /lib > >>># load the modules > >>>for m in $modules ; do > >>> m=$(basename $m .ko) > >>> m=${m/_/-} > >>> case $m in > >>> sunrpc) > >>> bpsh $node modprobe -i sunrpc > >>> bpsh $node mkdir -p /var/lib/nfs/rpc_pipefs > >>> bpsh $node mount | grep -q rpc_pipefs || \ > >>> bpsh $node mount -t rpc_pipefs sunrpc > /var/lib/nfs/rpc_pipefs > === message truncated === ________________________________________________________________________ Yahoo! India Matrimony: Find your life partner online Go to: http://yahoo.shaadi.com/india-matrimony |
From: John B. <j0n...@ya...> - 2005-01-02 13:44:18
|
Hi reza !! I too had the same problem... the node completes every step till displayin -> this is node xx** node setup successfully... i guess some problem due to loadin the libraries (the lib might be missin in path or something) or unable to load the module that u had specified in config.boot / node_up. did u try checkin the logs at /var/log/clustermatic/node.xx ?? regardin mountin the nfs thing... well Michal Jaegermann had sent an updated version of the script than wot u r usin with some minor fixes. try usin that.. i carried out the steps manually like copyin the modules to the node (to the exact path), i guess there were three modules sunrpc, lockd and nfs. load sunrpc and do the mkdir and mount as specified. the rest u can load usually with modprobe. copy the other daemons if needed. Regards, Jon --- Reza Shahidi <sh...@en...> wrote: > Hi, > > I think this is not a mount problem. Even if I > comment out the > entire nfs.init file, the nodes still hang on > booting. The boot process > must be getting stuck in the node_up script. It is > too bad I am unable > to find any useful log messages. If anybody can > think of what could be > happening, please let me know. Thanks. > > Happy New Year, > > Reza > > Steven James wrote: > > >Greetings, > > > >NFS mounts can hang up if the server isn't running > lockd and the mount > >options don't include nolock. > > > >G'day, > >sjames > > > > > > > >On Fri, 31 Dec 2004, Reza Shahidi wrote: > > > > > > > >>Hello, > >> > >> I tried the script you sent below, but now the > nodes get stuck with > >>a status of boot when Clustermatic is restarted. > I can't bpsh to the > >>nodes or anything. On the screen of node 0, the > boot sequence gets > >>stuck at bpslave-0: setting node number to 0, and > stays that way if not > >>restarted. This does not happen when the regular > node_up/nfs.init > >>scripts are used, but of course, I am still not > able to get the NFS > >>mount working in this case either. Any more > ideas? > >> > >>Thanks, > >> > >>Reza > >> > >>Daniel Gruner wrote: > >> > >> > >> > >>>Reza, > >>> > >>>For some reason, in Clustermatic 5, trying to do > NFS mounts according > >>>to the "manual" (which is what you tried, and > used to work in > >>>Clustermatic 4), doesn't work anymore. We've had > to do some hacks in order > >>>to make it work. In short, do NOT try the NFS > mounts in > >>>/etc/clustermatic/fstab. What you have to do is > run a script from the > >>>/etc/clustermatic/node_up script, which will do > all the necessary stuff on > >>>the nodes. > >>> > >>>I am attaching here my /etc/clustermatic/node_up, > and another file called > >>>nfs.init which is also put in /etc/clustermatic. > This scheme works > >>>well for us, and it should work for you as well. > You will need to modify > >>>the nfs.init script to mount your particular > filesystem(s). > >>> > >>>Regards, > >>>Daniel > >>> > >>> > >>> > >>>------------------------------------------------------------------------ > >>> > >>>#!/bin/sh > >>># > >>># This shell script is called automatically by > BProc to perform any > >>># steps necessary to bring up the nodes. This is > just a stub script > >>># pointing to the program that does the real > work. > >>># > >>># $Id: node_up.stub,v 1.3 2003/11/12 23:30:59 > mkdist Exp $ > >>> > >>># All changes up to "############" line by > >>># Michal Jaegermann, mi...@ha... > >>> > >>>seterror () { > >>> bpctl -S $1 -s error > >>> exit 1 > >>>} > >>> > >>>if [ -x /usr/lib64/beoboot/bin/node_up ] ; then > >>> /usr/lib64/beoboot/bin/node_up $* || seterror > $* > >>>else > >>> /usr/lib/beoboot/bin/node_up $* || seterror $* > >>>fi > >>># we are "sourcing" these script so variable > assignments > >>># remain like in here; pass a node number as an > argument > >>># if you want to _run_ them from a shell and wrap > in a loop > >>># for multiple nodes > >>># > >>># lm_sensors - 'bpsh 3 sensors' will produce > sensors information for node 3 > >>># . /etc/clustermatic/sensors.init > >>># if we use pathscale libraries we have to make > them available on nodes > >>># . /etc/clustermatic/pathscale.init > >>># similarly for Intel compiler > >>># . /etc/clustermatic/intel.init > >>># Turn the next line on for NFS support on nodes > >>>. /etc/clustermatic/nfs.init > >>> > >>>exit > >>> > >>>############ > >>> > >>># below the original script - now NOT executing > due to 'exit' above > >>> > >>>if [ -x /usr/lib64/beoboot/bin/node_up ] ; then > >>> exec /usr/lib64/beoboot/bin/node_up $* > >>>else > >>> exec /usr/lib/beoboot/bin/node_up $* > >>>fi > >>> > >>># If we reach this point there's an error. > >>>bpctl -S $* -s error > >>>exit 1 > >>> > >>># If you want to put more setup stuff here, make > sure do replace the > >>># "exec" above with the following: > >>># /usr/lib/beoboot/bin/node_up $* || exit 1 > >>> > >>> > >>>------------------------------------------------------------------------ > >>> > >>>#!/bin/sh > >>># > >>># A sample how to get NFS modules on a node. > >>># Make sure that /etc/modules.conf.dist for a > node does not > >>># define any 'install' actions for these > >>># > >>># Michal Jaegermann, 2004/Aug/19, > mi...@ha... > >>># > >>> > >>>node=$1 > >>># get the list of modules, and copy them to the > node > >>>mod=nfs > >>>modules=$( grep $mod.ko /lib/modules/$(uname > -r)/modules.dep) > >>>modules=${modules/:/} > >>>modules=$( > >>>for m in $modules ; do > >>> echo $m > >>>done | tac ) > >>>( cd / > >>> for m in $modules ; do > >>> echo $m > >>> done > >>>) | ( cd / ; cpio -o -c --quiet ) | bpsh $node > cpio -imd --quiet > >>>bpsh $node depmod -a > >>># fix the permissions after cpio > >>>bpsh $node chmod -R a+rX /lib > >>># load the modules > >>>for m in $modules ; do > >>> m=$(basename $m .ko) > >>> m=${m/_/-} > >>> case $m in > >>> sunrpc) > >>> bpsh $node modprobe -i sunrpc > >>> bpsh $node mkdir -p /var/lib/nfs/rpc_pipefs > >>> bpsh $node mount | grep -q rpc_pipefs || \ > >>> bpsh $node mount -t rpc_pipefs sunrpc > /var/lib/nfs/rpc_pipefs > === message truncated === ________________________________________________________________________ Yahoo! India Matrimony: Find your life partner online Go to: http://yahoo.shaadi.com/india-matrimony |
From: Reza S. <sh...@en...> - 2005-01-01 02:26:32
|
Hi, I think this is not a mount problem. Even if I comment out the entire nfs.init file, the nodes still hang on booting. The boot process must be getting stuck in the node_up script. It is too bad I am unable to find any useful log messages. If anybody can think of what could be happening, please let me know. Thanks. Happy New Year, Reza Steven James wrote: >Greetings, > >NFS mounts can hang up if the server isn't running lockd and the mount >options don't include nolock. > >G'day, >sjames > > > >On Fri, 31 Dec 2004, Reza Shahidi wrote: > > > >>Hello, >> >> I tried the script you sent below, but now the nodes get stuck with >>a status of boot when Clustermatic is restarted. I can't bpsh to the >>nodes or anything. On the screen of node 0, the boot sequence gets >>stuck at bpslave-0: setting node number to 0, and stays that way if not >>restarted. This does not happen when the regular node_up/nfs.init >>scripts are used, but of course, I am still not able to get the NFS >>mount working in this case either. Any more ideas? >> >>Thanks, >> >>Reza >> >>Daniel Gruner wrote: >> >> >> >>>Reza, >>> >>>For some reason, in Clustermatic 5, trying to do NFS mounts according >>>to the "manual" (which is what you tried, and used to work in >>>Clustermatic 4), doesn't work anymore. We've had to do some hacks in order >>>to make it work. In short, do NOT try the NFS mounts in >>>/etc/clustermatic/fstab. What you have to do is run a script from the >>>/etc/clustermatic/node_up script, which will do all the necessary stuff on >>>the nodes. >>> >>>I am attaching here my /etc/clustermatic/node_up, and another file called >>>nfs.init which is also put in /etc/clustermatic. This scheme works >>>well for us, and it should work for you as well. You will need to modify >>>the nfs.init script to mount your particular filesystem(s). >>> >>>Regards, >>>Daniel >>> >>> >>> >>>------------------------------------------------------------------------ >>> >>>#!/bin/sh >>># >>># This shell script is called automatically by BProc to perform any >>># steps necessary to bring up the nodes. This is just a stub script >>># pointing to the program that does the real work. >>># >>># $Id: node_up.stub,v 1.3 2003/11/12 23:30:59 mkdist Exp $ >>> >>># All changes up to "############" line by >>># Michal Jaegermann, mi...@ha... >>> >>>seterror () { >>> bpctl -S $1 -s error >>> exit 1 >>>} >>> >>>if [ -x /usr/lib64/beoboot/bin/node_up ] ; then >>> /usr/lib64/beoboot/bin/node_up $* || seterror $* >>>else >>> /usr/lib/beoboot/bin/node_up $* || seterror $* >>>fi >>># we are "sourcing" these script so variable assignments >>># remain like in here; pass a node number as an argument >>># if you want to _run_ them from a shell and wrap in a loop >>># for multiple nodes >>># >>># lm_sensors - 'bpsh 3 sensors' will produce sensors information for node 3 >>># . /etc/clustermatic/sensors.init >>># if we use pathscale libraries we have to make them available on nodes >>># . /etc/clustermatic/pathscale.init >>># similarly for Intel compiler >>># . /etc/clustermatic/intel.init >>># Turn the next line on for NFS support on nodes >>>. /etc/clustermatic/nfs.init >>> >>>exit >>> >>>############ >>> >>># below the original script - now NOT executing due to 'exit' above >>> >>>if [ -x /usr/lib64/beoboot/bin/node_up ] ; then >>> exec /usr/lib64/beoboot/bin/node_up $* >>>else >>> exec /usr/lib/beoboot/bin/node_up $* >>>fi >>> >>># If we reach this point there's an error. >>>bpctl -S $* -s error >>>exit 1 >>> >>># If you want to put more setup stuff here, make sure do replace the >>># "exec" above with the following: >>># /usr/lib/beoboot/bin/node_up $* || exit 1 >>> >>> >>>------------------------------------------------------------------------ >>> >>>#!/bin/sh >>># >>># A sample how to get NFS modules on a node. >>># Make sure that /etc/modules.conf.dist for a node does not >>># define any 'install' actions for these >>># >>># Michal Jaegermann, 2004/Aug/19, mi...@ha... >>># >>> >>>node=$1 >>># get the list of modules, and copy them to the node >>>mod=nfs >>>modules=$( grep $mod.ko /lib/modules/$(uname -r)/modules.dep) >>>modules=${modules/:/} >>>modules=$( >>>for m in $modules ; do >>> echo $m >>>done | tac ) >>>( cd / >>> for m in $modules ; do >>> echo $m >>> done >>>) | ( cd / ; cpio -o -c --quiet ) | bpsh $node cpio -imd --quiet >>>bpsh $node depmod -a >>># fix the permissions after cpio >>>bpsh $node chmod -R a+rX /lib >>># load the modules >>>for m in $modules ; do >>> m=$(basename $m .ko) >>> m=${m/_/-} >>> case $m in >>> sunrpc) >>> bpsh $node modprobe -i sunrpc >>> bpsh $node mkdir -p /var/lib/nfs/rpc_pipefs >>> bpsh $node mount | grep -q rpc_pipefs || \ >>> bpsh $node mount -t rpc_pipefs sunrpc /var/lib/nfs/rpc_pipefs >>> ;; >>> *) bpsh $node modprobe -i $m >>> esac >>>done >>># these are for a benfit of rpc.statd >>>bpsh $node mkdir -p /var/lib/nfs/statd/ >>>bpsh $node mkdir -p /var/run >>>bpsh $node portmap >>>bpsh $node rpc.statd >>>bpsh $node mkdir /home >>>bpsh $node mount -t nfs -o nfsvers=3,rw,noac master:/home /home >>>bpsh $node mkdir /usr/local >>>bpsh $node mount -t nfs -o nfsvers=3,rw,noac master:/usr/local /usr/local >>> >>> >>> >>> >> >>------------------------------------------------------- >>The SF.Net email is sponsored by: Beat the post-holiday blues >>Get a FREE limited edition SourceForge.net t-shirt from ThinkGeek. >>It's fun and FREE -- well, almost....http://www.thinkgeek.com/sfshirt >>_______________________________________________ >>BProc-users mailing list >>BPr...@li... >>https://lists.sourceforge.net/lists/listinfo/bproc-users >> >> >> > >||||| |||| ||||||||||||| ||| >by Linux Labs International, Inc. > Steven James, CTO > >55 Marietta Street >Suite 1830 >Atlanta, Ga 30303 >866 824 9737 support > > > |
From: Reza S. <sh...@en...> - 2005-01-01 00:13:11
|
Hello, I tried the script you sent below, but now the nodes get stuck with a status of boot when Clustermatic is restarted. I can't bpsh to the nodes or anything. On the screen of node 0, the boot sequence gets stuck at bpslave-0: setting node number to 0, and stays that way if not restarted. This does not happen when the regular node_up/nfs.init scripts are used, but of course, I am still not able to get the NFS mount working in this case either. Any more ideas? Thanks, Reza Daniel Gruner wrote: >Reza, > >For some reason, in Clustermatic 5, trying to do NFS mounts according >to the "manual" (which is what you tried, and used to work in >Clustermatic 4), doesn't work anymore. We've had to do some hacks in order >to make it work. In short, do NOT try the NFS mounts in >/etc/clustermatic/fstab. What you have to do is run a script from the >/etc/clustermatic/node_up script, which will do all the necessary stuff on >the nodes. > >I am attaching here my /etc/clustermatic/node_up, and another file called >nfs.init which is also put in /etc/clustermatic. This scheme works >well for us, and it should work for you as well. You will need to modify >the nfs.init script to mount your particular filesystem(s). > >Regards, >Daniel > > > >------------------------------------------------------------------------ > >#!/bin/sh ># ># This shell script is called automatically by BProc to perform any ># steps necessary to bring up the nodes. This is just a stub script ># pointing to the program that does the real work. ># ># $Id: node_up.stub,v 1.3 2003/11/12 23:30:59 mkdist Exp $ > ># All changes up to "############" line by ># Michal Jaegermann, mi...@ha... > >seterror () { > bpctl -S $1 -s error > exit 1 >} > >if [ -x /usr/lib64/beoboot/bin/node_up ] ; then > /usr/lib64/beoboot/bin/node_up $* || seterror $* >else > /usr/lib/beoboot/bin/node_up $* || seterror $* >fi ># we are "sourcing" these script so variable assignments ># remain like in here; pass a node number as an argument ># if you want to _run_ them from a shell and wrap in a loop ># for multiple nodes ># ># lm_sensors - 'bpsh 3 sensors' will produce sensors information for node 3 ># . /etc/clustermatic/sensors.init ># if we use pathscale libraries we have to make them available on nodes ># . /etc/clustermatic/pathscale.init ># similarly for Intel compiler ># . /etc/clustermatic/intel.init ># Turn the next line on for NFS support on nodes >. /etc/clustermatic/nfs.init > >exit > >############ > ># below the original script - now NOT executing due to 'exit' above > >if [ -x /usr/lib64/beoboot/bin/node_up ] ; then > exec /usr/lib64/beoboot/bin/node_up $* >else > exec /usr/lib/beoboot/bin/node_up $* >fi > ># If we reach this point there's an error. >bpctl -S $* -s error >exit 1 > ># If you want to put more setup stuff here, make sure do replace the ># "exec" above with the following: ># /usr/lib/beoboot/bin/node_up $* || exit 1 > > >------------------------------------------------------------------------ > >#!/bin/sh ># ># A sample how to get NFS modules on a node. ># Make sure that /etc/modules.conf.dist for a node does not ># define any 'install' actions for these ># ># Michal Jaegermann, 2004/Aug/19, mi...@ha... ># > >node=$1 ># get the list of modules, and copy them to the node >mod=nfs >modules=$( grep $mod.ko /lib/modules/$(uname -r)/modules.dep) >modules=${modules/:/} >modules=$( >for m in $modules ; do > echo $m >done | tac ) >( cd / > for m in $modules ; do > echo $m > done >) | ( cd / ; cpio -o -c --quiet ) | bpsh $node cpio -imd --quiet >bpsh $node depmod -a ># fix the permissions after cpio >bpsh $node chmod -R a+rX /lib ># load the modules >for m in $modules ; do > m=$(basename $m .ko) > m=${m/_/-} > case $m in > sunrpc) > bpsh $node modprobe -i sunrpc > bpsh $node mkdir -p /var/lib/nfs/rpc_pipefs > bpsh $node mount | grep -q rpc_pipefs || \ > bpsh $node mount -t rpc_pipefs sunrpc /var/lib/nfs/rpc_pipefs > ;; > *) bpsh $node modprobe -i $m > esac >done ># these are for a benfit of rpc.statd >bpsh $node mkdir -p /var/lib/nfs/statd/ >bpsh $node mkdir -p /var/run >bpsh $node portmap >bpsh $node rpc.statd >bpsh $node mkdir /home >bpsh $node mount -t nfs -o nfsvers=3,rw,noac master:/home /home >bpsh $node mkdir /usr/local >bpsh $node mount -t nfs -o nfsvers=3,rw,noac master:/usr/local /usr/local > > |
From: Rene S. <rs...@tu...> - 2004-12-30 22:24:48
|
Hi list, Trying to get pvfs2 working with bproc/clustermatic 5 on an x86_64 cluster here is the problem/setup. I have pvfs2 working and installed on the master bproc node. I can mount and use the pvfs2 file system just fine on the bproc master node. To get the pvfs2 file system mounted and working on the master node all I do is this: 1) >modprobe pvfs2 2) >/usr/local/sbin/pvfs2-client -p /usr/local/sbin/pvfs2-client-core 3) >mount -t pvfs2 tcp://10.0.0.1:3334/pvfs2-fs /scratch-cluster To get these three thing going on the slave bpoc nodes I did this 1)I added this line on the node_up.conf file and it loads the pvfs2 kernel modules on all slave nodes just fine. kmod pvfs2 here is the list of modules on the slave node. >bpsh 0 lsmod Module Size Used by pvfs2 50376 1 tg3 84996 0 bproc 124948 1 vmadump 22568 1 bproc 2)I made the directories and copied the pvfs executables to the slave nodes node=$1 bpsh $node mkdir /sbin bpcp /usr/local/sbin/pvfs2-client $node:/sbin/ bpcp /usr/local/sbin/pvfs2-client-core $node:/sbin/ bpsh $node mkdir /scratch-cluster then I can start the pvfs2-clients deamons bpsh 0 /usr/local/sbin/pvfs2-client -p /sbin/pvfs2-client-core So what this does i think is it starts the pvfs2-client deamon on the master node and then migrates it once migrated the pvfs2-client deamon spawns the local pvfs2-client-core with resides on the /sbin/ directory on the slave node. here is what ps on the master node shows. ps -ef | grep pv root 12180 1 0 Dec29 ? 00:00:00 /usr/local/sbin/pvfs2-client -p /usr/local/sbin/pvfs2-client-core root 17965 1 0 09:45 ? 00:00:00 [pvfs2-client] root 17966 17965 0 09:45 ? 00:00:00 [pvfs2-client-co] root 18077 17585 0 10:14 pts/0 00:00:00 grep pv so the pvfs2-clients appear to be running on the slave. 3)Here is where it fails. I can't mount the pvfs file system. on the master i get this message >bpsh 0 mount -t pvfs2 tcp://10.0.0.1:3334/pvfs2-fs /scratch-cluster >mount: Bad file descriptor on the slave console I get this >pvfs2_get_sb: mount request failed with -9 Maybe I need to start both pvfs2-client deamons in the slave node's private process space and not migrate then as above. Any one have an idea on how to do this? Any clues? Thank you for any help on this. Rene |
From: Daniel G. <dg...@cp...> - 2004-12-30 07:01:11
|
On Wed, Dec 29, 2004 at 09:48:50PM -0330, Reza Shahidi wrote: > Hello, > > ha...@no... wrote: > > > In short, you cannot access any files from filesystems which are not mounted on your > > slave nodes? Check /etc/beowulf/fstab . Another option is to move files between master > > and slave using bcp. > > > > Vaclav Hanzl > > I am trying to get an NFS mount with the MATLAB directory from > the master node to the slave nodes with little success. I am > following the steps in the Clustermatic documentation. This is what I > do and what I get back: > > There must be something very trivial that I am missing. Any help > with getting the NFS mount working properly is very much appreciated. > Please respond to this account since the server for the account I used > in my previous posts is down. > > Thanks, > > Reza > > Reza, For some reason, in Clustermatic 5, trying to do NFS mounts according to the "manual" (which is what you tried, and used to work in Clustermatic 4), doesn't work anymore. We've had to do some hacks in order to make it work. In short, do NOT try the NFS mounts in /etc/clustermatic/fstab. What you have to do is run a script from the /etc/clustermatic/node_up script, which will do all the necessary stuff on the nodes. I am attaching here my /etc/clustermatic/node_up, and another file called nfs.init which is also put in /etc/clustermatic. This scheme works well for us, and it should work for you as well. You will need to modify the nfs.init script to mount your particular filesystem(s). Regards, Daniel -- Dr. Daniel Gruner dg...@ch... Dept. of Chemistry dan...@ut... University of Toronto phone: (416)-978-8689 80 St. George Street fax: (416)-978-5325 Toronto, ON M5S 3H6, Canada finger for PGP public key |
From: Reza S. <rsh...@gm...> - 2004-12-30 01:19:00
|
Hello, ha...@no... wrote: > In short, you cannot access any files from filesystems which are not mounted on your > slave nodes? Check /etc/beowulf/fstab . Another option is to move files between master > and slave using bcp. > > Vaclav Hanzl I am trying to get an NFS mount with the MATLAB directory from the master node to the slave nodes with little success. I am following the steps in the Clustermatic documentation. This is what I do and what I get back: controller:/ # chkconfig --list nfs nfs 0:off 1:off 2:off 3:off 4:off 5:off 6:off controller:/ # chkconfig nfs on controller:/ # chkconfig --list nfs nfs 0:off 1:off 2:off 3:on 4:off 5:on 6:off controller:/ # /etc/init.d/nfs start Importing Net File System (NFS) unused Then I add the following line to /etc/exports: /usr/local/matlab6p5 192.168.0.0/24(rw,sync,no_root_squash) and type exportfs -a. Then in /etc/clustermatic/fstab I add: MASTER:/usr/local/matlab6p5 /usr/local/matlab6p5 nfs nolock 0 0 Finally, I reboot Clustermatic by doing: controller:/ # /etc/init.d/clustermatic stop Stopping Clustermatic... Shutting down beoserv: done Shutting down bpmaster: done Unmounting bpfs: done Clearing library lists: done Removing modules: done controller:/ # /etc/init.d/clustermatic start Starting Clustermatic... Loading modules: done Setting up libraries: done Mounting bpfs: done Starting bpmaster: done Starting beoserv: done After rebooting Clustermatic, the nodes don't go back up because of some unknown error: shahidi@controller:/usr/local/matlab6p5> bpstat -l Node Address Status Mode User Group 0 0.0.0.0 down ---------- root root 1 192.168.0.11 error ---x------ root root 2 192.168.0.12 error ---x------ root root 3 192.168.0.13 error ---x------ root root Before, I seem to remember node 0 also returning an error status, but now it doesn't even boot. This is not a major issue however, since I can get it to work with a manual reboot. The main concern is the error I am getting on the other nodes. The log for node 1, for example, is as follows: nodeup : Running node_up client for node 1. nodeup : Loading configuration from: /etc/clustermatic/node_up.conf nodeup : Loaded kmod: Kernel Module Loader nodeup : Loaded miscfiles: Generic file transfer module. nodeup : Loaded ifconfig: nodeup : Loaded setupfs: File system mounting module. nodeup : Loaded miscfiles: Generic file transfer module. nodeup : Loaded miscfiles: Generic file transfer module. nodeup : Loaded miscfiles: Generic file transfer module. nodeup : Loaded vmadlib: Library transfer module. nodeup : Loaded nodeinfo: Node information gathering module nodeup : Running premove functions nodeup : Calling premove for kmod nodeup : Plugin kmod returned status 0 (ok) nodeup : Calling premove for miscfiles nodeup : Plugin miscfiles returned status 0 (ok) nodeup : No premove function for ifconfig nodeup : Calling premove for setupfs setupfs : mount flags for /proc are 0 / setupfs : mount flags for /sys are 0 / setupfs : mount flags for /bpfs are 0 / setupfs : mount flags for /usr/local/matlab6p5 are 0 / nolock setupfs : Successfully loaded fstab from /etc/clustermatic/fstab nodeup : Plugin setupfs returned status 0 (ok) nodeup : Calling premove for miscfiles nodeup : Plugin miscfiles returned status 0 (ok) nodeup : Calling premove for miscfiles miscfiles : loaded /etc/localtime (size=1017;id=0,0;mode=100644) miscfiles : loaded /etc/ld.so.cache (size=81781;id=0,0;mode=100644) nodeup : Plugin miscfiles returned status 0 (ok) nodeup : Calling premove for miscfiles miscfiles : loaded /etc/clustermatic/nsswitch.conf (size=27;id=0,0;mode=100644) nodeup : Plugin miscfiles returned status 0 (ok) nodeup : Calling premove for vmadlib vmadlib : library list contains 123 libraries vmadlib : loaded /lib/libattr.so.1.1.0 (size=14350;id=0,0;mode=100644) vmadlib : loaded /lib/libselinux.so.1 (size=61336;id=0,0;mode=100755) vmadlib : loaded /lib/libacl.so.1.1.0 (size=31632;id=0,0;mode=100644) vmadlib : loaded /lib/libresolv.so.2 (size=74342;id=0,0;mode=100755) vmadlib : loaded /lib/tls/libpthread.so.0 (size=88272;id=0,0;mode=100755) vmadlib : loaded /lib/ld-2.3.3.so (size=103516;id=0,0;mode=100755) vmadlib : loaded /lib/tls/libc.so.6 (size=1348627;id=0,0;mode=100755) vmadlib : loaded /lib/tls/librtkaio.so.1 (size=35844;id=0,0;mode=100755) vmadlib : loaded /lib/tls/libm.so.6 (size=170563;id=0,0;mode=100755) vmadlib : loaded /usr/X11R6/lib/XFree.so (size=175132;id=0,0;mode=100755) vmadlib : loaded /usr/X11R6/lib/PLog.so (size=64708;id=0,0;mode=100755) vmadlib : loaded /usr/X11R6/lib/libXxf86vm.so.1.0 (size=20309;id=0,0;mode=100755) vmadlib : loaded /usr/X11R6/lib/libXxf86rush.so.1.0 (size=9755;id=0,0;mode=100755) vmadlib : loaded /usr/X11R6/lib/libXvMC.so.1.0 (size=11449;id=0,0;mode=100755) vmadlib : loaded /usr/X11R6/lib/libXtst.so.6.1 (size=22643;id=0,0;mode=100755) vmadlib : loaded /usr/X11R6/lib/libXt.so.6.0 (size=376701;id=0,0;mode=100755) vmadlib : loaded /usr/X11R6/lib/libXTrap.so.6.4 (size=37013;id=0,0;mode=100755) vmadlib : loaded /usr/X11R6/lib/libXss.so.1.0 (size=10626;id=0,0;mode=100755) vmadlib : loaded /usr/X11R6/lib/libXRes.so.1.0 (size=8153;id=0,0;mode=100755) vmadlib : loaded /usr/X11R6/lib/libXrender.so.1.2.2 (size=34126;id=0,0;mode=100755) vmadlib : loaded /usr/X11R6/lib/libXrandr.so.2.0 (size=13265;id=0,0;mode=100755) vmadlib : loaded /usr/X11R6/lib/libXp.so.6.2 (size=34160;id=0,0;mode=100755) vmadlib : loaded /usr/X11R6/lib/libXpm.so.4.11 (size=68662;id=0,0;mode=100755) vmadlib : loaded /usr/X11R6/lib/libXmuu.so.1.0 (size=13015;id=0,0;mode=100755) vmadlib : loaded /usr/X11R6/lib/libXmu.so.6.2 (size=102621;id=0,0;mode=100755) vmadlib : loaded /usr/X11R6/lib/libXm.so.3.0.1 (size=2389516;id=0,0;mode=100755) vmadlib : loaded /usr/X11R6/lib/libxkbui.so.1.0 (size=13720;id=0,0;mode=100755) vmadlib : loaded /usr/X11R6/lib/libxkbfile.so.1.0 (size=148813;id=0,0;mode=100755) vmadlib : loaded /usr/X11R6/lib/libXi.so.6.0 (size=35108;id=0,0;mode=100755) vmadlib : loaded /usr/X11R6/lib/libXinerama.so.1.0 (size=9430;id=0,0;mode=100755) vmadlib : loaded /usr/X11R6/lib/libXft.so.2.1.1 (size=84580;id=0,0;mode=100755) vmadlib : loaded /usr/X11R6/lib/libXft.so.1.1 (size=62460;id=0,0;mode=100755) vmadlib : loaded /usr/X11R6/lib/libXfont.so.1.5 (size=496935;id=0,0;mode=100755) vmadlib : loaded /usr/X11R6/lib/libXfontcache.so.1.2 (size=7614;id=0,0;mode=100755) vmadlib : loaded /usr/X11R6/lib/libXext.so.6.4 (size=67082;id=0,0;mode=100755) vmadlib : loaded /usr/X11R6/lib/libXcursor.so.1.0.2 (size=39854;id=0,0;mode=100755) vmadlib : loaded /usr/X11R6/lib/libXaw.so.7.0 (size=419880;id=0,0;mode=100755) vmadlib : loaded /usr/X11R6/lib/libXaw.so.6.1 (size=298466;id=0,0;mode=100755) vmadlib : loaded /usr/X11R6/lib/libXaw3d.so.7.0 (size=329322;id=0,0;mode=100755) vmadlib : loaded /usr/X11R6/lib/libXaw3d.so.6.1 (size=329322;id=0,0;mode=100755) vmadlib : loaded /usr/X11R6/lib/libX11.so.6.2 (size=1113569;id=0,0;mode=100755) vmadlib : loaded /usr/X11R6/lib/libwraster.so.3.0.0 (size=85064;id=0,0;mode=100755) vmadlib : loaded /usr/X11R6/lib/libVncExt.so.2.0 (size=9376;id=0,0;mode=100755) vmadlib : loaded /usr/X11R6/lib/libUil.so.3.0.1 (size=361572;id=0,0;mode=100755) vmadlib : loaded /usr/X11R6/lib/libSM.so.6.0 (size=37492;id=0,0;mode=100755) vmadlib : loaded /usr/X11R6/lib/libpsres.so.1.0 (size=23517;id=0,0;mode=100755) vmadlib : loaded /usr/X11R6/lib/libMrm.so.3.0.1 (size=132156;id=0,0;mode=100755) vmadlib : loaded /usr/X11R6/lib/libICE.so.6.3 (size=98545;id=0,0;mode=100755) vmadlib : loaded /usr/X11R6/lib/libI810XvMC.so.1.0 (size=73548;id=0,0;mode=100755) vmadlib : loaded /usr/X11R6/lib/libGLw.so.1.0 (size=26354;id=0,0;mode=100755) vmadlib : loaded /usr/X11R6/lib/libFS.so.6.0 (size=52125;id=0,0;mode=100755) vmadlib : loaded /usr/X11R6/lib/libfontenc.so.1.0 (size=27983;id=0,0;mode=100755) vmadlib : loaded /usr/X11R6/lib/libdpstk.so.1.0 (size=35456;id=0,0;mode=100755) vmadlib : loaded /usr/X11R6/lib/libdps.so.1.0 (size=354949;id=0,0;mode=100755) vmadlib : loaded /usr/local/matlab6p5/sys/os/glnx86/libXm.so.2.1 (size=1965965;id=106,100;mode=100755) vmadlib : loaded /usr/local/matlab6p5/sys/os/glnx86/libtermcap.so (size=10448;id=106,100;mode=100755) vmadlib : loaded /usr/local/matlab6p5/sys/os/glnx86/libstdc++-libc6.2-2.so.3 (size=271880;id=106,100;mode=100755) vmadlib : loaded /usr/local/matlab6p5/sys/os/glnx86/libstdc++-libc6.1-2.so.3 (size=269592;id=106,100;mode=100755) vmadlib : loaded /usr/local/matlab6p5/sys/os/glnx86/libiberty.so (size=51260;id=106,100;mode=100755) vmadlib : loaded /usr/local/matlab6p5/sys/os/glnx86/libgmp.so (size=241653;id=106,100;mode=100755) vmadlib : loaded /usr/local/matlab6p5/bin/glnx86/libut.so (size=254276;id=106,100;mode=100755) vmadlib : loaded /usr/local/matlab6p5/bin/glnx86/librxtxSerial.so (size=39824;id=106,100;mode=100755) vmadlib : loaded /usr/local/matlab6p5/bin/glnx86/libnativejmi.so (size=9316;id=106,100;mode=100755) vmadlib : loaded /usr/local/matlab6p5/bin/glnx86/libnativejava.so (size=14312;id=106,100;mode=100755) vmadlib : loaded /usr/local/matlab6p5/bin/glnx86/libmx.so (size=201668;id=106,100;mode=100755) vmadlib : loaded /usr/local/matlab6p5/bin/glnx86/libmwumfpack.so (size=441560;id=106,100;mode=100755) vmadlib : loaded /usr/local/matlab6p5/bin/glnx86/libmwuix.so (size=1110660;id=106,100;mode=100755) vmadlib : loaded /usr/local/matlab6p5/bin/glnx86/libmwudd.so (size=774452;id=106,100;mode=100755) vmadlib : loaded /usr/local/matlab6p5/bin/glnx86/libmwudd_mi.so (size=549032;id=106,100;mode=100755) vmadlib : loaded /usr/local/matlab6p5/bin/glnx86/libmwsl_solver.so (size=181168;id=106,100;mode=100755) vmadlib : loaded /usr/local/matlab6p5/bin/glnx86/libmwsimulink.so (size=5558060;id=106,100;mode=100755) vmadlib : loaded /usr/local/matlab6p5/bin/glnx86/libmwservices.so (size=81436;id=106,100;mode=100755) vmadlib : loaded /usr/local/matlab6p5/bin/glnx86/libmwrsim_engine.so (size=412276;id=106,100;mode=100755) vmadlib : loaded /usr/local/matlab6p5/bin/glnx86/libmwrefblas.so (size=145596;id=106,100;mode=100755) vmadlib : loaded /usr/local/matlab6p5/bin/glnx86/libmwnumerics.so (size=413356;id=106,100;mode=100755) vmadlib : loaded /usr/local/matlab6p5/bin/glnx86/libmwm_pcodeio.so (size=71776;id=106,100;mode=100755) vmadlib : loaded /usr/local/matlab6p5/bin/glnx86/libmwm_pcodegen.so (size=84036;id=106,100;mode=100755) vmadlib : loaded /usr/local/matlab6p5/bin/glnx86/libmwmpath.so (size=79316;id=106,100;mode=100755) vmadlib : loaded /usr/local/matlab6p5/bin/glnx86/libmwm_parser.so (size=1774440;id=106,100;mode=100755) vmadlib : loaded /usr/local/matlab6p5/bin/glnx86/libmwmlib.so (size=488808;id=106,100;mode=100755) vmadlib : loaded /usr/local/matlab6p5/bin/glnx86/libmwm_ir.so (size=557968;id=106,100;mode=100755) vmadlib : loaded /usr/local/matlab6p5/bin/glnx86/libmwm_interpreter.so (size=1055968;id=106,100;mode=100755) vmadlib : loaded /usr/local/matlab6p5/bin/glnx86/libmwm_dispatcher.so (size=18568;id=106,100;mode=100755) vmadlib : loaded /usr/local/matlab6p5/bin/glnx86/libmwmcl.so (size=20944;id=106,100;mode=100755) vmadlib : loaded /usr/local/matlab6p5/bin/glnx86/libmwlapack.so (size=103944;id=106,100;mode=100755) vmadlib : loaded /usr/local/matlab6p5/bin/glnx86/libmwjmi.so (size=260844;id=106,100;mode=100755) vmadlib : loaded /usr/local/matlab6p5/bin/glnx86/libmwir_interp.so (size=78996;id=106,100;mode=100755) vmadlib : loaded /usr/local/matlab6p5/bin/glnx86/libmwhg.so (size=1363088;id=106,100;mode=100755) vmadlib : loaded /usr/local/matlab6p5/bin/glnx86/libmwhardcopy.so (size=180916;id=106,100;mode=100755) vmadlib : loaded /usr/local/matlab6p5/bin/glnx86/libmwgui.so (size=502356;id=106,100;mode=100755) vmadlib : loaded /usr/local/matlab6p5/bin/glnx86/libmwglee.so (size=31264;id=106,100;mode=100755) vmadlib : loaded /usr/local/matlab6p5/bin/glnx86/libmwfftw.so (size=355252;id=106,100;mode=100755) vmadlib : loaded /usr/local/matlab6p5/bin/glnx86/libmwdastudio.so (size=107444;id=106,100;mode=100755) vmadlib : loaded /usr/local/matlab6p5/bin/glnx86/libmwcompiler.so (size=646908;id=106,100;mode=100755) vmadlib : loaded /usr/local/matlab6p5/bin/glnx86/libmwcg_ir.so (size=517732;id=106,100;mode=100755) vmadlib : loaded /usr/local/matlab6p5/bin/glnx86/libmwbuiltins.so (size=193764;id=106,100;mode=100755) vmadlib : loaded /usr/local/matlab6p5/bin/glnx86/libmwarpack.so (size=131772;id=106,100;mode=100755) vmadlib : loaded /usr/local/matlab6p5/bin/glnx86/libmex.so (size=29764;id=106,100;mode=100755) vmadlib : loaded /usr/local/matlab6p5/bin/glnx86/libmat.so (size=37332;id=106,100;mode=100755) vmadlib : loaded /usr/local/matlab6p5/bin/glnx86/libmatlb.so (size=727904;id=106,100;mode=100755) vmadlib : loaded /usr/local/matlab6p5/bin/glnx86/libfixedpoint.so (size=187064;id=106,100;mode=100755) vmadlib : loaded /usr/local/matlab6p5/bin/glnx86/lapack.so (size=1716896;id=106,100;mode=100755) vmadlib : loaded /usr/local/matlab6p5/bin/glnx86/glren.so (size=140300;id=106,100;mode=100755) vmadlib : loaded /usr/local/matlab6p5/bin/glnx86/atlas_PPro.so (size=1260068;id=106,100;mode=100755) vmadlib : loaded /usr/local/matlab6p5/bin/glnx86/atlas_PII.so (size=1359408;id=106,100;mode=100755) vmadlib : loaded /usr/local/matlab6p5/bin/glnx86/atlas_PIII.so (size=1547716;id=106,100;mode=100755) vmadlib : loaded /usr/local/matlab6p5/bin/glnx86/atlas_P4.so (size=1370136;id=106,100;mode=100755) vmadlib : loaded /usr/local/matlab6p5/bin/glnx86/atlas_Athlon.so (size=2046624;id=106,100;mode=100755) vmadlib : loaded /lib/libgcc_s.so.1 (size=40171;id=0,0;mode=100755) vmadlib : loaded /lib/libutil.so.1 (size=10797;id=0,0;mode=100755) vmadlib : loaded /lib/libdl.so.2 (size=13647;id=0,0;mode=100755) vmadlib : loaded /lib/libncurses.so.5.4 (size=316410;id=0,0;mode=100755) vmadlib : loaded /lib/libhistory.so.4.3 (size=24828;id=0,0;mode=100755) vmadlib : loaded /lib/libreadline.so.4.3 (size=176736;id=0,0;mode=100755) vmadlib : loaded /lib/libnss_bproc.so.2 (size=44922;id=0,0;mode=100755) vmadlib : loaded /usr/lib/libbproc.so.4.0.0 (size=40697;id=0,0;mode=100755) vmadlib : loaded /usr/lib/libstdc++.so.5.0.5 (size=965717;id=0,0;mode=100755) vmadlib : loaded /usr/lib/libstdc++.so.2.9.0 (size=352298;id=0,0;mode=100755) vmadlib : loaded /usr/lib/libstdc++.so.2.8.0 (size=325815;id=0,0;mode=100755) vmadlib : loaded /usr/lib/libstdc++.so.2.7.2.8 (size=245395;id=0,0;mode=100755) vmadlib : loaded /usr/lib/libstdc++-libc6.1-1.so.2 (size=358513;id=0,0;mode=100755) vmadlib : loaded /usr/lib/libstdc++-3-libc6.2-2-2.10.0.so (size=379414;id=0,0;mode=100755) vmadlib : loaded /usr/lib/libstdc++-3-libc6.1-2-2.10.0.so (size=358408;id=0,0;mode=100755) nodeup : Plugin vmadlib returned status 0 (ok) nodeup : No premove function for nodeinfo nodeup : Starting 1 child processes. nodeup : Finished creating child processes. nodeup : Running postmove functions nodeup : Calling postmove for kmod nodeup : Plugin kmod returned status 0 (ok) nodeup : Calling postmove for miscfiles nodeup : Plugin miscfiles returned status 0 (ok) nodeup : Calling postmove for ifconfig ifconfig : ifconfig lo 127.0.0.1 255.0.0.0 nodeup : Plugin ifconfig returned status 0 (ok) nodeup : Calling postmove for setupfs setupfs : Doing mount: dev=none mntpt=/proc type=proc opts= nodeup : Child process for node 1 returned error 1 nodeup : Node setup returned status 1 I am not sure what the child process in the initialization sequence is, but this seems to be causing the error. Also, for node 0, the log is almost the same, except for some reason, 3 children process are created, which give the following messages: nodeup : Child process for node 1 returned error 1 nodeup : Child process for node 3 returned error 1 nodeup : Child process for node 0 returned error 1 There must be something very trivial that I am missing. Any help with getting the NFS mount working properly is very much appreciated. Please respond to this account since the server for the account I used in my previous posts is down. Thanks, Reza |
From: Luke P. <lop...@wi...> - 2004-12-27 23:32:52
|
oM stability really depends on who you ask. If you ask me, it's highly hardware dependent. The biggest problem is interrupt-happy ethernet controllers that constantly send nodes into catastophic livelocks. Intel e100 and e1000 are some particularly bad ones. That problem gets much better by running the adapters in NAPI (polling) mode, but I haven't tested that enough to know if I trust it. The second biggest problem with oM is the refusal of the developers to admit it has any "little problems." ^H ^H ^H ^H ^H ^H ^H ^H ^H ^H ^H I don't see how these projects could be effectively intermingled. For one thing, oM is far more complicated. Also oM has a much wider userbase, mostly because it's more user friendly and cool (note that coolnees != usefulness). So I would highly doubt that bproc could influence any oM design decisions. -Luke ha...@no... wrote: >(encouraged by Luke's presence I dare to ask questions I held back so far) > >Are there any souls versed in both bproc and openMosix and able to tell me: > >- is current openMosix stability anywhere close to bproc? > >- oM would like to get into the mainline kernel one day; isn't there >a possibility for us to kindly influence their current design >decisions in such a way that they try to create some kernel subsystem >also usable for bproc (and maybe checkpointing)? I mean, are there any >common interests they could push forward, resulting in even smaller >bproc patch in the future ;-) ? Anybody cares? > >Vaclav Hanzl > > > >------------------------------------------------------- >SF email is sponsored by - The IT Product Guide >Read honest & candid reviews on hundreds of IT Products from real users. >Discover which products truly live up to the hype. Start reading now. >http://productguide.itmanagersjournal.com/ >_______________________________________________ >BProc-users mailing list >BPr...@li... >https://lists.sourceforge.net/lists/listinfo/bproc-users > > |
From: <ha...@no...> - 2004-12-27 20:15:13
|
(encouraged by Luke's presence I dare to ask questions I held back so far) Are there any souls versed in both bproc and openMosix and able to tell me: - is current openMosix stability anywhere close to bproc? - oM would like to get into the mainline kernel one day; isn't there a possibility for us to kindly influence their current design decisions in such a way that they try to create some kernel subsystem also usable for bproc (and maybe checkpointing)? I mean, are there any common interests they could push forward, resulting in even smaller bproc patch in the future ;-) ? Anybody cares? Vaclav Hanzl |
From: Luke P. <lop...@wi...> - 2004-12-27 19:59:19
|
>Well, in fact I wanted to say the same thing :-) In yet another words, >I mean this: Either those syscalls (which should have been avoided in >Matlab source in the first place) do nearly nothing, in which case oM > > My experience with matlab for *nix leads me to believe that they probably just munge something together based on a completed windows version. So your first guess is pretty likely. And I'm going to keep my mouth shut about oM. Mathworks seems at least aware that people are trying- and failing- to get matlab to run in cluster environments. One can only hope the next version will bring improvements. -Luke |
From: <ha...@no...> - 2004-12-27 19:47:32
|
Hi Luke, > > "... However, matlab processes will not do any useful > > work when migrated. My assumption is that matlab relies heavily on > > system calls." > > > > http://howto.ipng.be/openMosixWiki/index.php/More%20on%20Matlab > > > >so unless they hit some particular inefficiency in openMosix, this may > >be the case with bproc as well. > > > > > > I wrote that. The reason that matlab doesn't work with oM is that > syscalls must be performed on the master node. bproc doesn't have this > limitation. However, I can attest to the fact that matlab-bproc is > still a giant pain in the rear. Well, in fact I wanted to say the same thing :-) In yet another words, I mean this: Either those syscalls (which should have been avoided in Matlab source in the first place) do nearly nothing, in which case oM could likely do a better job using some clever optimizing and caching strategies and avoid network round-trips, or those syscalls really do something at least slightly important, and in this case something may break when bproc migrates Matlab and the said giant pain results. In short: If it makes problems with oM, it can make problems with bproc as well (surprise! :) Vaclav |
From: Luke P. <lop...@wi...> - 2004-12-27 14:53:04
|
Hey Vaclav, > "Any version of matlab can be made to migrate ... starting with -nojvm > (disabling java). However, matlab processes will not do any useful > work when migrated. My assumption is that matlab relies heavily on > system calls." > > http://howto.ipng.be/openMosixWiki/index.php/More%20on%20Matlab > >so unless they hit some particular inefficiency in openMosix, this may >be the case with bproc as well. > > I wrote that. The reason that matlab doesn't work with oM is that syscalls must be performed on the master node. bproc doesn't have this limitation. However, I can attest to the fact that matlab-bproc is still a giant pain in the rear. -Luke |
From: <ha...@no...> - 2004-12-26 18:35:45
|
> 1. I don't know if there is an easy way to save variables in MATLAB to disk. > 2. I can't access toolbox functions. > 3. Documentation is unavailable. In short, you cannot access any files from filesystems which are not mounted on your slave nodes? Check /etc/beowulf/fstab . Another option is to move files between master and slave using bcp. Vaclav Hanzl |
From: Reza S. <sh...@en...> - 2004-12-26 03:06:12
|
Hello, ha...@no... wrote: >>>??? MATLAB was unable to open a pseudo-tty: No such file or directory >>> >>> >>... >> >> >>>check with your system administrator and confirm that your pty subsystem >>>is properly configured. >>> >>> > >Usual pty subsystem in fact does not exist with bproc. > >These things (strange connections between terminals and signal >delivery) are one of the darker sides of UNIX and bproc ignores them >by design (I guess it still does). > >They are also rather optional so sometimes you can just ignore few >warnings and things still work. > > > I will assume that having a pseudo-tty is not vital. However with the MATLAB spawned using bpsh on a child node I still have 3 things which are not working: 1. I don't know if there is an easy way to save variables in MATLAB to disk. 2. I can't access toolbox functions. 3. Documentation is unavailable. I am guessing that fixing some or all of these problems would require a more intimate knowledge of the inner workings of MATLAB, but if anybody has any possible solutions or stabs at solutions they would really be appreciated. Thanks! Reza |
From: <ha...@no...> - 2004-12-25 20:35:52
|
>> ??? MATLAB was unable to open a pseudo-tty: No such file or directory >... >> check with your system administrator and confirm that your pty subsystem >> is properly configured. Usual pty subsystem in fact does not exist with bproc. These things (strange connections between terminals and signal delivery) are one of the darker sides of UNIX and bproc ignores them by design (I guess it still does). They are also rather optional so sometimes you can just ignore few warnings and things still work. (Long time ago I spent few weeks exploring this and decided that this would not be easy to get right, once you try some easy fixes you open the door to huge empty spaces. Details can be found in archives of this list.) Vaclav Hanzl |