You can subscribe to this list here.
2001 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(25) |
Nov
|
Dec
(22) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2002 |
Jan
(13) |
Feb
(22) |
Mar
(39) |
Apr
(10) |
May
(26) |
Jun
(23) |
Jul
(38) |
Aug
(20) |
Sep
(27) |
Oct
(76) |
Nov
(32) |
Dec
(11) |
2003 |
Jan
(8) |
Feb
(23) |
Mar
(12) |
Apr
(39) |
May
(1) |
Jun
(48) |
Jul
(35) |
Aug
(15) |
Sep
(60) |
Oct
(27) |
Nov
(9) |
Dec
(32) |
2004 |
Jan
(8) |
Feb
(16) |
Mar
(40) |
Apr
(25) |
May
(12) |
Jun
(33) |
Jul
(49) |
Aug
(39) |
Sep
(26) |
Oct
(47) |
Nov
(26) |
Dec
(36) |
2005 |
Jan
(29) |
Feb
(15) |
Mar
(22) |
Apr
(1) |
May
(8) |
Jun
(32) |
Jul
(11) |
Aug
(17) |
Sep
(9) |
Oct
(7) |
Nov
(15) |
Dec
|
From: <er...@he...> - 2004-11-05 17:22:48
|
On Thu, Oct 28, 2004 at 05:23:29PM -0400, Daniel Gruner wrote: > Hi > > We use Fedora 2, with BProc-patched kernel 2.6.7, BProc 4.0.6pre. Works fine! I just threw Clustermatic 5 up on the web site (http://www.clustermatic.org). I tried to make it a little more distribution neutral this time. The x86_64 stuff has been tested on SuSE 9.1 this time. I think it should be possible to throw it on Fedora 2 easily. - Erik |
From: Daniel G. <dg...@ti...> - 2004-10-28 21:23:39
|
Hi We use Fedora 2, with BProc-patched kernel 2.6.7, BProc 4.0.6pre. Works fine! Daniel On Thu, Oct 28, 2004 at 04:07:02PM -0500, Rene Salmon wrote: > > > Hello, > > > We have a small x86_64 cluster and we would like to test drive > Bproc,Supermon,etc.. "Clustermatic" in general. > > Clustermatic seems to want SuSE 9.0 for x86_64. Just doing a quick search > I was only able to find SeSE 9.1 online but maybe there is an archived version > of 9.0 somewhere? > > I was hopping to get a hint or maybe some advice on what distro to try > that will work fairly well with Bproc and friends compiled from source. > > Fedora 2, SuSe 9.1, Mandrake, etc..?? > > Thanks for any advice/comments. > > Rene > > - > -- > Rene Salmon > Tulane University > Center for Computational Science > Richardson Building 310 > New Orleans, LA 70118 > http://www.ccs.tulane.edu > Tel 504-862-8393 > Fax 504-862-8392 > > > > ------------------------------------------------------- > This Newsletter Sponsored by: Macrovision > For reliable Linux application installations, use the industry's leading > setup authoring tool, InstallShield X. Learn more and evaluate > today. http://clk.atdmt.com/MSI/go/ins0030000001msi/direct/01/ > _______________________________________________ > BProc-users mailing list > BPr...@li... > https://lists.sourceforge.net/lists/listinfo/bproc-users -- Dr. Daniel Gruner dg...@ti... Dept. of Chemistry dan...@ut... University of Toronto phone: (416)-978-8689 80 St. George Street fax: (416)-978-5325 Toronto, ON M5S 3H6, Canada finger for PGP public key |
From: Rene S. <rs...@tu...> - 2004-10-28 21:07:23
|
Hello, We have a small x86_64 cluster and we would like to test drive Bproc,Supermon,etc.. "Clustermatic" in general. Clustermatic seems to want SuSE 9.0 for x86_64. Just doing a quick search I was only able to find SeSE 9.1 online but maybe there is an archived version of 9.0 somewhere? I was hopping to get a hint or maybe some advice on what distro to try that will work fairly well with Bproc and friends compiled from source. Fedora 2, SuSe 9.1, Mandrake, etc..?? Thanks for any advice/comments. Rene - -- Rene Salmon Tulane University Center for Computational Science Richardson Building 310 New Orleans, LA 70118 http://www.ccs.tulane.edu Tel 504-862-8393 Fax 504-862-8392 |
From: Ted S. <tsa...@cr...> - 2004-10-28 17:52:38
|
I'll work later to set up bjs. I have more serious problem now. Sometimes, not always, a small job like pi3 kills nodes. In the example below 0,1 and 4 where already killed in the same way: #> ~/mpi_examples/64> mpirun -d --p4 -np 8 --nper 2 /u/ted/mpi_examples/ 64/pi3 . Bogus number of cpus on node 2: 0 Bogus number of cpus on node 3: 0 Bogus number of cpus on node 5: 0 Bogus number of cpus on node 6: 0 Bogus number of cpus on node 7: 0 Bogus number of cpus on node 8: 0 Bogus number of cpus on node 9: 0 Bogus number of cpus on node 10: 0 Bogus number of cpus on node 11: 0 Bogus number of cpus on node 12: 0 Bogus number of cpus on node 13: 0 Bogus number of cpus on node 14: 0 listen: 192.168.0.101 42017 6: Slave node died 6: Slave node died Not all processes started, aborting. On a node console there is something like (different for different nodes): Pid: 197, comm: init Not tained 2.6.7 RIP: ... Call Trace: < ...> {bproc: do_recv_proc_stub+496} or on another node: {bproc: masq_add_proc+496}. There is nothing on the nodes log files. What 'Bogus number of cpus on node' means? Help, please? Thanks, Ted Daniel Gruner wrote: >Ted, > >bjs changes the permissions on the nodes, so that only root will be >able to submit without going through bjs. > >I have only done a bit of bjs stuff, and then only for single processor >stuff. Typically it involves using scripts that themselves use bpsh >to submit the jobs to the nodes. In bpsh you can redirect the input, >output and stderr using the -I, -O and -E directives. > >If your nodes mount the master node (via nfs), then you can redirect >the output from your job directly to the user's home directory. >Alternatively, if you have local disk on the nodes then you can >run the job and create the output file(s) on the local disk, and >later bpcp the files to the master. > >I am not sure how this would work for mpi codes that run under bjs. >In fact, I have never tried to run mpi codes in bjs... >I guess I will have to find out sooner or later anyway, as I am planning >to set up my newest cluster with bjs for users' jobs. > >Keep me posted... >Daniel > >On Thu, Oct 28, 2004 at 10:07:10AM -0400, Ted Sariyski wrote: > > >>Thanks Daniel, >>With /usr/bin/mpirun I am able to submit jobs but I had to change the >>permissions to the nodes to xxx. Is it the normal mode of >>permissions if users is supposed to submit jobs only through bjs? I >>still do not get the output file when the job is submitted with bjs. Any >>idea where to look? >>Thanks a lot, >>Ted >> >>Daniel Gruner wrote: >> >> >> >>>Hi Ted, >>> >>>On Thu, Oct 28, 2004 at 08:54:14AM -0400, Ted Sariyski wrote: >>> >>> >>> >>> >>>>Hi, >>>>I thought that I patched mpich with BProc but the Makefile had >>>>RSHCOMMAND set to /bin/rsh. I rebuild mpich again. I'll provide more >>>>details how I did it because although that the build went seamless I >>>>still get errors. I use the source of mpich and the patches from CM4. >>>> >>>>Pathching finished without errors: >>>>#> patch -p1 < ../mpich-1.2.5..10-p4-bproc.patch >>>>patching file mpid/ch_p4/p4/lib/p4_sock_conn.c >>>>patching file mpid/ch_p4/p4/lib/p4_sock_sr.c >>>>patching file mpid/ch_p4/p4/lib/p4_utils.c >>>>patching file mpid/ch_p4/p4priv.c >>>>#> patch -p1 < ../mpich-1.2.5..10-totalview.patch >>>>patching file src/env/initutil.c >>>> >>>>Makefile was generated by: >>>>export CC=gcc >>>>export FC=pgf77 >>>>export F90=pgf90 >>>>export F77=pgf77 >>>>export RSHCOMMAND=bproc >>>>./configure --with-device=ch_p4 \ >>>> --prefix=/usr/local/mpic-p4_1.2.5.2 \ >>>> --enable-debug \ >>>> -optcc="-O3" \ >>>> -c++=g++ \ >>>> --enable-f90 --enable-f77 \ >>>> --enable-romio --with-file-system=nfs | tee configure.log >>>>Compilation finished without errors. >>>> >>>>Now mpirun returns: >>>>#> mpirun -np 2 --p4 /u/ted/mpi_examples/64/pi3 >>>> Unrecognized argument --p4 ignored. >>>> p0_6984: p4_error: Path to program is invalid while starting >>>>/u/ted/mpi_examples/64/pi3 with bproc on xtreme101: -1 >>>> p4_error: latest msg from perror: No such file or directory >>>> >>>> >>>> >>>> >>>Try using /usr/bin/mpirun, rather than whatever is coming up first in >>>your path (likely the mpirun from the examples directory). The >>>/usr/bin/mpirun is from cmtools, and it is the only one that works >>>(at least for me). It will happily take the --p4 argument. >>> >>>Daniel >>> >>> >>> >>> > > > |
From: Daniel G. <dg...@ti...> - 2004-10-28 15:11:36
|
Ted, bjs changes the permissions on the nodes, so that only root will be able to submit without going through bjs. I have only done a bit of bjs stuff, and then only for single processor stuff. Typically it involves using scripts that themselves use bpsh to submit the jobs to the nodes. In bpsh you can redirect the input, output and stderr using the -I, -O and -E directives. If your nodes mount the master node (via nfs), then you can redirect the output from your job directly to the user's home directory. Alternatively, if you have local disk on the nodes then you can run the job and create the output file(s) on the local disk, and later bpcp the files to the master. I am not sure how this would work for mpi codes that run under bjs. In fact, I have never tried to run mpi codes in bjs... I guess I will have to find out sooner or later anyway, as I am planning to set up my newest cluster with bjs for users' jobs. Keep me posted... Daniel On Thu, Oct 28, 2004 at 10:07:10AM -0400, Ted Sariyski wrote: > Thanks Daniel, > With /usr/bin/mpirun I am able to submit jobs but I had to change the > permissions to the nodes to xxx. Is it the normal mode of > permissions if users is supposed to submit jobs only through bjs? I > still do not get the output file when the job is submitted with bjs. Any > idea where to look? > Thanks a lot, > Ted > > Daniel Gruner wrote: > > >Hi Ted, > > > >On Thu, Oct 28, 2004 at 08:54:14AM -0400, Ted Sariyski wrote: > > > > > >>Hi, > >>I thought that I patched mpich with BProc but the Makefile had > >>RSHCOMMAND set to /bin/rsh. I rebuild mpich again. I'll provide more > >>details how I did it because although that the build went seamless I > >>still get errors. I use the source of mpich and the patches from CM4. > >> > >>Pathching finished without errors: > >>#> patch -p1 < ../mpich-1.2.5..10-p4-bproc.patch > >>patching file mpid/ch_p4/p4/lib/p4_sock_conn.c > >>patching file mpid/ch_p4/p4/lib/p4_sock_sr.c > >>patching file mpid/ch_p4/p4/lib/p4_utils.c > >>patching file mpid/ch_p4/p4priv.c > >>#> patch -p1 < ../mpich-1.2.5..10-totalview.patch > >>patching file src/env/initutil.c > >> > >>Makefile was generated by: > >>export CC=gcc > >>export FC=pgf77 > >>export F90=pgf90 > >>export F77=pgf77 > >>export RSHCOMMAND=bproc > >>./configure --with-device=ch_p4 \ > >> --prefix=/usr/local/mpic-p4_1.2.5.2 \ > >> --enable-debug \ > >> -optcc="-O3" \ > >> -c++=g++ \ > >> --enable-f90 --enable-f77 \ > >> --enable-romio --with-file-system=nfs | tee configure.log > >>Compilation finished without errors. > >> > >> Now mpirun returns: > >>#> mpirun -np 2 --p4 /u/ted/mpi_examples/64/pi3 > >> Unrecognized argument --p4 ignored. > >> p0_6984: p4_error: Path to program is invalid while starting > >>/u/ted/mpi_examples/64/pi3 with bproc on xtreme101: -1 > >> p4_error: latest msg from perror: No such file or directory > >> > >> > > > >Try using /usr/bin/mpirun, rather than whatever is coming up first in > >your path (likely the mpirun from the examples directory). The > >/usr/bin/mpirun is from cmtools, and it is the only one that works > >(at least for me). It will happily take the --p4 argument. > > > >Daniel > > > > -- Dr. Daniel Gruner dg...@ti... Dept. of Chemistry dan...@ut... University of Toronto phone: (416)-978-8689 80 St. George Street fax: (416)-978-5325 Toronto, ON M5S 3H6, Canada finger for PGP public key |
From: Ted S. <tsa...@cr...> - 2004-10-28 14:10:58
|
Thanks Daniel, With /usr/bin/mpirun I am able to submit jobs but I had to change the permissions to the nodes to xxx. Is it the normal mode of permissions if users is supposed to submit jobs only through bjs? I still do not get the output file when the job is submitted with bjs. Any idea where to look? Thanks a lot, Ted Daniel Gruner wrote: >Hi Ted, > >On Thu, Oct 28, 2004 at 08:54:14AM -0400, Ted Sariyski wrote: > > >>Hi, >>I thought that I patched mpich with BProc but the Makefile had >>RSHCOMMAND set to /bin/rsh. I rebuild mpich again. I'll provide more >>details how I did it because although that the build went seamless I >>still get errors. I use the source of mpich and the patches from CM4. >> >>Pathching finished without errors: >>#> patch -p1 < ../mpich-1.2.5..10-p4-bproc.patch >>patching file mpid/ch_p4/p4/lib/p4_sock_conn.c >>patching file mpid/ch_p4/p4/lib/p4_sock_sr.c >>patching file mpid/ch_p4/p4/lib/p4_utils.c >>patching file mpid/ch_p4/p4priv.c >>#> patch -p1 < ../mpich-1.2.5..10-totalview.patch >>patching file src/env/initutil.c >> >>Makefile was generated by: >>export CC=gcc >>export FC=pgf77 >>export F90=pgf90 >>export F77=pgf77 >>export RSHCOMMAND=bproc >>./configure --with-device=ch_p4 \ >> --prefix=/usr/local/mpic-p4_1.2.5.2 \ >> --enable-debug \ >> -optcc="-O3" \ >> -c++=g++ \ >> --enable-f90 --enable-f77 \ >> --enable-romio --with-file-system=nfs | tee configure.log >>Compilation finished without errors. >> >> Now mpirun returns: >>#> mpirun -np 2 --p4 /u/ted/mpi_examples/64/pi3 >> Unrecognized argument --p4 ignored. >> p0_6984: p4_error: Path to program is invalid while starting >>/u/ted/mpi_examples/64/pi3 with bproc on xtreme101: -1 >> p4_error: latest msg from perror: No such file or directory >> >> > >Try using /usr/bin/mpirun, rather than whatever is coming up first in >your path (likely the mpirun from the examples directory). The >/usr/bin/mpirun is from cmtools, and it is the only one that works >(at least for me). It will happily take the --p4 argument. > >Daniel > > |
From: Daniel G. <dg...@ti...> - 2004-10-28 13:22:21
|
Hi Ted, On Thu, Oct 28, 2004 at 08:54:14AM -0400, Ted Sariyski wrote: > Hi, > I thought that I patched mpich with BProc but the Makefile had > RSHCOMMAND set to /bin/rsh. I rebuild mpich again. I'll provide more > details how I did it because although that the build went seamless I > still get errors. I use the source of mpich and the patches from CM4. > > Pathching finished without errors: > #> patch -p1 < ../mpich-1.2.5..10-p4-bproc.patch > patching file mpid/ch_p4/p4/lib/p4_sock_conn.c > patching file mpid/ch_p4/p4/lib/p4_sock_sr.c > patching file mpid/ch_p4/p4/lib/p4_utils.c > patching file mpid/ch_p4/p4priv.c > #> patch -p1 < ../mpich-1.2.5..10-totalview.patch > patching file src/env/initutil.c > > Makefile was generated by: > export CC=gcc > export FC=pgf77 > export F90=pgf90 > export F77=pgf77 > export RSHCOMMAND=bproc > ./configure --with-device=ch_p4 \ > --prefix=/usr/local/mpic-p4_1.2.5.2 \ > --enable-debug \ > -optcc="-O3" \ > -c++=g++ \ > --enable-f90 --enable-f77 \ > --enable-romio --with-file-system=nfs | tee configure.log > Compilation finished without errors. > > Now mpirun returns: > #> mpirun -np 2 --p4 /u/ted/mpi_examples/64/pi3 > Unrecognized argument --p4 ignored. > p0_6984: p4_error: Path to program is invalid while starting > /u/ted/mpi_examples/64/pi3 with bproc on xtreme101: -1 > p4_error: latest msg from perror: No such file or directory Try using /usr/bin/mpirun, rather than whatever is coming up first in your path (likely the mpirun from the examples directory). The /usr/bin/mpirun is from cmtools, and it is the only one that works (at least for me). It will happily take the --p4 argument. Daniel -- Dr. Daniel Gruner dg...@ti... Dept. of Chemistry dan...@ut... University of Toronto phone: (416)-978-8689 80 St. George Street fax: (416)-978-5325 Toronto, ON M5S 3H6, Canada finger for PGP public key |
From: Ted S. <tsa...@cr...> - 2004-10-28 12:58:10
|
Hi, I thought that I patched mpich with BProc but the Makefile had RSHCOMMAND set to /bin/rsh. I rebuild mpich again. I'll provide more details how I did it because although that the build went seamless I still get errors. I use the source of mpich and the patches from CM4. Pathching finished without errors: #> patch -p1 < ../mpich-1.2.5..10-p4-bproc.patch patching file mpid/ch_p4/p4/lib/p4_sock_conn.c patching file mpid/ch_p4/p4/lib/p4_sock_sr.c patching file mpid/ch_p4/p4/lib/p4_utils.c patching file mpid/ch_p4/p4priv.c #> patch -p1 < ../mpich-1.2.5..10-totalview.patch patching file src/env/initutil.c Makefile was generated by: export CC=gcc export FC=pgf77 export F90=pgf90 export F77=pgf77 export RSHCOMMAND=bproc ./configure --with-device=ch_p4 \ --prefix=/usr/local/mpic-p4_1.2.5.2 \ --enable-debug \ -optcc="-O3" \ -c++=g++ \ --enable-f90 --enable-f77 \ --enable-romio --with-file-system=nfs | tee configure.log Compilation finished without errors. Now mpirun returns: #> mpirun -np 2 --p4 /u/ted/mpi_examples/64/pi3 Unrecognized argument --p4 ignored. p0_6984: p4_error: Path to program is invalid while starting /u/ted/mpi_examples/64/pi3 with bproc on xtreme101: -1 p4_error: latest msg from perror: No such file or directory With bjssub it doesn't complain but I do not get any output. Both bjsstat and bpstat returns what is expected: #> bjssub -p default -n 2 1000 -O /u/ted/mpi_examples/64/my.out mpirun --p4 -np 4 /u/ted/mpi_examples/64/pi3 . JOBID=40 #> bjsstat Pool: default Nodes (total/up/free): 15/15/13 ID User Command Requirements 40 R tsariysk 1000 -O /u/ted/mpi_examples/64/my.out mpirun --p4 -np 4 nodes=2 secs=1 #> bpstat Node(s) Status Mode User Group 0-1 up ---x------ tsariysk users 2-14 up ---x------ root root I checked that all nodes can execute pi3 ('bpsh all ./pi3' works fine). I also checked that /u, where the executable is and where I expect the output, is mounted on all nodes: #> bpsh 0 mount rootfs on / type rootfs (rw) none on /proc type proc (rw,nodiratime) none on /bpfs type bpfs (rw) 192.168.0.200:/public/home on /u type nfs (rw,v3,rsize=16384,wsize=16384,hard,intr,tcp,lock,addr=192.168.0.200) Why I get 'Path to program is invalid while starting /u/ted/mpi_examples/64/pi3 with bproc on xtreme101: -1'? and 'Unrecognized argument --p4' when execute mpirun but don't get these errors with bjssub? What I'm doing wrong? Thanks, Ted er...@he... wrote: On Tue, Oct 26, 2004 at 01:24:26PM -0400, Ted Sariyski wrote: > Well, the problem was that I have a record for the head node in my > .rhost file while the other user hasn't. I don't remember when and > why I put the head node in the .rhosts file. What surprise me is > that brprc looks at this file at all ... or I have a misconfigured > cluster? > Thanks, Ted > It sounds like you're not using BProc for startup at all. BProc definitely doesn't use .rhosts for anything. Is your MPI patched to use BProc? - Erik >> Hi, >> I have a permission problem. I have two (unprivileged ) users in >> bjs.conf: >> users tsariysk cmenchini >> >> The first user has no problems to execute jobs but the second user >> gets 'permission denied' when running the same job: >> Permission denied. >> p0_12983: p4_error: Child process exited while making connection to >> remote process on xtreme101: 0 >> p0_12983: (18.031250) net_send: could not write to fd=4, errno = 32 >> >> I see that when the job is submitted five nodes are allocated for the >> job: >> Before the submission status is: >> 0-4 >> up ---x------ cmenchin users >> 5-14 >> up ---x------ root root >> and after the job exit they are returned to the pool: >> 0-14 >> up ---x------ root root >> >> How should I track this problem? >> Thanks, Ted >> >> P.S. Attached is bjs.conf. Is there another file that controls user >> permissions? >> >> >> *********** bjs.conf ***************** >> spooldir /var/spool/bjs >> policypath /usr/lib64/bjs:/usr/lib/bjs >> socketpath /tmp/.bjs >> acctlog /tmp/acct.log >> >> pool default >> policy filler >> nodes 0-14 >> users tsariysk cmenchini >> >> > > ------------------------------------------------------- > This SF.Net email is sponsored by: > Sybase ASE Linux Express Edition - download now for FREE > LinuxWorld Reader's Choice Award Winner for best database on Linux. > http://ads.osdn.com/?ad_id=5588&alloc_id=12065&op=click > _______________________________________________ > BProc-users mailing list > BPr...@li... > https://lists.sourceforge.net/lists/listinfo/bproc-users > ------------------------------------------------------- This SF.Net email is sponsored by: Sybase ASE Linux Express Edition - download now for FREE LinuxWorld Reader's Choice Award Winner for best database on Linux. http://ads.osdn.com/?ad_id=5588&alloc_id=12065&op=click _______________________________________________ BProc-users mailing list BPr...@li... https://lists.sourceforge.net/lists/listinfo/bproc-users |
From: Ted S. <tsa...@cr...> - 2004-10-27 16:25:20
|
I thought that I patched mpich with BProc but I rebuild it again (attached are the details how I built it). Now mpirun returns: #> mpirun -np 2 --p4 /u/ted/mpi_examples/64/pi3 Unrecognized argument --p4 ignored. p0_6984: p4_error: Path to program is invalid while starting /u/ted/mpi_examples/64/pi3 with bproc on xtreme101: -1 p4_error: latest msg from perror: No such file or directory With bjssub it looks to work but I never get an output so I cannot tell: #> bjssub -p default -n 2 1000 -O /u/ted/mpi_examples/64/my.out mpirun --p4 -np 4 /u/ted/mpi_examples/64/pi3 . JOBID=40 #> bjsstat Pool: default Nodes (total/up/free): 15/15/13 ID User Command Requirements 40 R tsariysk 1000 -O /u/ted/mpi_examples/64/my.out mpirun --p4 -np 4 nodes=2 secs=1 #> bpstat Node(s) Status Mode User Group 0-1 up ---x------ tsariysk users 2-14 up ---x------ root root I am able to execute pi3 on all nodes ('bpsh all ./pi3' works fine) and /u is mounted on all nodes: #> bpsh 0 mount rootfs on / type rootfs (rw) none on /proc type proc (rw,nodiratime) none on /bpfs type bpfs (rw) 192.168.0.200:/public/home on /u type nfs (rw,v3,rsize=16384,wsize=16384,hard,intr,tcp,lock,addr=192.168.0.200) Why I get 'Path to program is invalid while starting /u/ted/mpi_examples/64/pi3 with bproc on xtreme101: -1'? What I'm doing wrong? Thanks, Ted P.S. Here is the how I built mpich (attached are Makefile and the log files from configure and make). [root@xtreme101 mpich-1.2.5.2]# patch -p1 < ../mpich-1.2.5..10-p4-bproc.patch patching file mpid/ch_p4/p4/lib/p4_sock_conn.c patching file mpid/ch_p4/p4/lib/p4_sock_sr.c patching file mpid/ch_p4/p4/lib/p4_utils.c patching file mpid/ch_p4/p4priv.c [root@xtreme101 mpich-1.2.5.2]# patch -p1 < ../mpich-1.2.5..10-totalview.patch patching file src/env/initutil.c export CC=gcc export FC=pgf77 export F90=pgf90 export F77=pgf77 export RSHCOMMAND=bproc ./configure --with-device=ch_p4 \ --prefix=/usr/local/mpic-p4_1.2.5.2 \ --enable-debug \ -optcc="-O3" \ -c++=g++ \ --enable-f90 --enable-f77 \ --enable-romio --with-file-system=nfs | tee configure.log make >& make.log er...@he... wrote: >On Tue, Oct 26, 2004 at 01:24:26PM -0400, Ted Sariyski wrote: > > >>Well, the problem was that I have a record for the head node in my >>.rhost file while the other user hasn't. I don't remember when and why >>I put the head node in the .rhosts file. What surprise me is that >>brprc looks at this file at all ... or I have a misconfigured cluster? >>Thanks, Ted >> >> > >It sounds like you're not using BProc for startup at all. BProc >definitely doesn't use .rhosts for anything. Is your MPI patched to >use BProc? > >- Erik > > > >>>Hi, >>>I have a permission problem. I have two (unprivileged ) users in >>>bjs.conf: >>> users tsariysk cmenchini >>> >>>The first user has no problems to execute jobs but the second user >>>gets 'permission denied' when running the same job: >>>Permission denied. >>>p0_12983: p4_error: Child process exited while making connection to >>>remote process on xtreme101: 0 >>>p0_12983: (18.031250) net_send: could not write to fd=4, errno = 32 >>> >>>I see that when the job is submitted five nodes are allocated for the >>>job: >>>Before the submission status is: >>>0-4 >>>up ---x------ cmenchin users >>>5-14 >>>up ---x------ root root >>>and after the job exit they are returned to the pool: >>>0-14 >>>up ---x------ root root >>> >>>How should I track this problem? >>>Thanks, Ted >>> >>>P.S. Attached is bjs.conf. Is there another file that controls user >>>permissions? >>> >>> >>>*********** bjs.conf ***************** >>>spooldir /var/spool/bjs >>>policypath /usr/lib64/bjs:/usr/lib/bjs >>>socketpath /tmp/.bjs >>>acctlog /tmp/acct.log >>> >>>pool default >>> policy filler >>> nodes 0-14 >>> users tsariysk cmenchini >>> >>> >>> >>------------------------------------------------------- >>This SF.Net email is sponsored by: >>Sybase ASE Linux Express Edition - download now for FREE >>LinuxWorld Reader's Choice Award Winner for best database on Linux. >>http://ads.osdn.com/?ad_id=5588&alloc_id=12065&op=click >>_______________________________________________ >>BProc-users mailing list >>BPr...@li... >>https://lists.sourceforge.net/lists/listinfo/bproc-users >> >> > > >------------------------------------------------------- >This SF.Net email is sponsored by: >Sybase ASE Linux Express Edition - download now for FREE >LinuxWorld Reader's Choice Award Winner for best database on Linux. >http://ads.osdn.com/?ad_id=5588&alloc_id=12065&op=click >_______________________________________________ >BProc-users mailing list >BPr...@li... >https://lists.sourceforge.net/lists/listinfo/bproc-users > > |
From: <ha...@no...> - 2004-10-26 19:01:10
|
> > There is a persistent local-disk caching support for NFS client as a > > patch against 2.6.9-rc4-mm1 (see links below). > > > > Anybody has an estimate how hard it would be to reconcile this with > > the bproc patch? > > Without having looked at the patch, it should be pretty easy. File > system stuff very rarely bumps into BProc since BProc doesn't touch > the fs code at all. > > So far I haven't heard of a file system that can't work with BProc. Thanks for confirming what I hoped for. My real concern however is -rc4-mm1, I guess that direct patch of bproc over 2.6.9-rc4-mm1 would be quite insane attempt as the 2.6.9-rc4-mm1 is 741 patches away from 2.6.9: ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.9-rc4/2.6.9-rc4-mm1/patch-series So I guess the viable way would be to first find a minimum of those 741 patches leading to working cachefs (daunting task but there is common interest in this) and then add bproc. I will check on the linux-cachefs list what are the prospects of creating patch of cachefs-NFS things against a vanilla kernel. Regards Vaclav Hanzl |
From: <er...@he...> - 2004-10-26 17:38:05
|
On Thu, Oct 21, 2004 at 10:24:51PM +0200, ha...@no... wrote: > There is a persistent local-disk caching support for NFS client as a > patch against 2.6.9-rc4-mm1 (see links below). > > Anybody has an estimate how hard it would be to reconcile this with > the bproc patch? Without having looked at the patch, it should be pretty easy. File system stuff very rarely bumps into BProc since BProc doesn't touch the fs code at all. So far I haven't heard of a file system that can't work with BProc. - Erik |
From: <er...@he...> - 2004-10-26 17:35:10
|
On Thu, Oct 21, 2004 at 09:12:59PM -0400, Daniel Gruner wrote: > Further to my previous note, the same problem occurs when I use the > gcc/g++/g77 compiler suite. FWIW, I haven't tried anything past mpich 1.2.4. I'm pretty sure our mpich startup patches for P4 are against that version. Things still seem to move around a bit inside p4 so I've seen problems on newer versions of mpich. - Erik > On Thu, Oct 21, 2004 at 08:50:25PM -0400, Daniel Gruner wrote: > > Hi > > > > I am trying to compile and use mpich 1.2.5.2 on an athlon cluster, with > > bproc 4.0.6pre, and kernel 2.6.7. The same mpich and bproc and kernel > > packages work fine on an Opteron cluster with PathScale compilers, but > > on my Athlon cluster with Intel compilers it doesn't. The compilation > > works fine, but running multi-process jobs does not. For example, > > running the hello++ program in the mpich examples, for a single process > > works fine: > > > > [root@abi examples]# /usr/bin/mpirun -P -np 1 -d -- ./hello > > listen: 192.168.101.1 33514 > > Hello World! I am 0 of 1 > > > > > > But when I try to run more than one process it bombs (irrespective of > > whether it tries to run on one node or several): > > > > mpirun -np 2 -d -- ./hello > > [root@abi examples]# /usr/bin/mpirun -P -np 2 -d -- ./hello > > listen: 192.168.101.1 33511 > > xm_8955: (0.003877) net_recv failed for fd = 6 > > xm_8955: p4_error: net_recv read, errno = : 104 > > rm_l_0_8957: (0.004684) net_send: could not write to fd=6, errno = 9 > > rm_l_0_8957: p4_error: net_send write: -1 > > p4_error: latest msg from perror: Bad file descriptor > > iofwd: Child process exited abnormally. > > > > I am running from /home, which is mounted on all nodes. > > I am doing exactly the same on the Opteron cluster... > > > > Any experience with mpich + Intel compilers + bproc 4.0.6 anywhere? > > > > Thanks, > > Daniel > > > > -- > > > > Dr. Daniel Gruner dg...@ti... > > Dept. of Chemistry dan...@ut... > > University of Toronto phone: (416)-978-8689 > > 80 St. George Street fax: (416)-978-5325 > > Toronto, ON M5S 3H6, Canada finger for PGP public key > > > > > > ------------------------------------------------------- > > This SF.net email is sponsored by: IT Product Guide on ITManagersJournal > > Use IT products in your business? Tell us what you think of them. Give us > > Your Opinions, Get Free ThinkGeek Gift Certificates! Click to find out more > > http://productguide.itmanagersjournal.com/guidepromo.tmpl > > _______________________________________________ > > BProc-users mailing list > > BPr...@li... > > https://lists.sourceforge.net/lists/listinfo/bproc-users > > -- > > Dr. Daniel Gruner dg...@ti... > Dept. of Chemistry dan...@ut... > University of Toronto phone: (416)-978-8689 > 80 St. George Street fax: (416)-978-5325 > Toronto, ON M5S 3H6, Canada finger for PGP public key > > > ------------------------------------------------------- > This SF.net email is sponsored by: IT Product Guide on ITManagersJournal > Use IT products in your business? Tell us what you think of them. Give us > Your Opinions, Get Free ThinkGeek Gift Certificates! Click to find out more > http://productguide.itmanagersjournal.com/guidepromo.tmpl > _______________________________________________ > BProc-users mailing list > BPr...@li... > https://lists.sourceforge.net/lists/listinfo/bproc-users |
From: <er...@he...> - 2004-10-26 17:32:55
|
On Tue, Oct 26, 2004 at 01:24:26PM -0400, Ted Sariyski wrote: > Well, the problem was that I have a record for the head node in my > .rhost file while the other user hasn't. I don't remember when and why > I put the head node in the .rhosts file. What surprise me is that > brprc looks at this file at all ... or I have a misconfigured cluster? > Thanks, Ted It sounds like you're not using BProc for startup at all. BProc definitely doesn't use .rhosts for anything. Is your MPI patched to use BProc? - Erik > > Hi, > > I have a permission problem. I have two (unprivileged ) users in > > bjs.conf: > > users tsariysk cmenchini > > > > The first user has no problems to execute jobs but the second user > > gets 'permission denied' when running the same job: > > Permission denied. > > p0_12983: p4_error: Child process exited while making connection to > > remote process on xtreme101: 0 > > p0_12983: (18.031250) net_send: could not write to fd=4, errno = 32 > > > > I see that when the job is submitted five nodes are allocated for the > > job: > > Before the submission status is: > > 0-4 > > up ---x------ cmenchin users > > 5-14 > > up ---x------ root root > > and after the job exit they are returned to the pool: > > 0-14 > > up ---x------ root root > > > > How should I track this problem? > > Thanks, Ted > > > > P.S. Attached is bjs.conf. Is there another file that controls user > > permissions? > > > > > > *********** bjs.conf ***************** > > spooldir /var/spool/bjs > > policypath /usr/lib64/bjs:/usr/lib/bjs > > socketpath /tmp/.bjs > > acctlog /tmp/acct.log > > > > pool default > > policy filler > > nodes 0-14 > > users tsariysk cmenchini > > > > > ------------------------------------------------------- > This SF.Net email is sponsored by: > Sybase ASE Linux Express Edition - download now for FREE > LinuxWorld Reader's Choice Award Winner for best database on Linux. > http://ads.osdn.com/?ad_id=5588&alloc_id=12065&op=click > _______________________________________________ > BProc-users mailing list > BPr...@li... > https://lists.sourceforge.net/lists/listinfo/bproc-users |
From: Ted S. <tsa...@cr...> - 2004-10-26 17:28:13
|
Well, the problem was that I have a record for the head node in my .rhost file while the other user hasn't. I don't remember when and why I put the head node in the .rhosts file. What surprise me is that brprc looks at this file at all ... or I have a misconfigured cluster? Thanks, Ted Ted Sariyski wrote: > Hi, > I have a permission problem. I have two (unprivileged ) users in > bjs.conf: > users tsariysk cmenchini > > The first user has no problems to execute jobs but the second user > gets 'permission denied' when running the same job: > Permission denied. > p0_12983: p4_error: Child process exited while making connection to > remote process on xtreme101: 0 > p0_12983: (18.031250) net_send: could not write to fd=4, errno = 32 > > I see that when the job is submitted five nodes are allocated for the > job: > Before the submission status is: > 0-4 > up ---x------ cmenchin users > 5-14 > up ---x------ root root > and after the job exit they are returned to the pool: > 0-14 > up ---x------ root root > > How should I track this problem? > Thanks, Ted > > P.S. Attached is bjs.conf. Is there another file that controls user > permissions? > > > *********** bjs.conf ***************** > spooldir /var/spool/bjs > policypath /usr/lib64/bjs:/usr/lib/bjs > socketpath /tmp/.bjs > acctlog /tmp/acct.log > > pool default > policy filler > nodes 0-14 > users tsariysk cmenchini > |
From: Daniel G. <dg...@ti...> - 2004-10-26 14:44:53
|
Hi Has anyone patched mpich 1.2.6 for use in a BProc system? Specifically, I am looking for BProc 4.0.6 compatibility. Thanks, Daniel -- Dr. Daniel Gruner dg...@ti... Dept. of Chemistry dan...@ut... University of Toronto phone: (416)-978-8689 80 St. George Street fax: (416)-978-5325 Toronto, ON M5S 3H6, Canada finger for PGP public key |
From: Ted S. <tsa...@cr...> - 2004-10-25 17:33:17
|
Hi, I have a permission problem. I have two (unprivileged ) users in bjs.conf: users tsariysk cmenchini The first user has no problems to execute jobs but the second user gets 'permission denied' when running the same job: Permission denied. p0_12983: p4_error: Child process exited while making connection to remote process on xtreme101: 0 p0_12983: (18.031250) net_send: could not write to fd=4, errno = 32 I see that when the job is submitted five nodes are allocated for the job: Before the submission status is: 0-4 up ---x------ cmenchin users 5-14 up ---x------ root root and after the job exit they are returned to the pool: 0-14 up ---x------ root root How should I track this problem? Thanks, Ted P.S. Attached is bjs.conf. Is there another file that controls user permissions? *********** bjs.conf ***************** spooldir /var/spool/bjs policypath /usr/lib64/bjs:/usr/lib/bjs socketpath /tmp/.bjs acctlog /tmp/acct.log pool default policy filler nodes 0-14 users tsariysk cmenchini |
From: Daniel G. <dg...@ti...> - 2004-10-22 01:13:24
|
Further to my previous note, the same problem occurs when I use the gcc/g++/g77 compiler suite. Daniel On Thu, Oct 21, 2004 at 08:50:25PM -0400, Daniel Gruner wrote: > Hi > > I am trying to compile and use mpich 1.2.5.2 on an athlon cluster, with > bproc 4.0.6pre, and kernel 2.6.7. The same mpich and bproc and kernel > packages work fine on an Opteron cluster with PathScale compilers, but > on my Athlon cluster with Intel compilers it doesn't. The compilation > works fine, but running multi-process jobs does not. For example, > running the hello++ program in the mpich examples, for a single process > works fine: > > [root@abi examples]# /usr/bin/mpirun -P -np 1 -d -- ./hello > listen: 192.168.101.1 33514 > Hello World! I am 0 of 1 > > > But when I try to run more than one process it bombs (irrespective of > whether it tries to run on one node or several): > > mpirun -np 2 -d -- ./hello > [root@abi examples]# /usr/bin/mpirun -P -np 2 -d -- ./hello > listen: 192.168.101.1 33511 > xm_8955: (0.003877) net_recv failed for fd = 6 > xm_8955: p4_error: net_recv read, errno = : 104 > rm_l_0_8957: (0.004684) net_send: could not write to fd=6, errno = 9 > rm_l_0_8957: p4_error: net_send write: -1 > p4_error: latest msg from perror: Bad file descriptor > iofwd: Child process exited abnormally. > > I am running from /home, which is mounted on all nodes. > I am doing exactly the same on the Opteron cluster... > > Any experience with mpich + Intel compilers + bproc 4.0.6 anywhere? > > Thanks, > Daniel > > -- > > Dr. Daniel Gruner dg...@ti... > Dept. of Chemistry dan...@ut... > University of Toronto phone: (416)-978-8689 > 80 St. George Street fax: (416)-978-5325 > Toronto, ON M5S 3H6, Canada finger for PGP public key > > > ------------------------------------------------------- > This SF.net email is sponsored by: IT Product Guide on ITManagersJournal > Use IT products in your business? Tell us what you think of them. Give us > Your Opinions, Get Free ThinkGeek Gift Certificates! Click to find out more > http://productguide.itmanagersjournal.com/guidepromo.tmpl > _______________________________________________ > BProc-users mailing list > BPr...@li... > https://lists.sourceforge.net/lists/listinfo/bproc-users -- Dr. Daniel Gruner dg...@ti... Dept. of Chemistry dan...@ut... University of Toronto phone: (416)-978-8689 80 St. George Street fax: (416)-978-5325 Toronto, ON M5S 3H6, Canada finger for PGP public key |
From: Daniel G. <dg...@ti...> - 2004-10-22 00:50:53
|
Hi I am trying to compile and use mpich 1.2.5.2 on an athlon cluster, with bproc 4.0.6pre, and kernel 2.6.7. The same mpich and bproc and kernel packages work fine on an Opteron cluster with PathScale compilers, but on my Athlon cluster with Intel compilers it doesn't. The compilation works fine, but running multi-process jobs does not. For example, running the hello++ program in the mpich examples, for a single process works fine: [root@abi examples]# /usr/bin/mpirun -P -np 1 -d -- ./hello listen: 192.168.101.1 33514 Hello World! I am 0 of 1 But when I try to run more than one process it bombs (irrespective of whether it tries to run on one node or several): mpirun -np 2 -d -- ./hello [root@abi examples]# /usr/bin/mpirun -P -np 2 -d -- ./hello listen: 192.168.101.1 33511 xm_8955: (0.003877) net_recv failed for fd = 6 xm_8955: p4_error: net_recv read, errno = : 104 rm_l_0_8957: (0.004684) net_send: could not write to fd=6, errno = 9 rm_l_0_8957: p4_error: net_send write: -1 p4_error: latest msg from perror: Bad file descriptor iofwd: Child process exited abnormally. I am running from /home, which is mounted on all nodes. I am doing exactly the same on the Opteron cluster... Any experience with mpich + Intel compilers + bproc 4.0.6 anywhere? Thanks, Daniel -- Dr. Daniel Gruner dg...@ti... Dept. of Chemistry dan...@ut... University of Toronto phone: (416)-978-8689 80 St. George Street fax: (416)-978-5325 Toronto, ON M5S 3H6, Canada finger for PGP public key |
From: <ha...@no...> - 2004-10-21 20:06:52
|
There is a persistent local-disk caching support for NFS client as a patch against 2.6.9-rc4-mm1 (see links below). Anybody has an estimate how hard it would be to reconcile this with the bproc patch? I would love to have both in the slave node kernel. Thanks Vaclav Hanzl links: http://www.redhat.com/archives/linux-cachefs/2004-October/msg00027.html - 2.6.9-rc4-mm1 patch that will enable NFS (even NFS4) to do persistent file caching on the local harddisk http://www.redhat.com/archives/linux-cachefs/2004-October/msg00004.html - older message explaining what is going on http://www.redhat.com/archives/linux-cachefs/2004-October/msg00019.html - about ways to get this to the mainline kernel http://www.redhat.com/mailman/listinfo/linux-cachefs - list archives and subscription page |
From: Greg W. <gw...@la...> - 2004-10-20 20:39:54
|
This is just telling you that a child of mpirun exited due to a signal. If one of the children exits, or if mpirun can't start all the processes for some reason, then mpirun will kill the remaining children and you may see this message. Greg On Oct 20, 2004, at 2:18 PM, Dale Harris wrote: > Hi, I have MPI processes where I get this error: > > iofwd: Child process exited abnormally > > I'm trying to track it down... I don't know if this is specifically an > MPI problem, or bproc. I have running bproc 3.2.6. > > -- > Dale Harris > ro...@ma... > /.-) > > > ------------------------------------------------------- > This SF.net email is sponsored by: IT Product Guide on > ITManagersJournal > Use IT products in your business? Tell us what you think of them. Give > us > Your Opinions, Get Free ThinkGeek Gift Certificates! Click to find out > more > http://productguide.itmanagersjournal.com/guidepromo.tmpl > _______________________________________________ > BProc-users mailing list > BPr...@li... > https://lists.sourceforge.net/lists/listinfo/bproc-users > |
From: Dale H. <ro...@ma...> - 2004-10-20 20:18:21
|
Hi, I have MPI processes where I get this error: iofwd: Child process exited abnormally I'm trying to track it down... I don't know if this is specifically an MPI problem, or bproc. I have running bproc 3.2.6. -- Dale Harris ro...@ma... /.-) |
From: Steven J. <py...@li...> - 2004-10-19 17:28:00
|
Greetings, That is true in many cases. I prefer to just load lockd. G'day, sjames On Tue, 19 Oct 2004, Michal Jaegermann wrote: > On Tue, Oct 19, 2004 at 04:05:58PM +0000, Steven James wrote: > > > Often when I see slow NFS mounts, it's because the server isn't running > > lockd (or hasn't loaded the lockd module) and the client doesn't mount > > with -onolock > > If you are not writing on such file system then mouning 'nolock' is > fine; otherwise it may be not such great idea, depending on > circumstances, but then you better have some form of 'statd' > running. > > Michal > ||||| |||| ||||||||||||| ||| by Linux Labs International, Inc. Steven James, CTO 55 Marietta Street Suite 1830 Atlanta, Ga 30303 866 824 9737 support |
From: Michal J. <mi...@ha...> - 2004-10-19 17:17:22
|
On Tue, Oct 19, 2004 at 04:05:58PM +0000, Steven James wrote: > Often when I see slow NFS mounts, it's because the server isn't running > lockd (or hasn't loaded the lockd module) and the client doesn't mount > with -onolock If you are not writing on such file system then mouning 'nolock' is fine; otherwise it may be not such great idea, depending on circumstances, but then you better have some form of 'statd' running. Michal |
From: Michal J. <mi...@ha...> - 2004-10-19 17:11:14
|
On Tue, Oct 19, 2004 at 12:01:05PM -0400, Ted Sariyski wrote: > > I'm not sure that I understand how to use: > # If you want to put more setup stuff here, make sure do replace the > # "exec" above with the following: > # /usr/lib/beoboot/bin/node_up $* || exit 1 This is a "basic shell", but ... (while in bash type 'help exec'). 'exec' _replaces_ currently running process with something new. So it cannot return to its caller because if you succeeded then that caller does not exist anymore. Try a shell script with such two lines in it: exec echo "these are effects of exec" echo "if we managed to land here then something is really wrong" If you will skip 'exec' then clearly both lines will run. Therefore if your 'node_up' script wants to continue to execute some other things after /usr/lib/beoboot/bin/node_up it cannot use 'exec' but instead should call ...bin/node_up, check a return code, and proceed accordingly. So you may rewrite it, say, in such way: if /usr/lib/beoboot/bin/node_up $* ; then : # any other code to finish your node setup else # oh, oh; we are in trouble; do whatever is required by # error handling here and follow up with ... exit 1 fi # if this is the end of this script then we automatically exit # with a status of the last executed command (or you may put # an explicit 'exit' for a better clarity). There are other ways to express the same thing. Mostly a question of style. > What I'm doing wrong ? It is not clear. Probably you are not running what you think you are running. 'bpsh $some_node ps ax' may give you clues. Michal |
From: Ted S. <tsa...@cr...> - 2004-10-19 16:42:40
|
It was -onolock. Thanks a lot. Ted Steven James wrote: >Greetings, > >Often when I see slow NFS mounts, it's because the server isn't running >lockd (or hasn't loaded the lockd module) and the client doesn't mount >with -onolock > >G'day, >sjames > > > >On Tue, 19 Oct 2004, Ted Sariyski wrote: > > > >>I need some more help. The script Michal wrote for NFS node support >>ends with: >> >># bpsh $node rpc.statd >> >>There is no rpc.statd in my distribution but there is rpc.rstatd (I use >>SuSe Server 9), so I changed it corespondingly. >>Than I add >> >>/etc/clustermatic/nfs_node.conf $* >> >>at the end of node_up (nfs_node.conf if the Michal's script, attached). >> >>It mounts but is slow. I'm not sure that I understand how to use: >># If you want to put more setup stuff here, make sure do replace the >># "exec" above with the following: >># /usr/lib/beoboot/bin/node_up $* || exit 1 >> >>What I'm doing wrong ? >>Thanks, >>Ted >> >> #!/bin/bash -x >> # >> # A sample how to get NFS modules on a node. >> # Make sure that /etc/modules.conf.dist for a node does not >> # define any "install" actions for these >> # >> # Michal Jaegermann, 2004/Aug/19, michal@ha... >> # >> # 2004/Oct/15, michal@ha... >> # - start portmap and rpc.statd on nodes >> # - fix "case m" typo and do not use "-N" option to bpsh >> >> node=$1 >> mod=nfs >> modules=$( grep $mod.ko /lib/modules/$(uname -r)/modules.dep) >> modules=${modules/:/} >> modules=$( >> for m in $modules ; do >> echo $m >> done | tac ) >> ( cd / >> for m in $modules ; do >> echo $m >> done >> ) | ( cd / ; cpio -o -c --quiet ) | bpsh $node cpio -imd --quiet >> bpsh $node depmod -a >> for m in $modules ; do >> m=$(basename $m .ko) >> m=${m/_/-} >> case $m in >> sunrpc) >> bpsh $node modprobe -i sunrpc >> bpsh $node mkdir -p /var/lib/nfs/rpc_pipefs >> bpsh $node mount | grep -q rpc_pipefs || \ >> bpsh $node mount -t rpc_pipefs sunrpc >>/var/lib/nfs/rpc_pipefs >> ;; >> *) bpsh $node modprobe -i $m >> esac >> done >> # these are for a benfit of rpc.statd >> bpsh $node mkdir -p /var/lib/nfs/statd/ >> bpsh $node mkdir -p /var/run >> bpsh $node portmap >> bpsh $node rpc.rstatd >> >>#mount -t nfs MASTER:/public/home /u -o >>nfsvers=3,tcp,bg,rw,rsize=16384,wsize=16384,hard,intr >>#mount -t nfs MASTER:/scratch /scratch1 -o >>nfsvers=3,tcp,bg,rw,rsize=16384,wsize=16384,hard,intr >>#mount -t nfs MASTER:/public/code /code -o >>nfsvers=3,tcp,bg,rw,rsize=16384,wsize=16384,hard,intr >> >> >>Daniel Gruner wrote: >> >> >> >>>Ted, >>> >>>See the posting from Michal Jagermann on Oct 16. You need to run >>>both portmap and rpc.statd on the nodes, and then mounting and umounting >>>work fine. >>> >>>Daniel >>> >>> >>>On Mon, Oct 18, 2004 at 11:01:59AM -0400, Ted Sariyski wrote: >>> >>> >>> >>> >>>>Finally I was able to build a customized version of clustermatic with >>>>kernel 2.6.7 for AMD64. All nodes use Tian B2882 Transport GX28 >>>>mainboard, the head node have two SATA hard disks running in RAID1 mode >>>>and I use PXE to boot the diskless nodes (only 16 nodes). >>>> >>>>I have a couple of questions concerning mounting remote file systems, it >>>>takes really long. Besides some nodes come up fast while for other it >>>>takes 5-10 minutes. For example node1 boots in 2-3 minutes while node0 >>>>issued errors on the console (there are not records on the log file): >>>> >>>>mmap failed: /lib64/ld-2.3.3.so >>>>vmadump: mmap failed: /lib64/ld-2.3.3.so >>>>portmap: server localhost not responding, time out >>>>RPC: failed to contact portmap >>>>Lockd_up: no pid, 2 users?? >>>> >>>>before somehow come up: >>>> >>>>[root@xtreme101 root]# bpsh 0 mount >>>>rootfs on / type rootfs (rw) >>>>none on /proc type proc (rw,nodiratime) >>>>none on /bpfs type bpfs (rw) >>>>192.168.0.101:/home on /home type nfs >>>>(rw,v3,rsize=32768,wsize=32768,hard,udp,nolock,addr=192.168.0.101) >>>>192.168.0.200:/public/home on /u type nfs >>>>(rw,v3,rsize=16384,wsize=16384,hard,intr,tcp,lock,addr=192.168.0.200) >>>>192.168.0.200:/scratch on /scratch1 type nfs >>>>(rw,v3,rsize=16384,wsize=16384,hard,intr,tcp,lock,addr=192.168.0.200) >>>>192.168.0.200:/public/code on /code type nfs >>>>(rw,v3,rsize=16384,wsize=16384,hard,intr,tcp,lock,addr=192.168.0.200) >>>> >>>>Currently I work only with three nodes and I believe that it's not a PXE >>>>issue. What is the meaning of mmap and portmap errors issued by node0? >>>>Is it normal for mount to take so long or I miss something in config? >>>> >>>>Thanks, >>>>Ted >>>> >>>> >>>> >>>> >>>> >>>>------------------------------------------------------- >>>>This SF.net email is sponsored by: IT Product Guide on ITManagersJournal >>>>Use IT products in your business? Tell us what you think of them. Give us >>>>Your Opinions, Get Free ThinkGeek Gift Certificates! Click to find out more >>>>http://productguide.itmanagersjournal.com/guidepromo.tmpl >>>>_______________________________________________ >>>>BProc-users mailing list >>>>BPr...@li... >>>>https://lists.sourceforge.net/lists/listinfo/bproc-users >>>> >>>> >>>> >>>> >>> >>> >>> >> >> >>------------------------------------------------------- >>This SF.net email is sponsored by: IT Product Guide on ITManagersJournal >>Use IT products in your business? Tell us what you think of them. Give us >>Your Opinions, Get Free ThinkGeek Gift Certificates! Click to find out more >>http://productguide.itmanagersjournal.com/guidepromo.tmpl >>_______________________________________________ >>BProc-users mailing list >>BPr...@li... >>https://lists.sourceforge.net/lists/listinfo/bproc-users >> >> >> > >||||| |||| ||||||||||||| ||| >by Linux Labs International, Inc. > Steven James, CTO > >55 Marietta Street >Suite 1830 >Atlanta, Ga 30303 >866 824 9737 support > > |