From: Daniel G. <dg...@ti...> - 2004-06-13 23:43:29
|
Brian, Success!!! (well, at least apparently). The thing lamboots fine, starts the lamd on all the nodes, and seems to run mpi jobs, so until further testing it looks like we are in business! I agree with you about BProc's lack of documentation, but I still like the system. It is the only cluster software that makes the cluster seem like a single system image. I have been running different versions of it for over 2 years, and I insist on it for all my clusters. It may be time to hack on the BProc guys for more documentation (Erik...). Anyway, thanks for your work on LAM, and let me know if I can be of more help. Regards, Daniel On Sun, Jun 13, 2004 at 04:01:11PM -0700, Brian Barrett wrote: > On Jun 13, 2004, at 3:32 PM, Daniel Gruner wrote: > > > Ok, here it goes again. The master node is still somehow screwed up, > > according to lamboot... Here is the output: > > <snip> > > > n-1<8638> ssi:boot:bproc: n-1 nodestatus failed (-1) > > n-1<8638> ssi:boot:bproc: n-1 node status down, failure > > n-1<8638> ssi:boot:bproc: n0 node status: up > > Well, on the good side, we detect the master node correctly now :). > > You know, life would be so much better if BProc documented things like > that. I added some code to make sure that NODE_MASTER is never used > for the parameter to bproc_nodestatus or the like. Hopefully, that > will make life better all around. If you could svn up and let me know > how it goes, I'd appreciate it. Only file changed is > share/ssi/boot/bproc/src/ssi_boot_bproc.c, so you should be able to svn > up and run make (without the autogen or configure stuff). > > Thanks! > > Brian > > -- > Brian Barrett > LAM/MPI developer and all around nice guy > Have a LAM/MPI day: http://www.lam-mpi.org/ -- Dr. Daniel Gruner dg...@ti... Dept. of Chemistry dan...@ut... University of Toronto phone: (416)-978-8689 80 St. George Street fax: (416)-978-5325 Toronto, ON M5S 3H6, Canada finger for PGP public key |