From: Daniel G. <dg...@ti...> - 2004-06-10 00:08:46
|
On Wed, Jun 09, 2004 at 03:23:32PM -0700, Brian Barrett wrote: > On Jun 9, 2004, at 3:15 PM, Daniel Gruner wrote: > > > I managed to build the latest version, including patches, on my bproc 4 > > cluster. It is made of Alpha machines, and I configured lam to use the > > "fort" compiler (the Compaq fortran compiler). It build without any > > apparent issues, and installed fine. Now, when I try to lamboot, it > > dumps core! > > Yeah, I reused a variable when I shouldn't have (stupid Brian). Thanks > to Kevin Russell for pointing this out to me. I've attached v2 of the > patch and a diff from v1 to v2, which should clean up the segfault. > This was on a BProc-4 cluster, correct? > Yes, it is a BProc-4 cluster. I rebuilt lam with your latest patch, and while it doesn't dump core anymore, it still doesn't lamboot. The error message is: racaille:dgruner{124}> lamboot bhost LAM 7.1a1svn/MPI 2 C++/ROMIO/bproc - Indiana University ----------------------------------------------------------------------------- LAM has detected that one or more nodes specified in the boot schema are not available, and has aborted. Bproc is reporting that the following nodes are not up: 192.168.101.1 192.168.101.100 192.168.101.101 192.168.101.102 192.168.101.103 192.168.101.104 192.168.101.105 ----------------------------------------------------------------------------- I have tried with node names in the bhost file, as well as with numeric ip addresses, and got the same message. The nodes are in fact up, and the cluster works properly. ??? Daniel -- Dr. Daniel Gruner dg...@ti... Dept. of Chemistry dan...@ut... University of Toronto phone: (416)-978-8689 80 St. George Street fax: (416)-978-5325 Toronto, ON M5S 3H6, Canada finger for PGP public key |