From: Brian B. <brb...@la...> - 2005-01-16 05:33:00
|
On Jan 14, 2005, at 7:18 PM, Dale Harris wrote: > I'm having a problem successfully running lamboot, from lam 7.1.1, on a > bproc system running version 4.0p8. What I see from lamboot is: > > lamboot hosts > > LAM 7.1.1/MPI 2 C++/bproc - Indiana University > > lamd kernel: problem with socket(): Address family not supported by > protocol > ... This is coming from a call to create a unix domain socket: if ((sd_kernel = socket(AF_UNIX, SOCK_STREAM, 0)) < 0) lampanic("lamd kernel: problem with socket()"); I'm not really sure how that could be failing with the given error message. I'm guessing that it's a symptom of the real problem. I know that's not really helpful, but there really isn't any reason that call to socket() should fail. > I was able to do a little strace of this, and see errors like: > > getxattr("/bpfs/-1", "bproc.addr", 0xbfffeff4, 16) = 16 > socket(PF_FILE, SOCK_STREAM, 0) = 3 > connect(3, {sa_family=AF_FILE, path="/var/run/.nscd_socket"}, 110) = -1 > ENOENT (No such file or directory) > close(3) = 0 > > But that doesn't make much sense to me, looks like it trying to resolve > a name, perhaps. I assume this is a symptom, but not a cause. Can you tell when that error message occurs? Perhaps there is something wrong with the BProc cluster that is causing your errors. Do other applications run properly on the compute nodes? Also, what happens if you try to boot with no hostfile (so it just tries to start on the BProc head node)? Brian -- Brian Barrett LAM/MPI developer and all around nice guy Have an LAM/MPI day: http://www.lam-mpi.org/ |