From: Brian B. <brb...@la...> - 2004-07-21 15:58:50
|
On Jul 20, 2004, at 9:19 AM, Daniel Gruner wrote: > On Tue, Jul 20, 2004 at 09:03:01AM -0500, Brian Barrett wrote: >> On Jul 20, 2004, at 8:08 AM, Thomas Eckert wrote: >> >>> this thread seems to have slipped off the bproc-list -- most likely I >>> replied >>> to the wrong message -- so here is a forward of my reply :( >>> >>> I'm interested in the bproc3<->lam-7.0.x results: have you tried >>> bproc3 with >>> the latest stable lam (7.0.x) and it did not work or are you focusing >>> on >>> bproc4 now anyway due to other reasons (want to use 2.6-kernels, >>> ...)? >> >> Luke was using BProc 4, which LAM 7.0.x does not support (LAM 7.1, >> which just went into beta, supports what is currently in the BProc 4 >> API. Hopefully, that means it will support BProc 4 when it goes >> stable). >> >> If you have any problems using LAM 7.0.x with BProc 3, please let us >> (the LAM developers) know. There has been some fairly extensive >> testing, so I would be surprised if there were problems in that area. > > I have recently installed LAM 7.0.2 on a CM3 (BProc 3) cluster. It > mostly > works, but there are a few disturbing glitches: > > - I cannot seem to run 2 MPI jobs as the same user simultaneously (on > different sets of nodes, of course), since when I do the second > invocation > of mpiexec (or its equivalent lamboot/run/lamhalt) it kills the first > lamd > on the master node. It does seem to work for different users, though. This is expected behavior. The design of LAM is that your start the RTE (the daemons) on the nodes you will use for all MPI applications, then run your application (or applications) inside that universe. If you need two separate universes, you can use the LAM_MPI_SESSION_SUFFIX environment variable to keep the daemons from clobbering each other. See the lamboot(1) man page and the LAM/MPI User Document (available in pdf form on the web page) for more information. > - Just doing lamboot followed by lamhalt (whether or not some mpi job > is run) > produces a core dump (I guess it is by lamhalt). Always. Running > mpiexec does > it too. Yeah, that's disturbing. Is there a core file left around? If so, can you gdb the core file and send me a stack trace? Thanks! Brian |