From: Alexander L. <Ale...@IG...> - 2005-01-18 10:00:00
|
Hello all, I have just installed a small testbed for our cluster consisting of only 2 computers. I have installed clustermatic 5 on top of Fedora Core 3. Booting and bpsh'ing works fine, but I have some trouble getting MPI programs to work with LAM/MPI. I found some postings on the archives but no clues how to solve them. The following issue concerns LAM/MPI 7.1.1-2 and the latest SVN snapshot (7.2b1r10023). I compiled both from scratch using gcc/g77 and gcc/nagware. I can lamboot without any problems using the bproc ssi boot module and tping reports that it can find all computers (master and 1 node). Then I try to start one of the examples contained in the LAM/MPI distro, e.g. the pi one. As soon as I start "mpirun n0-1 PATH_TO_LAM/example/fpi", I get the following message on the nodes console: "bproc: WARNING: bproc/move.c: 1886: send_recv_process needs to be reworked to be consistent with the rest of the move code" And on the master the mpirun program reports: "----------------------------------------------------------------------- ------ It seems that [at least] one of the processes that was started with mpirun did not invoke MPI_INIT before quitting (it is possible that more than one process did not invoke MPI_INIT -- mpirun was only notified of the first one, which was on node n0). mpirun can *only* be used with MPI programs (i.e., programs that invoke MPI_INIT and MPI_FINALIZE). You can use the "lamexec" program to run non-MPI programs over the lambooted nodes. ------------------------------------------------------------------------ -----" Does anybody experienced similar problems or has a tip how to verify that my setup is basically ok? Help would be very appreciated. Thanks in advance. Alex |