From: <er...@he...> - 2003-04-16 17:27:46
|
On Mon, Apr 07, 2003 at 04:51:04PM -0700, rwillis wrote: > Hi, > > I made a simple test program for MPI to test it on my cluster. The program > simply sends, from any non-zero node to node-0 a simple message, node-0 then > prints it. Anyway, I compile it (using mpicc), and invoke it by typing > <path>/mpirun -d -G -p 2 ./hello > > The program does not run, but I get this back; > > rank 1 pid=6681 exited with signal 13 > [0] Error: inconsistancy in collected data! > rank 0 pid=6680 exited with signal 13 > > I get the same thing when trying to run Netpipe (make mpi). I don't know > where the inconsistancy error is coming from, I have not found it in any > source. > > Any ideas? > Has anyone seen this before? > is exiting with signal 13 bad or good? It's definitely bad. Most likely you're having a problem with the GM ids, etc in the nodeinfo file. Most likely, the GM id stored there don't match what's on the nodes. The bit of code you're having trouble with is: if ((gmpi.port_ids[MPID_MyWorldRank] != port_id) || (gmpi.board_ids[MPID_MyWorldRank] != board_id) || (gmpi.node_ids[MPID_MyWorldRank] != gmpi.my_node_id)) { fprintf (stderr, "[%d] Error: inconsistency in collected data !\n", MPID_MyWorldRank); gmpi_abort (0); } > BTW, I put in some debugging into mpirun to look at parameters going into > bproc_vexecmove_io() and a NULL is being passed into the function for the > program name when I use 'hello' instead of './hello' as an arguement to > mpirun. I don't know if htis is a bug or not, but I thought I would report > it anyways. Sort of. It's on the list of things to fix. - Erik |