From: <er...@he...> - 2002-12-13 22:25:58
|
On Thu, Dec 12, 2002 at 05:19:20PM -0500, Jack Neely wrote: > > > * Second one: MPI. > > > I can't run a mpi program on 2 or more nodes. I get the next messages: > > > > > > [root@nereapc examples]# mpirun -P -np 3 -d ./cpi > > > listen: n-1 33276 > > > 0 0 31728 192.168.1.100 32773 > > > 1 1 31729 192.168.1.101 32773 > > > 2 2 31730 192.168.1.102 32773 > > > Process 0 on n0 > > > rank 2 pid=31730 exited with signal 2 > > > rank 1 pid=31729 exited with signal 2 > > > xm_31728: (0.007068) net_recv failed for fd = 6 > > > xm_31728: p4_error: net_recv read, errno = : 104 > > > rank 0 pid=31728 exited with signal 13 > > > > > > Also, when I try to run 'cpi' without mpirun, I get: > > > > > > [root@nereapc examples]# ./cpi > > > p0_31794: p4_error: init_p4_brdcst_info: my master indx bad: -1 > > > p4_error: latest msg from perror: No such file or directory > > > > Yeah, the p4 hack is horribly broken. It's somewhere on the list of > > things to fix. > > > > - Erik > > > > Is there a status update on this? *Offers to test* I spent some more time on this the other day on another system and it actually worked for me so I don't know what's going on. I intend to take another look at it but I'm 100% occupied with the cluster we're building here right now. That one is all MPICH-GM.... - Erik |