From: Rick K. <rk...@nc...> - 2004-08-24 00:34:17
|
More info on psrun and MPICH/ch_p4... has anyone had success with this? Rick ---------- Forwarded message ---------- Date: Mon, 23 Aug 2004 19:31:01 -0500 (CDT) From: Rick Kufrin <rk...@os...> To: "Venkatesh, Vangal" <van...@in...> Subject: RE: Psrun with argonne mpich Vangal, Yes, I see that I can reproduce the same behavior here, using a ch_p4 version of MPICH. I'm guessing it's due to the way that the commands are constructed and eventually launched on the remote nodes, but unfortunately I don't know an easy way to invoke mpirun or mpirun.ch_p4 to get around it. We typically use a different MPICH (MPICH-GM) here, so probably I was confusing with that? One thing you can do is to change your source code to include a call to the library that psrun uses... not sure if you're willing to do that. If you are, then what you'd have to do is include a call to the following routine, probably right after MPI_Init is best: #include <pshwpc.h> int ps_hwpc_psrun(void); (C) or include 'fperfsuite.h' subroutine psf_hwpc_psrun(ierr) (Fortran) Successful returns from either of these two routines are PS_SUCCESS (or 0). That should be the only change to your source, but you would have to also arrange for the -I and -L options at link time. The libraries you'd want to include are -lpshwpc and possibly -lperfsuite and -lpapi, depending on your system/environment. With the library linked in, you can forget about psrun and just run the executable as normal. All the environment variables that psrun recognizes should work as usual. If you try this and still have problems, let me know (if it works for you, that would be good to know too!) In the meantime, I'll see if I can figure out an alternate way to use psrun rather than having to go through changing source code and let you know if something comes up. Rick On Mon, 23 Aug 2004, Venkatesh, Vangal wrote: > Rick, > I am getting the following error message: > [vvenkat1@spd206-6 run_dir]$ ../../mpich-ia32e/mpich-1.2.5.2/bin/mpirun > -np 2 / > usr/local/bin/psrun /home/spd/vvenkat1/pop1.4/run_dir/pop2 > spd206-6: No such file or directory > p0_10443: p4_error: Child process exited while making connection to > remote proc > ess on spd206-6: 0 > > Some how it thinks spd206-6 is a file (it is a machine in my > machinefile). I also tried putting in the -machinefile option. > > Vangal > |