|
From: Ashley P. <as...@qu...> - 2007-03-12 21:01:27
|
On Mon, 2007-03-12 at 20:07 +0530, Subhankar Ghosh wrote: > Ashley, > > Thanks for your reply. > > > Are you using mpd (the mpich2 default) as your job launcher or some > > other program? If you are using mpd is it installed as yourself or > > root, I know it's possible to do either. Also, can you run the program > > normally and have a look at the process list when it's running, what > > is the ppid of the child process(c) and can you trace this back to > > the parent(p) or does it trace back to the mpd ring? > > Yes, I'm using mpd as the job launcher. Probably for this reason ppid of > process 'c' is mpd, ie it's tracing back to the mpd ring rather than the > parent process 'p'. Looking at the source for MPICH2 Comm_Spawn() opens a socket directly to the mpd ring and sends it the command to spawn which explains why valgrind isn't following the child process. There are three ways I can think of fixing this, none of which is ideal, what I was thinking of Friday was hooking into MPICH2 itself and making it "valgrind aware" such that if it detected it was running under valgrind it would query valgrind for the ARGV[0] and command line options needed and automatically prepend then the the spawned command, this doesn't seem possible in reality as valgrind does not export this information to the client. The second option would be to modify the mpd source to take a --cmd-prefix=<cmd> option so you could launch your job as $ mpiexec -l -n 1 --cmd-prefix="valgrind --trace-children=yes --log=file=vgnd.log" p <cmdline option to p> to have both the job and the spawned command launched under valgrind. this is probably "best" option although you would need to persuade anl of the benefit of this, possibly not that hard if you send them a patch however. An alternative to this would be for the MPI library itself catching the command line option and inserting it automagically as in (1) above. Thirdly you could modify your own code and insert the valgrind command yourself using #include <valgrind.h> and the RUNNING_ON_VALGRIND() macro. Ashley, |