|
From: Subhankar G. <sg...@ca...> - 2007-03-09 13:04:55
|
Dear All, I'm trying to detect memory error in a child process(c) in my MPICH2 application. This child process is being generated my calling MPI::Comm_spawn() from parent(p). The command line I'm using: >mpiexec -l -n 1 valgrind --trace-children=yes --log=file=vgnd.log p <cmdline option to p> But no logfile is created for child process 'c' by valgrind. Am I missing anything? I'm using latest valgrind release 3.2.3. Thanks, Subhankar |
|
From: Ashley P. <as...@qu...> - 2007-03-09 13:29:56
|
On Fri, 2007-03-09 at 18:45 +0530, Subhankar Ghosh wrote: > Dear All, > > I'm trying to detect memory error in a child process(c) in my MPICH2 > application. This child process is being generated my calling > MPI::Comm_spawn() from parent(p). The command line I'm using: > > >mpiexec -l -n 1 valgrind --trace-children=yes --log=file=vgnd.log p <cmdline > option to p> > > But no logfile is created for child process 'c' by valgrind. > > Am I missing anything? Sabhankar, It's a interesting problem, what you have done at least looks right but there are a number of caveats and the devil is in the detail with MPI implementations. Comm_spawn() will probably launch mpiexec to start the new processes and as mpiexec is likely a suid program valgrind won't trace it :( Are you using mpd (the mpich2 default) as your job launcher or some other program? If you are using mpd is it installed as yourself or root, I know it's possible to do either. Also, can you run the program normally and have a look at the process list when it's running, what is the ppid of the child process(c) and can you trace this back to the parent(p) or does it trace back to the mpd ring? I suspect that getting this to work might require modifications to MPICH2 itself although they should be fairly minor, I expect I can knock up a patch in a hour or two if I'm right about the problem although I'm very busy currently. Ashley, |
|
From: Julian S. <js...@ac...> - 2007-03-09 13:47:20
|
> On Friday 09 March 2007 13:29, Ashley Pittman wrote: > [...] > > > Am I missing anything? I don't think I can add anything to Ashley's analysis, except to say that 3.2.3 does work with MPICH2. J |
|
From: Subhankar G. <sg...@ca...> - 2007-03-12 14:27:50
|
Ashley, Thanks for your reply. > Are you using mpd (the mpich2 default) as your job launcher or some > other program? If you are using mpd is it installed as yourself or > root, I know it's possible to do either. Also, can you run the program > normally and have a look at the process list when it's running, what > is the ppid of the child process(c) and can you trace this back to > the parent(p) or does it trace back to the mpd ring? Yes, I'm using mpd as the job launcher. Probably for this reason ppid of process 'c' is mpd, ie it's tracing back to the mpd ring rather than the parent process 'p'. BTW, is there any other free tool I can use to detect memory error in child process 'c'? Regards, Subhankar On Fri, 09 Mar 2007 13:29:33 +0000, Ashley Pittman wrote > On Fri, 2007-03-09 at 18:45 +0530, Subhankar Ghosh wrote: > > Dear All, > > > > I'm trying to detect memory error in a child process(c) in my MPICH2 > > application. This child process is being generated my calling > > MPI::Comm_spawn() from parent(p). The command line I'm using: > > > > >mpiexec -l -n 1 valgrind --trace-children=yes --log=file=vgnd.log p <cmdline > > option to p> > > > > But no logfile is created for child process 'c' by valgrind. > > > > Am I missing anything? > > Sabhankar, > > It's a interesting problem, what you have done at least looks right but > there are a number of caveats and the devil is in the detail with MPI > implementations. Comm_spawn() will probably launch mpiexec to start > the new processes and as mpiexec is likely a suid program valgrind won't > trace it :( > > Are you using mpd (the mpich2 default) as your job launcher or some > other program? If you are using mpd is it installed as yourself or > root, I know it's possible to do either. Also, can you run the program > normally and have a look at the process list when it's running, what > is the ppid of the child process(c) and can you trace this back to > the parent(p) or does it trace back to the mpd ring? > > I suspect that getting this to work might require modifications to > MPICH2 itself although they should be fairly minor, I expect I can knock > up a patch in a hour or two if I'm right about the problem although I'm > very busy currently. > > Ashley, |
|
From: Ashley P. <as...@qu...> - 2007-03-12 21:01:27
|
On Mon, 2007-03-12 at 20:07 +0530, Subhankar Ghosh wrote: > Ashley, > > Thanks for your reply. > > > Are you using mpd (the mpich2 default) as your job launcher or some > > other program? If you are using mpd is it installed as yourself or > > root, I know it's possible to do either. Also, can you run the program > > normally and have a look at the process list when it's running, what > > is the ppid of the child process(c) and can you trace this back to > > the parent(p) or does it trace back to the mpd ring? > > Yes, I'm using mpd as the job launcher. Probably for this reason ppid of > process 'c' is mpd, ie it's tracing back to the mpd ring rather than the > parent process 'p'. Looking at the source for MPICH2 Comm_Spawn() opens a socket directly to the mpd ring and sends it the command to spawn which explains why valgrind isn't following the child process. There are three ways I can think of fixing this, none of which is ideal, what I was thinking of Friday was hooking into MPICH2 itself and making it "valgrind aware" such that if it detected it was running under valgrind it would query valgrind for the ARGV[0] and command line options needed and automatically prepend then the the spawned command, this doesn't seem possible in reality as valgrind does not export this information to the client. The second option would be to modify the mpd source to take a --cmd-prefix=<cmd> option so you could launch your job as $ mpiexec -l -n 1 --cmd-prefix="valgrind --trace-children=yes --log=file=vgnd.log" p <cmdline option to p> to have both the job and the spawned command launched under valgrind. this is probably "best" option although you would need to persuade anl of the benefit of this, possibly not that hard if you send them a patch however. An alternative to this would be for the MPI library itself catching the command line option and inserting it automagically as in (1) above. Thirdly you could modify your own code and insert the valgrind command yourself using #include <valgrind.h> and the RUNNING_ON_VALGRIND() macro. Ashley, |
|
From: Subhankar G. <sg...@ca...> - 2007-03-14 12:11:43
|
Ashley, In option #3 are you mentioning that parent process code can be modified such that it explicitly passes valgrind command to MPI::Comm_Spawn()? I have not used macro RUNNING_ON_VALGRIND() earlier. - Subhankar On Mon, 12 Mar 2007 21:00:59 +0000, Ashley Pittman wrote > On Mon, 2007-03-12 at 20:07 +0530, Subhankar Ghosh wrote: > > Ashley, > > > > Thanks for your reply. > > > > > Are you using mpd (the mpich2 default) as your job launcher or some > > > other program? If you are using mpd is it installed as yourself or > > > root, I know it's possible to do either. Also, can you run the program > > > normally and have a look at the process list when it's running, what > > > is the ppid of the child process(c) and can you trace this back to > > > the parent(p) or does it trace back to the mpd ring? > > > > Yes, I'm using mpd as the job launcher. Probably for this reason ppid of > > process 'c' is mpd, ie it's tracing back to the mpd ring rather than the > > parent process 'p'. > > Looking at the source for MPICH2 Comm_Spawn() opens a socket > directly to the mpd ring and sends it the command to spawn which > explains why valgrind isn't following the child process. > > There are three ways I can think of fixing this, none of which is > ideal, what I was thinking of Friday was hooking into MPICH2 itself > and making it "valgrind aware" such that if it detected it was > running under valgrind it would query valgrind for the ARGV[0] and > command line options needed and automatically prepend then the the > spawned command, this doesn't seem possible in reality as valgrind > does not export this information to the client. > > The second option would be to modify the mpd source to take a > --cmd-prefix=<cmd> option so you could launch your job as > > $ mpiexec -l -n 1 --cmd-prefix="valgrind --trace-children=yes > --log=file=vgnd.log" p <cmdline option to p> > > to have both the job and the spawned command launched under valgrind. > this is probably "best" option although you would need to persuade > anl of the benefit of this, possibly not that hard if you send them > a patch however. An alternative to this would be for the MPI > library itself catching the command line option and inserting it > automagically as in > (1) above. > > Thirdly you could modify your own code and insert the valgrind > command yourself using #include <valgrind.h> and the > RUNNING_ON_VALGRIND() macro. > > Ashley, |
|
From: Ashley P. <as...@qu...> - 2007-03-14 13:08:08
|
Subhankar,
At least in theory it's fairly easy, the downside is you may end up
embedding the valgrind command line in your application. Assuming your
child command is "c" and takes no args you'd need something like this.
#include <valgrind.h>
void my_spawn(MPI_Comm comm, int size) {
if (RUNNING_ON_VALGRIND() ) {
char *args[] = { "-q", "--log=file=vgnd.log", "c", NULL };
MPI_Comm_spawn("valgrind", args, size, ...);
} else {
MPI_Comm_spawn("c", NULL, size, ...);
}
}
As I said option #1 below would be the best however this would require a
way for the client to query the host to find out the correct set of
command line parameters to use.
Ashley,
On Wed, 2007-03-14 at 17:52 +0530, Subhankar Ghosh wrote:
> Ashley,
>
> In option #3 are you mentioning that parent process code can be modified such
> that it explicitly passes valgrind command to MPI::Comm_Spawn()? I have not
> used macro RUNNING_ON_VALGRIND() earlier.
>
> - Subhankar
>
>
> On Mon, 12 Mar 2007 21:00:59 +0000, Ashley Pittman wrote
> > On Mon, 2007-03-12 at 20:07 +0530, Subhankar Ghosh wrote:
> > > Ashley,
> > >
> > > Thanks for your reply.
> > >
> > > > Are you using mpd (the mpich2 default) as your job launcher or some
> > > > other program? If you are using mpd is it installed as yourself or
> > > > root, I know it's possible to do either. Also, can you run the program
> > > > normally and have a look at the process list when it's running, what
> > > > is the ppid of the child process(c) and can you trace this back to
> > > > the parent(p) or does it trace back to the mpd ring?
> > >
> > > Yes, I'm using mpd as the job launcher. Probably for this reason ppid of
> > > process 'c' is mpd, ie it's tracing back to the mpd ring rather than the
> > > parent process 'p'.
> >
> > Looking at the source for MPICH2 Comm_Spawn() opens a socket
> > directly to the mpd ring and sends it the command to spawn which
> > explains why valgrind isn't following the child process.
> >
> > There are three ways I can think of fixing this, none of which is
> > ideal, what I was thinking of Friday was hooking into MPICH2 itself
> > and making it "valgrind aware" such that if it detected it was
> > running under valgrind it would query valgrind for the ARGV[0] and
> > command line options needed and automatically prepend then the the
> > spawned command, this doesn't seem possible in reality as valgrind
> > does not export this information to the client.
> >
> > The second option would be to modify the mpd source to take a
> > --cmd-prefix=<cmd> option so you could launch your job as
> >
> > $ mpiexec -l -n 1 --cmd-prefix="valgrind --trace-children=yes
> > --log=file=vgnd.log" p <cmdline option to p>
> >
> > to have both the job and the spawned command launched under valgrind.
> > this is probably "best" option although you would need to persuade
> > anl of the benefit of this, possibly not that hard if you send them
> > a patch however. An alternative to this would be for the MPI
> > library itself catching the command line option and inserting it
> > automagically as in
> > (1) above.
> >
> > Thirdly you could modify your own code and insert the valgrind
> > command yourself using #include <valgrind.h> and the
> > RUNNING_ON_VALGRIND() macro.
> >
> > Ashley,
>
>
>
|