|
From: Joshua R. T. <jt...@an...> - 2010-04-28 01:57:32
|
Hi all,
I am trying to use Valgrind to debug an MPI application, but things don't seem to work. I understand that Valgrind explicitly implements wrapper functions for MPI calls and uses the PMPI interface. To the best of my understanding, when called in the context of MPI, valgrind should somehow check MPI calls, avoid giving "garbage" output from the underlying mpi libraries, and suppress the printing of a separate output for each node. It currently does none of those things. Can someone help?? Feel free to redistribute this message as needed.
Sincerely,
~Josh
Some things that may help:
--Configure script says:
$ cat ~/valgrind/config_out | grep MPI
checking primary target for usable MPI2-compliant C compiler and
mpi.h... yes, mpicc
checking secondary target for usable MPI2-compliant C compiler and
mpi.h... yes, mpicc
--Invocation on compute notes under mvapich2:
export \
LD_PRELOAD=/home/jtepper/valgrind/lib/valgrind/libmpiwrap-amd64-linux.so
${MPI_BIN}/mpdboot -f mpd.hosts -n $NUMHOST -r ssh
${MPI_BIN}/mpiexec -genv I_MPI_DEVICE rdssm:OpenIB-cma \
-machinefile $PBS_NODEFILE -n $NUMPROC valgrind $EXE $PARAM
${MPI_BIN}/mpdallexit
--mvapich2 relies on icc on our system.
--As reflected by the "-genv I_MPI_..." and the fact that we use mVApich, our nodes are connected with infiniband.
--these commands are executed in a directory that is part of a filesystem that is shared by all nodes.
|
|
From: Julian S. <js...@ac...> - 2010-04-28 06:21:00
|
On Wednesday 28 April 2010, Joshua R. Tepper wrote: > Hi all, > I am trying to use Valgrind to debug an MPI application, but things don't > seem to work. I understand that Valgrind explicitly implements wrapper > functions for MPI calls and uses the PMPI interface. To the best of my > understanding, when called in the context of MPI, valgrind should somehow > check MPI calls, yes > avoid giving "garbage" output from the underlying mpi > libraries, no (but see below) > and suppress the printing of a separate output for each node. no (how did you infer that?) Re the garbage, debugging MPI apps is problematic because the NIC I/O and control buffers are mapped directly into memory, and memcheck doesn't have a way to detect state changes in them. One option is to use the --ignore-ranges= flag, if you can figure out what the relevant NIC addresses are. Another is to tell your MPI stack to not map cards into memory, but just to use TCP/IP via normal syscalls to communicate. That is (or at least, used to be) possible with OpenMPI with the mpirun args --mca btl tcp,self, for example. If you are using OpenMPI you might want to ask the OpenMPI devs for advice. They are pretty Memcheck-aware, afaik. J |
|
From: Dave G. <go...@mc...> - 2010-04-28 14:00:45
|
On Apr 28, 2010, at 1:36 AM, Julian Seward wrote: > On Wednesday 28 April 2010, Joshua R. Tepper wrote: >> Hi all, >> I am trying to use Valgrind to debug an MPI application, but things >> don't >> seem to work. I understand that Valgrind explicitly implements >> wrapper >> functions for MPI calls and uses the PMPI interface. To the best >> of my >> understanding, when called in the context of MPI, valgrind should >> somehow >> check MPI calls, > yes >> avoid giving "garbage" output from the underlying mpi >> libraries, > no (but see below) >> and suppress the printing of a separate output for each node. > no (how did you infer that?) I'm not entirely sure what you are expecting in terms of output, but you might try Ashley Pittman's vg_xmlmerge.pl script. I've never used it, but I believe that it merges valgrind output and removes duplications for a parallel job. http://www.mail-archive.com/val...@li.../msg01162.html > Re the garbage, debugging MPI apps is problematic because the NIC > I/O and control buffers are mapped directly into memory, and memcheck > doesn't have a way to detect state changes in them. > > One option is to use the --ignore-ranges= flag, if you can figure out > what the relevant NIC addresses are. Another is to tell your MPI > stack > to not map cards into memory, but just to use TCP/IP via normal > syscalls to communicate. That is (or at least, used to be) possible > with OpenMPI with the mpirun args --mca btl tcp,self, for example. > > If you are using OpenMPI you might want to ask the OpenMPI devs > for advice. They are pretty Memcheck-aware, afaik. You should be able to build a TCP version of MVAPICH2 by passing "-- with-device=ch3:sock" to configure. While you are doing that, you should probably also include "--enable-g=dbg,meminit" to avoid some messages about passing uninitialized buffers to certain syscalls. You may also want to post your message to mva...@cs... to see if the OSU folks have any specific suggestions when using IB. At the very least it might be a gentle reminder for them to make MVAPICH2 play nicely with Valgrind in the future (if it doesn't right now). -Dave |
|
From: Joshua R. T. <jt...@an...> - 2010-04-29 22:14:15
|
David, Julian, Thank you both for the insight. This helps. Julian: I had inferred that there would be some mechanism by which each node didn't produce it's own output from a conversation with a friend, and not from any official documentation. Since, in an mpi environment there are no guarantees about how nodes are connected, it seems that the only way to accomplish this would be if valgrind issued calls to the mpi libraries, and so I had thought that this claim was a bit odd. Finding the addresses of the network buffers or forcing MPI to deal with "traditional" sys calls for TCP/IP are both options we will look into in the future. For now, however, the most important thing for me is to find a way to organize the massive amount of output from many nodes (which currently exceeds 60k lines). David: Thanks for the script (and to Ms. Pittman as well). This looks very helpful. The only possible issue, I think, will be to guarantee that each process has a different pid, which I don't imagine is guaranteed for a job spread over many nodes. I think such collisions will prove uncommon. Sincerely, ~Josh Dave Goodell wrote: > On Apr 28, 2010, at 1:36 AM, Julian Seward wrote: > > >> On Wednesday 28 April 2010, Joshua R. Tepper wrote: >> >>> Hi all, >>> I am trying to use Valgrind to debug an MPI application, but things >>> don't >>> seem to work. I understand that Valgrind explicitly implements >>> wrapper >>> functions for MPI calls and uses the PMPI interface. To the best >>> of my >>> understanding, when called in the context of MPI, valgrind should >>> somehow >>> check MPI calls, >>> >> yes >> >>> avoid giving "garbage" output from the underlying mpi >>> libraries, >>> >> no (but see below) >> >>> and suppress the printing of a separate output for each node. >>> >> no (how did you infer that?) >> > > I'm not entirely sure what you are expecting in terms of output, but > you might try Ashley Pittman's vg_xmlmerge.pl script. I've never used > it, but I believe that it merges valgrind output and removes > duplications for a parallel job. > > http://www.mail-archive.com/val...@li.../msg01162.html > > >> Re the garbage, debugging MPI apps is problematic because the NIC >> I/O and control buffers are mapped directly into memory, and memcheck >> doesn't have a way to detect state changes in them. >> >> One option is to use the --ignore-ranges= flag, if you can figure out >> what the relevant NIC addresses are. Another is to tell your MPI >> stack >> to not map cards into memory, but just to use TCP/IP via normal >> syscalls to communicate. That is (or at least, used to be) possible >> with OpenMPI with the mpirun args --mca btl tcp,self, for example. >> >> If you are using OpenMPI you might want to ask the OpenMPI devs >> for advice. They are pretty Memcheck-aware, afaik. >> > > You should be able to build a TCP version of MVAPICH2 by passing "-- > with-device=ch3:sock" to configure. While you are doing that, you > should probably also include "--enable-g=dbg,meminit" to avoid some > messages about passing uninitialized buffers to certain syscalls. > > You may also want to post your message to mva...@cs... > to see if the OSU folks have any specific suggestions when using > IB. At the very least it might be a gentle reminder for them to make > MVAPICH2 play nicely with Valgrind in the future (if it doesn't right > now). > > -Dave > > ------------------------------------------------------------------------------ > _______________________________________________ > Valgrind-users mailing list > Val...@li... > https://lists.sourceforge.net/lists/listinfo/valgrind-users > > |
|
From: Dave G. <go...@mc...> - 2010-04-29 22:47:46
|
On Apr 29, 2010, at 5:10 PM, Joshua R. Tepper wrote:
> David:
> Thanks for the script (and to Ms. Pittman as well).
(FYI, that's Mr. Pittman; Ashley is male)
> This looks very
> helpful. The only possible issue, I think, will be to guarantee that
> each process has a different pid, which I don't imagine is guaranteed
> for a job spread over many nodes. I think such collisions will prove
> uncommon.
Since you know that you are running on a particular MPI implementation
(MVAPICH2) with a particular process manager (mpd) you can play a
trick to get your valgrind output into separate log files:
mpiexec -genv [...] -n $NUMPROC valgrind --xml=yes --xml-file='vg_log.
%q{PMI_RANK}.xml' $EXE $PARAM
Which will create "vg_log.0.xml", "vg_log.1.xml", ..., and "vg_log.$
((NUMPROC-1)).xml". Then merge them with:
./vg_xmlmerge.pl vg_log.*.xml > merged_vg_log
-Dave
|