|
From: Martin K. <m.k...@gm...> - 2012-05-05 16:14:24
|
Am 05.05.2012 16:38, schrieb Philippe Waroquiers: > On Sat, 2012-05-05 at 14:25 +0200, Martin Kalany wrote: >> Hello, >> >> I'm trying to use valgrind do debug an mpich2 program. Unfortunately, I get the following error: >> >> libmpi.so.0: cannot open shared object file: No such file or directory >> >> I found out that libmpich.so.1.0 should be linked to instead (see libmpiwrap.c). Valgrind documation states that "The MPI functions to be wrapped are assumed to be in an ELF shared object with soname matching libmpi.so*. This is known to be correct at least for Open MPI and Quadrics MPI, and can easily be changed if required." >> >> How do I change that? > Is the 'cannot open' error only there when running under Valgrind ? > The Z encoding used in libmpiwrap.c is a pattern which matches > one or the other library: > #define I_WRAP_FNNAME_U(_name) \ > I_WRAP_SONAME_FNNAME_ZU(libmpiZaZdsoZa,_name) > > i.e. it is libmpi*.so*. > > So, I guess your problem is not the Valgrind wrapping. > Maybe a problem related to the dynamic loader ? > > Philippe > > >Is the 'cannot open' error only there when running under Valgrind ? Yes. When I use mpirun, it's fine. What I think is strange that valgrind apperantly tries to load libmpi.so, although it should load libmpich.so.1.0 >Maybe a problem related to the dynamic loader ? I'm rather new to MPI so I'm not sure about this. Martin |
|
From: Philippe W. <phi...@sk...> - 2012-05-05 16:59:07
|
On Sat, 2012-05-05 at 18:14 +0200, Martin Kalany wrote: > > > >Is the 'cannot open' error only there when running under Valgrind ? > Yes. When I use mpirun, it's fine. > > What I think is strange that valgrind apperantly tries to load > libmpi.so, although it should load libmpich.so.1.0 > > >Maybe a problem related to the dynamic loader ? > I'm rather new to MPI so I'm not sure about this. Valgrind is not supposed to change which shared lib are used: the dynamic loader is executed by Valgrind and should behave the same (and so load the same shared libs as a native run). I know nothing abound MPI and so might not understand what you are doing. But I believe mpirun is a shell script which might be needed to setup some required env variables. So, to be sure, mpirun should be used also when using Valgrind e.g. valgrind --trace-children=yes mpirun .... (or is this what you are doing already ?) Philippe |
|
From: Martin K. <m.k...@gm...> - 2012-05-05 22:24:54
|
Am 05.05.2012 18:59, schrieb Philippe Waroquiers:
> On Sat, 2012-05-05 at 18:14 +0200, Martin Kalany wrote:
>>> Is the 'cannot open' error only there when running under Valgrind ?
>> Yes. When I use mpirun, it's fine.
>>
>> What I think is strange that valgrind apperantly tries to load
>> libmpi.so, although it should load libmpich.so.1.0
>>
>>> Maybe a problem related to the dynamic loader ?
>> I'm rather new to MPI so I'm not sure about this.
> Valgrind is not supposed to change which shared lib are used:
> the dynamic loader is executed by Valgrind and should behave
> the same (and so load the same shared libs as a native run).
>
> I know nothing abound MPI and so might not understand
> what you are doing.
> But I believe mpirun is a shell script
> which might be needed to setup some required env variables.
>
> So, to be sure, mpirun should be used also when using Valgrind
> e.g.
> valgrind --trace-children=yes mpirun ....
>
> (or is this what you are doing already ?)
>
>
> Philippe
I'm already doing that, but in a slightly different way:
LD_PRELOAD=~/valgrind/valgrind-3.7.0/mpi/libmpiwrap-x86-linux.so \
mpirun -np 2 valgrind ./foo
(This is suggested in the mpi section of the valgrind documentation).
This way, mpirun will launch two processes; each process starts valgrind
which will in turn execute the actual program.
I did exactly as the documentation does; I think the main issue is this:
>Valgrind documation states that "The MPI functions to be wrapped are assumed to be in an ELF shared object with soname matching libmpi.so*. This is known>to be correct at least for Open MPI and Quadrics MPI, and can easily be changed if required."
How and where do I change that?
Martin
|
|
From: Philippe W. <phi...@sk...> - 2012-05-05 22:43:51
|
On Sun, 2012-05-06 at 00:24 +0200, Martin Kalany wrote:
> I'm already doing that, but in a slightly different way:
> LD_PRELOAD=~/valgrind/valgrind-3.7.0/mpi/libmpiwrap-x86-linux.so \
> mpirun -np 2 valgrind ./foo
>
> (This is suggested in the mpi section of the valgrind documentation).
> This way, mpirun will launch two processes; each process starts valgrind
> which will in turn execute the actual program.
>
> I did exactly as the documentation does; I think the main issue is this:
>
> >Valgrind documation states that "The MPI functions to be wrapped are assumed to be in an ELF shared object with soname matching libmpi.so*. This is known>to be correct at least for Open MPI and Quadrics MPI, and can easily be changed if required."
> How and where do I change that?
>From what I can see, to change this, you must edit the "Z encoding"
of the mpi wrappers.
I think this is the macro in the file libmpiwrap.c:
#define I_WRAP_FNNAME_U(_name) \
I_WRAP_SONAME_FNNAME_ZU(libmpiZaZdsoZa,_name)
However, the current Z encoding is the following pattern:
"libmpi*.so*"
so that it will wrap the mpi functions from all sonames matching
the above pattern.
In particular, it will wrap the functions in a soname:
libmpich.so.1.0
>From what I can see, the problem is not in the wrapping but
rather that for one reason or another, the dynamic loader under
Valgrind does not find the relevant lib.
Maybe Valgrind args -v -v -v -d -d -d --trace-redir=yes could give a
hint ?
Sorry for not be able to help more
Philippe
|
|
From: Philippe W. <phi...@sk...> - 2012-05-05 22:54:27
|
On Sun, 2012-05-06 at 00:24 +0200, Martin Kalany wrote: > >Valgrind documation states that "The MPI functions to be wrapped are assumed to be in an ELF shared object with soname matching libmpi.so*. This is known>to be correct at least for Open MPI and Quadrics MPI, and can easily be changed if required." Note that the documentation is slightly out of date, as the code contains the pattern libmpi*.so* (so as to match a.o. libmpich.so.1.0). Philippe |
|
From: Martin K. <m.k...@gm...> - 2012-05-07 19:15:04
|
Am 06.05.2012 00:54, schrieb Philippe Waroquiers: > On Sun, 2012-05-06 at 00:24 +0200, Martin Kalany wrote: >>> Valgrind documation states that "The MPI functions to be wrapped are assumed to be in an ELF shared object with soname matching libmpi.so*. This is known>to be correct at least for Open MPI and Quadrics MPI, and can easily be changed if required." > Note that the documentation is slightly out of date, as the code > contains the pattern libmpi*.so* > (so as to match a.o. libmpich.so.1.0). > > Philippe Thanks a lot for your help Phillippe! You finally led me in the right direction: I found a workaround for the problem: I simply renamed libmpich.so to libmpi.so and the error was gone. Nevertheless, valgrind doesn't print anything similar to "valgrind MPI wrappers 31901: Active for pid 31901 valgrind MPI wrappers 31901: Try MPIWRAP_DEBUG=help for possible options" as stated in the documentation. How do I know whether or not the mpi wrappers now work? Martin |
|
From: Dave G. <go...@mc...> - 2012-05-07 19:34:58
|
On May 7, 2012, at 2:15 PM CDT, Martin Kalany wrote:
> Am 06.05.2012 00:54, schrieb Philippe Waroquiers:
>> On Sun, 2012-05-06 at 00:24 +0200, Martin Kalany wrote:
>>>> Valgrind documation states that "The MPI functions to be wrapped are assumed to be in an ELF shared object with soname matching libmpi.so*. This is known>to be correct at least for Open MPI and Quadrics MPI, and can easily be changed if required."
>> Note that the documentation is slightly out of date, as the code
>> contains the pattern libmpi*.so*
>> (so as to match a.o. libmpich.so.1.0).
>>
>> Philippe
> Thanks a lot for your help Phillippe! You finally led me in the right
> direction:
> I found a workaround for the problem: I simply renamed libmpich.so to
> libmpi.so and the error was gone.
That sounds like it will cause other problems. Do your applications still run correctly after the rename?
> Nevertheless, valgrind doesn't print anything similar to
> "valgrind MPI wrappers 31901: Active for pid 31901
> valgrind MPI wrappers 31901: Try MPIWRAP_DEBUG=help for possible options"
> as stated in the documentation. How do I know whether or not the mpi
> wrappers now work?
I'm guessing that they aren't working. I've hit this (not the "libmpi.so" name issue) in the past when linking statically instead of dynamically.
How are you compiling and linking your code? With "mpicc"?
What does "ldd YOUR_BINARY_HERE" give you? You should see lines that look like this in the output:
----8<----
libmpich.so.6 => /sandbox/goodell/mpich2-installed/lib/libmpich.so.6 (0x00007fb786ffa000)
libopa.so.1 => /sandbox/goodell/mpich2-installed/lib/libopa.so.1 (0x00007fb786df8000)
libmpl.so.1 => /sandbox/goodell/mpich2-installed/lib/libmpl.so.1 (0x00007fb786bf1000)
----8<----
-Dave
|
|
From: Martin K. <m.k...@gm...> - 2012-05-07 19:44:52
|
Am 07.05.2012 21:34, schrieb Dave Goodell: > On May 7, 2012, at 2:15 PM CDT, Martin Kalany wrote: > >> Am 06.05.2012 00:54, schrieb Philippe Waroquiers: >>> On Sun, 2012-05-06 at 00:24 +0200, Martin Kalany wrote: >>>>> Valgrind documation states that "The MPI functions to be wrapped are assumed to be in an ELF shared object with soname matching libmpi.so*. This is known>to be correct at least for Open MPI and Quadrics MPI, and can easily be changed if required." >>> Note that the documentation is slightly out of date, as the code >>> contains the pattern libmpi*.so* >>> (so as to match a.o. libmpich.so.1.0). >>> >>> Philippe >> Thanks a lot for your help Phillippe! You finally led me in the right >> direction: >> I found a workaround for the problem: I simply renamed libmpich.so to >> libmpi.so and the error was gone. > That sounds like it will cause other problems. Do your applications still run correctly after the rename? To be more precise: I did a copy+rename, so the original still exists. > >> Nevertheless, valgrind doesn't print anything similar to >> "valgrind MPI wrappers 31901: Active for pid 31901 >> valgrind MPI wrappers 31901: Try MPIWRAP_DEBUG=help for possible options" >> as stated in the documentation. How do I know whether or not the mpi >> wrappers now work? > I'm guessing that they aren't working. I've hit this (not the "libmpi.so" name issue) in the past when linking statically instead of dynamically. > > How are you compiling and linking your code? With "mpicc"? Yes, I'm using mpicc to compile and link. > > What does "ldd YOUR_BINARY_HERE" give you? You should see lines that look like this in the output: > > ----8<---- > libmpich.so.6 => /sandbox/goodell/mpich2-installed/lib/libmpich.so.6 (0x00007fb786ffa000) > libopa.so.1 => /sandbox/goodell/mpich2-installed/lib/libopa.so.1 (0x00007fb786df8000) > libmpl.so.1 => /sandbox/goodell/mpich2-installed/lib/libmpl.so.1 (0x00007fb786bf1000) > ----8<---- > > -Dave > > ldd gives me libmpi.so.0, but no mpich-related .so files And I guess that's the problem, right? I already reinstalled vlagrind using ./configure --with-mpicc=path/to/mpich/ (as you suggested on stackoverflow. That question there is from me, if you haven't noticed yet). Thanks for your help! Martin |
|
From: Dave G. <go...@mc...> - 2012-05-08 02:27:20
|
On May 7, 2012, at 2:45 PM CDT, Martin Kalany wrote: > Am 07.05.2012 21:34, schrieb Dave Goodell: >> >> What does "ldd YOUR_BINARY_HERE" give you? You should see lines that look like this in the output: >> >> ----8<---- >> libmpich.so.6 => /sandbox/goodell/mpich2-installed/lib/libmpich.so.6 (0x00007fb786ffa000) >> libopa.so.1 => /sandbox/goodell/mpich2-installed/lib/libopa.so.1 (0x00007fb786df8000) >> libmpl.so.1 => /sandbox/goodell/mpich2-installed/lib/libmpl.so.1 (0x00007fb786bf1000) >> ----8<---- > > ldd gives me libmpi.so.0, but no mpich-related .so files > > And I guess that's the problem, right? I already reinstalled vlagrind > using ./configure --with-mpicc=path/to/mpich/ What is the ldd output when run on an executable supposedly built with an unmodified MPICH2 installation? My guess is that you've got MPICH2 and Open MPI (or some other MPI implementation with a "libmpi.so") installed on the same machine and you're doing one or more of: 1) using the wrong mpicc 2) incorrectly setting LD_LIBRARY_PATH If the ldd from an unmodified MPICH2 doesn't show proper linking against libmpich.so, then try using the absolute path to "mpicc" when building your application. > (as you suggested on > stackoverflow. That question there is from me, if you haven't noticed yet). Yes, I noticed. I originally saw it there, but decided that it would be better to avoid duplicate work and save some of Philippe's time by switching to this thread once I saw it here. -Dave |
|
From: Philippe W. <phi...@sk...> - 2012-05-07 20:54:37
|
On Mon, 2012-05-07 at 21:15 +0200, Martin Kalany wrote:
> Nevertheless, valgrind doesn't print anything similar to
> "valgrind MPI wrappers 31901: Active for pid 31901
> valgrind MPI wrappers 31901: Try MPIWRAP_DEBUG=help for possible options"
> as stated in the documentation. How do I know whether or not the mpi
> wrappers now work?
If you add the option --trace-redir=yes to your Valgrind args,
Valgrind will trace all the actions related to redirection/wrapping:
* it will trace the creation of the redir specifications
(e.g; when loading the libmpiwrap which is part of Valgrind)
* it will trace the resulting "active" redirections or wrappings.
For what concerns the original problem: I understand it is because
Valgrind was configured with a different mpi that the one you are using
and that created a mixup in the libs. Is that the explanation ?
Philippe
|
|
From: Dave G. <go...@mc...> - 2012-05-08 02:14:47
|
On May 7, 2012, at 3:54 PM CDT, Philippe Waroquiers wrote: > For what concerns the original problem: I understand it is because > Valgrind was configured with a different mpi that the one you are using > and that created a mixup in the libs. Is that the explanation ? I don't think this has been confirmed, but something along these lines seems like the most likely explanation to me. I just tried the Valgrind 3.7.0 MPI wrapper with MPICH2 on Linux and it worked just fine without any library renaming. -Dave |