|
From: Will S. <wil...@vn...> - 2016-06-29 22:21:18
|
Hi, I wanted to try to knock down a few of the lingering valgrind regression test bugs. I've chased the hgtls failure around in a few circles, and at this point I suspect something is going wrong somewhere in the valgrind part of the test for power on ppc64(BE). A bit of a brain dump follows of what I've figured out. At the moment I'm looking for insights into what would be happening here, or a pointer for where I should be looking or what I should be looking for. :-) With a newly built gdb(*) the behavior on a ppc64LE system appears OK. So this would seem to be something possibly power (BE) specific. (*) - Specifically a gdb that contains "Fix PR/18564 - regression in showing __thread so extern variable", from Sep 2015. With a very recent (06/29) version of gdb on ppc64(BE): $ perl tests/vg_regtest --keep-unfiltered gdbserver_tests/hgtls output for a ppc64 (power7) box contains: $ (ppc64) grep equal hgtls.stdoutB.out.unfiltered.out test race tls_ip 0x100200b0 ip 0x100200b0 equal 1 test local tls_ip 0xffffffffffffffff ip 0x537f8e0 equal 0 test global tls_ip 0x3 ip 0x5b7f8e4 equal 0 test static_extern tls_ip 0x7 ip 0x637f8e8 equal 0 test so_extern tls_ip 0xffffffffffffffff ip 0x6b7f8f0 equal 0 test so_local tls_ip 0xffffffffffffffff ip 0x737f8ec equal 0 test so_global tls_ip 0x3 ip 0x7b7f8e4 equal 0 Under debug (printfs added to gdb) the final values failing the compare are coming out of the gdb/target.c:target_translate_tls_address() call. The lm_addr value there looks reasonable, the addr value definitely is not. I admittedly have no idea if lm_addr here is actually correct. DBG: lm_addr: 4051eb8 DBG: addr: ffffffffffffffff test local tls_ip 0xffffffffffffffff ip 0x577f8e0 equal 0 ... DBG: lm_addr: 4051eb8 DBG: addr: 3 test global tls_ip 0x3 ip 0x5b7f8e4 equal 0 But... When running the same test directly under gdb/gdbserver (no valgrind/vgdb involvement) it works OK, in that the gdb breakpoints trigger, the compares occur, and all report "equal 1" for the tests. So I'm not sure my chasing this through the gdb code is the right direction.. Thoughts? Thanks, -Will |
|
From: Philippe W. <phi...@sk...> - 2016-06-29 23:00:21
|
On Wed, 2016-06-29 at 16:30 -0500, Will Schmidt wrote:
> Hi,
> I wanted to try to knock down a few of the lingering valgrind
> regression test bugs. I've chased the hgtls failure around in a few
> circles, and at this point I suspect something is going wrong somewhere
> in the valgrind part of the test for power on ppc64(BE). A bit of a
> brain dump follows of what I've figured out. At the moment I'm looking
> for insights into what would be happening here, or a pointer for where I
> should be looking or what I should be looking for. :-)
>
> With a newly built gdb(*) the behavior on a ppc64LE system appears OK.
> So this would seem to be something possibly power (BE) specific.
>
> (*) - Specifically a gdb that contains "Fix PR/18564 - regression in
> showing __thread so extern variable", from Sep 2015.
Yes, a regression was introduced in gdb at some point in time,
making hgtls failing on all platforms.
This gdb (and hgtls) regression was fixed by this change in gdb.
>
> With a very recent (06/29) version of gdb on ppc64(BE):
>
> $ perl tests/vg_regtest --keep-unfiltered gdbserver_tests/hgtls
>
> output for a ppc64 (power7) box contains:
> $ (ppc64) grep equal hgtls.stdoutB.out.unfiltered.out
> test race tls_ip 0x100200b0 ip 0x100200b0 equal 1
> test local tls_ip 0xffffffffffffffff ip 0x537f8e0 equal 0
> test global tls_ip 0x3 ip 0x5b7f8e4 equal 0
> test static_extern tls_ip 0x7 ip 0x637f8e8 equal 0
> test so_extern tls_ip 0xffffffffffffffff ip 0x6b7f8f0 equal 0
> test so_local tls_ip 0xffffffffffffffff ip 0x737f8ec equal 0
> test so_global tls_ip 0x3 ip 0x7b7f8e4 equal 0
All thread local addresses are wrong in the above.
The only correct address is the &race, which is not thread local.
So, this point at a basic problem in valgrind gdbserver processing
the packet qGetTLSAddr.
>
> Under debug (printfs added to gdb) the final values failing the compare
> are coming out of the gdb/target.c:target_translate_tls_address() call.
> The lm_addr value there looks reasonable, the addr value definitely is
> not. I admittedly have no idea if lm_addr here is actually correct.
>
> DBG: lm_addr: 4051eb8
> DBG: addr: ffffffffffffffff
> test local tls_ip 0xffffffffffffffff ip 0x577f8e0 equal 0
> ...
> DBG: lm_addr: 4051eb8
> DBG: addr: 3
> test global tls_ip 0x3 ip 0x5b7f8e4 equal 0
>
> But... When running the same test directly under gdb/gdbserver (no
> valgrind/vgdb involvement) it works OK, in that the gdb breakpoints
> trigger, the compares occur, and all report "equal 1" for the tests.
> So I'm not sure my chasing this through the gdb code is the right
> direction..
>
> Thoughts?
tls handling in valgrind gdbserver is based on various 'hacks' :
* in platform specific files (e.g. valgrind-low-ppc64.c), you have
a function target_get_dtv
* then a glibc/platform specific offset has to be 'guessed'.
This is done by the auxprog/getoff.c program
that will be launched by the function valgrind_get_tls_addr
See in getoff.c the way to verify that the ugly hack of getoff.c
works ok by using gdb as indicated in the comment following
#ifdef HAVE_DLINFO_RTLD_DI_TLS_MODID
* valgrind_get_tls_addr will then use the dtv and lm to compute the
thread local variable.
Maybe you could put some valgrind debug trace -d -d -d
and compare the ppc64 BE and LE trace ?
Or also maybe compare the gdb+gdbserver traces for lm/lm_addr etc
with what valgrind shows ?
It is now quite some time I have done this tls functionality, so I do
not remember much. The paper of Ulrich Drepper describing thread local
storage on linux gave very good information (I think some details
here and there might be obsolete).
Sorry for not being able to give more precise indications,
I hope the pointers above might give a hint.
Philippe
|