|
From: Florian K. <br...@ac...> - 2011-10-06 13:15:06
|
Here is an update on the regression tests. If you're on cc I'm
asking for your help.
Not included in the table are those failing testcases for which
we have immediate fixes pending.
I use these abbreviations for distributions:
U10 = Ubuntu 10.10
F.. = Fedora ..
S11 = SLES 11
R4 = RHEL 4
2.6.37 = Rich Coe's run. I don't know what distribution he's using.
The x86 results are from running on my thinkpad.
x86 x86_64 s390x
--- ------------------------- --------
memcheck
origin5-bz2 F15 F14 2.6.37
overlap F15
linux/stack_switch F14 F13 F11 2.6.37
long_namespace_xml F11
linux/timerfd-syscall F S11
manuel3 R4
partial_load_ok R4
varinfo6 R4
helgrind
hg05_race2 F15
pth_barrier3 F13
tc18_semabuse F S11 R4
tc20_verifywrap F S11 R4
tc09_bad_unlock R4
tc14_laog_dinphils R4
tc23_bogus_condwait U10
drd
tc04_free_lock F S11 R4
tc09_bad_unlock F S11 R4
tc23_bogus_condwait U10
gdbserver_tests
mcbreak F14
mcclean_after_fork F14
mcinfcallWSRU F14
mcleak F14
mcmain_pic F14
mcvabits F14
mssnapshot F14 2.6.37
nlpasssigalrm F14
nlsigvgdb F14
Here are some comments about the non-s390x specific testcases:
memcheck / overlap
Julian suspects it's related to the changes in handling memcpy,
memmove that went in a few weeks ago.
My plan is to filter this out, unless somebody has a better suggestion.
memcheck / origin5-bz2
There are different answers about the origin of an uninitialized
value. On some systems it's said to come from dynamically allocated
memory whereas it seems it ought to come from a client request.
memcheck / linux / stackswitch
Could be related to the system wrapper for the clone call or to
our less than ideal handling of stack switches.
memcheck / long_namespace_xml
Looks like a real bug to me.
helgrind / hg05_race2
Related to DWARF reading. Julian thinks this will be difficult to fix
in the current dwarf3 framework
helgrind / pth_barrier3
There are extra error messages. Needs investigation
helgrind / tc23_bogus_condwait
drd / tc23_bogus_condwait
Most likely harmless. A different error is issued on x86 perhaps related
to 32 bit vs 64 bit.
helgrind / tc08_hbl2
helgrind / annotate_hbefore
Intermittent fasilures and hangs.
It is suspected that these are due to memory contention issues.
May require inserting memory fences in the testcase in strategic places
to get deterministic behaviour.
Here is where I'm asking for help:
Tom, could you check in an exp file for this test:
none
shell F15
I would do it myself but I do not know what shell it is. So I cannot
pick a meaningful name for the exp file.
On Fedora 14 the gdbserver tests are failing. Perhaps this is as easy
to fix as
yum --disablerepo='*' --enablerepo='*-debuginfo' install .....
?
Julian said he will check in a 2nd set of exp files for these:
none:
amd64/bug132918 F11 F9
amd64/fxtract F11 F9
amd64/sse4-64 F11 F9
x86/fxtract F11 F9
ARM result would be good, too.
Rich: Could you update your nightly build machinery such that it
uses the latest version of the nightly script? That would
show us what kind of system you are running.
Maynard: can you run a ppc regtest and tar up the diffs and send them
to me. I'd like to see how we're doing there WRT filtering backtrace
noise.
Thanks,
Florian
|
|
From: Tom H. <to...@co...> - 2011-10-06 15:09:36
|
On 06/10/11 14:14, Florian Krohm wrote: > Tom, could you check in an exp file for this test: > > none > shell F15 > > I would do it myself but I do not know what shell it is. So I cannot > pick a meaningful name for the exp file. It's bash - the only difference is the case of one letter in the error message compared to whatever version of bash was used to generate our output. > On Fedora 14 the gdbserver tests are failing. Perhaps this is as easy > to fix as > yum --disablerepo='*' --enablerepo='*-debuginfo' install ..... > ? This is really a side effect of me running the test under mock, which means the RPM database has been generated by F15 and the F14 gdb in the mock chroot can't understand it. Installing the debuginfo's for glibc does work around it because gdn then finds the debug and doesn't try and consult the RPM database. I've made that change, but it's possible it might causes changes in other tests having the glibc debuginfo present. Tom -- Tom Hughes (to...@co...) http://compton.nu/ |
|
From: Maynard J. <may...@us...> - 2011-10-06 19:16:07
Attachments:
vg-svn_10.06.2011_POWER7-sles11SP1_test-diffs.tar.gz
|
Florian Krohm wrote: > Here is an update on the regression tests. If you're on cc I'm > asking for your help. > [snip] > Maynard: can you run a ppc regtest and tar up the diffs and send them > to me. I'd like to see how we're doing there WRT filtering backtrace > noise. The regression tests on POWER7/SLES 11 SP1 (using Oct 6, 2011 SVN) actually compare quite favorably to Valgrind 3.6.1 on a POWER5 (POWER7 support did not yet exist in 3.6.1 timeframe). For reference, here are the results: ---------------------------------- gdbserver_tests/mcinfcallRU (stderr) gdbserver_tests/mcinfcallWSRU (stderr) gdbserver_tests/mcinfcallWSRU (stderrB) gdbserver_tests/mcsignopass (stderr) gdbserver_tests/mcsignopass (stdoutB) gdbserver_tests/mcsigpass (stderr) gdbserver_tests/mcsigpass (stdoutB) gdbserver_tests/nlcontrolc (stdoutB) gdbserver_tests/nlpasssigalrm (stdoutB) memcheck/tests/varinfo6 (stderr) memcheck/tests/wrap8 (stdout) memcheck/tests/wrap8 (stderr) callgrind/tests/simwork-both (stdout) callgrind/tests/simwork-both (stderr) callgrind/tests/simwork-branch (stdout) callgrind/tests/simwork-branch (stderr) massif/tests/big-alloc (post) none/tests/faultstatus (stderr) none/tests/ppc32/jm-fp (stdout) none/tests/ppc32/jm-vmx (stdout) none/tests/ppc32/testVMX (stdout) none/tests/ppc64/jm-fp (stdout) none/tests/ppc64/jm-vmx (stdout) helgrind/tests/hg05_race2 (stderr) helgrind/tests/tc18_semabuse (stderr) helgrind/tests/tc20_verifywrap (stderr) drd/tests/tc04_free_lock (stderr) drd/tests/tc09_bad_unlock (stderr) ---------------------------------- Unfortunately, POWER5/SLES 10 SP3 (current SVN) compared much worse to Valgrind 3.6.1 -- lots of callgrind and helgrind failures. Whereas results from an April 15 snapshot of SVN was almost identical to the 3.6.1 results (with just two extra callgrind testcases failing). Also, some gdbserver tests usually hang the regtest on this processor; thus, I have to remove the gdbserver/*vgtest files in order to get 'make regtest' to complete. I have seen occasional hangs of gdbserver tests on POWER7, too, but not often. While this drastic regression for POWER5 does concern me (and I have it on my to-do list to investigate), this is a pretty old processor that's not even supported on current distros like SLES 11 and RHEL 6. For now, my higher priority will be to clean up the testsuite failures for POWER7. I've attached a tar file containing the *.diff from the test directories from my POWER7 testsuite run. By the way, there hasn't been a ppc32 nightly build since July 27. Anyone know why? Thanks, -Maynard > > Thanks, > Florian > |
|
From: Bart V. A. <bva...@ac...> - 2011-10-06 19:25:00
|
On Thu, Oct 6, 2011 at 9:06 PM, Maynard Johnson <may...@us...> wrote: > By the way, there hasn't been a ppc32 nightly build since July 27. Anyone > know why? > PPC systems that are up 24h/24h are not that easy to find though. Have you already considered setting up a nightly PPC build yourself ? A previous discussion of this subject can be found here: http://old.nabble.com/Re%3A-ppc-nightly-build-td32256186.html. Bart. |
|
From: Maynard J. <may...@us...> - 2011-10-12 12:12:27
|
Bart Van Assche wrote: > On Thu, Oct 6, 2011 at 9:06 PM, Maynard Johnson <may...@us...> wrote: > >> By the way, there hasn't been a ppc32 nightly build since July 27. Anyone >> know why? >> > > PPC systems that are up 24h/24h are not that easy to find though. Have you > already considered setting up a nightly PPC build yourself ? I'm working this issue. I'll let you all know soon what can be done. -Maynard > > A previous discussion of this subject can be found here: > http://old.nabble.com/Re%3A-ppc-nightly-build-td32256186.html. > > Bart. > |
|
From: Philippe W. <phi...@sk...> - 2011-10-08 14:50:07
|
> The regression tests on POWER7/SLES 11 SP1 (using Oct 6, 2011 SVN) actually compare quite favorably to Valgrind 3.6.1 on a POWER5 > (POWER7 support did not yet exist in 3.6.1 timeframe). For reference, here are the results: The attached diff files also contains differences for mcbreak, but mcbreak is not in the below list ??? > gdbserver_tests/mcinfcallRU (stderr) > gdbserver_tests/mcinfcallWSRU (stderr) > gdbserver_tests/mcinfcallWSRU (stderrB) These 3 failures+the mcbreak seems linked to problems to do an inferior function call. (see below). > gdbserver_tests/mcsignopass (stderr) > gdbserver_tests/mcsignopass (stdoutB) > gdbserver_tests/mcsigpass (stderr) > gdbserver_tests/mcsigpass (stdoutB) The above tests are based on none/tests/faultstatus which also fails. I think that if faultstatus is fixed, these should also work. > gdbserver_tests/nlcontrolc (stdoutB) > gdbserver_tests/nlpasssigalrm (stdoutB) These two are failing as the filter script does not properly remove or transform the power7 specific line: from /lib64/power7/libc.so.6 and 0x........ in .__GI_kill () from /lib64/power7/libc.so.6 The (already horrible) gdbserver_tests/filter_gdb should be further enhanced to filter or transform these. ... > Unfortunately, POWER5/SLES 10 SP3 (current SVN) compared much worse to Valgrind 3.6.1 -- lots of callgrind and helgrind failures. > Whereas results from an April 15 snapshot of SVN was almost identical to the 3.6.1 results (with just two extra callgrind > testcases failing). > Also, some gdbserver tests usually hang the regtest on this processor; thus, I have to remove the gdbserver/*vgtest files in order > to get > 'make regtest' to complete. I have seen occasional hangs of gdbserver tests on POWER7, too, but not often. Hanging tests are often caused by inferior function calls not working properly. A possible technique to investigate this is to compare the behaviour of the "standard" gdbserver with the valgrind gdbserver (see gdbserver_tests/README_DEVELOPERS). Philippe |
|
From: Maynard J. <may...@us...> - 2011-10-10 13:16:08
|
Philippe Waroquiers wrote: >> The regression tests on POWER7/SLES 11 SP1 (using Oct 6, 2011 SVN) actually compare quite favorably to Valgrind 3.6.1 on a POWER5 >> (POWER7 support did not yet exist in 3.6.1 timeframe). For reference, here are the results: > > The attached diff files also contains differences for mcbreak, but mcbreak is not in the below list ??? Right . . . copy/paste error. >> gdbserver_tests/mcinfcallRU (stderr) >> gdbserver_tests/mcinfcallWSRU (stderr) >> gdbserver_tests/mcinfcallWSRU (stderrB) > These 3 failures+the mcbreak seems linked to problems to do an inferior function call. > (see below). > >> gdbserver_tests/mcsignopass (stderr) >> gdbserver_tests/mcsignopass (stdoutB) >> gdbserver_tests/mcsigpass (stderr) >> gdbserver_tests/mcsigpass (stdoutB) > The above tests are based on none/tests/faultstatus which also fails. > I think that if faultstatus is fixed, these should also work. OK, I'll look at faultstatus. > >> gdbserver_tests/nlcontrolc (stdoutB) >> gdbserver_tests/nlpasssigalrm (stdoutB) > These two are failing as the filter script does not properly remove or transform > the power7 specific line: > from /lib64/power7/libc.so.6 > and > 0x........ in .__GI_kill () from /lib64/power7/libc.so.6 > The (already horrible) gdbserver_tests/filter_gdb should be further enhanced to filter or transform these. Should I open a bug report and attach a patch for this? > ... >> Unfortunately, POWER5/SLES 10 SP3 (current SVN) compared much worse to Valgrind 3.6.1 -- lots of callgrind and helgrind failures. >> Whereas results from an April 15 snapshot of SVN was almost identical to the 3.6.1 results (with just two extra callgrind >> testcases failing). >> Also, some gdbserver tests usually hang the regtest on this processor; thus, I have to remove the gdbserver/*vgtest files in order >> to get >> 'make regtest' to complete. I have seen occasional hangs of gdbserver tests on POWER7, too, but not often. > > Hanging tests are often caused by inferior function calls not working properly. A possible technique to investigate this > is to compare the behaviour of the "standard" gdbserver with the valgrind gdbserver > (see gdbserver_tests/README_DEVELOPERS). I'll try to find some time to look at this later in the week. Thanks for the help! -Maynard > > Philippe > |
|
From: Maynard J. <may...@us...> - 2011-10-17 21:03:59
|
On 10/08/2011 9:50 AM, Philippe Waroquiers wrote: >> The regression tests on POWER7/SLES 11 SP1 (using Oct 6, 2011 SVN) actually >> compare quite favorably to Valgrind 3.6.1 on a POWER5 (POWER7 support did not >> yet exist in 3.6.1 timeframe). For reference, here are the results: [snip] > >> gdbserver_tests/nlcontrolc (stdoutB) >> gdbserver_tests/nlpasssigalrm (stdoutB) > These two are failing as the filter script does not properly remove or transform > the power7 specific line: > from /lib64/power7/libc.so.6 > and > 0x........ in .__GI_kill () from /lib64/power7/libc.so.6 > The (already horrible) gdbserver_tests/filter_gdb should be further enhanced to > filter or transform these. I opened a bug report for this (https://bugs.kde.org/show_bug.cgi?id=284305) and attached a patch to filter_gdb that addresses these two problems. Can you please review it to see if my changes might break other arch's? Thanks. -Maynard [snip] > Philippe > |
|
From: Maynard J. <may...@us...> - 2011-10-18 15:12:28
|
Philippe Waroquiers wrote:
>> The regression tests on POWER7/SLES 11 SP1 (using Oct 6, 2011 SVN) actually compare quite favorably to Valgrind 3.6.1 on a POWER5
>> (POWER7 support did not yet exist in 3.6.1 timeframe). For reference, here are the results:
>
> The attached diff files also contains differences for mcbreak, but mcbreak is not in the below list ???
>> gdbserver_tests/mcinfcallRU (stderr)
>> gdbserver_tests/mcinfcallWSRU (stderr)
>> gdbserver_tests/mcinfcallWSRU (stderrB)
> These 3 failures+the mcbreak seems linked to problems to do an inferior function call.
Philippe,
The gdbserver tests actually run better on my POWER7/Fedora 16 box. With the patches applied for which I've recently opened bug reports (faultstatus bug #283709 and filter_gdb bug #284305), I only get one gdbserver test failure -- mcmain_pic. I went back to my SLES 11 SP1/POWER7 box where I was seeing the above gdbserver_tests failures and tried using the debugging tips in gdbserver_tests/README_DEVELOPERS to debug the mcbreak problem, but I couldn't get the breakpoints to set correctly. In both cases when trying to set breakpoints in t.c, I get the message:
No symbol table is loaded. Use the "file" command.
Make breakpoint pending on future shared library load? (y or [n])
I tried both 'y' and 'n', but I get the same results either way. The valgrind process continues up to the point where I see the message "Petaouchnok sleep nr 15 out of 15 sleeping 5 seconds" and the gdb process just says "continuing", and then both processes are stuck there, and I have to quit gdb and ctl-C valgrind.
By the way, I was able to successfully run mcsigpass manually (one session for valgrind and one session for gdb) using mcsigpass.stdinB.gdb as input to the gdb session, so I *think* I'm running the two processes correctly in manual mode. Any idea what I'm doing wrong? I'm not versed in using gdbserver, so it's very possible I'm missing something.
Thanks.
-Maynard
>
[snip]
> Philippe
>
|
|
From: Philippe W. <phi...@sk...> - 2011-10-11 18:59:42
|
>>> gdbserver_tests/mcsignopass (stderr) >>> gdbserver_tests/mcsignopass (stdoutB) >>> gdbserver_tests/mcsigpass (stderr) >>> gdbserver_tests/mcsigpass (stdoutB) >> The above tests are based on none/tests/faultstatus which also fails. >> I think that if faultstatus is fixed, these should also work. > > I opened https://bugs.kde.org/show_bug.cgi?id=283709 to address the faultstatus problem. > This fix also allows mcsignopass and mcsigpass to pass on my ppc64/SLES 11 SP system. That is good news. For what concerns the other problems discussed in the other mail: * For the filtering of the power7 specific lines (cfr other mail), the best is also to attach a patch. * For the remaining failing tests: if this is due to inferior function calls not working, a comparison with a power5 and/or with the "standard" gdbserver might help to understand what is wrong. I currently have no access anymore to a ppc system, so can't investigate much. Philippe |
|
From: Philippe W. <phi...@sk...> - 2011-10-22 07:04:40
|
> By the way, I was able to successfully run mcsigpass manually (one session for valgrind and one session for gdb) using > mcsigpass.stdinB.gdb > as input to the gdb session, so I *think* I'm running the two processes correctly in manual mode. Any idea what I'm doing wrong? > I'm not versed in using gdbserver, so it's very possible I'm missing something. IIUC, these tests are ok on fedora16/Power7 and not ok on suse11/power7. One thing you can try is to do the same commands as in the stdinB.gdb but with the standard gdbserver part of the gdb distribution, and see if this also fails to put the breakpoints. I can take a look at the problem if you run the following commands on both platforms, and send the gdbserver_tests/*.out files. rm gdbserver_tests/*.out perl tests/vg_regtest --keep-unfiltered gdbserver_tests/mcinfcallRU.vgtest .... send the gdbserver_tests/*.out export EXTRA_REGTEST_OPTS="-v -v -d -d -d" .... redo the above Thanks |
|
From: Maynard J. <may...@us...> - 2011-10-27 19:16:48
|
On 10/22/2011 2:04 AM, Philippe Waroquiers wrote: > >> By the way, I was able to successfully run mcsigpass manually (one session for >> valgrind and one session for gdb) using mcsigpass.stdinB.gdb >> as input to the gdb session, so I *think* I'm running the two processes >> correctly in manual mode. Any idea what I'm doing wrong? >> I'm not versed in using gdbserver, so it's very possible I'm missing something. > > IIUC, these tests are ok on fedora16/Power7 and not ok on suse11/power7. Yes. > One thing you can try is to do the same commands as in the stdinB.gdb but with > the standard gdbserver > part of the gdb distribution, and see if this also fails to put the breakpoints. I can set breakpoints OK when using gdbserver manually, outside of valgrind. > > I can take a look at the problem if you run the following commands on both > platforms, Well, the Fedora16/ppc64 box I had access to was freshly installed with F16 alpha yesterday. I pulled valgrind from svn and built it and tried to run the gdbserver tests. I don't know what's changed (since the gdbserver tests had run quite well on Fedora the last I tried), but I had major problems. The testcases either hung or failed the pre-req because gdb.step does not exist. sigh.... -Maynard > and send the gdbserver_tests/*.out files. > > rm gdbserver_tests/*.out > perl tests/vg_regtest --keep-unfiltered gdbserver_tests/mcinfcallRU.vgtest > .... send the gdbserver_tests/*.out > > export EXTRA_REGTEST_OPTS="-v -v -d -d -d" > .... redo the above > > Thanks > > |
|
From: Philippe W. <phi...@sk...> - 2011-10-27 22:24:36
|
> Well, the Fedora16/ppc64 box I had access to was freshly installed with F16 > alpha yesterday. I pulled valgrind from svn and built it and tried to run the > gdbserver tests. I don't know what's changed (since the gdbserver tests had run > quite well on Fedora the last I tried), but I had major problems. The testcases > either hung or failed the pre-req because gdb.step does not exist. sigh.... I suspect you might encounter two different problems. First the gdb.step that does not exist: -------------------------------------- 'make regtest' calls 'gdbserver_tests/make_local_links /path/to/gdb' make_local_links tries to extract the gdb version from the gdb --version first line. Then depending on the gdb version, the gdb.step file is created to indicate that gdb properly implements 'step command', otherwise, gdb.step is removed. gdb.step should always be created for gdb version >= 7.0, except on ARM where gdb version must be >= 7.1. So, if there is no gdb.step on fedora16/ppc64, it might be that the way make_local_links extracts the version does not work properly with the gdb on fedora16/ppc. Can you send me the result of gdb --version? Second the hanging tests: ------------------------- even if gdb.step is not created, that should just disable some tests. It should not create hanging tests. What I have often seen on ppc systems is that the standard visible gdb is a 32 bits gdb, which cannot debug the 64 bits executable. This can then cause the test to block. I just regained access to the gcc38 ppc gcc farm system. This is what I have on this system: philippe@ps3gccfarm:~/valgrind/trunk_untouched$ /usr/bin/gdb gdbserver_tests/t GNU gdb (GDB) 7.0.1-debian ..... "/home/philippe/valgrind/trunk_untouched/gdbserver_tests/t": not in executable format: File format not recognized (gdb) quit With a gdb I compiled myself in 64 bits: philippe@ps3gccfarm:~/valgrind/trunk_untouched$ /home/philippe/gdb/gdb-7.3.1/install/bin/gdb gdbserver_tests/t GNU gdb (GDB) 7.3.1 ... Reading symbols from /home/philippe/valgrind/trunk_untouched/gdbserver_tests/t...done. (gdb) I got another failure reason on this gcc38 system: vgdb relay application is taking a lock on a file to ensure it is the only one doing a relay with the Valgrind process using this file. For a yet not clear reason, the lock cannot be taken on gcc38 when the file is accessed through nfs. (if the file is local, the lock can be taken). I will do some more testing with an nfs tomorrow. But if this is your problem, the symptom will be vgdb writing the following error msg (and then failing) syscall failed: No locks available cannot acquire lock. The failure of vgdb will block the test (but the error msg above should be visible in one of the *.out test output file). If the blocking is not one of the above causes, then the output files of the tests (filtered and unfiltered, cfr previous mail) might allow me to investigate. Hope this helps Philippe |
|
From: Philippe W. <phi...@sk...> - 2011-10-28 22:05:45
|
> When I ran the full 'make regtest' on my Fedora16/POWER7 system, then only one gdbserver test failed -- and no hangs.
> The one failing testcase was mcmain_pic, just like my earlier run. The *.out files you asked for are attached.
It looks like the main_pic executable is crashing because an error is detected by the dynamic loader:
...
--7332:1:gdbsrv getpkt ("Hc0"); [no ack]
--7332:1:gdbsrv putpkt ("$E01#a6"); [no ack]
--7332:1:gdbsrv getpkt ("c"); [no ack]
--7332:1:gdbsrv set_desired_inferior use_general 0 found 0x0 tid 1 lwpid 7332
--7332:1:gdbsrv resume_info thread -1 leave_stopped 0 step 0 sig 0 stepping 0
--7332:1:gdbsrv stop pc is 0x800BFE75A0
--7332:1:transtab allocate sector 0
--7332-- TT/TC: initialise sector 0
--7332:1:mallocfr newSuperblock at 0x405030000 (pszB 65504) owner VALGRIND/ttaux
--7332-- REDIR: 0x800c003fb0 (strlen) redirected to 0x3807c1a0 (vgPlain_ppc64_linux_REDIR_FOR_strlen)
Inconsistency detected by ld.so: rtld.c: 1278: dl_main: Assertion `_rtld_local._dl_rtld_map.l_libname' failed!
--7332:1:syswrap- thread_wrapper(tid=1): exit
--7332:1:syswrap- run_a_thread_NORETURN(tid=1): post-thread_wrapper
...
The 'Inconsistency' line seems not related to the gdbserver as a few lines before, we see that gdbserver
has received a "c" packet (continue packet from gdb) and has properly resumed the executable (as we see
some transtab related trace and a strlen redirection being done).
=> is the main_pic executable running properly outside of Valgrind ?
i.e. gdbserver_tests/main_pic
=> is it running properly under Valgrind, but without gdbserver ?
i.e. ./vg-in-place gdbserver_tests/main_pic ?
Philippe
|
|
From: Philippe W. <phi...@sk...> - 2011-10-28 23:01:49
|
> Back on the SLES 11 SP1/POWER7 system, a fresh pull of upstream valgrind ran quite well this time -- no hangs in the gdbserver
> tests,
> and I still get the same three testcases failing: mcbreak, mcinfcallRU, and mcinfcallWSRU. Again, the *.out files you asked for
> are attached.
In the 'no extra options' file, we see that mcbreak fails when gdb invokes a function call.
A break is encountered in t.c at line 112.
The test executes a few "steps" and a "next"
and then invokes a call to whoami("first").
We see in mcbreak.stderr.out.unfiltered.out that the process crashes due to the following error:
==63162== Process terminating with default action of signal 11 (SIGSEGV): dumping core
==63162== Bad permissions for mapped region at address 0x10012030
==63162== at 0x10000BB0: whoami (t.c:28)
==63162== by 0x10001183: main (t.c:112)
The two other tests are also failing with a similar 'Bad permissions'. The address however slightly changes.
If we examine more in details mcbreak.stderr.out.unfiltered.out (searching for signal 11),
we see some lines above the following:
--63758:1:gdbsrv stop_pc 0x10001194 changed to be resume_pc 0x401F4D0: malloc (dl-minimal.c:96)
gdbsrv indicates that gdb has asked to change the pc at which the process was stopped (somewhere around t.c:112)
to be another pc (the pc of malloc). This is because gdb must allocate memory in order to execute the whoami("first") :
it first has to allocate memory by "pushing" a call to malloc.
gdb has then to copy "first" in this memory, and it can then pushes a call to whoami.
The usual technique with which gdb pushes a call is:
It puts a break instruction somewhere (gdb has some logic depending on the OS and/or processor to find a "reasonable" place
where such a break can be put).
It then pushes the address where this break has been put as a return address (the technique to put the return address
depends on the ABI. IIRC, on power, this is via a register)
It puts the needed arguments
and then invokes the function to call (e.g. malloc, or whoami, or ...) by changing the pc to be the pc of this function.
When this function returns, it returns to the break instruction. The break causes gdb to regain control.
gdb then knows the function call is finished. It recuperates the return value (e.g. what malloc has returned)
and can continue.
>From what I can see, gdb uses the address 0x10012030 to put the break.
It then calls malloc. But this malloc calls already fails (gives a signal 11). It looks like gdb believes the call has worked
properly, as it then pushes a call to whoami.
I do not know why the calls to malloc fails, and why the subsequent calls to whoami similarly fails.
What could be done to investigate is to make a simpler test of invoking a function call.
For example (not compiled):
#include <stdio.h>
int fun_set_me = 0;
void fun (int i)
{
fun_set_me = i;
}
main()
{
printf ("hello world\n");
printf ("fun_set_me %d\n", fun_set_me); <<< put a break here, then when encountered, call fun(1234) from gdb
}
Execute the above test on a f16/power7 valgrind with -v -v -v -d -d -d and gdb
Do the same on suse.
The difference in the trace between the two runs might explain what goes wrong on f16.
You might also compare the behaviour of the standard f16 gdbserver to the Valgrind gdbserver:
The standard gdbserver has two options --debug and --remote-debug that will show the trace.
This might also allow to compare a working function call invocation to the failing one under valgrind.
The gdbserver protocol is relatively simple : the query packets and reply packets are explained in
annex D of the gdb user manual. Note however that there will not be a one to one mapping between
the packets exchange. E.g. gdb might send "Z" packets to insert "hardware breakpoints" to the
Valgrind gdbserver (as valgrind gdbserver pretends it can implement hw watchpoint), and the standard
gdbserver might indicate it only accepts breakpoints via chaning the instruction to a trap.
But it should be possible to recognise the "push a call sequence".
Hope the above helps ...
Philippe
|
|
From: Maynard J. <may...@us...> - 2011-10-31 15:34:01
Attachments:
gdbserv_test_debugging.tar.gz
|
On 10/28/2011 6:01 PM, Philippe Waroquiers wrote:
>
>> Back on the SLES 11 SP1/POWER7 system, a fresh pull of upstream valgrind ran
>> quite well this time -- no hangs in the gdbserver tests,
>> and I still get the same three testcases failing: mcbreak, mcinfcallRU, and
>> mcinfcallWSRU. Again, the *.out files you asked for are attached.
>
> In the 'no extra options' file, we see that mcbreak fails when gdb invokes a
> function call.
> A break is encountered in t.c at line 112.
> The test executes a few "steps" and a "next"
> and then invokes a call to whoami("first").
> We see in mcbreak.stderr.out.unfiltered.out that the process crashes due to the
> following error:
> ==63162== Process terminating with default action of signal 11 (SIGSEGV):
> dumping core
> ==63162== Bad permissions for mapped region at address 0x10012030
> ==63162== at 0x10000BB0: whoami (t.c:28)
> ==63162== by 0x10001183: main (t.c:112)
> The two other tests are also failing with a similar 'Bad permissions'. The
> address however slightly changes.
>
>
> If we examine more in details mcbreak.stderr.out.unfiltered.out (searching for
> signal 11),
> we see some lines above the following:
> --63758:1:gdbsrv stop_pc 0x10001194 changed to be resume_pc 0x401F4D0: malloc
> (dl-minimal.c:96)
>
> gdbsrv indicates that gdb has asked to change the pc at which the process was
> stopped (somewhere around t.c:112)
> to be another pc (the pc of malloc). This is because gdb must allocate memory in
> order to execute the whoami("first") :
> it first has to allocate memory by "pushing" a call to malloc.
> gdb has then to copy "first" in this memory, and it can then pushes a call to
> whoami.
>
> The usual technique with which gdb pushes a call is:
> It puts a break instruction somewhere (gdb has some logic depending on the OS
> and/or processor to find a "reasonable" place
> where such a break can be put).
> It then pushes the address where this break has been put as a return address
> (the technique to put the return address
> depends on the ABI. IIRC, on power, this is via a register)
> It puts the needed arguments
> and then invokes the function to call (e.g. malloc, or whoami, or ...) by
> changing the pc to be the pc of this function.
> When this function returns, it returns to the break instruction. The break
> causes gdb to regain control.
> gdb then knows the function call is finished. It recuperates the return value
> (e.g. what malloc has returned)
> and can continue.
>
> From what I can see, gdb uses the address 0x10012030 to put the break.
> It then calls malloc. But this malloc calls already fails (gives a signal 11).
> It looks like gdb believes the call has worked
> properly, as it then pushes a call to whoami.
>
> I do not know why the calls to malloc fails, and why the subsequent calls to
> whoami similarly fails.
>
> What could be done to investigate is to make a simpler test of invoking a
> function call.
> For example (not compiled):
> #include <stdio.h>
> int fun_set_me = 0;
> void fun (int i)
> {
> fun_set_me = i;
> }
> main()
> {
> printf ("hello world\n");
> printf ("fun_set_me %d\n", fun_set_me); <<< put a break here, then when
> encountered, call fun(1234) from gdb
> }
>
> Execute the above test on a f16/power7 valgrind with -v -v -v -d -d -d and gdb
> Do the same on suse.
> The difference in the trace between the two runs might explain what goes wrong
> on f16.
Actually, these tests run OK on F16, but fail on SLES 11 SP1. When running on
F16, the stdout and stderr output (both redirected to one file) show the "hello
world" and the "fun_set_me 1234" print outs. But I don't see either of these
when I run the same thing on SLES 11 SP1. I don't really know what to look for
to see where things go bad. I've attached both files in a tar file. Thanks for
your help!
-Maynard
>
> You might also compare the behaviour of the standard f16 gdbserver to the
> Valgrind gdbserver:
> The standard gdbserver has two options --debug and --remote-debug that will show
> the trace.
> This might also allow to compare a working function call invocation to the
> failing one under valgrind.
> The gdbserver protocol is relatively simple : the query packets and reply
> packets are explained in
> annex D of the gdb user manual. Note however that there will not be a one to one
> mapping between
> the packets exchange. E.g. gdb might send "Z" packets to insert "hardware
> breakpoints" to the
> Valgrind gdbserver (as valgrind gdbserver pretends it can implement hw
> watchpoint), and the standard
> gdbserver might indicate it only accepts breakpoints via chaning the instruction
> to a trap.
> But it should be possible to recognise the "push a call sequence".
>
> Hope the above helps ...
>
> Philippe
>
|
|
From: Philippe W. <phi...@sk...> - 2011-10-31 19:06:25
|
> Actually, these tests run OK on F16, but fail on SLES 11 SP1. When running on > F16, the stdout and stderr output (both redirected to one file) show the "hello > world" and the "fun_set_me 1234" print outs. But I don't see either of these > when I run the same thing on SLES 11 SP1. I don't really know what to look for > to see where things go bad. I've attached both files in a tar file. Thanks for > your help! It looks like the inferior function call fails on S11 because gdb puts a break in a non-executable mapping of vgdb-test. On F16, to execute the function call, gdb puts a break at address 0x10000498, which is _start in vgdb-test. It then puts this address in the lr register, change the program counter to be the address of fun. And then everything works fine. --9441:1:aspacem ( 1) /home/mpj/vgdb-test ... --9441:1:aspacem 6: file 0010000000-001000ffff 65536 r-xT- d=0xfd01 i=406682 o=0 (1) --9441:1:aspacem 7: file 0010010000-001001ffff 65536 rw--- d=0xfd01 i=406682 o=0 (1) I tried the same test on the gcc farm PS3 ppc64/debian with gdb 7.3.1. It works similarly to the F16: it puts a break in _start (in the r-x mapped vgdb-test). On S11, gdb puts a break at address 0x10011010, which is also in vgdb-test. But it is in the "rw" mapping of vgdb-test, not the r-x mapping: --53993:1:aspacem ( 1) /home/mpj/temp/vgdb-test ... --53993:1:aspacem 4: file 0010000000-001000ffff 65536 r-x-- d=0xfd00 i=24314763 o=0 (1) --53993:1:aspacem 5: file 0010010000-001001ffff 65536 rw--- d=0xfd00 i=24314763 o=0 (1) I guess that the sig 11 is generated by Valgrind as it refuses to execute instruction in a non executable mapping. >From what I can see by reading gdb code, gdb derives the _start address purely from the executable/debugging information, without querying gdbserver. So, it might be a bug in gdb, rather than a bug in the gdbserver. Which version of gdb are you using on S11 ? It might be worth trying with a recent gdb (e.g. the same as what I tried on the PS3 : gdb 7.3.1). If it still fails, the best will be to compare the behaviour between the gdb 7.3.1 gdbserver and the Valgrind gdbserver on S11 : * do the "break + fun call" with gdb 7.3.1 + Valgrind, using -v -v -v -d -d -d to have full trace. * do the same with the gdb 7.3.1 gdbserver+trace, something like: gdbserver --debug --remote-debug :1234 vgdb-test in another window: gdb vgdb-test target remote :1234 ... same as with Valgrind : break then continue then p fun(1234) This should allow to confirm my reading of the gdb code regarding getting the _start address. Let's hope this is a gdb bug, otherwise finding why gdb + Valgrind gdbserver can't agree on the address of _start will not be a piece of cake. Philippe NB: what about the main_pic on F16. Is this more clear ? |
|
From: Maynard J. <may...@us...> - 2011-10-31 20:01:04
|
On 10/31/2011 2:06 PM, Philippe Waroquiers wrote:
>
>> Actually, these tests run OK on F16, but fail on SLES 11 SP1. When running on
>> F16, the stdout and stderr output (both redirected to one file) show the
>> "hello world" and the "fun_set_me 1234" print outs. But I don't see either of
>> these when I run the same thing on SLES 11 SP1. I don't really know what to
>> look for to see where things go bad. I've attached both files in a tar file.
>> Thanks for your help!
>
> It looks like the inferior function call fails on S11 because gdb puts a break in
> a non-executable mapping of vgdb-test.
>
> On F16, to execute the function call, gdb puts a break at address 0x10000498,
> which is _start in vgdb-test.
> It then puts this address in the lr register, change the program counter to be
> the address of fun.
> And then everything works fine.
> --9441:1:aspacem ( 1) /home/mpj/vgdb-test
> ...
> --9441:1:aspacem 6: file 0010000000-001000ffff 65536 r-xT- d=0xfd01 i=406682 o=0
> (1)
> --9441:1:aspacem 7: file 0010010000-001001ffff 65536 rw--- d=0xfd01 i=406682 o=0
> (1)
>
> I tried the same test on the gcc farm PS3 ppc64/debian with gdb 7.3.1.
> It works similarly to the F16: it puts a break in _start (in the r-x mapped
> vgdb-test).
>
> On S11, gdb puts a break at address 0x10011010, which is also in vgdb-test.
> But it is in the "rw" mapping of vgdb-test, not the r-x mapping:
> --53993:1:aspacem ( 1) /home/mpj/temp/vgdb-test
> ...
> --53993:1:aspacem 4: file 0010000000-001000ffff 65536 r-x-- d=0xfd00 i=24314763
> o=0 (1)
> --53993:1:aspacem 5: file 0010010000-001001ffff 65536 rw--- d=0xfd00 i=24314763
> o=0 (1)
>
> I guess that the sig 11 is generated by Valgrind as it refuses to execute
> instruction in a non executable mapping.
>
> From what I can see by reading gdb code, gdb derives the _start address purely
> from the executable/debugging
> information, without querying gdbserver.
> So, it might be a bug in gdb, rather than a bug in the gdbserver.
> Which version of gdb are you using on S11 ?
> It might be worth trying with a recent gdb (e.g. the same as what I tried on the
> PS3 : gdb 7.3.1).
The version of stock gdb for SLES 11 SP1 is 7.0. I built valgrind and ran the
testcases using a newer toolchain, including a 7.3 gdb, and got different
(better?) results. Of the three failing testcases I mentioned earlier, one of
them passes, and two fail differently, but appear to be very minor failure
symptoms -- stderrB.diff has the following extra line:
+'import site' failed; use -v for traceback
Unfortunately, the gdbserver testsuite did not run to completion, since it hung
on the mcinvokeWS testcase. I just can't spend any more time on this now, but I
really appreciate your help in digging into these issues.
Thanks.
-Maynard
>
> If it still fails, the best will be to compare the behaviour between the gdb
> 7.3.1 gdbserver and
> the Valgrind gdbserver on S11 :
> * do the "break + fun call" with gdb 7.3.1 + Valgrind, using -v -v -v -d -d -d
> to have full trace.
> * do the same with the gdb 7.3.1 gdbserver+trace, something like:
> gdbserver --debug --remote-debug :1234 vgdb-test
> in another window:
> gdb vgdb-test
> target remote :1234
> ... same as with Valgrind : break then continue then p fun(1234)
>
> This should allow to confirm my reading of the gdb code regarding getting the
> _start address.
>
> Let's hope this is a gdb bug, otherwise finding why gdb + Valgrind gdbserver
> can't agree on the address
> of _start will not be a piece of cake.
> Philippe
>
> NB: what about the main_pic on F16. Is this more clear ?
>
|
|
From: Julian S. <js...@ac...> - 2011-10-07 12:55:05
|
> Julian said he will check in a 2nd set of exp files for these: > > none: > amd64/bug132918 F11 F9 > amd64/fxtract F11 F9 > amd64/sse4-64 F11 F9 > x86/fxtract F11 F9 Done, r12119. > ARM result would be good, too. Working on it. J |
|
From: Julian S. <js...@ac...> - 2011-10-07 13:47:27
|
It's great to see the regtests getting fixed up.
Relatedly .. if you add/remove/rename svn-tracked files
(in this case, the various expected-output files), please be sure
to update EXTRA_DIST in the associated Makefile.am. If that
doesn't happen then either
(a) "make dist" (building the tarball) fails, or
(b) make dist works, but running regression tests from a
tarball fails, because files are missing, or
(c) running regression tests from a tarball falsely causes some tests
on some platforms to fail when the same from-svn-tree would succeed,
because the necessary special-case .exp files are missing.
(c) is particularly scary, because it's invisible most of the time.
I just tried running 'make dist', and it fails with
make[2]: *** No rule to make target `badjump.stderr.exp-s390x', needed by
`distdir'. Stop.
Looking back over the commits of the past few days, I see the following
which added/removed/renamed files, but have no Makefile.am changes:
12107 12103 12098 12097 12092 12091 12079 12077
I'll try to get the EXTRA_DISTs back in sync over the next few days.
J
On Thursday, October 06, 2011, Florian Krohm wrote:
> Here is an update on the regression tests. If you're on cc I'm
> asking for your help.
>
> Not included in the table are those failing testcases for which
> we have immediate fixes pending.
>
> I use these abbreviations for distributions:
> U10 = Ubuntu 10.10
> F.. = Fedora ..
> S11 = SLES 11
> R4 = RHEL 4
> 2.6.37 = Rich Coe's run. I don't know what distribution he's using.
>
> The x86 results are from running on my thinkpad.
>
>
> x86 x86_64 s390x
> --- ------------------------- --------
> memcheck
> origin5-bz2 F15 F14 2.6.37
> overlap F15
> linux/stack_switch F14 F13 F11 2.6.37
> long_namespace_xml F11
> linux/timerfd-syscall F S11
> manuel3 R4
> partial_load_ok R4
> varinfo6 R4
>
> helgrind
> hg05_race2 F15
> pth_barrier3 F13
> tc18_semabuse F S11 R4
> tc20_verifywrap F S11 R4
> tc09_bad_unlock R4
> tc14_laog_dinphils R4
> tc23_bogus_condwait U10
>
> drd
> tc04_free_lock F S11 R4
> tc09_bad_unlock F S11 R4
> tc23_bogus_condwait U10
>
> gdbserver_tests
> mcbreak F14
> mcclean_after_fork F14
> mcinfcallWSRU F14
> mcleak F14
> mcmain_pic F14
> mcvabits F14
> mssnapshot F14 2.6.37
> nlpasssigalrm F14
> nlsigvgdb F14
>
>
> Here are some comments about the non-s390x specific testcases:
>
> memcheck / overlap
> Julian suspects it's related to the changes in handling memcpy,
> memmove that went in a few weeks ago.
> My plan is to filter this out, unless somebody has a better suggestion.
>
> memcheck / origin5-bz2
> There are different answers about the origin of an uninitialized
> value. On some systems it's said to come from dynamically allocated
> memory whereas it seems it ought to come from a client request.
>
> memcheck / linux / stackswitch
> Could be related to the system wrapper for the clone call or to
> our less than ideal handling of stack switches.
>
> memcheck / long_namespace_xml
> Looks like a real bug to me.
>
> helgrind / hg05_race2
> Related to DWARF reading. Julian thinks this will be difficult to fix
> in the current dwarf3 framework
>
> helgrind / pth_barrier3
> There are extra error messages. Needs investigation
>
> helgrind / tc23_bogus_condwait
> drd / tc23_bogus_condwait
> Most likely harmless. A different error is issued on x86 perhaps related
> to 32 bit vs 64 bit.
>
> helgrind / tc08_hbl2
> helgrind / annotate_hbefore
> Intermittent fasilures and hangs.
> It is suspected that these are due to memory contention issues.
> May require inserting memory fences in the testcase in strategic places
> to get deterministic behaviour.
>
>
> Here is where I'm asking for help:
>
> Tom, could you check in an exp file for this test:
>
> none
> shell F15
>
> I would do it myself but I do not know what shell it is. So I cannot
> pick a meaningful name for the exp file.
>
> On Fedora 14 the gdbserver tests are failing. Perhaps this is as easy
> to fix as
> yum --disablerepo='*' --enablerepo='*-debuginfo' install .....
> ?
>
> Julian said he will check in a 2nd set of exp files for these:
>
> none:
> amd64/bug132918 F11 F9
> amd64/fxtract F11 F9
> amd64/sse4-64 F11 F9
> x86/fxtract F11 F9
>
> ARM result would be good, too.
>
>
> Rich: Could you update your nightly build machinery such that it
> uses the latest version of the nightly script? That would
> show us what kind of system you are running.
>
> Maynard: can you run a ppc regtest and tar up the diffs and send them
> to me. I'd like to see how we're doing there WRT filtering backtrace
> noise.
>
> Thanks,
> Florian
|
|
From: Florian K. <br...@ac...> - 2011-10-08 17:10:27
|
On 10/07/2011 09:46 AM, Julian Seward wrote: > > It's great to see the regtests getting fixed up. > > Relatedly .. if you add/remove/rename svn-tracked files > (in this case, the various expected-output files), please be sure > to update EXTRA_DIST in the associated Makefile.am. If that > doesn't happen then either > > (a) "make dist" (building the tarball) fails, or > (b) make dist works, but running regression tests from a > tarball fails, because files are missing, or > (c) running regression tests from a tarball falsely causes some tests > on some platforms to fail when the same from-svn-tree would succeed, > because the necessary special-case .exp files are missing. > > (c) is particularly scary, because it's invisible most of the time. > > I just tried running 'make dist', and it fails with > make[2]: *** No rule to make target `badjump.stderr.exp-s390x', needed by > `distdir'. Stop. > > Looking back over the commits of the past few days, I see the following > which added/removed/renamed files, but have no Makefile.am changes: > > 12107 12103 12098 12097 12092 12091 12079 12077 > I plead guilty. Sorry for that. I keep forgetting about make dist. I'll try to be more careful. Florian |
|
From: Philippe W. <phi...@sk...> - 2011-10-08 22:22:31
|
> I plead guilty. Sorry for that. I keep forgetting about make dist. > I'll try to be more careful. I also forgot to add such files in the past. So, I wrote a script which checks that all *.exp or *.vgtest or *.gdb (and similar) are present in the Makefile.am. Now, I just need to not forget the run the script :). Maybe this check could be part of make regtest ? (then nobody can forget anymore) Philippe |
|
From: Bart V. A. <bva...@ac...> - 2011-10-09 08:26:55
|
On Sun, Oct 9, 2011 at 12:22 AM, Philippe Waroquiers <phi...@sk...> wrote: > I also forgot to add such files in the past. > So, I wrote a script which checks that all *.exp or *.vgtest or *.gdb > (and similar) are present in the Makefile.am. Something like step (4) in drd/Testing.txt ? Note: a second test, namely whether all files specified in EXTRA_DIST do really exist is also necessary. See e.g. step (5) in the same file. Bart. |
|
From: Florian K. <br...@ac...> - 2011-10-08 22:59:13
|
On 10/08/2011 06:22 PM, Philippe Waroquiers wrote: > So, I wrote a script which checks that all *.exp or *.vgtest or *.gdb > (and similar) are present in the Makefile.am. > > Now, I just need to not forget the run the script :). > > Maybe this check could be part of make regtest ? > (then nobody can forget anymore) > I for one would welcome the integration of your script in make regtest. Does it work both ways and remind me if a no longer existent file is still listed in Makefile.am ? Which typically will happen when renaming a file. Florian |
|
From: Julian S. <js...@ac...> - 2011-10-10 21:02:28
|
On Sunday, October 09, 2011, Florian Krohm wrote: > On 10/08/2011 06:22 PM, Philippe Waroquiers wrote: > > So, I wrote a script which checks that all *.exp or *.vgtest or *.gdb > > (and similar) are present in the Makefile.am. > > > > Now, I just need to not forget the run the script :). > > > > Maybe this check could be part of make regtest ? > > (then nobody can forget anymore) > > I for one would welcome the integration of your script in make regtest. Yes, me too. Or into make dist. Philippe, can you show the script? J |