|
From: Robert W. <rj...@du...> - 2004-09-03 02:36:21
|
Hi all, Do we have a good idea why there are regression test failures on certain platforms? Regards, Robert. --=20 Robert Walsh Amalgamated Durables, Inc. - "We don't make the things you buy." Email: rj...@du... |
|
From: Robert W. <rj...@du...> - 2005-05-24 00:40:41
|
I've been looking at the x86_64 failures on an FC3 box. The nice thing
is that they're down to only a handful now. Here's what I've seen so
far:
1. memcheck/tests/brk:
I can make this failure go away with this patch:
--- brk.c (revision 3790)
+++ brk.c (working copy)
@@ -26,7 +26,7 @@
// vals[9] = EOL;
vals[8] = EOL;
- for (i = 0; (void*)0xffffffff != vals[i]; i++) {
+ for (i = 0; EOL != vals[i]; i++) {
res = (void*)syscall(__NR_brk, vals[i]);
}
This is a simple 8-byte -1 versus 4-byte -1 problem.
2. memcheck/tests/error_count:
Strangely enough, Valgrind doesn't seem to be getting the count of
leaked memory correct - it always returns 0. I've played around with
this a bit and I think I know what is happening. The code in
mac_shared.c that handles this user request is doing this:
UWord** argp = (UWord**)arg;
// MAC_(bytes_leaked) et al were set by the last leak check (or zero
// if no prior leak checks performed).
*argp[1] = MAC_(bytes_leaked) + MAC_(bytes_indirect);irect);
*argp[2] = MAC_(bytes_dubious);
*argp[3] = MAC_(bytes_reachable);
*argp[4] = MAC_(bytes_suppressed);
When I put VG_(printf)'s after each one of these assigns to print out
the value of '*argp[1]', it printed out the correct number 3 times and
after the last line, had magically reset this to zero.
It turns out that UWord is 8 bytes on a 64-bit machine, whereas it's 4
bytes on a 32-bit machine. But the test program uses ints as it's
values to store these in. The "n_suppressed" location happens to be 4
bytes before the "n_reachable" on the stack, the last 8-byte write craps
on it. I think the correct solution here is to change the argp from a
UWord ** to a UInt **:
--- mac_shared.c (revision 3790)
+++ mac_shared.c (working copy)
@@ -926,7 +926,7 @@
switch (arg[0]) {
case VG_USERREQ__COUNT_LEAKS: { /* count leaked bytes */
- UWord** argp = (UWord**)arg;
+ UInt** argp = (UInt**)arg;
// MAC_(bytes_leaked) et al were set by the last leak check (or zero
// if no prior leak checks performed).
*argp[1] = MAC_(bytes_leaked) + MAC_(bytes_indirect);
When I do this, the problem goes away.
I'll look at the other failures later.
Regards,
Robert.
|
|
From: Nicholas N. <nj...@cs...> - 2005-05-24 03:16:35
|
On Mon, 23 May 2005, Robert Walsh wrote:
> 1. memcheck/tests/brk:
>
> I can make this failure go away with this patch:
>
> --- brk.c (revision 3790)
> +++ brk.c (working copy)
> @@ -26,7 +26,7 @@
> // vals[9] = EOL;
> vals[8] = EOL;
>
> - for (i = 0; (void*)0xffffffff != vals[i]; i++) {
> + for (i = 0; EOL != vals[i]; i++) {
> res = (void*)syscall(__NR_brk, vals[i]);
> }
>
>
> This is a simple 8-byte -1 versus 4-byte -1 problem.
Looks good.
> 2. memcheck/tests/error_count:
>
> It turns out that UWord is 8 bytes on a 64-bit machine, whereas it's 4
> bytes on a 32-bit machine. But the test program uses ints as it's
> values to store these in. The "n_suppressed" location happens to be 4
> bytes before the "n_reachable" on the stack, the last 8-byte write craps
> on it. I think the correct solution here is to change the argp from a
> UWord ** to a UInt **:
>
> --- mac_shared.c (revision 3790)
> +++ mac_shared.c (working copy)
> @@ -926,7 +926,7 @@
>
> switch (arg[0]) {
> case VG_USERREQ__COUNT_LEAKS: { /* count leaked bytes */
> - UWord** argp = (UWord**)arg;
> + UInt** argp = (UInt**)arg;
> // MAC_(bytes_leaked) et al were set by the last leak check (or zero
> // if no prior leak checks performed).
> *argp[1] = MAC_(bytes_leaked) + MAC_(bytes_indirect);
>
> When I do this, the problem goes away.
Or the test could pass in 8-byte integers. That would be more consistent
with the other requests -- generally they take word-sized arguments.
N
|
|
From: Robert W. <rj...@du...> - 2005-05-24 04:14:41
|
> Or the test could pass in 8-byte integers. That would be more consistent= =20 > with the other requests -- generally they take word-sized arguments. Right. Sigh. I appear to have forgotten my svn password. If someone could apply the appropriate patches, I'd be grateful... :-) Regards, Robert. --=20 Robert Walsh Amalgamated Durables, Inc. - "We don't make the things you buy." Email: rj...@du... |
|
From: Dirk M. <dm...@gm...> - 2005-05-24 14:41:51
|
On Tuesday 24 May 2005 06:14, Robert Walsh wrote: > Sigh. I appear to have forgotten my svn password. If someone could > apply the appropriate patches, I'd be grateful... :-) submit a new one crypt'ed to sys...@kd... |
|
From: Tom H. <to...@co...> - 2005-05-24 07:22:53
|
In message <111...@ph...>
Robert Walsh <rj...@du...> wrote:
> Sigh. I appear to have forgotten my svn password. If someone could
> apply the appropriate patches, I'd be grateful... :-)
Done.
Tom
--
Tom Hughes (to...@co...)
http://www.compton.nu/
|
|
From: Robert W. <rj...@du...> - 2005-05-24 16:33:59
|
> > Sigh. I appear to have forgotten my svn password. If someone could > > apply the appropriate patches, I'd be grateful... :-) >=20 > Done. Thanks, Tom. Regards, Robert. --=20 Robert Walsh Amalgamated Durables, Inc. - "We don't make the things you buy." Email: rj...@du... |
|
From: Florian K. <br...@ac...> - 2011-10-01 20:21:22
|
While working through issues in the s390 port I've been looking
at our regressions. The number of failing testcases for a given
nightly run is not particularly high. But it isn't zero and it
hasn't been zero for a very long time. At least for a year,
quite possibly longer. So regression tests get little attention.
Which kind of defeats their purpose.
Here is the status quo taken from the nightly build 2 nights ago.
In the table below the 1st column lists the failing testcase.
Then there are 3 "columns" for the various architectures: x86,
x86_64 and s390x. As we don't have nightly builds on ppc and darwin
those don't show up.
I use these abbreviations for distributions:
U10 = Ubuntu 10.10
F.. = Fedora ..
S11 = SLES 11
R4 = RHEL 4
2.6.37 = Rich Coe's run. I don't know what distribution he's using.
The x86 results are from running on my thinkpad.
x86 x86_64 s390x
--- ------------------------- --------
memcheck
err_disable3 U10 F15 F14 F13 F11 F9 2.6.37 F S11 R4
err_disable4 U10 F15 F14 F13 F11 F9 2.6.37 F S11 R4
origin5-bz2 U10 F15 F14 2.6.37
overlap F15
linux/stack_switch F14 F13 F11 2.6.37
long_namespace_xml F11
linux/timerfd-syscall F S11
manuel3 R4
partial_load_ok R4
varinfo6 R4
none
shell F15
amd64/bug132918 F11 F9
amd64/fxtract F11 F9
amd64/sse4-64 F11 F9
x86/fxtract F11 F9
helgrind
hg05_race2 F15
tc06_two_races_xml U10 F15 F14 F13 F11 F9 2.6.37 F S11 R4
pth_barrier3 F13 R4
tc18_semabuse F S11 R4
tc20_verifywrap F S11 R4
pth_barrier2 R4
tc09_bad_unlock R4
tc14_laog_dinphils R4
tc22_exit_w_lock U10
tc23_bogus_condwait U10
drd
tc04_free_lock F S11 R4
tc09_bad_unlock F S11 R4
tc23_bogus_condwait U10
exp-sgcheck
bad_percentify F15 F14 2.6.37
stackerr U10
gdbserver_tests
mcbreak F14 S11
mcclean_after_fork F14 S11
mcinfcallWSRU F14
mcleak F14 S11
mcmain_pic F14 S11
mcvabits F14 S11
mssnapshot F14 2.6.37 S11
nlpasssigalrm F14 S11
nlsigvgdb F14 S11
I've investigated some testcases (other than the s390x specific ones)
and there are a few that can be fixed easily:
Low hanging fruit
-----------------
none: shell
- needs new .exp file for that particular shell
memcheck: err_disable3
- backtrace noise
- use new filter (adapt helgrind/tests/filter_helgrind)
memcheck: overlap
- backtrace noise
- use new filter (adapt helgrind/tests/filter_helgrind)
memcheck: linux/stack_switch
- was fixed in r12033 for s390x
- perhaps that patch is applicable for others
exp-sgcheck: bad_percentify
- wrong line number
- most likely a GCC issue
- hypothesis: rewriting the testcase to use an explicit
assignment and eliminating the initialization should help
helgrind: tc06_two_races_xml
- backtrace noise
- I have a patch forthcoming...
none: amd64/bug132918
none: amd64/fxtract
none: amd64/sse4-64
none: x86/fxtract
- Not sure it's a low hanging fruit, but.....
- The testcases fail in the same way: nan vs -nan
- Hypothesis: fixing one will fix the others.
gdbserver_tests
- Not sure it's a low hanging fruit, but.....
- The testcases fail in the same way (although differently for F14
and S11)
- Hypothesis: fixing one will fix the others.
Real bugs
---------
memcheck: err_disable4
- backtrace noise
- errors are suppressed that shouldn't
- happens across the board (except on Fedora13)
Mysteries
---------
Some testcase fail intermittendly although "nothing has changed"...
I've observed this for helgrind's hg05_race2 and tc14_laog_dinphils.
When run 5 times in a row the latter fails 3 times and passes twice.
Increasing the sleep duration to 2 seconds, makes the testcase run
into a deadlock (sometimes). Note sure what to do about it...
If somebody has insight on this.. I'm all ears.
I will adapt the filter_helgrind script for memcheck which then should
fix any backtrace related regressions there.
Florian
|
|
From: Philippe W. <phi...@sk...> - 2011-10-02 07:57:54
|
> gdbserver_tests > - Not sure it's a low hanging fruit, but..... > - The testcases fail in the same way (although differently for F14 > and S11) > - Hypothesis: fixing one will fix the others. For what concerns F14 failures, a possible path of investigations is to use the standard gdbserver on one of the gdbserver executables, put a break at the same place as where valgrind gdbserver stops the executable, continue and and see if the same errors are shown. Then this means the problem is linked to the platform setup. Probably worth trying the 'yum' commands as indicated. For the S11 failure: I saw similar failures on other platforms, and installing the debug info fixed these. So, this is the first thing to try. Same for the 2.6.37 failure. Philippe |
|
From: Florian K. <br...@ac...> - 2011-10-05 02:45:28
|
On 10/02/2011 03:57 AM, Philippe Waroquiers wrote:
>> gdbserver_tests
>> - Not sure it's a low hanging fruit, but.....
>> - The testcases fail in the same way (although differently for F14
>> and S11)
>> - Hypothesis: fixing one will fix the others.
>
>
> For the S11 failure: I saw similar failures on other platforms,
> and installing the debug info fixed these. So, this is the first
> thing to try.
Right. IIRC Christian said that he did install the debug info but the
tests are still failing. Philippe, was there anything else that needed
to be done other than installing them? Perhaps a problem with a search path?
Florian
|
|
From: Julian S. <js...@ac...> - 2011-10-03 14:23:07
|
Excellent work, first of all. > quite possibly longer. So regression tests get little attention. > Which kind of defeats their purpose. Yes, exactly right. It would be great to move towards a scenario where the tests were robust enough that they could reliably get zero failures on important platforms. IMO the biggest single problem is backtrace noise; perhaps your Helgrind filtering scheme is the way to go? > Here is the status quo taken from the nightly build 2 nights ago. That's a useful table; unfortunately assembled by hand (I assume) so it isn't easily automated. (yes?) Need to get an arm-linux regtester running too. > memcheck > err_disable3 U10 F15 F14 F13 F11 F9 2.6.37 F S11 R4 > err_disable4 U10 F15 F14 F13 F11 F9 2.6.37 F S11 R4 I'll chase these. disable3 doesn't fail on U 10.04 (64-bit), but it looks pretty broken from the point of view of this summary. > origin5-bz2 U10 F15 F14 2.6.37 Two possible problems here: backtrace noise, but also there's something deeper wrong -- one of the origins is, on some platforms, listed as coming from a wrong (or at least different) place. > overlap F15 might be due to the memcpy vs memmove nightmare we struggled through a couple of months back; I don't know. > linux/stack_switch F14 F13 F11 2.6.37 Our stack-switch handling is rubbish; not sure if this test is ever going to work sanely. --- I'll look at disable3 and disable4 and these: > none: amd64/bug132918 > none: amd64/fxtract > none: amd64/sse4-64 > none: x86/fxtract > none: amd64/bug132918 > none: amd64/fxtract > none: amd64/sse4-64 > none: x86/fxtract > - Not sure it's a low hanging fruit, but..... > - The testcases fail in the same way: nan vs -nan > - Hypothesis: fixing one will fix the others. I think there was a change in the way glibc prints negative NaNs, at some point in the past, and that's (possibly) the cause of these. > Mysteries > --------- > Some testcase fail intermittendly although "nothing has changed"... > I've observed this for helgrind's hg05_race2 and tc14_laog_dinphils. > When run 5 times in a row the latter fails 3 times and passes twice. > Increasing the sleep duration to 2 seconds, makes the testcase run > into a deadlock (sometimes). Note sure what to do about it... tc14 is the Dining Philosophers with a braindead fork-allocation algorithm and so is expected to deadlock from time to time :-) One solution is to have yet another thread, which waits (eg) 3 seconds and then nukes the other threads, so as to get past any deadlock they may have encountered. (as in rescue_me in tc23_bogus_condwait) Or .. simpler .. just have the main thread nuke the child threads rather than pthread_joining them. Not sure how to do this, maybe with pthread_cancel. re hg05, what is the nature of the intermittent failure? It always works OK for me. > I will adapt the filter_helgrind script for memcheck which then should > fix any backtrace related regressions there. Cool. Overall, are you happy to continue driving this along? I am happy to fix/investigate failures if give directions about what to look at in what order. J |
|
From: Florian K. <br...@ac...> - 2011-10-03 16:54:16
|
On 10/03/2011 10:22 AM, Julian Seward wrote: > > IMO the biggest single > problem is backtrace noise; perhaps your Helgrind filtering scheme > is the way to go? > Yes. It has worked well there and I'm adapting it for memcheck. So don't bother with fixing backtrace noise related stuff there. Patch should be ready by tonight before the x86_64 builds kick off. > That's a useful table; unfortunately assembled by hand (I assume) > so it isn't easily automated. (yes?) > Yes. By hand. I didn't bother automating it because soonish we should have runs with zero failures :) Or so I hope. > >> linux/stack_switch F14 F13 F11 2.6.37 > > Our stack-switch handling is rubbish; not sure if this test is > ever going to work sanely. The failures on Fedora look exactly like those we had on s390. There the problem was in the syscall-wrapper. Might be worth to look at how it was fixed there. > I'll look at disable3 and disable4 and these: >> none: amd64/bug132918 >> none: amd64/fxtract >> none: amd64/sse4-64 >> none: x86/fxtract > Great! >> none: amd64/bug132918 >> none: amd64/fxtract >> none: amd64/sse4-64 >> none: x86/fxtract >> - Not sure it's a low hanging fruit, but..... >> - The testcases fail in the same way: nan vs -nan >> - Hypothesis: fixing one will fix the others. > > I think there was a change in the way glibc prints negative NaNs, > at some point in the past, and that's (possibly) the cause of these. > Sounds plausible. Perhaps the testcase can be rewritten to avoid it? Not sure. > > tc14 is the Dining Philosophers with a braindead fork-allocation algorithm > and so is expected to deadlock from time to time :-) One solution is > to have yet another thread, which waits (eg) 3 seconds and then nukes the > other threads, so as to get past any deadlock they may have encountered. > (as in rescue_me in tc23_bogus_condwait) > > Or .. simpler .. just have the main thread nuke the child threads > rather than pthread_joining them. Not sure how to do this, maybe with > pthread_cancel. > OK. I'll leave that as the last testcase to fix :) > > re hg05, what is the nature of the intermittent failure? It always > works OK for me. > It works for me, too. I noticed it in the Fedora 15 nightly run. It passed in the 09/30 build and failed the following night like so: ================================================= ./valgrind-new/helgrind/tests/hg05_race2.stderr.diff ================================================= --- hg05_race2.stderr.exp 2011-10-02 03:16:05.865487001 +0100 +++ hg05_race2.stderr.out 2011-10-02 03:24:46.251181409 +0100 @@ -1,4 +1,6 @@ +warning: evaluate_Dwarf3_Expr: unhandled DW_OP_ 0x........ +warning: evaluate_Dwarf3_Expr: unhandled DW_OP_ 0x........ ---Thread-Announcement------------------------------------------ Thread #x was created @@ -32,6 +34,8 @@ Location 0x........ is 0 bytes inside foo.poot[5].plop[11], declared at hg05_race2.c:24, in frame #x of thread x +warning: evaluate_Dwarf3_Expr: unhandled DW_OP_ 0x........ +warning: evaluate_Dwarf3_Expr: unhandled DW_OP_ 0x........ ---------------------------------------------------------------- Looking at svn log... what went in on those days seems unrelated. What would be really good is to modify the nightly scripts so they show what valgrind revision is actually tested and also some detail about the environment. See the z900 nightly build for an example. > > Overall, are you happy to continue driving this along? I am happy > to fix/investigate failures if give directions about what to look > at in what order. > Yes, I will move this along. Here are my next steps 1) reduce backtrace noise for memcheck (adapt filter-helgrind) 2) rename exp files where the testcase is actually failing, e.g. memcheck/badjump. I'll use exp-kfail to indicate the failure. 3) origin5-bz2 (some functions need noinline attribute the lack thereof is the cause of the noise) 4) Get rid of platform-specific exp files where possible. 5) Fix s390x specific things I'll send failing testcases that identify functional problems your way. Florian |
|
From: Julian S. <js...@ac...> - 2011-10-03 18:32:28
|
> >> none: amd64/bug132918 > >> none: amd64/fxtract > >> none: amd64/sse4-64 > >> none: x86/fxtract Yeah, these differ because glibc-2.8 prints both positive and negative nans as "nan", whereas more recent glibc prints "-nan" in the negative case. IOW the test programs themselves behave differently. Sigh. I don't fancy rewriting the zillions of printf statements in them to do some kind of abs() that applies only to nans, in order to get consistent output. Plus that would reduce the testing usefulness of the programs. Maybe have two expected output files, one for old glibc, one for newer? Lame, I know. J |
|
From: Julian S. <js...@ac...> - 2011-10-03 17:30:51
|
> > I'll look at disable3 and disable4 and these: disable3 is backtrace noise only, I think. I tried to fix it in r12087 but wasn't completely successful; but I guess you have that under control now, or soon will. disable4 didn't run to completion on 32 bit targets. It should do now (post r12086). > > re hg05, what is the nature of the intermittent failure? It always > > works OK for me. > > It works for me, too. I noticed it in the Fedora 15 nightly run. > It passed in the 09/30 build and failed the following night like so: > > ================================================= > ./valgrind-new/helgrind/tests/hg05_race2.stderr.diff > ================================================= > --- hg05_race2.stderr.exp 2011-10-02 03:16:05.865487001 +0100 > +++ hg05_race2.stderr.out 2011-10-02 03:24:46.251181409 +0100 > @@ -1,4 +1,6 @@ > > +warning: evaluate_Dwarf3_Expr: unhandled DW_OP_ 0x........ > +warning: evaluate_Dwarf3_Expr: unhandled DW_OP_ 0x........ Oh, right. So this is the Dwarf3 var info reader complaining that it doesn't understand something created by the gcc 4.6.x in F15. > Looking at svn log... what went in on those days seems unrelated. > > What would be really good is to modify the nightly scripts so they show > what valgrind revision is actually tested and also some detail about the > environment. See the z900 nightly build for an example. Yes, I noticed the s390x runs have much better info at the top. What needs to happen to make all the runs have the same info? Is it something that can be done by editing the scripts in trunk/nightly/ ? > I'll send failing testcases that identify functional problems your way. Cool. J |
|
From: Florian K. <br...@ac...> - 2011-10-03 20:21:16
|
On 10/03/2011 01:30 PM, Julian Seward wrote: > > disable4 didn't run to completion on 32 bit targets. It should > do now (post r12086). > Thanks. It also passes on my old s390x box. >> >> ================================================= >> ./valgrind-new/helgrind/tests/hg05_race2.stderr.diff >> ================================================= >> --- hg05_race2.stderr.exp 2011-10-02 03:16:05.865487001 +0100 >> +++ hg05_race2.stderr.out 2011-10-02 03:24:46.251181409 +0100 >> @@ -1,4 +1,6 @@ >> >> +warning: evaluate_Dwarf3_Expr: unhandled DW_OP_ 0x........ >> +warning: evaluate_Dwarf3_Expr: unhandled DW_OP_ 0x........ > > Oh, right. So this is the Dwarf3 var info reader complaining that it > doesn't understand something created by the gcc 4.6.x in F15. > Yes. What is curious is that the test passed on 09/30 and failed on 10/01? Assuming the compiler and libraries did not change on that system and svn has no changes in that area, it should pass. >> What would be really good is to modify the nightly scripts so they show >> what valgrind revision is actually tested and also some detail about the >> environment. See the z900 nightly build for an example. > > Yes, I noticed the s390x runs have much better info at the top. > What needs to happen to make all the runs have the same info? > Is it something that can be done by editing the scripts in > trunk/nightly/ ? > Integration into bin/nightly shouldn't be a problem. Currently, this is done in nightly/conf/z900.sendmail Would these queries work on Darwin: glibc_version="`/lib/libc.so.* | head -1`" uname_stuff="`uname -mrs`" vendor_stuff="`cat /etc/issue.net | head -1`" Florian |
|
From: Julian S. <js...@ac...> - 2011-10-05 07:51:52
|
> >> +warning: evaluate_Dwarf3_Expr: unhandled DW_OP_ 0x........ > >> +warning: evaluate_Dwarf3_Expr: unhandled DW_OP_ 0x........ > > > > Oh, right. So this is the Dwarf3 var info reader complaining that it > > doesn't understand something created by the gcc 4.6.x in F15. > > Yes. What is curious is that the test passed on 09/30 and failed on > 10/01? No idea. > Integration into bin/nightly shouldn't be a problem. > Currently, this is done in nightly/conf/z900.sendmail > Would these queries work on Darwin: > > glibc_version="`/lib/libc.so.* | head -1`" > uname_stuff="`uname -mrs`" > vendor_stuff="`cat /etc/issue.net | head -1`" No chance :-( libc isn't at /lib/libc* (it's in different directories on OSX 10.6 and 10.7, for maximum confusion value) and it's certainly not runnable directly: $ /usr/lib/system/libsystem_c.dylib -bash: /usr/lib/system/libsystem_c.dylib: cannot execute binary file uname -mrs is OK, producing: Darwin 11.1.0 i386 and there's an /etc directory, but not /etc/issue*. J |
|
From: Julian S. <js...@ac...> - 2011-10-05 11:07:23
|
> helgrind > hg05_race2 F15 This fails on F15 because the dwarf var-loc reader complains thusly --18932-- warning: evaluate_Dwarf3_Expr: unhandled DW_OP_ 0x93 0x93 is DW_OP_piece, which I think the chances of implementing are minimal in the current dwarf3 framework at least. Is it possible to filter this line out? The test does produce the right results otherwise, since that bit of dwarf isn't necessary to produce the test outputs, I suppose. So .. can we just filter the line out, for now? Also wrt Helgrind, on Sandy Bridge I get hangs on annotate_hbefore and tc08_hbl2. These un-hang when I (eg) scroll around in emacs on the same machine. I assume therefore they are memory coherence issues. Philippe, didn't you try out some memory fences in these or similar tests? I think we need to put some fences in these two to make them run reliably. J |
|
From: Philippe W. <phi...@sk...> - 2011-10-05 19:21:32
|
> Also wrt Helgrind, on Sandy Bridge I get hangs on annotate_hbefore > and tc08_hbl2. These un-hang when I (eg) scroll around in emacs on > the same machine. I assume therefore they are memory coherence issues. > > Philippe, didn't you try out some memory fences in these or similar > tests? I think we need to put some fences in these two to make them > run reliably. Florian did put a fence in the s390x. I understood that it helped but did not solve the problem reliably : https://bugs.kde.org/show_bug.cgi?id=268623 I also tried to put a fence on amd64, but with no improvement in the test. (as I have a close to 0 knowledge of this subject, I might have done a wrong fence, but as the problem persists both on s390x and amd64, there is maybe a need for a fence; but probably then not enough to avoid the hang). Philippe |
|
From: Philippe W. <phi...@sk...> - 2011-10-05 19:13:58
|
> Right. IIRC Christian said that he did install the debug info but the > tests are still failing. Philippe, was there anything else that needed > to be done other than installing them? Perhaps a problem with a search path? I do not remember having to do anything else than install the debug info. Are the same messages produced by gdb when debugging directly the process and/or when using the gdbserver part of the gdb distribution ? Philippe |
|
From: Christian B. <bor...@de...> - 2011-10-05 20:53:15
|
On 05/10/11 21:13, Philippe Waroquiers wrote:
>> Right. IIRC Christian said that he did install the debug info but the
>> tests are still failing. Philippe, was there anything else that needed
>> to be done other than installing them? Perhaps a problem with a search path?
> I do not remember having to do anything else than install the debug info.
>
> Are the same messages produced by gdb when debugging directly
> the process and/or when using the gdbserver part of the gdb distribution ?
>
> Philippe
>
Seems that the debug info on SLES11 is missing some parts of ld.so. The error
now look like:
[...]
+1 rtld.c: No such file or directory.
[...]
This patch fixes the problem:
--- gdbserver_tests/filter_vgdb (revision 12106)
+++ gdbserver_tests/filter_vgdb (working copy)
@@ -14,4 +14,7 @@
sed -e 's/\(relaying data between gdb and process \)[0-9][0-9]*/\1..../' \
-e 's/\(sending command .* to pid \)[0-9][0-9]*/\1..../' \
-e '/Cannot access memory at address 0x......../d' \
- -e '/^[1-9][0-9]* \.\.\/sysdeps\/powerpc\/powerpc32\/dl-start\.S: No such file or directory\./d'
+ -e '/^[1-9][0-9]* \.\.\/sysdeps\/powerpc\/powerpc32\/dl-start\.S: No such file or directory\./d' |
+
+# filter some debuginfo problems with ld.so and SLES11
+sed -e '/^1 rtld.c: No such file or directory\./d'
|
|
From: Christian B. <bor...@de...> - 2011-10-06 15:12:14
|
> --- gdbserver_tests/filter_vgdb (revision 12106) > +++ gdbserver_tests/filter_vgdb (working copy) > @@ -14,4 +14,7 @@ > sed -e 's/\(relaying data between gdb and process \)[0-9][0-9]*/\1..../' \ > -e 's/\(sending command .* to pid \)[0-9][0-9]*/\1..../' \ > -e '/Cannot access memory at address 0x......../d' \ > - -e '/^[1-9][0-9]* \.\.\/sysdeps\/powerpc\/powerpc32\/dl-start\.S: No such file or directory\./d' > + -e '/^[1-9][0-9]* \.\.\/sysdeps\/powerpc\/powerpc32\/dl-start\.S: No such file or directory\./d' | > + > +# filter some debuginfo problems with ld.so and SLES11 > +sed -e '/^1 rtld.c: No such file or directory\./d' > Any chance to apply that, to see if that works tonight? Christian |
|
From: Florian K. <br...@ac...> - 2011-10-08 12:47:34
|
On 10/06/2011 11:11 AM, Christian Borntraeger wrote: >> --- gdbserver_tests/filter_vgdb (revision 12106) >> +++ gdbserver_tests/filter_vgdb (working copy) >> @@ -14,4 +14,7 @@ >> sed -e 's/\(relaying data between gdb and process \)[0-9][0-9]*/\1..../' \ >> -e 's/\(sending command .* to pid \)[0-9][0-9]*/\1..../' \ >> -e '/Cannot access memory at address 0x......../d' \ >> - -e '/^[1-9][0-9]* \.\.\/sysdeps\/powerpc\/powerpc32\/dl-start\.S: No such file or directory\./d' >> + -e '/^[1-9][0-9]* \.\.\/sysdeps\/powerpc\/powerpc32\/dl-start\.S: No such file or directory\./d' | >> + >> +# filter some debuginfo problems with ld.so and SLES11 >> +sed -e '/^1 rtld.c: No such file or directory\./d' >> > > Any chance to apply that, to see if that works tonight? > Done in r12123. Sorry for the delay. I'm travelling and my thinkpad died.... Florian |
|
From: Tom H. <th...@cy...> - 2004-09-03 06:15:22
|
In message <109...@dr...>
Robert Walsh <rj...@du...> wrote:
> Do we have a good idea why there are regression test failures on certain
> platforms?
There's no single cause, equally none of the outstanding failures
is very easy to fix - the problems are generally in the tests rather
than in valgrind.
Tom
--
Tom Hughes (th...@cy...)
Software Engineer, Cyberscience Corporation
http://www.cyberscience.com/
|