|
From: Philippe W. <phi...@sk...> - 2016-09-24 12:04:45
|
Mark/Ivo, I am (now?) seeing random failures of helgrind|drd/tests/bar_bad* (also now seeing failures in nightly builds). I have encountered such failures on amd64/debian 8, and on ppc64/gcc110. I think (not sure) this was working some days/weeks ago. Any idea? Philippe |
|
From: Mark W. <mj...@re...> - 2016-09-24 20:12:27
|
On Sat, Sep 24, 2016 at 02:04:34PM +0200, Philippe Waroquiers wrote: > Mark/Ivo, > > I am (now?) seeing random failures of helgrind|drd/tests/bar_bad* > (also now seeing failures in nightly builds). > > I have encountered such failures on amd64/debian 8, and on ppc64/gcc110. > > I think (not sure) this was working some days/weeks ago. > > Any idea? No, not really. I believe this test never really was deterministly good. But the recent changes to it definitely made it worse. The change committed makes the testcase no longer hang against newer glibc pthread_barrier implementations by adding a timer that triggers after a short delay canceling the barrier if it looks like the test was hanging. Unfortunately this seems to have made the testcase much more nondeterministic than before. It succeeds more often than it fails for me (both on old and new glibc), but it definitely does fail randomly. See also the (still open) bug report: https://bugs.kde.org/show_bug.cgi?id=358213 |
|
From: Petar J. <mip...@gm...> - 2016-11-10 17:17:17
|
On Sat, Sep 24, 2016 at 2:04 PM, Philippe Waroquiers <phi...@sk...> wrote: > Mark/Ivo, > > I am (now?) seeing random failures of helgrind|drd/tests/bar_bad* > (also now seeing failures in nightly builds). > > I have encountered such failures on amd64/debian 8, and on ppc64/gcc110. > Hi Philippe, Can you check if the following patch helps with those failures? https://bugsfiles.kde.org/attachment.cgi?id=102144 Regards, Petar |
|
From: Philippe W. <phi...@sk...> - 2016-11-11 10:04:48
|
On Thu, 2016-11-10 at 18:17 +0100, Petar Jovanovic wrote: > On Sat, Sep 24, 2016 at 2:04 PM, Philippe Waroquiers > <phi...@sk...> wrote: > > Mark/Ivo, > > > > I am (now?) seeing random failures of helgrind|drd/tests/bar_bad* > > (also now seeing failures in nightly builds). > > > > I have encountered such failures on amd64/debian 8, and on ppc64/gcc110. > > > Hi Philippe, > > Can you check if the following patch helps with those failures? > > https://bugsfiles.kde.org/attachment.cgi?id=102144 > Hello Petar, I tried but this does not (fully?) solve the problem. Based on a few trials, I think the effect is: * helgrind test seems to succeed (more?) systematically, but takes a lot longer (around one minute, burning a lot of CPU, against about 10 seconds unpatched). * drd tests seems to fail (more?) systematically Thanks for looking at this, looks like this test is a nightmare (Mark already spent significant time to try to make it deterministic and work on all/most distros). Philippe |
|
From: Petar J. <mip...@gm...> - 2016-11-15 18:30:20
|
Hi Philippe, > * helgrind test seems to succeed (more?) systematically, but takes a lot > longer (around one minute, burning a lot of CPU, against about 10 > seconds unpatched). This is a bit unusual. I have seen awfully long executions with the current trunk (i.e. without any patches) on my multicore x86 server ( Intel(R) Xeon(R) CPU E3-1220 V2 @ 3.10GHz). $ time perl tests/vg_regtest helgrind/tests/bar_bad.vgtest bar_bad: valgrind -q ./bar_bad == 1 test, 0 stderr failures, 0 stdout failures, 0 stderrB failures, 0 stdoutB failures, 0 post failures == real 19m57.592s user 20m5.572s sys 0m1.680s The previous patch was wrong, at least with the current DRD behaviour. Would the following one work for you? https://bugsfiles.kde.org/attachment.cgi?id=102247 bar_bad fails consistently on some systems and it would be good if we could find some solution for it. Regards, Petar |
|
From: Philippe W. <phi...@sk...> - 2016-11-17 22:09:28
|
On Tue, 2016-11-15 at 19:30 +0100, Petar Jovanovic wrote: > Hi Philippe, > > > * helgrind test seems to succeed (more?) systematically, but takes a > lot > > longer (around one minute, burning a lot of CPU, against about 10 > > seconds unpatched). > > This is a bit unusual. I have seen awfully long executions with the > current > trunk (i.e. without any patches) on my multicore x86 server ( > Intel(R) Xeon(R) CPU E3-1220 V2 @ 3.10GHz). > > $ time perl tests/vg_regtest helgrind/tests/bar_bad.vgtest > bar_bad: valgrind -q ./bar_bad > > == 1 test, 0 stderr failures, 0 stdout failures, 0 stderrB failures, 0 > stdoutB failures, 0 post failures == > > > real 19m57.592s > user 20m5.572s > sys 0m1.680s I think these (sometimes) very long execution times are due to the default (unfair) way the valgrind scheduler works, which might cause starvation. Adding options --fair-sched=try for the 3 */tests/bar_bad*vgtest should ensure a reasonable bounded time. Philippe |
|
From: Philippe W. <phi...@sk...> - 2016-11-17 21:28:28
|
On Tue, 2016-11-15 at 19:30 +0100, Petar Jovanovic wrote: > Would the following one work for you? > > https://bugsfiles.kde.org/attachment.cgi?id=102247 > > bar_bad fails consistently on some systems and it would be good if we > could > find some solution for it. Yes, that would be really good. And your last patch looks to be on the good way. I have run: perl tests/vg_regtest --loop-till-fail helgrind/tests/bar_bad*vgtest drd/tests/bar_bad*vgtest and it has done now already 60 tests without failing or blocking. So, seems nice ... Philippe |
|
From: Petar J. <mip...@gm...> - 2016-11-21 19:19:32
|
On Thu, Nov 17, 2016 at 10:30 PM, Philippe Waroquiers <phi...@sk...> wrote: > On Tue, 2016-11-15 at 19:30 +0100, Petar Jovanovic wrote: >> Would the following one work for you? >> >> https://bugsfiles.kde.org/attachment.cgi?id=102247 >> >> bar_bad fails consistently on some systems and it would be good if we >> could >> find some solution for it. > Yes, that would be really good. > > And your last patch looks to be on the good way. > I have run: > perl tests/vg_regtest --loop-till-fail helgrind/tests/bar_bad*vgtest drd/tests/bar_bad*vgtest > > and it has done now already 60 tests without failing or blocking. > > So, seems nice ... > Mark, should we change the test this way? Regards, Petar |
|
From: Philippe W. <phi...@sk...> - 2016-11-21 21:38:44
|
On Mon, 2016-11-21 at 20:19 +0100, Petar Jovanovic wrote: > On Thu, Nov 17, 2016 at 10:30 PM, Philippe Waroquiers > <phi...@sk...> wrote: > > On Tue, 2016-11-15 at 19:30 +0100, Petar Jovanovic wrote: > >> Would the following one work for you? > >> > >> https://bugsfiles.kde.org/attachment.cgi?id=102247 > >> > >> bar_bad fails consistently on some systems and it would be good if we > >> could > >> find some solution for it. > > Yes, that would be really good. > > > > And your last patch looks to be on the good way. > > I have run: > > perl tests/vg_regtest --loop-till-fail helgrind/tests/bar_bad*vgtest drd/tests/bar_bad*vgtest > > > > and it has done now already 60 tests without failing or blocking. > > > > So, seems nice ... > > > > Mark, should we change the test this way? Assuming Mark also thinks this is a good patch, I would just add --fair-sched=try as fair sched avoids the very variable and sometimes very long run time for these tests. I have not yet investigated where/why we need fair scheduler for these tests to have a bounded reproducible cpu time, but I have added this on my list of things to do. I also would like to re-measure the performance impact of the fair scheduler on (more) recent kernels, as the unfair scheduler is really causing strange behaviours. If performance impact is less significant than a few years ago, we might decide to set --fair-sched=try as the default value for the next release. Philippe |
|
From: Mark W. <mj...@re...> - 2016-11-22 13:21:01
|
On Mon, 2016-11-21 at 20:19 +0100, Petar Jovanovic wrote: > On Thu, Nov 17, 2016 at 10:30 PM, Philippe Waroquiers > <phi...@sk...> wrote: > > On Tue, 2016-11-15 at 19:30 +0100, Petar Jovanovic wrote: > >> Would the following one work for you? > >> > >> https://bugsfiles.kde.org/attachment.cgi?id=102247 > >> > >> bar_bad fails consistently on some systems and it would be good if we > >> could > >> find some solution for it. > > Yes, that would be really good. > > > > And your last patch looks to be on the good way. > > I have run: > > perl tests/vg_regtest --loop-till-fail helgrind/tests/bar_bad*vgtest drd/tests/bar_bad*vgtest > > > > and it has done now already 60 tests without failing or blocking. > > > > So, seems nice ... > > Mark, should we change the test this way? Yes. The patch makes both the helgrind and drd bar_bad testcase reliably PASS for me on x86_64 against glibc version 2.17 and 2.23. Thanks, Mark |
|
From: Petar J. <mip...@gm...> - 2016-11-23 17:43:59
|
On Tue, Nov 22, 2016 at 2:20 PM, Mark Wielaard <mj...@re...> wrote: > Yes. The patch makes both the helgrind and drd bar_bad testcase reliably > PASS for me on x86_64 against glibc version 2.17 and 2.23. > Committed in r16154. Regards, Petar |
|
From: Mark W. <mj...@re...> - 2016-11-22 13:26:50
|
On Mon, 2016-11-21 at 22:40 +0100, Philippe Waroquiers wrote: > On Mon, 2016-11-21 at 20:19 +0100, Petar Jovanovic wrote: > > On Thu, Nov 17, 2016 at 10:30 PM, Philippe Waroquiers > > <phi...@sk...> wrote: > > > On Tue, 2016-11-15 at 19:30 +0100, Petar Jovanovic wrote: > > >> Would the following one work for you? > > >> > > >> https://bugsfiles.kde.org/attachment.cgi?id=102247 > > >> > > >> bar_bad fails consistently on some systems and it would be good if we > > >> could > > >> find some solution for it. > > > Yes, that would be really good. > > > > > > And your last patch looks to be on the good way. > > > I have run: > > > perl tests/vg_regtest --loop-till-fail helgrind/tests/bar_bad*vgtest drd/tests/bar_bad*vgtest > > > > > > and it has done now already 60 tests without failing or blocking. > > > > > > So, seems nice ... > > > > > > > Mark, should we change the test this way? > Assuming Mark also thinks this is a good patch, I would just add > --fair-sched=try > as fair sched avoids the very variable and sometimes very long > run time for these tests. Adding --fair-sched=try to vgopts for these three tests doesn't seem to change anything for me. But I haven't seen long delays even without --fair-sched=try. If it helps in other cases then please do add it. Cheers, Mark |
|
From: Petar J. <mip...@gm...> - 2016-11-23 17:41:45
|
On Tue, Nov 22, 2016 at 2:26 PM, Mark Wielaard <mj...@re...> wrote: > > Adding --fair-sched=try to vgopts for these three tests doesn't seem to > change anything for me. But I haven't seen long delays even without > --fair-sched=try. If it helps in other cases then please do add it. > I can confirm what Philippe says, that is that adding --fair-sched=try avoids the long executions. Regards, Petar |