|
From: Nicholas N. <n.n...@gm...> - 2023-03-25 00:25:33
|
On Fri, 24 Mar 2023 at 22:25, Mark Wielaard <ma...@kl...> wrote: > > We aren't (yet?) using all of them (and some of them would mean moving > over bugzilla and the mailinglist, which might be controversial). But > I'll at least add the buildbot CI testers to the website (and we should > at least make use of the try-branches) this weekend. > Great! I'd be happy to try this out. Though I guess I'd need to do a no-change try run before testing a real change, to give a baseline of expected test failures, right? Nick |
|
From: Mark W. <ma...@kl...> - 2023-04-01 22:04:26
|
Hi, On Sat, Mar 25, 2023 at 11:25:12AM +1100, Nicholas Nethercote wrote: > On Fri, 24 Mar 2023 at 22:25, Mark Wielaard <ma...@kl...> wrote: > > We aren't (yet?) using all of them (and some of them would mean moving > > over bugzilla and the mailinglist, which might be controversial). But > > I'll at least add the buildbot CI testers to the website (and we should > > at least make use of the try-branches) this weekend. > > Great! I'd be happy to try this out. Though I guess I'd need to do a > no-change try run before testing a real change, to give a baseline of > expected test failures, right? I setup the user try branches on sourceware (just for the setups that have passing aux-tests) debian-arm64, debian-armhf, debian-i386, debian-ppc64, fedora-ppc64le, fedora-s390x, ibm-power9, opensuseleap-x86_64. And added instructions on how to use them into README_DEVELOPERS: Every developer with commit access can use try branches. Code committed to a try branch will be build by the buildbot at builder.sourceware.org https://builder.sourceware.org/buildbot/#/builders?tags=valgrind-try If you want to try a commit you can push to a special named try branch (users/<your-user-name>/try-<topic>) as follows: git checkout -b frob ...hack, hack, hack... OK, looks good to submit git commit -a -m "Awesome hack" git push origin frob:users/username/try-frob When all builders have build your patch the buildbot will sent you (or actually the patch author) an email telling you if any builds failed and references to all the logs. You can also find the logs and the builds here: https://builder.sourceware.org/buildbot/#/builders?tags=valgrind-try Afterwards you can delete the branch again: git push origin :users/username/try-frob |
|
From: Paul F. <pj...@wa...> - 2023-03-25 20:31:57
|
On 25-03-23 01:23, Nicholas Nethercote wrote: > One way to do it is to divide the tests into "must pass on CI" and "the > rest". I suspect there are plenty of tests that work on all platforms, > which would give a lot of useful coverage from the start. Over time you > can hopefully move tests from the first category to the second. > > The other way to do it is to divide the tests into "run on CI" and > "don't run on CI", i.e. exceptions, which does require a mechanism for > specifying those exceptions. In practice I think this works out much the > same as the first approach, because a test that consistently fails on > one platform isn't much use. (In fact, it can have negative value if its > presence masks new failures in other tests.) > > One consequence of all this is that the CI platforms become gospel. E.g. > if a test passes on CI but fails locally, that's good enough. This is > fine in practice, assuming the CI platforms are reasonable choices. > > Flaky tests can be a problem. For rare failures you can always just > trigger another CI run. For regular failures you should either fix the > test or disable it. Our problems are different to most company testing systems that I've used. Typical examples of flakiness are threading nondeterminism with floating point, use of pointers as keys for ordered collections, . In a corporate environment I'm used to using standardized build and test machines, all running the same OS and using the same compiler and generally on similar hardware. We do have a bit of thread non-determinism. Our build and test kit is pretty much a random bunch of bits and bobs. Since a large number of our tests are deliberately executing UB it's hard to have a set of deterministic and reliable reference results. Things often change with compiler or OS upgrades. If we do go for CI (and I'm in favour of it) then I also think that we need to have some sort of tiering for platforms. At the moment we have glibc Linux amd64/PPC/s390 and FreeBSD amd64 that are both fairly close to clean - less than 5 failures. After that it goes downhill fairly rapidly. Linux aarch64 has 17 errors mostly related to identifying variables i error messages. Last time I tried Solaris 11.3 there were 20 or so failures, but there are many more on Illumos and Solaris 11.4. Alipne Linux (musl based) is a mess and macOS is still a basket case (counting on you Louis!). So I would say Tier 1 - as "officially" supported as we can manage glibc Linux amd64/PPC/s390 and FreeBSD amd64 Tier 2 - best effort support glibc Linux aarch64 and FreeBSD x86 Tier 3 - practically unsupported, try to get them to build for releases all the rest It's too early to tell how Loongson and riscv64 would fit in if/when they get merged. A+ Paul |
|
From: Austin E. <aus...@gm...> - 2023-03-26 08:07:10
|
On Sat, Mar 25, 2023 at 3:32 PM Paul Floyd <pj...@wa...> wrote: > > On 25-03-23 01:23, Nicholas Nethercote wrote: > > > One way to do it is to divide the tests into "must pass on CI" and "the > > rest". I suspect there are plenty of tests that work on all platforms, > > which would give a lot of useful coverage from the start. Over time you > > can hopefully move tests from the first category to the second. > > > > The other way to do it is to divide the tests into "run on CI" and > > "don't run on CI", i.e. exceptions, which does require a mechanism for > > specifying those exceptions. In practice I think this works out much the > > same as the first approach, because a test that consistently fails on > > one platform isn't much use. (In fact, it can have negative value if its > > presence masks new failures in other tests.) > > > > One consequence of all this is that the CI platforms become gospel. E.g. > > if a test passes on CI but fails locally, that's good enough. This is > > fine in practice, assuming the CI platforms are reasonable choices. > > > > Flaky tests can be a problem. For rare failures you can always just > > trigger another CI run. For regular failures you should either fix the > > test or disable it. > > Our problems are different to most company testing systems that I've > used. Typical examples of flakiness are threading nondeterminism with > floating point, use of pointers as keys for ordered collections, . In a > corporate environment I'm used to using standardized build and test > machines, all running the same OS and using the same compiler and > generally on similar hardware. > > We do have a bit of thread non-determinism. Our build and test kit is > pretty much a random bunch of bits and bobs. Since a large number of our > tests are deliberately executing UB it's hard to have a set of > deterministic and reliable reference results. Things often change with > compiler or OS upgrades. > > If we do go for CI (and I'm in favour of it) then I also think that we > need to have some sort of tiering for platforms. > > At the moment we have glibc Linux amd64/PPC/s390 and FreeBSD amd64 that > are both fairly close to clean - less than 5 failures. After that it > goes downhill fairly rapidly. Linux aarch64 has 17 errors mostly related > to identifying variables i error messages. Last time I tried Solaris > 11.3 there were 20 or so failures, but there are many more on Illumos > and Solaris 11.4. Alipne Linux (musl based) is a mess and macOS is still > a basket case (counting on you Louis!). > > So I would say > Tier 1 - as "officially" supported as we can manage > glibc Linux amd64/PPC/s390 and FreeBSD amd64 > > Tier 2 - best effort support > glibc Linux aarch64 and FreeBSD x86 > > Tier 3 - practically unsupported, try to get them to build for releases > all the rest > > It's too early to tell how Loongson and riscv64 would fit in if/when > they get merged. > > A+ > Paul > For what it's worth, Wine deals with similar issues: * tests that are expected to work on windows, but not wine * tests that work on some windows versions, but not others (fwiw, generally, newer behavior is accepted) * flaky tests * tests that are expected to work on win32, but not win64) To work around that, there are various macros that are used to test for the appropriate conditions (and a generic todo_wine_if() that can be used to check for non-generic cases). See: https://gitlab.winehq.org/wine/wine/-/blob/master/include/wine/test.h Valgrind could do similar; add macros for various platforms/"feature levels", and consider the test as failed if it fails on those platforms. For platforms that it's expected to fail, ignore failure, but exit with non-zero status if the test succeeds. -- -Austin GPG: 267B CC1F 053F 0749 (expires 2024/02/17) |