|
From: Austin E. <aus...@gm...> - 2023-03-26 08:07:10
|
On Sat, Mar 25, 2023 at 3:32 PM Paul Floyd <pj...@wa...> wrote: > > On 25-03-23 01:23, Nicholas Nethercote wrote: > > > One way to do it is to divide the tests into "must pass on CI" and "the > > rest". I suspect there are plenty of tests that work on all platforms, > > which would give a lot of useful coverage from the start. Over time you > > can hopefully move tests from the first category to the second. > > > > The other way to do it is to divide the tests into "run on CI" and > > "don't run on CI", i.e. exceptions, which does require a mechanism for > > specifying those exceptions. In practice I think this works out much the > > same as the first approach, because a test that consistently fails on > > one platform isn't much use. (In fact, it can have negative value if its > > presence masks new failures in other tests.) > > > > One consequence of all this is that the CI platforms become gospel. E.g. > > if a test passes on CI but fails locally, that's good enough. This is > > fine in practice, assuming the CI platforms are reasonable choices. > > > > Flaky tests can be a problem. For rare failures you can always just > > trigger another CI run. For regular failures you should either fix the > > test or disable it. > > Our problems are different to most company testing systems that I've > used. Typical examples of flakiness are threading nondeterminism with > floating point, use of pointers as keys for ordered collections, . In a > corporate environment I'm used to using standardized build and test > machines, all running the same OS and using the same compiler and > generally on similar hardware. > > We do have a bit of thread non-determinism. Our build and test kit is > pretty much a random bunch of bits and bobs. Since a large number of our > tests are deliberately executing UB it's hard to have a set of > deterministic and reliable reference results. Things often change with > compiler or OS upgrades. > > If we do go for CI (and I'm in favour of it) then I also think that we > need to have some sort of tiering for platforms. > > At the moment we have glibc Linux amd64/PPC/s390 and FreeBSD amd64 that > are both fairly close to clean - less than 5 failures. After that it > goes downhill fairly rapidly. Linux aarch64 has 17 errors mostly related > to identifying variables i error messages. Last time I tried Solaris > 11.3 there were 20 or so failures, but there are many more on Illumos > and Solaris 11.4. Alipne Linux (musl based) is a mess and macOS is still > a basket case (counting on you Louis!). > > So I would say > Tier 1 - as "officially" supported as we can manage > glibc Linux amd64/PPC/s390 and FreeBSD amd64 > > Tier 2 - best effort support > glibc Linux aarch64 and FreeBSD x86 > > Tier 3 - practically unsupported, try to get them to build for releases > all the rest > > It's too early to tell how Loongson and riscv64 would fit in if/when > they get merged. > > A+ > Paul > For what it's worth, Wine deals with similar issues: * tests that are expected to work on windows, but not wine * tests that work on some windows versions, but not others (fwiw, generally, newer behavior is accepted) * flaky tests * tests that are expected to work on win32, but not win64) To work around that, there are various macros that are used to test for the appropriate conditions (and a generic todo_wine_if() that can be used to check for non-generic cases). See: https://gitlab.winehq.org/wine/wine/-/blob/master/include/wine/test.h Valgrind could do similar; add macros for various platforms/"feature levels", and consider the test as failed if it fails on those platforms. For platforms that it's expected to fail, ignore failure, but exit with non-zero status if the test succeeds. -- -Austin GPG: 267B CC1F 053F 0749 (expires 2024/02/17) |