|
From: Paul F. <pj...@wa...> - 2023-03-25 20:31:57
|
On 25-03-23 01:23, Nicholas Nethercote wrote: > One way to do it is to divide the tests into "must pass on CI" and "the > rest". I suspect there are plenty of tests that work on all platforms, > which would give a lot of useful coverage from the start. Over time you > can hopefully move tests from the first category to the second. > > The other way to do it is to divide the tests into "run on CI" and > "don't run on CI", i.e. exceptions, which does require a mechanism for > specifying those exceptions. In practice I think this works out much the > same as the first approach, because a test that consistently fails on > one platform isn't much use. (In fact, it can have negative value if its > presence masks new failures in other tests.) > > One consequence of all this is that the CI platforms become gospel. E.g. > if a test passes on CI but fails locally, that's good enough. This is > fine in practice, assuming the CI platforms are reasonable choices. > > Flaky tests can be a problem. For rare failures you can always just > trigger another CI run. For regular failures you should either fix the > test or disable it. Our problems are different to most company testing systems that I've used. Typical examples of flakiness are threading nondeterminism with floating point, use of pointers as keys for ordered collections, . In a corporate environment I'm used to using standardized build and test machines, all running the same OS and using the same compiler and generally on similar hardware. We do have a bit of thread non-determinism. Our build and test kit is pretty much a random bunch of bits and bobs. Since a large number of our tests are deliberately executing UB it's hard to have a set of deterministic and reliable reference results. Things often change with compiler or OS upgrades. If we do go for CI (and I'm in favour of it) then I also think that we need to have some sort of tiering for platforms. At the moment we have glibc Linux amd64/PPC/s390 and FreeBSD amd64 that are both fairly close to clean - less than 5 failures. After that it goes downhill fairly rapidly. Linux aarch64 has 17 errors mostly related to identifying variables i error messages. Last time I tried Solaris 11.3 there were 20 or so failures, but there are many more on Illumos and Solaris 11.4. Alipne Linux (musl based) is a mess and macOS is still a basket case (counting on you Louis!). So I would say Tier 1 - as "officially" supported as we can manage glibc Linux amd64/PPC/s390 and FreeBSD amd64 Tier 2 - best effort support glibc Linux aarch64 and FreeBSD x86 Tier 3 - practically unsupported, try to get them to build for releases all the rest It's too early to tell how Loongson and riscv64 would fit in if/when they get merged. A+ Paul |