Re: [Valgrind-developers] Zero fail tests (Was: Intent to rewrite `cg_annotate` in Python)

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

On 25-03-23 01:23, Nicholas Nethercote wrote:

> One way to do it is to divide the tests into "must pass on CI" and "the 
> rest". I suspect there are plenty of tests that work on all platforms, 
> which would give a lot of useful coverage from the start. Over time you 
> can hopefully move tests from the first category to the second.
> 
> The other way to do it is to divide the tests into "run on CI" and 
> "don't run on CI", i.e. exceptions, which does require a mechanism for 
> specifying those exceptions. In practice I think this works out much the 
> same as the first approach, because a test that consistently fails on 
> one platform isn't much use. (In fact, it can have negative value if its 
> presence masks new failures in other tests.)
> 
> One consequence of all this is that the CI platforms become gospel. E.g. 
> if a test passes on CI but fails locally, that's good enough. This is 
> fine in practice, assuming the CI platforms are reasonable choices.
> 
> Flaky tests can be a problem. For rare failures you can always just 
> trigger another CI run. For regular failures you should either fix the 
> test or disable it.

Our problems are different to most company testing systems that I've 
used. Typical examples of flakiness are threading nondeterminism with 
floating point, use of pointers as keys for ordered collections, . In a 
corporate environment I'm used to using standardized build and test 
machines, all running the same OS and using the same compiler and 
generally on similar hardware.

We do have a bit of thread non-determinism. Our build and test kit is 
pretty much a random bunch of bits and bobs. Since a large number of our 
tests are deliberately executing UB it's hard to have a set of 
deterministic and reliable reference results. Things often change with 
compiler or OS upgrades.

If we do go for CI (and I'm in favour of it) then I also think that we 
need to have some sort of tiering for platforms.

At the moment we have glibc Linux amd64/PPC/s390 and FreeBSD amd64 that 
are both fairly close to clean - less than 5 failures. After that it 
goes downhill fairly rapidly. Linux aarch64 has 17 errors mostly related 
to identifying variables i error messages. Last time I tried Solaris 
11.3 there were 20 or so failures, but there are many more on Illumos 
and Solaris 11.4. Alipne Linux (musl based) is a mess and macOS is still 
a basket case (counting on you Louis!).

So I would say
Tier 1 - as "officially" supported as we can manage
glibc Linux amd64/PPC/s390 and FreeBSD amd64

Tier 2 - best effort support
glibc Linux aarch64 and FreeBSD x86

Tier 3 - practically unsupported, try to get them to build for releases
all the rest

It's too early to tell how Loongson and riscv64 would fit in if/when 
they get merged.

A+
Paul