|
From: Mark W. <ma...@kl...> - 2023-03-24 11:52:56
|
Hi Nick, On Thu, 2023-03-23 at 21:51 +1100, Nicholas Nethercote wrote: > I threw a lot of ideas out in my earlier email, but this is the most > important one. Graydon Hoare expressed this years ago as: > > > The Not Rocket Science Rule Of Software Engineering: > > automatically maintain a repository of code that always passes all > > the tests > > This requires that all the tests pass before merging a change. Having > worked on projects that follow this and projects that don't, I say > with confidence that it's a good idea. If we could get that happening > for Valgrind, that alone would be a huge improvement over the status > quo. I looked at the sites you linked, but couldn't work out much > about how they work. > > W.r.t. failing tests, this would give great incentive to fix > currently failing tests (or disable them if they cannot be made > reliable) and to keep them passing. I completely agree with this sentiment. But how do you get there? And how do you cross the psychological barrier. I mean that it feels like cheating to just disable failing or flaky tests. They might fail on some, but not all setups. Or they might even be just flaky depending on CPU model (I think I saw some failures with an AMD Ryzen processor, which succeeded on an Intel Xeon processor). What should our policy be to get to zero fail? Does that mean a test should always pass on any arch/setup? Or do we make exceptions for tests that fail on some setups? Do we keep an "exception list" based on...? What do we do with the "removed" (or excepted) tests? Do those turn into high priority bugs instead? What about new ports, they often start with a bunch of failing tests. e.g. for x86_64 we do have memcheck/tests/overlap which fails on newer glibc with certain processors where glibc might use an ifunc to point both memcpy and memmove to the same function, which confuses our intercept code. https://bugs.kde.org/show_bug.cgi?id=402833 It works fine on some (older or not x86_64) setups though. I would love to get to "zero fail" but what should we do to get there and what should our policy be to keep it given that we don't fully control our environment and some (new) failures simply come from upgrading glibc or the compiler or even the cpu. Cheers, Mark |