Re: [Sbcl-devel] on broken builds and tests

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

On Mon, Jul 11, 2005 at 11:55:27AM +0100, Christophe Rhodes wrote:
> William Harold Newman <wil...@ai...> writes:
> 
> > Incidentally, committers, y'all are probably well aware of this, but
> > I'll repeat anyway: when the build (or tests, in this case) is broken,
> > it can cause quite a lot of friction distributed over the project. I,
> > for example, have been sitting on my detabification patch for about a
> > week, waiting for a version of SBCL which passes its tests so that I
> > can merge my patch and have a reasonable expectation that as long as I
> > didn't break something, the result will pass tests itself.
> 
> I believe you should be free to go now :-) If you're quick, you can
> get in there before anyone else breaks the tests again...

Thank you; soon we should see whether my global detabification scripts
might cause mysterious breakage deep in #+SB-SHOW
./src/assembly/sparc/ or thereabouts.

> > Opinions, other ideas, etc.? 
> 
> I think I would start off by attempting a technical solution, because
> although this is a social problem I don't think it's the kind that
> can't be nudged away by a technical rearrangement of the indolence
> landscape.  So to start with I think I would suggest at least
> attempting to do distributed autobuilding and testing: we already have
> a start to this, with Brian Mastenbrook's scripts; if the collation of
> results could be arranged so that planet.sbcl.org (and probably some
> friendly mailer-daemon) can summarize the extant problems in
> easily-digestible form, then maybe people with a spare half-hour would
> be more likely to stop the nuisance reminder e-mails...

Any usefully multiplatform automatic test rig seems like an outcome
much to be desired. However, I was and remain a little pessimistic
because of difficulty in getting to that outcome. As far as I can see,
there's a physical prerequisite of reliable machines with reliable net
connections up 24/7 (or, for 75% partial credit, at least up for some
predictable time window every day) where the owner doesn't mind being
hit unpredictably with multiple arbitrarily-buggy SBCL builds and
tests. This size of this difficulty may be temporary; last year I
finally got my nice little $6/month website where I can run CGI stuff
to my heart's content, and for all I know, by 2007 Moore's Law might
upgrade this to $6/month dedicated Linux/FreeBSD boxes (maybe called
"matchboxes" instead of just "boxes") suitable for SBCL tests. But as
long as we're stuck here with 2005 physical technology, the physical
prerequisites seem like enough of a problem that we might not get such
a test rig without social technology so advanced as to be
indistinguishable from magic -- hardware purchase and sysadmin support
donated by some happy large commercial SBCL user, or some such thing.

Vaguely-rt-style tests don't seem to be as good an outcome as a
multiplatform test rig, but I'm more optimistic about them existing in
the near term. I'm more optimistic primarily because we could get
there without new physical resources (and stay there without new
ongoing sysadminish physical maintenance headaches), but also
secondarily because we can move that direction incrementally without
any daunting first steps. There is nonzero cost to setting up
soft-failure mechanisms; part of the reason that I initially set up
the test suite to die on a single failure is that it's dead simple to
do it that way, while rt-style management of all the different test
styles (.pure.lisp, .test.sh, etc.) would require more detail work.
But I doubt that most people here would find it all that challenging
to implement a usable failure counter/reporter with a suitably general
interface, so it doesn't seem like a terribly big first step. Then,
even without doing a big bang conversion of all the tests to rt-style,
once the failure counting mechanism is hanging on the outside,
individual tests or individual families of tests could be converted
opportunistically. In particular, even if something like the failed
foreign.test.sh test hasn't converted to rt-style soft failure before
a hangup like this occurs, it could be straightforward to convert it
to soft failure style once the problem arises, in order to resume
doing (hopefully-)platform-independent maintenance while the
foreign.test.sh bug is being chased.

> If tests/ is to be adjusted to make it more fault-tolerant, I would
> suggest identifying some core platform-people combinations such that
> on those platforms the failure would still be a "hard" one: the
> rationale is to prevent the fault-tolerance from encouraging a
> (social, again) tolerance of faults.  This doesn't have to be a
> terribly formal arrangement: nothing much more than the current
> statement
>   "All tests should pass on x86/Linux, x86/FreeBSD4, and ppc/Darwin"
> except of course that in an ideal world the tests would pass without
> some of them being conditionalized out...

Yes, that particular policy sounds reasonable. And my enthusiasm for
general "don't tolerate bugs" policy/culture (or perhaps "increasing
the number of features is not an adequate justification for increasing
the number of failures in existing features") remains high, enough
that if a switch to a soft-failure test suite seemed to be encouraging
bad commits, I'd probably start to think that we should switch back.
However, I hope that even a soft-failure test suite won't tempt anyone
to commit changes which increase the number of failures on his own
machine, and so for something like 90% of the maintenance on the
system, it seems as though the cultural effect should be small. Then,
as long as >80% of maintenance remains socially fault-intolerant, I
doubt there's room for a subculture of sloppiness to grow in what's
left over. (even if we had any seeds for such in our existing
healthily paranoid detail-obsessed intolerant baseline:-)

-- 
William Harold Newman <wil...@ai...>
PGP key fingerprint 85 CE 1C BA 79 8D 51 8C  B9 25 FB EE E0 C3 E5 7C
"Tweak alpha so it sends SIGBUS for unaligned access, and does NOT do a fixup.
This encourages people to fix their code." - http://www.OpenBSD.org/plus29.html

Re: [Sbcl-devel] on broken builds and tests

Common Lisp compiler and runtime

Re: [Sbcl-devel] on broken builds and tests