From: Christophe R. <cs...@ca...> - 2006-01-16 14:09:42
|
Nathan Froyd <fr...@cs...> writes: > On Sun, Jan 15, 2006 at 06:29:02PM +0000, Christophe Rhodes wrote: >> It appears that this change has an effect on the SBCL regression >> tests, at least on x86/linux. Empirically, on x86/linux, versions >> 0.9.8.35 and later suffer failure in mop-[347].impure-cload.lisp when >> run as part of the complete regression test suite >> sh ./run-tests.sh >> They do _not_ fail if run in isolation >> sh ./run-tests.sh mop-3.impure-cload.lisp >> or even as part of a large group of tests >> sh ./run-tests.sh clos* mop* >> >> To add to the confusion, it would appear that sbcl on neither >> sparc/sunos5.8 nor x86-64/linux suffers from this problem. >> Additionally, of course, I'm baffled as to how this commit could cause >> anything like this symptom at all. > > FWIW, this problem does not happen on ppc/osx. (I didn't actually do a > full build to test this patch; I just built the contribs. Silly me for > assuming the effects of the patch would be confined to the contrib > directory.) Heh. > I am baffled, too. Perhaps one of the "pure" tests is, in fact, not? > But then why would it matter only on x86/linux? Is it an artifact of > the particular system? We've been doing a fair amount of *boggling* on IRC this (GMT) morning. It turns out that indeed one of the "pure" tests is not pure: interface.pure.lisp defines three (look carefully!) classes. However, this doesn't in fact explain anything: why should the introduction of some classes matter? Well, welcome back to discriminating-function world. The discriminating functions for accessors take various different flavours: for this current discussion, we shall restrict our attention to two-class and index, but there are more details on <http://www.sbcl.org/sbcl-internals>. The original generic function metacircle problem was seen when instantiating the second subclass of standard-generic-function; it occurred because the generic function accessor which extracts various pieces of information from a generic function is needed in the code path which updates an accessor's discriminating function; since it is itself an accessor, hilarity ensues. The current problem is similar yet subtly different. The accessor causing the metacircularity this time is (SETF SB-PCL::GF-DFUN-STATE); since when there is a metacircularity very little can be expected to work, the best way to find this out is wandering into the stack on the debugger and doing things like (type-of (sb-debug:arg 0)) (slot-value (sb-debug:arg 0) 'name) until something bites. However, (SETF SB-PCL::GF-DFUN-STATE) is not called in all cases when the discriminating function is updated; only some of them: specifically, whenever a cache needs to be adjusted, rather than simply added to. The order of events is something like this. When the third generic function class is instantiated, the dfun-state of this generic function needs to be set; this causes a cache miss on (SETF SB-PCL::GF-DFUN-STATE) (with argument of class SECOND-SUB-GF), which was previously in a TWO-CLASS dfun state, having seen the other two classes before. Since the indexes of the dfun-state slot are all the same, this converts to a ONE-INDEX dfun with a cache initially populated with just the second-sub-gf entry. However, (SETF SB-PCL::GF-DFUN-STATE) very quickly gets called on a STANDARD-GENERIC-FUNCTION, and under certain conditions (I /think/ just when both the classes STANDARD-GENERIC-FUNCTION and SECOND-SUB-GF have a CLOS-HASH-0 slot which is 0 mod 4) the cache will be recomputed before the new entry can be written. This recomputation necessitates a call to (SETF SB-PCL::GF-DFUN-STATE), and the metacircle is born. So why did this happen just on x86, and indeed why did it happen only after loading :sb-posix, and what about the three classes in interface.pure.lisp? Well, as I indicate before, this depends critically on the clos-hash-0 slot both of standard-generic-function and of the second subclass. It turned out that the addition of one more class (ALIEN-PASSWD) into sb-posix was enough that running the test suite in order generated a second subclass of standard-generic-function with the right hash slot; meanwhile, since there are apparently (slightly) different numbers of classes in the base images on the different platforms, the hash slot 0 of standard-generic-function was different on my sparc and x86. After all this, the fix is relatively simple; I'll commit something soon. (I'm not sure that it's terribly easy to test for this, unfortunately: it does depend on rather too many factors. Introduction of further tests would be welcomed.) > If nobody has any ideas by the end of the day, I will revert this patch. > The Win32 merges can then proceed and the patch can be tried again, > perhaps with a slightly less intrusive form. (SB-POSIX-INTERNAL can > be retained; SB-POSIX's :USE list can be modified; and everybody goes > home happy.) Thanks; this shouldn't be necessary. Cheers, Christophe |