I've now gotten the system to build and sort of run on both Linux and
OpenBSD by turning off PURIFY. In the build process, after dumping the
system at the end of warm init, there's sometimes an error, but it
seems to occur only after the system is already written.
The result doesn't pass the regression tests, though -- it gets a
little ways, but then starts to just spin for many CPU minutes.
I've started a flaky1 branch to store the current state in CVS. This
branch is basically just intended for my own use, so that I can do
things like "cvs diff" back to it, and so that I can experiment with
working with CVS branches. But in this case, since it's been a while
since I've been able to make the system stable enough to check in to
the main development branch, it's also the only way that anyone with a
morbid interest in can check what I'm doing.
Very likely flakyxxx branches, or some similar device, will be a
recurring theme in the CVS tree in the future, because there have been
other times that I wanted to do a CVS checkin but held off because the
system was still too broken to inflict on anyone else. I'm also
tentatively thinking of making a stable_0_6 branch for the dying
embers of 0.6.x once the conflagration that is 0.7.x starts, but I'll
burn that bridge when I come to it.
Incidentally, if anyone thinks I'm doing something less than optimal
in the CVS admin, you're probably right, so please speak right up,
since I'm pretty unfamiliar with any but the really basic features.
On Mon, 14 May 2001 15:48:29 -0700, Michael Vanier wrote:
> Have you seen this?
No, I hadn't. Thank you. I had a good, if somewhat crazed, laugh.
On Tue, May 15, 2001 at 09:12:45AM +0100, Daniel Barlow wrote:
> William Harold Newman <william.newman@...> writes:
> > I fixed -- I think -- problems with current_dynamic_space being
> > uninitialized (but still used..) when GENCGC is defined. But for some
> > time I've been unable to fix another pair of problems, first that the
> > system sometimes fails a GC assertion because an alloc_region isn't
> > reset when it's expected to be reset, and second that if that doesn't
> > happen, the system fails with a SIGINT when it tries to load a core
> > files that it's written.
> Ew. Is this a problem that can be reproduced in the snapshot
> (i.e. was it my fault?)
The "sometimes fails a GC assertion" problem doesn't seem to be all
that reproducible even in my build. I'm guessing that it may depend
on the exact size of the src/runtime/runtime executable. In the
version in flaky1, it happens in OpenBSD
* [undoing binding stack... Argh! alloc_region not reset in gc_alloc_new_region()
start_addr=0x48000000, free_pointer=0x49becaa0, end_addr=0x48000000
fatal error encountered in SBCL runtime system
[saving current Lisp image into output/sbcl.core:
writing 1688(0x698) bytes from the read-only(3) space at 0x10000000
writing 1424(0x590) bytes from the static(2) space at 0x28000000
writing 106274816(0x655a000) bytes from the dynamic(1) space at 0x48000000
(where the funny order is probably because this output was
collected with "2>&1 | tee make.tmp", so that stderr and stdout get
a bit mixed up) but it doesn't happen on Linux. A few builds ago,
though, something similar (another "alloc_region not reset" assertion
failure) happened on Linux too, and I've basically just added assertions
and refactored code since then, nothing which should've truly fixed
I suspect that at least some of the current problem is partly "your
fault", perhaps because of changes you made in the way that
current_region_free_pointer is used. But the behavior of
current_region_free_pointer and SymbolValue(ALLOCATION_POINTER) in
gencgc.c seems more bizarre than you might reasonably have expected,
so much so that interacting with it while trying unscrew the
all-the-world's-an-x86 simplifications I made when first setting up
SBCL is asking for trouble. And I can certainly understand testing the
system, seeing that it works, and thinking it's OK.
I do think causing the GENCGC code to use current_dynamic_space
without initializing it was a bad idea, though:
+ current_dynamic_space = DYNAMIC_0_SPACE_START;
and then all the stuff which used to use DYNAMIC_SPACE_START using
current_dynamic_space instead. But then it did work somehow in
sbcl-0.6.12.7, so maybe I'm confused; and anyway, again, it's
basically pretty reasonable to test the system, see that it works, and
send it in. In general I really like specific tests accompanying
patches, but unfortunately it's pretty hard to test this kind of GC
stuff much more than "I built it, and did some things which put some
stress on the GC, and it still worked". So I don't know what more I
can ask for, other than you never making any mistakes.:-|
> If so and if you have a reasonable way of reproducing it, I can have a
> look for anything I might have done that could be causing it.
If you (Dan, or anyone else, especially Alpha users) want to take a
look at the flaky1 branch, you can:
$ cvs $whatever_magic_you_use_to_get_to_sourceforge checkout -rflaky1 sbcl
I think. I hope I'll be able fix the x86 problems soon, so with luck
it won't be worth trying to debug it from your end. However, I would
be interested in knowing whether it still builds on an Alpha, since I
don't have any good way to test whether I've made a mistake which
affects the non-GENCGC code.
Also, I'm accumulating a list of cleanups that I'd like to make in the
GC code once things are more stable again, and I'll want to get your
(Dan, or any other Alpha hackers) opinion of them, to try to make sure
that I don't mess up the non-GENCGC world too badly.
William Harold Newman <william.newman@...>
"To foil the maintenance programmer, you have to understand how he
thinks." -- <http://mindprod.com/unmain.html>
PGP key fingerprint 85 CE 1C BA 79 8D 51 8C B9 25 FB EE E0 C3 E5 7C