From: Christophe R. <cs...@ca...> - 2002-07-25 16:41:45
Attachments:
sbcl.sequence.diff
|
Hi, Attached is a patch that (mostly) implements correct ANSI behaviour for calls of the form (CONCATENATE '(STRING 7) "foo" "bar") which should signal an error in safe code. The "mostly" in the previous sentence refers to the fact that there are some residual incorrectly-conditionalized transforms (for example, for concatenate on simple-strings, like the call above). Now. Here's where the fun starts. I have built sbcl with the attached patch on PPC. It passes the tests it usually passes, and furthermore rebuilt itself without trouble, in addition to passing a test suite on the series of functions under question (MAP, MERGE, CONCATENATE, COERCE and MAKE-SEQUENCE). I was quite chuffed by this; there are some slightly dubious modifications to one or two functions[*] to avoid the type checks early in cold-init, but apart from that it causes a /lot/ of bogosity in src/code/seq.lisp to go away. So, I was getting ready to merge this into CVS, when a little voice said "test it some more". So I decided to try to build it on my x86/Linux laptop. All went swimmingly; it got through genesis; got through cold-init, built PCL and dumped a core. Whereupon I decided to run the tests; and my shiny new sbcl binary hung on startup, never getting into lisp land. I repeat: it works fine on PPC; it gets through cold-init and compiles PCL fine, and the resulting binary (the _same binary_ that got through cold-init, though not the same core), on invocation, goes into call_into_lisp; single-stepping in gdb shows: 0x08054e49 in Ldone () (gdb) 0x00000000 in ?? () (gdb) disassemble Ldone 0x8054e49 <Ldone+8>: call *0xffffffff(%eax) Any ideas? Comments welcome on the Lisp parts of the patch, too... Cheers, Christophe [*] OK: in class.lisp, declared INHERITS-LIST to be a LIST so that the MAP can be open-coded; in primordial-extensions.lisp, declared everything to be simple-strings so that the CONCATENATE could be transformed away; in seq.lisp, MAKE-SEQUENCE-LIKE has to ask whether the type system is initialized yet; in show.lisp, CANNOT-SHOW now prints out two lines instead of one. These are the only compromises. -- Jesus College, Cambridge, CB5 8BL +44 1223 510 299 http://www-jcsu.jesus.cam.ac.uk/~csr21/ (defun pling-dollar (str schar arg) (first (last +))) (make-dispatch-macro-character #\! t) (set-dispatch-macro-character #\! #\$ #'pling-dollar) |
From: Raymond T. <to...@rt...> - 2002-07-29 15:22:14
|
>>>>> "Christophe" == Christophe Rhodes <cs...@ca...> writes: Christophe> So, I was getting ready to merge this into CVS, when a little voice said Christophe> "test it some more". So I decided to try to build it on my x86/Linux Christophe> laptop. All went swimmingly; it got through genesis; got through Christophe> cold-init, built PCL and dumped a core. Whereupon I decided to run the Christophe> tests; and my shiny new sbcl binary hung on startup, never getting into Christophe> lisp land. Don't know if this helps, but I did notice that in one of my attempts to make this all work on CMUCL, there were a few calls to make-sequence-like. However, the given sequence was a vector with a fill pointer and the length was less than the actual length of the vector. With these checks enabled, this causes a failure, of course. Don't remember what I did to fix this, if any. Did you notice any speed impacts? I would think make-sequence is pretty heavy-weight compared to make-sequence-of-type. Ray |
From: Christophe R. <cs...@ca...> - 2002-08-05 18:54:15
|
On Mon, Jul 29, 2002 at 11:22:04AM -0400, Raymond Toy wrote: > >>>>> "Christophe" == Christophe Rhodes <cs...@ca...> writes: > > > So, I was getting ready to merge this into CVS, when a little voice said > > "test it some more". So I decided to try to build it on my x86/Linux > > laptop. All went swimmingly; it got through genesis; got through > > cold-init, built PCL and dumped a core. Whereupon I decided to run the > > tests; and my shiny new sbcl binary hung on startup, never getting into > > lisp land. > > Don't know if this helps, but I did notice that in one of my attempts > to make this all work on CMUCL, there were a few calls to > make-sequence-like. However, the given sequence was a vector with a > fill pointer and the length was less than the actual length of the > vector. With these checks enabled, this causes a failure, of course. > > Don't remember what I did to fix this, if any. I don't think it's this; I've got code in make-sequence-like to remove the length information from the type of the sequence passed to it. It's not a straightforward failure, either, sadly; I've now tested my patch on more system, and (with the uncommenting of the CONCATENATE transform for strings to compile away calls that are too difficult early in cold-init, *sigh*) it works on sparc, alpha and ppc; it fails on x86. This leads me to believe that, in fact, something is going wrong either in gencgc or in purify. (Or, possibly, both; see below). Let me elaborate a bit on the symptoms on the x86 (Linux, but I imagine these to be the same on *BSD... maybe someone motivated could check?). Compiling with :SB-SHOW, what happens is that cold-init and so on go swimmingly, pcl is compiled, SAVE-LISP-AND-DIE happens, and things appear to be fine. But running the resultant core (with "src/runtime/sbcl --core output/sbcl.core", a pattern which is hardwired into my fingers now) yields infinite SIGSEGVs; a little investigation with gdb reveals that, at the crucial point in call_into_lisp call *CLOSURE_FUN_OFFSET(%eax) examination of the memory at $eax-1 shows a uniform sea of 0x0000000s, so jumping to *($eax-1) obviously leads to death. Compiling without :SB-SHOW, however, leads to different broken behaviour; just after the end of first gc in cold-init, we take a path through gencgc_handle_wp_violation(), whereupon I get a SIGSEGV in vfprintf from the FSHOW statement at the head of that function. I'm completely, utterly baffled. > Did you notice any speed impacts? I would think make-sequence is > pretty heavy-weight compared to make-sequence-of-type. I don't think it's likely to be on the critical path of many things -- and I'm prepared to justify a small cost on the basis that MAKE-FOO is going to be consing anyway, which speed-conscious inner-loops probably shouldn't be doing. But this talk of justification of small costs is a bit irrelevant if we can't figure out what is going on on the x86... Cheers, Christophe -- Jesus College, Cambridge, CB5 8BL +44 1223 510 299 http://www-jcsu.jesus.cam.ac.uk/~csr21/ (defun pling-dollar (str schar arg) (first (last +))) (make-dispatch-macro-character #\! t) (set-dispatch-macro-character #\! #\$ #'pling-dollar) |
From: Daniel B. <da...@te...> - 2002-08-05 19:16:48
|
Christophe Rhodes <cs...@ca...> writes: > Compiling without :SB-SHOW, however, leads to different broken > behaviour; just after the end of first gc in cold-init, we take a path > through gencgc_handle_wp_violation(), whereupon I get a SIGSEGV in > vfprintf from the FSHOW statement at the head of that function. _Without_ :sb-show? Presumably you defined QSHOW_SIGNALS by hand? You'll be using the alternate signal stack at that point, as this was called from the SIGSEGV handler. Has it overflowed? info reg esp and compare with ALTERNATE_SIGNAL_STACK_START (probably 0x58000000) and SIGSTKSZ. printf seems to be fairly deeply nested inside glibc, so this may be a possibility If you do need to increase the signal stack size, there are - I regret to admit - two places it needs changing. interrupt.c: sigstack.ss_size = SIGSTKSZ; validate.c: ensure_space( (lispobj *) ALTERNATE_SIGNAL_STACK_START,SIGSTKSZ) (We don't actually need to use a fixed address for this any more anyway: the call in validate could easily be lost in favour of some mallocing in undoably_install_low_level_interrupt_handler. I may fix this later this month, but please feel free to preempt me if it uncomplicates your problem) -dan -- http://ww.telent.net/cliki/ - Link farm for free CL-on-Unix resources |
From: Christophe R. <cs...@ca...> - 2002-08-06 08:44:33
|
On Mon, Aug 05, 2002 at 08:16:37PM +0100, Daniel Barlow wrote: > Christophe Rhodes <cs...@ca...> writes: > > > Compiling without :SB-SHOW, however, leads to different broken > > behaviour; just after the end of first gc in cold-init, we take a path > > through gencgc_handle_wp_violation(), whereupon I get a SIGSEGV in > > vfprintf from the FSHOW statement at the head of that function. > > _Without_ :sb-show? Presumably you defined QSHOW_SIGNALS by hand? Yes. > You'll be using the alternate signal stack at that point, as this was > called from the SIGSEGV handler. Has it overflowed? info reg esp and > compare with ALTERNATE_SIGNAL_STACK_START (probably 0x58000000) and > SIGSTKSZ. printf seems to be fairly deeply nested inside glibc, so > this may be a possibility Something weird appears to be happening. Which way is the stack meant to go on x86? The fault occurs on the instruction vfprintf+18505: 0x4008fd29 <vfprintf+18497>: mov %esp,%ebp 0x4008fd2b <vfprintf+18499>: sub $0x20dc,%esp 0x4008fd31 <vfprintf+18505>: push %edi 0x4008fd32 <vfprintf+18506>: push %esi 0x4008fd33 <vfprintf+18507>: push %ebx 0x4008fd34 <vfprintf+18508>: call 0x4008fd39 <vfprintf+18513> so it certainly looks like a stack exhaustion error of some kind; what worries me slightly are the values of esp and ebp at this point: esp: 0x57fff500 ebp: 0x580015dc (yes, the difference between these is 0x20dc, but worse is that there appears to be one on each side of the ALTERNATE_SIGNAL_STACK_START... OK, yes, I understand: this _is_ a stack overflow; the stack grows _downward_ from ALTERNATE_SIGNAL_STACK_START+SIGSTKSZ, so we need lots more space. Right. Rebuilding...) Cheers, Christophe -- Jesus College, Cambridge, CB5 8BL +44 1223 510 299 http://www-jcsu.jesus.cam.ac.uk/~csr21/ (defun pling-dollar (str schar arg) (first (last +))) (make-dispatch-macro-character #\! t) (set-dispatch-macro-character #\! #\$ #'pling-dollar) |
From: Christophe R. <cs...@ca...> - 2002-08-06 09:26:10
|
On Tue, Aug 06, 2002 at 09:44:07AM +0100, Christophe Rhodes wrote: > On Mon, Aug 05, 2002 at 08:16:37PM +0100, Daniel Barlow wrote: > (yes, the difference between these is 0x20dc, but worse is that there > appears to be one on each side of the ALTERNATE_SIGNAL_STACK_START... > OK, yes, I understand: this _is_ a stack overflow; the stack grows > _downward_ from ALTERNATE_SIGNAL_STACK_START+SIGSTKSZ, so we need lots > more space. Right. Rebuilding...) Having rebuilt with a new size of 16*SIGSTKSZ, I get the same symptoms on x86 as the previous build -- compilation of PCL goes to completion, and the address of the initial function contains a sea of 0x0000000, so call_into_lisp attempts to call 0x0. Suspicion continues to linger on gencgc/purify, particularly since the SAVE-LISP-AND-DIE stage prints: [saving current Lisp image into /home/csr21/sourceforge-cvs/sbcl/output/sbcl.core: writing 15398232 bytes from the read-only space at 0x01000000 writing 3882392 bytes from the static space at 0x05000000 writing 864256 bytes from the dynamic space at 0x09000000 done] which is an abnormally large amount in the dynamic space. Hey ho, Christophe -- Jesus College, Cambridge, CB5 8BL +44 1223 510 299 http://www-jcsu.jesus.cam.ac.uk/~csr21/ (defun pling-dollar (str schar arg) (first (last +))) (make-dispatch-macro-character #\! t) (set-dispatch-macro-character #\! #\$ #'pling-dollar) |