From: Gábor M. <me...@re...> - 2009-03-20 08:23:44
|
On Viernes 20 Marzo 2009, Sidney Markowitz wrote: > Cyrus Harmon wrote, On 20/3/09 6:17 PM: > > sidney, > > > > sorry for jumping into this thread so late, but I see that this is > > 1.0.26.9. a few questions: > > That was the latest, but I used git-bisect to identify the rev where > it first appeared as 1.0.25.44 > > > 1. did you build this yourself? > > Yes, using sh clean.sh and then sh make.sh under sbcl 1.0.25.12 > > > 2. do prebuilt binaries work for you? > > I don't see a prebuilt binary for Intel Mac OS X newer than 1.0.23. > > > 3. can you build and run other versions? > > Any version before 1.0.25.44 > > > 4. what is in local-target-features.lisp-expr? > > (:x86 :unix :mach-o :bsd :darwin :mach-exception-handler :sb-lutex > > :restore-fs-segment-register-from-tls :gencgc > :stack-grows-downward-not-upward :c-stack-is-control-stack > :compare-and-swap-vops :unwind-to-frame-and-call-vop > :raw-instance-init-vops :stack-allocatable-closures :alien-callbacks > :cycle-counter :linkage-table :os-provides-dlopen :os-provides-dladdr > :os-provides-putwc :os-provides-blksize-t :os-provides-suseconds-t) > > -- sidney Now that I have an x86-darwin installation to test on, here is what I found so far. The hang is due to some signals not being delivered. It's evident on a unithread build when using QSHOW, QSHOW_SIGNALS, but _not_ QSHOW_SAFE (or QSHOW_SIGNAL_SAFE on more recent versions). I removed the consing part the test so that with QSHOW stderr is not flooded with gc messages: (let ((*x0* nil) (*x1* nil) (*x2* nil) (*x3* nil) (*x4* nil)) (declare (special *x0* *x1* *x2* *x3* *x4*)) (loop repeat 10 do (loop repeat 10 do (catch 'again (sb-ext:schedule-timer (sb-ext:make-timer (lambda () (format *trace-output* "throwing~%") (sb-impl::with-interrupts) (throw 'again nil))) 0) (loop)) (when (not (and (null *x0*) (null *x1*) (null *x2*) (null *x3*) (null *x4*))) (format t "~S ~S ~S ~S ~S~%" *x0* *x1* *x2* *x3* *x4*) (assert nil))) (princ '*) (force-output)) (terpri)) When this form is evaluated this is the output I get (comments are narration): ;;; 14 is sigalrm /maybe_defer_handler(8ea0,14): not deferred /entering interrupt_handle_now(14, info, context) /calling Lisp-level handler Memory fault at: 0x10079de4, PC: 0x100a47dd heap WP violation? fault_addr=10079de4, page_index=121 Memory fault at: 0x102c0150, PC: 0x1031110f heap WP violation? fault_addr=102c0150, page_index=704 Memory fault at: 0x10163c14, PC: 0x10311119 heap WP violation? fault_addr=10163c14, page_index=355 Memory fault at: 0x102c1008, PC: 0x1031112f heap WP violation? fault_addr=102c1008, page_index=705 Memory fault at: 0x102a7f78, PC: 0x10311150 heap WP violation? fault_addr=102a7f78, page_index=679 ;;; 13 is sigpipe. interrupt-thread enqueues the timers function ;;; into thread-interruptions and raises sigpipe which blocked ;;; together with all deferrable signals. /kill_safely: 0, 13 Signal 13 pending /returning from interrupt_handle_now(14, info, context) ;;; Having returned from the signal handler deferrables are ;;; unblocked, sigpipe is delivered. /maybe_defer_handler(8ea0,13): not deferred /entering interrupt_handle_now(13, info, context) /calling Lisp-level handler throwing ;;; This is the second timer's SIGALRM, so far so good: /maybe_defer_handler(8ea0,14): not deferred /entering interrupt_handle_now(14, info, context) /calling Lisp-level handler /kill_safely: 0, 13 /returning from interrupt_handle_now(14, info, context) ;;; We did the same, sigmask should be the same but sigpipe ;;; is not delivered ... ;;; I'm waiting here, but nothing happens until I press ^C: ^C ;;; 2 is SIGINT /maybe_defer_handler(8ea0,2): not deferred /entering interrupt_handle_now(2, info, context) /calling Lisp-level handler ;;; it's handled by interrupting the thread with #'break ;;; that's why another sigpipe is raised /kill_safely: 0, 13 Signal 13 pending /returning from interrupt_handle_now(2, info, context) ;;; the sigpipe arrived /maybe_defer_handler(8ea0,13): not deferred /entering interrupt_handle_now(13, info, context) ;;; run-interruption is called: /calling Lisp-level handler ;;; the first interruption from thread-interruptions will be called ;;; but before that we signal sigpipe again because there is ;;; another interruption: #'break /kill_safely: 0, 13 throwing ;;; the interuption runs in a without-interrupts so #'break's ;;; sigpipe is deferred: /store_signal_data_for_later: signal: 13 /maybe_defer_handler(8ea0,13): deferred (RACE=0) ;;; and upon exiting without-interrupts it's handled: /<trap pending interrupt> /[arch_skip_inst resuming at 100555bc] /entering interrupt_handle_pending /running deferred handler 0x8ea0 /entering interrupt_handle_now(13, info, context) /calling Lisp-level handler Memory fault at: 0x1008bba4, PC: 0x1043939c heap WP violation? fault_addr=1008bba4, page_index=139 Memory fault at: 0x10085f7c, PC: 0x104393c2 heap WP violation? fault_addr=10085f7c, page_index=133 Memory fault at: 0x100c8e64, PC: 0x104380ee heap WP violation? fault_addr=100c8e64, page_index=200 Memory fault at: 0x101b1d3c, PC: 0x1043810f heap WP violation? fault_addr=101b1d3c, page_index=433 Memory fault at: 0x102e0000, PC: 0x1031110f heap WP violation? fault_addr=102e0000, page_index=736 Memory fault at: 0x1018972c, PC: 0x10311119 heap WP violation? fault_addr=1018972c, page_index=393 Memory fault at: 0x102e3008, PC: 0x1031112f heap WP violation? fault_addr=102e3008, page_index=739 Memory fault at: 0x102e2008, PC: 0x10311150 heap WP violation? fault_addr=102e2008, page_index=738 Memory fault at: 0x1042d2c4, PC: 0x10d993ea heap WP violation? fault_addr=1042d2c4, page_index=1069 Memory fault at: 0x100be544, PC: 0x10da02af heap WP violation? fault_addr=100be544, page_index=190 Memory fault at: 0x1149cc08, PC: 0x10d64756 heap WP violation? fault_addr=1149cc08, page_index=5276 debugger invoked on a SB-SYS:INTERACTIVE-INTERRUPT: Interactive interrupt at #x118BAD73. Memory fault at: 0x10060034, PC: 0x104385b2 heap WP violation? fault_addr=10060034, page_index=96 Type HELP for debugger help, or (SB-EXT:QUIT) to exit from SBCL. restarts (invokable by number or by possibly-abbreviated name): 0: [CONTINUE] Return from SB-UNIX:SIGINT. 1: [ABORT ] Exit debugger, returning to top level. Memory fault at: 0x100e3aa8, PC: 0x10416e2c heap WP violation? fault_addr=100e3aa8, page_index=227 Memory fault at: 0x100cba50, PC: 0x10416e2c heap WP violation? fault_addr=100cba50, page_index=203 Memory fault at: 0x101b2064, PC: 0x1030b152 heap WP violation? fault_addr=101b2064, page_index=434 Memory fault at: 0x10ef7ad0, PC: 0x10d64756 heap WP violation? fault_addr=10ef7ad0, page_index=3831 Memory fault at: 0x10417c58, PC: 0x10bde0f2 heap WP violation? fault_addr=10417c58, page_index=1047 Memory fault at: 0x102bbad8, PC: 0x10ebcd17 heap WP violation? fault_addr=102bbad8, page_index=699 Memory fault at: 0x1042c9bc, PC: 0x10d993ea heap WP violation? fault_addr=1042c9bc, page_index=1068 Memory fault at: 0x115aaa80, PC: 0x10d64756 heap WP violation? fault_addr=115aaa80, page_index=5546 Memory fault at: 0x111a2560, PC: 0x10bde0f2 heap WP violation? fault_addr=111a2560, page_index=4514 Memory fault at: 0x10b513d8, PC: 0x10ebcd17 heap WP violation? fault_addr=10b513d8, page_index=2897 Memory fault at: 0x1005e79c, PC: 0x101994c6 heap WP violation? fault_addr=1005e79c, page_index=94 ((FLET #:CLEANUP-FUN-[INVOKE-INTERRUPTION]58))[:CLEANUP] 0] All in all a sigpipe signal is lost and it shouldn't be because the previous one is handled. To verify this I printed pending signals at the end of interrupt_handle_now with this: static void print_pending(void) { sigset_t sigset; int i; sigpending(&sigset); for(i = 1; i < NSIG; i++) { if (sigismember(&sigset, i)) fprintf(stderr, "Signal %d pending\n", i); } } You can see in the transcript that after some kill 13's signal 13 is not pending ... |