From: Gábor M. <me...@re...> - 2009-03-20 08:57:56
|
On Viernes 20 Marzo 2009, Gábor Melis wrote: > On Viernes 20 Marzo 2009, Sidney Markowitz wrote: > > Cyrus Harmon wrote, On 20/3/09 6:17 PM: > > > sidney, > > > > > > sorry for jumping into this thread so late, but I see that this > > > is 1.0.26.9. a few questions: > > > > That was the latest, but I used git-bisect to identify the rev > > where it first appeared as 1.0.25.44 > > > > > 1. did you build this yourself? > > > > Yes, using sh clean.sh and then sh make.sh under sbcl 1.0.25.12 > > > > > 2. do prebuilt binaries work for you? > > > > I don't see a prebuilt binary for Intel Mac OS X newer than 1.0.23. > > > > > 3. can you build and run other versions? > > > > Any version before 1.0.25.44 > > > > > 4. what is in local-target-features.lisp-expr? > > > > (:x86 :unix :mach-o :bsd :darwin :mach-exception-handler :sb-lutex > > > > :restore-fs-segment-register-from-tls :gencgc > > :stack-grows-downward-not-upward :c-stack-is-control-stack > > :compare-and-swap-vops :unwind-to-frame-and-call-vop > > :raw-instance-init-vops :stack-allocatable-closures > > : :alien-callbacks cycle-counter :linkage-table :os-provides-dlopen > > : :os-provides-dladdr os-provides-putwc :os-provides-blksize-t > > : :os-provides-suseconds-t) > > > > -- sidney > > Now that I have an x86-darwin installation to test on, here is what I > found so far. > > The hang is due to some signals not being delivered. It's evident on > a unithread build when using QSHOW, QSHOW_SIGNALS, but _not_ > QSHOW_SAFE (or QSHOW_SIGNAL_SAFE on more recent versions). I removed > the consing part the test so that with QSHOW stderr is not flooded > with gc messages: > > > (let ((*x0* nil) (*x1* nil) (*x2* nil) (*x3* nil) (*x4* nil)) > (declare (special *x0* *x1* *x2* *x3* *x4*)) > (loop repeat 10 do > (loop repeat 10 do > (catch 'again > (sb-ext:schedule-timer (sb-ext:make-timer > (lambda () > (format *trace-output* > "throwing~%") > (sb-impl::with-interrupts) > (throw 'again nil))) > 0) > (loop)) > (when (not (and (null *x0*) (null *x1*) (null *x2*) > (null *x3*) > (null *x4*))) > (format t "~S ~S ~S ~S ~S~%" *x0* *x1* *x2* *x3* > *x4*) (assert nil))) > (princ '*) > (force-output)) > (terpri)) > > > When this form is evaluated this is the output I get (comments are > narration): > > ;;; 14 is sigalrm > /maybe_defer_handler(8ea0,14): not deferred > /entering interrupt_handle_now(14, info, context) > /calling Lisp-level handler > Memory fault at: 0x10079de4, PC: 0x100a47dd > heap WP violation? fault_addr=10079de4, page_index=121 > Memory fault at: 0x102c0150, PC: 0x1031110f > heap WP violation? fault_addr=102c0150, page_index=704 > Memory fault at: 0x10163c14, PC: 0x10311119 > heap WP violation? fault_addr=10163c14, page_index=355 > Memory fault at: 0x102c1008, PC: 0x1031112f > heap WP violation? fault_addr=102c1008, page_index=705 > Memory fault at: 0x102a7f78, PC: 0x10311150 > heap WP violation? fault_addr=102a7f78, page_index=679 > ;;; 13 is sigpipe. interrupt-thread enqueues the timers function > ;;; into thread-interruptions and raises sigpipe which blocked > ;;; together with all deferrable signals. > /kill_safely: 0, 13 > Signal 13 pending > /returning from interrupt_handle_now(14, info, context) > ;;; Having returned from the signal handler deferrables are > ;;; unblocked, sigpipe is delivered. > /maybe_defer_handler(8ea0,13): not deferred > /entering interrupt_handle_now(13, info, context) > /calling Lisp-level handler > throwing > ;;; This is the second timer's SIGALRM, so far so good: > /maybe_defer_handler(8ea0,14): not deferred > /entering interrupt_handle_now(14, info, context) > /calling Lisp-level handler > /kill_safely: 0, 13 > /returning from interrupt_handle_now(14, info, context) > ;;; We did the same, sigmask should be the same but sigpipe > ;;; is not delivered ... > > ;;; I'm waiting here, but nothing happens until I press ^C: > > > ^C > ;;; 2 is SIGINT > /maybe_defer_handler(8ea0,2): not deferred > /entering interrupt_handle_now(2, info, context) > /calling Lisp-level handler > ;;; it's handled by interrupting the thread with #'break > ;;; that's why another sigpipe is raised > /kill_safely: 0, 13 > Signal 13 pending > /returning from interrupt_handle_now(2, info, context) > ;;; the sigpipe arrived > /maybe_defer_handler(8ea0,13): not deferred > /entering interrupt_handle_now(13, info, context) > ;;; run-interruption is called: > /calling Lisp-level handler > ;;; the first interruption from thread-interruptions will be called > ;;; but before that we signal sigpipe again because there is > ;;; another interruption: #'break > /kill_safely: 0, 13 > throwing > ;;; the interuption runs in a without-interrupts so #'break's > ;;; sigpipe is deferred: > /store_signal_data_for_later: signal: 13 > /maybe_defer_handler(8ea0,13): deferred (RACE=0) > ;;; and upon exiting without-interrupts it's handled: > /<trap pending interrupt> > /[arch_skip_inst resuming at 100555bc] > /entering interrupt_handle_pending > /running deferred handler 0x8ea0 > /entering interrupt_handle_now(13, info, context) > /calling Lisp-level handler > Memory fault at: 0x1008bba4, PC: 0x1043939c > heap WP violation? fault_addr=1008bba4, page_index=139 > Memory fault at: 0x10085f7c, PC: 0x104393c2 > heap WP violation? fault_addr=10085f7c, page_index=133 > Memory fault at: 0x100c8e64, PC: 0x104380ee > heap WP violation? fault_addr=100c8e64, page_index=200 > Memory fault at: 0x101b1d3c, PC: 0x1043810f > heap WP violation? fault_addr=101b1d3c, page_index=433 > > Memory fault at: 0x102e0000, PC: 0x1031110f > heap WP violation? fault_addr=102e0000, page_index=736 > Memory fault at: 0x1018972c, PC: 0x10311119 > heap WP violation? fault_addr=1018972c, page_index=393 > Memory fault at: 0x102e3008, PC: 0x1031112f > heap WP violation? fault_addr=102e3008, page_index=739 > Memory fault at: 0x102e2008, PC: 0x10311150 > heap WP violation? fault_addr=102e2008, page_index=738 > Memory fault at: 0x1042d2c4, PC: 0x10d993ea > heap WP violation? fault_addr=1042d2c4, page_index=1069 > Memory fault at: 0x100be544, PC: 0x10da02af > heap WP violation? fault_addr=100be544, page_index=190 > Memory fault at: 0x1149cc08, PC: 0x10d64756 > heap WP violation? fault_addr=1149cc08, page_index=5276 > debugger invoked on a SB-SYS:INTERACTIVE-INTERRUPT: > Interactive interrupt at #x118BAD73. > Memory fault at: 0x10060034, PC: 0x104385b2 > heap WP violation? fault_addr=10060034, page_index=96 > > Type HELP for debugger help, or (SB-EXT:QUIT) to exit from SBCL. > > restarts (invokable by number or by possibly-abbreviated name): > 0: [CONTINUE] Return from SB-UNIX:SIGINT. > 1: [ABORT ] Exit debugger, returning to top level. > Memory fault at: 0x100e3aa8, PC: 0x10416e2c > heap WP violation? fault_addr=100e3aa8, page_index=227 > Memory fault at: 0x100cba50, PC: 0x10416e2c > heap WP violation? fault_addr=100cba50, page_index=203 > Memory fault at: 0x101b2064, PC: 0x1030b152 > heap WP violation? fault_addr=101b2064, page_index=434 > > Memory fault at: 0x10ef7ad0, PC: 0x10d64756 > heap WP violation? fault_addr=10ef7ad0, page_index=3831 > Memory fault at: 0x10417c58, PC: 0x10bde0f2 > heap WP violation? fault_addr=10417c58, page_index=1047 > Memory fault at: 0x102bbad8, PC: 0x10ebcd17 > heap WP violation? fault_addr=102bbad8, page_index=699 > Memory fault at: 0x1042c9bc, PC: 0x10d993ea > heap WP violation? fault_addr=1042c9bc, page_index=1068 > Memory fault at: 0x115aaa80, PC: 0x10d64756 > heap WP violation? fault_addr=115aaa80, page_index=5546 > Memory fault at: 0x111a2560, PC: 0x10bde0f2 > heap WP violation? fault_addr=111a2560, page_index=4514 > Memory fault at: 0x10b513d8, PC: 0x10ebcd17 > heap WP violation? fault_addr=10b513d8, page_index=2897 > Memory fault at: 0x1005e79c, PC: 0x101994c6 > heap WP violation? fault_addr=1005e79c, page_index=94 > ((FLET #:CLEANUP-FUN-[INVOKE-INTERRUPTION]58))[:CLEANUP] > 0] > > > > All in all a sigpipe signal is lost and it shouldn't be because the > previous one is handled. To verify this I printed pending signals at > the end of interrupt_handle_now with this: > > static void > print_pending(void) > { > sigset_t sigset; > int i; > sigpending(&sigset); > for(i = 1; i < NSIG; i++) { > if (sigismember(&sigset, i)) > fprintf(stderr, "Signal %d pending\n", i); > } > } > > You can see in the transcript that after some kill 13's signal 13 is > not pending ... FWIW, I find that signals are not lost with this change: Index: src/runtime/thread.c =================================================================== RCS file: /cvsroot/sbcl/sbcl/src/runtime/thread.c,v retrieving revision 1.99 diff -u -p -r1.99 thread.c --- src/runtime/thread.c 17 Mar 2009 14:05:45 -0000 1.99 +++ src/runtime/thread.c 20 Mar 2009 08:49:21 -0000 @@ -721,7 +721,12 @@ kill_safely(os_thread_t os_thread, int s int status; if (os_thread != 0) lose("kill_safely: who do you want to kill? %d?\n", os_thread); - status = raise(signal); + { + sigset_t oldset; + thread_sigmask(SIG_BLOCK, &blockable_sigset, &oldset); + status = raise(signal); + thread_sigmask(SIG_SETMASK,&oldset,0); + } if (status == 0) { return 0; } else { The idea is from that QSHOW_SIGNAL_SAFE only plays the same game around fprintf. Of course, this should not be necessary at all. With this change all signal and timer tests pass on unithread. Multithreaded build dies in gc as before. |