From: Nikodemus S. <nsi...@it...> - 2007-03-23 00:30:27
|
I have a proof of concept tree where GC happens outside the SIG_MEMORY_FAULT handler (working on x86-64 Linux), which is at least good enough to pass all tests... Here's what happens: 1. Signal handler calls arrange_return_to_signal_tramp(sigsegv_return_handler, signo, info, context); 2. This is like arrange_return_to_lisp_function, but instead of going to lisp we end up returning from the signal handler to call the function signal_tramp with (1) the real handler function, and (2 & 3) malloc'ed copies of siginfo and context. 3. signal_tramp blocks the signal we are handling in current thread, saves copies of siginfo and context on stack and frees the malloced memory. It then calls the real handler with the signo, stack allocated siginfo and context, and the original signal mask. IF the return handler returns, the original signal mask is restored by signal_tramp: IF the real handler unwinds, then it is its responsibility to reset the sigmask. So, we have basically functional signal handlers outside the kernel, and are no longer restricted by silly POSIX rules about which functions are signal safe -- which we broke all the time. The only thing we cannot do is directly frob the context and return to it. Todo: The initially malloced copies are just a quick hack: I am planning to actually copy the siginfo and context directly to stack, so (1) we don't need to worry if malloc is signal safe, and (2) we don't need to worry about leaking memory due to asynch unwinds. Todo: This is also still missing a bit of interrupt protection: we still have to worry about asynch unwinds that catch us after we have established the new signal mask. How does this sound? Am I forgetting something obvious? Does this approach make someone more uneady then running whatnot inside signal handlers? Cheers, -- Nikodemus |
From: Cyrus H. <ch...@bo...> - 2007-03-23 00:57:58
|
Nikodemus, This sounds good. You may have looked at it, but x86-darwin-os.c contains a similar approach for mach exception handling. One thing that we did there is that we re-trap when we return which gives us the opportunity to frob the signal context (equivalent) at that point if we want. Cyrus On Mar 22, 2007, at 5:30 PM, Nikodemus Siivola wrote: > I have a proof of concept tree where GC happens outside the > SIG_MEMORY_FAULT handler (working on x86-64 Linux), which is at least > good enough to pass all tests... > > Here's what happens: > > 1. Signal handler calls > arrange_return_to_signal_tramp(sigsegv_return_handler, > signo, info, context); > > 2. This is like arrange_return_to_lisp_function, but instead of > going to > lisp we end up returning from the signal handler to call the function > signal_tramp with (1) the real handler function, and (2 & 3) malloc'ed > copies of siginfo and context. > > 3. signal_tramp blocks the signal we are handling in current thread, > saves copies of siginfo and context on stack and frees the malloced > memory. It then calls the real handler with the signo, stack allocated > siginfo and context, and the original signal mask. IF the return > handler > returns, the original signal mask is restored by signal_tramp: IF the > real handler unwinds, then it is its responsibility to reset the > sigmask. > > So, we have basically functional signal handlers outside the > kernel, and > are no longer restricted by silly POSIX rules about which functions > are > signal safe -- which we broke all the time. The only thing we > cannot do > is directly frob the context and return to it. > > Todo: The initially malloced copies are just a quick hack: I am > planning > to actually copy the siginfo and context directly to stack, so (1) we > don't need to worry if malloc is signal safe, and (2) we don't need to > worry about leaking memory due to asynch unwinds. > > Todo: This is also still missing a bit of interrupt protection: we > still > have to worry about asynch unwinds that catch us after we have > established the new signal mask. > > How does this sound? Am I forgetting something obvious? Does this > approach make someone more uneady then running whatnot inside signal > handlers? > > Cheers, > > -- Nikodemus > > > ---------------------------------------------------------------------- > --- > Take Surveys. Earn Cash. Influence the Future of IT > Join SourceForge.net's Techsay panel and you'll get the chance to > share your > opinions on IT & business topics through brief surveys-and earn cash > http://www.techsay.com/default.php? > page=join.php&p=sourceforge&CID=DEVDEV > _______________________________________________ > Sbcl-devel mailing list > Sbc...@li... > https://lists.sourceforge.net/lists/listinfo/sbcl-devel |
From: <me...@re...> - 2007-03-23 08:50:00
|
On Friday 23 March 2007 01:30, Nikodemus Siivola wrote: > I have a proof of concept tree where GC happens outside the > SIG_MEMORY_FAULT handler (working on x86-64 Linux), which is at least > good enough to pass all tests... > > Here's what happens: > > 1. Signal handler calls > arrange_return_to_signal_tramp(sigsegv_return_handler, > signo, info, context); > > 2. This is like arrange_return_to_lisp_function, but instead of going > to lisp we end up returning from the signal handler to call the > function signal_tramp with (1) the real handler function, and (2 & 3) > malloc'ed copies of siginfo and context. > > 3. signal_tramp blocks the signal we are handling in current thread, Is there a window of time right before this when the signal is not blocked? > saves copies of siginfo and context on stack and frees the malloced > memory. It then calls the real handler with the signo, stack > allocated siginfo and context, and the original signal mask. IF the > return handler returns, the original signal mask is restored by > signal_tramp: IF the real handler unwinds, then it is its > responsibility to reset the sigmask. > > So, we have basically functional signal handlers outside the kernel, > and are no longer restricted by silly POSIX rules about which > functions are signal safe -- which we broke all the time. The only > thing we cannot do is directly frob the context and return to it. 'Async signal safe' may actually be a slightly misleading misnomer. I touched on this in "Some thread & signal safety issues". Functions that are not async signal safe don't care if they are being reentered directly from the signal handler or from a trampoline that the signal handler arranged to be called at the very same point where the interrupt happened. I think this approach in general (i.e. considering all signals that not just GC) has the same issues as the orignal that doesn't really fail either with any frequency that approaches reproducability. Now, for GC the situation is somewhat different. We know that GC is triggerred synchronously by consing. Hence, we can be sure that no async signal unsafe C code is running at that time in the thread where the gc is triggerred and where we are going to run the GC code. In my world that means we are safe. In short, our synchronous signals are nothing to worry about because the restrictions do no apply to them so as far as I know GC is fine as it is and true async signals (think sigalarm, sigint) are not helped out the trampoline. I've found this page about MPS that seems to share this view: http://www.ravenstream.com/project/mps/master/design/protli/ " .threads.async: POSIX (and hence Linux) imposes some restrictions on signal handler functions (see design.mps.pthreadext.anal.signal.safety). Basically the rules say the behaviour of almost all POSIX functions inside a signal handler is undefined, except for a handful of functions which are known to be "async-signal safe". However, if it's known that the signal didn't happen inside a POSIX function, then it is safe to call arbitrary POSIX functions inside a handler. .threads.async.protection: If the signal handler is invoked because of an MPS access, then we know the access must have been caused by client code (because the client is not allowed to permit access to protectable memory to arbitrary foreign code [need a reference for this]). In these circumstances, it's OK to call arbitrary POSIX functions inside the handler. .threads.async.other: If the signal handler is invoked for some other reason (i.e. one we are not prepared to handle) then there is less we can say about what might have caused the SEGV. In general it is not safe to call arbitrary POSIX functions inside the handler in this case. " > Todo: The initially malloced copies are just a quick hack: I am > planning to actually copy the siginfo and context directly to stack, > so (1) we don't need to worry if malloc is signal safe, and (2) we > don't need to worry about leaking memory due to asynch unwinds. > > Todo: This is also still missing a bit of interrupt protection: we > still have to worry about asynch unwinds that catch us after we have > established the new signal mask. > > How does this sound? Am I forgetting something obvious? Does this > approach make someone more uneady then running whatnot inside signal > handlers? > > Cheers, > > -- Nikodemus |
From: Nikodemus S. <nik...@ra...> - 2007-03-26 09:52:59
|
There was some talk about this a few days back on #lisp. I'll recap what I recall where the major points here (which are probably mixed up with my own conclusions): * Gabor probably hits the nail on the head when he explains what the POSIX signal safety requirement really means: our handlers for semi-synchronous signals should be safe even if they don't follow the letter os POSIX. * Similarly, our handlers for asynch signals are not going to be safe even with the kinds of tricks we play with arrange_return_to_lisp_function. * To make asynch signal handlers safe in multithreaded builds they need to request handling from another thread, probably using realtime semaphores (which are signal safe). * To make asynch signal handlers safe in unithreaded builds we apparently need to mask the asynch signals pretty much everywhere, and listen for them only in safe points. * Neither of these approaches will make async unwind issues go away. Asynch unwinds will never be really safe. * We can, however, gain a synchronous timeout ability by making various blocking functions have not just a :TIMEOUT parameter, but by making them also respect a global *DEADLINE*. I hesitate to say anything about properties of such synchronous timeouts, though. Corrections and comments hoped for, -- Nikodemus |
From: Brian M. <br...@ma...> - 2007-03-26 12:05:14
|
Nikodemus Siivola wrote: > * We can, however, gain a synchronous timeout ability by making various > blocking functions have not just a :TIMEOUT parameter, but by making > them also respect a global *DEADLINE*. I hesitate to say anything > about properties of such synchronous timeouts, though. Hi Nikodemus, What is your concern here about the properties of such timeouts? Regardless of whether async unwinds can be made safe, I think what you describe is good global policy. For applications which call a number of blocking APIs but are unconcerned with entering an infinite loop in Lisp code, this is all the timeout machinery which is necessary. It would probably be a good idea to expose the API used here so that FFI users can respect these timeouts as well. I've a few thoughts here if others are interested. Thanks, -- Brian Mastenbrook br...@ma... http://brian.mastenbrook.net/ |
From: Nikodemus S. <nik...@ra...> - 2007-03-26 13:13:24
|
Brian Mastenbrook wrote: > Nikodemus Siivola wrote: >> * We can, however, gain a synchronous timeout ability by making various >> blocking functions have not just a :TIMEOUT parameter, but by making >> them also respect a global *DEADLINE*. I hesitate to say anything >> about properties of such synchronous timeouts, though. > What is your concern here about the properties of such timeouts? Just my ability to get details of stuff like this wrong the first time. I would like to say that they are well-behaved and can be unwound from safely, but I have no proof either way right now. > Regardless of whether async unwinds can be made safe, I think what you > describe is good global policy. For applications which call a number of > blocking APIs but are unconcerned with entering an infinite loop in Lisp > code, this is all the timeout machinery which is necessary. It would > probably be a good idea to expose the API used here so that FFI users > can respect these timeouts as well. I've a few thoughts here if others > are interested. I am! Cheers, -- Nikodemus |
From: Brian M. <br...@ma...> - 2007-03-27 12:42:00
|
Nikodemus Siivola wrote: > Brian Mastenbrook wrote: > >> Nikodemus Siivola wrote: >>> * We can, however, gain a synchronous timeout ability by making various >>> blocking functions have not just a :TIMEOUT parameter, but by making >>> them also respect a global *DEADLINE*. I hesitate to say anything >>> about properties of such synchronous timeouts, though. > >> What is your concern here about the properties of such timeouts? > > Just my ability to get details of stuff like this wrong the first time. > I would like to say that they are well-behaved and can be unwound from > safely, but I have no proof either way right now. These synchronous timeouts should all be triggered after return into Lisp, when the code calling the foreign function checks the return value and determines that it returned due to an expired timeout. Thus unwind will be safe as it will be triggered from Lisp code, and not from a signal handler running in the middle of an allocation (or any other "bad" case). However... >> Regardless of whether async unwinds can be made safe, I think what you >> describe is good global policy. For applications which call a number >> of blocking APIs but are unconcerned with entering an infinite loop in >> Lisp code, this is all the timeout machinery which is necessary. It >> would probably be a good idea to expose the API used here so that FFI >> users can respect these timeouts as well. I've a few thoughts here if >> others are interested. > > I am! ... they don't necessarily have to unwind. TIMEOUT should be signaled as a condition, and a restart made available to continue execution. Consider an application working with SIP messaging over UDP: it must explicitly retry certain transactions if no response is received, but in the meantime it may be off trying to contact another host. In this case, the response to the timeout should be to retry the message send and return to whatever else may be processing. *DEADLINE* is probably too simple as well. Applications like the one I mentioned above will need to have a set of timeouts active and trigger different responses depending on which timeout is expiring. Also, timeout response will usually not need to be taken (as usually the remote host will be there) and so the application should be given some way of canceling a timeout. Put this way, a deadline is more of a computed function from which timeouts are currently active than a single global value. This function can be computed when timeouts are added to and removed from the set of active timeouts, so the code surrounding each blocking foreign call will still be relatively cheap. I think the interface which is needed here is: * A function to return the current deadline based on the current timer queue, * A function to trigger appropriate timeout actions when the deadline has passed, * A function to register a timeout, which is given an instance of a condition class to be signaled when the timeout expires * A function to cancel a timeout, which is given the condition class instance to find in the queue and cancel There are still a few rough edges here that need to be ironed out: for instance, when a blocking call returns successfully but the deadline has passed, do we still invoke timeout handlers? If a timeout handler chooses to unwind, but other timeouts active in the queue are ready to expire, when do they fire? It would probably be interesting to have SB-EVAL check the timeout periodically as well. -- Brian Mastenbrook br...@ma... http://brian.mastenbrook.net/ |
From: James Y K. <fo...@fu...> - 2007-03-27 22:58:13
|
On Mar 27, 2007, at 8:42 AM, Brian Mastenbrook wrote: > I think the interface which is needed here is: > > * A function to return the current deadline based on the current > timer queue, > * A function to trigger appropriate timeout actions when the > deadline > has passed, > * A function to register a timeout, which is given an instance of a > condition class to be signaled when the timeout expires > * A function to cancel a timeout, which is given the condition > class > instance to find in the queue and cancel > Uggg. This sounds really overdesigned to me. What I'd like is simple: timer support in the serve-event loop. I think that should cleanly cover your UDP SIP server case as well. Other than that, a timeout argument to various blocking functions is useful, but I don't really see that they need such a sophisticated support system to go along with them. One can always keep a global variable *deadline* in user code and pass the appropriate timeout to each function as you call it, no? An implicit failure-inducing global like that just seems dangerous. James |
From: Nikodemus S. <nik...@ra...> - 2007-03-28 12:36:46
|
James Y Knight wrote: > On Mar 27, 2007, at 8:42 AM, Brian Mastenbrook wrote: >> I think the interface which is needed here is: >> >> * A function to return the current deadline based on the current >> timer queue, >> * A function to trigger appropriate timeout actions when the deadline >> has passed, >> * A function to register a timeout, which is given an instance of a >> condition class to be signaled when the timeout expires >> * A function to cancel a timeout, which is given the condition class >> instance to find in the queue and cancel > Uggg. This sounds really overdesigned to me. What I'd like is simple: > timer support in the serve-event loop. I think that should cleanly cover > your UDP SIP server case as well. I confess I haven't thought about how this ties in with SERVE-EVENT yet. My first thought is that streams have timeouts of their own, which would then cause the even-loop to signal timeout. ...but considering how it is a recursive event-loop that makes me feel quite ill at ease. Need to think and see what I can implement. > Other than that, a timeout argument to various blocking functions is > useful, but I don't really see that they need such a sophisticated > support system to go along with them. One can always keep a global > variable *deadline* in user code and pass the appropriate timeout to > each function as you call it, no? An implicit failure-inducing global > like that just seems dangerous. I think I am pretty square in the middle between you two. The interface work I've done so far seems to speak strongly in favor of per-object default timeouts, per-call-site explicit timeouts, and a single global *DEADLINE*. I tried an interface similar to proposed by Brian, but it turned out that then writing a function exposing a timeout parameter to caller and respecting the global deadlines got quite hairy very quickly, and there would have been a whole lot of possible bignum computations going on. Here's a sketch of what I have in mind: ;;; Primary timeout user interface (defmacro with-timeout (seconds &body forms) `(let* ((*timeout* ,seconds) (*deadline* (min (+ (now) *timeout*) *deadline*))) ,@forms)) ;;; Primary timeout function writer interface (defun ensure-timeout (&optional default) (let ((default (if default (coerce default 'double-float) 0.0d0))) (if *deadline* (let ((timeout (- *deadline* (now)))) (unless (plusp timeout) (with-simple-restart (continue "Extend the deadline by ~A seconds." *timeout*) (error 'deadline :seconds *timeout*)) (setf timeout *timeout* *deadline* (+ (now) timeout))) (if (= default 0.0d0) timeout (min timeout default))) default))) ;;; A random timeout respecting function. (defun foo (thing &optional (timeout (ensure-timeout (thing-timeout thing)))) (loop (when (= +foreign-timeout+ (foreign-foo timeout)) (with-simple-restart (continue "Continue for ~A seconds more." timeout) (error 'simple-timeout :format-control "Timeout while doing FOO." :seconds timeout))))) This doesn't do nearly as much as Brian's proposal, but is simple to use for the common case. If you do have cases where you need to tell apart between different levels of timeout (time to die, time to abort connection, etc), then I suggest that you may be better off with timers. (Which would then fire off in a thread of their own in multithreaded builds, and during safe-points in unithreaded builds.) Cheers, -- Nikodemus |
From: James Y K. <fo...@fu...> - 2007-03-28 15:09:48
|
On Mar 28, 2007, at 8:36 AM, Nikodemus Siivola wrote: > I confess I haven't thought about how this ties in with SERVE-EVENT > yet. My first thought is that streams have timeouts of their own, > which would then cause the even-loop to signal timeout. ...but > considering how it is a recursive event-loop that makes me feel quite > ill at ease. Need to think and see what I can implement. I just meant timers, handled by the event loop. Something like (add- timer-event function secs) => handle; (remove-timer-event handle). If one triggers, this counts as an event occurring. Timer support is pretty much standard for every other event loop implementation I know of. (I think various patches to implement this for serve-event have been proposed in the past but I've not examined them in detail). I don't know what you mean by streams having timeouts that would cause serve-event to timeout; that doesn't make any sense to me. James |