From: Tobias C. R. <tc...@fr...> - 2010-03-05 09:18:39
|
Say a deadline handler is executing (via HANDLER-BIND) within CONDITION-WAIT waiting on a mutex M and a condition-variable C. Should it be allowed that you can wake up C from within that deadline handler? In code: (let ((mutex (sb-thread:make-mutex)) (cvar (sb-thread:make-waitqueue))) (handler-bind ((sb-sys:deadline-timeout #'(lambda (c) (sb-thread:with-recursive-lock (mutex) (sb-thread:condition-broadcast cvar)) (sb-sys:defer-deadline 10.0 c)))) (sb-sys:with-deadline (:seconds 0.1) (sb-thread:with-mutex (mutex) (sb-thread:condition-wait cvar mutex))))) Should it return, or hang? At the moment that happens to work (i.e. return) on Linux with SBCL's Futex based implementation. There's still a lost-wakeup bug in that implementation and fixing that bug in the trivial wait would disallow the above from working which is how I became aware of the issue. I talked about this with Gabor at Vienna, and he said he'd be reluctant to make that official behaviour because it might tie hands in future, and he wasn't sure whether one could get such a behaviour on non-Futex based implementations. Now because there seems to be people who want to revisit the non-Futex based implementation for adding deadlines, can you please research whether the above could be made work? Background: I was about to use deadline handlers to periodically check for messages and react upon them, propagate messages, change state etc. So I'd really like to do in a deadline handler pretty much anything I can do elsewhere. I mean I cannot know whether I'm right now waiting within a CONDITION-WAIT for C when I just happen to have to wake up all waiters on C due to a message I received. If this won't become official behaviour, I have to try to accomodate my midlevel synchronization tools to work around the issue. Hopefully there will be a way. What things you know are _not_ safe to do in deadline handlers? -T. |
From: Nikodemus S. <nik...@ra...> - 2010-03-05 12:55:17
|
On 5 March 2010 11:18, Tobias C. Rittweiler <tc...@fr...> wrote: > Say a deadline handler is executing (via HANDLER-BIND) within > CONDITION-WAIT waiting on a mutex M and a condition-variable C. > > Should it be allowed that you can wake up C from within that deadline > handler? I would note that the case as posted is broken: spurious wakeups are allowed, so theoretically it could return immediately, without even visiting the deadline handler. Condition variables are meaningful only when they mediate waits for data of some kind. Restating the case like this seems more useful to me. (let ((mutex (sb-thread:make-mutex)) (cvar (sb-thread:make-waitqueue)) (data nil)) ;; This thread will produce value for us. (run-writer-thread mutex cvar (lambda (x) (setf data x))) (handler-bind ((sb-sys:deadline-timeout #'(lambda (c) (sb-thread:with-recursive-lock (mutex) (setf data (append data (list :deadline))) (sb-thread:condition-broadcast cvar)) (sb-sys:defer-deadline 10.0 c)))) (sb-sys:with-deadline (:seconds 0.1) (let (got) (sb-thread:with-mutex (mutex) (loop until data do (sb-thread:condition-wait cvar mutex) (setf got (pop data))) got))) I suspect the right thing is to consider deadlines during condition-wait as wakeups. If there is no data to be had, a correctly written CONDITION-WAIT loop will re-enter the wait, and the deadline will be signalled by DECODE-TIMEOUT. The handler will defer the deadline, which will cause another extension -- etc. I _think_ this should be doable in most imaginable condition-wait implementations. > At the moment that happens to work (i.e. return) on Linux with SBCL's > Futex based implementation. There's still a lost-wakeup bug in that > implementation and fixing that bug in the trivial wait would disallow I need more coffeee. What's the lost-wakeup bug? Cheers, -- Nikodemus |
From: Tobias C. R. <tc...@fr...> - 2010-03-05 14:13:36
|
Nikodemus Siivola <nik...@ra...> writes: > On 5 March 2010 11:18, Tobias C. Rittweiler <tc...@fr...> wrote: > > > Say a deadline handler is executing (via HANDLER-BIND) within > > CONDITION-WAIT waiting on a mutex M and a condition-variable C. > > > > Should it be allowed that you can wake up C from within that deadline > > handler? > > I would note that the case as posted is broken: spurious wakeups are > allowed, so theoretically it could return immediately, without even > visiting the deadline handler. Condition variables are meaningful only > when they mediate waits for data of some kind. Of course. > I suspect the right thing is to consider deadlines during > condition-wait as wakeups. Happy to hear that! > > At the moment that happens to work (i.e. return) on Linux with SBCL's > > Futex based implementation. There's still a lost-wakeup bug in that > > implementation and fixing that bug in the trivial wait would disallow > > I need more coffeee. What's the lost-wakeup bug? See Feb 05, and Feb 13, subject contains "lost wakeup". The bit about stack-allocation in the posting on Feb 13 was already dismissed as wrong by Gabor in Vienna. -T. |
From: Nikodemus S. <nik...@ra...> - 2010-03-05 14:51:32
Attachments:
0001-CONDITION-WAIT-fixes.patch
|
On 5 March 2010 16:13, Tobias C. Rittweiler <tc...@fr...> wrote: > See Feb 05, and Feb 13, subject contains "lost wakeup". > > The bit about stack-allocation in the posting on Feb 13 was already > dismissed as wrong by Gabor in Vienna. The fix Gabor posted is IMO obviously correct, and doing that should not interfere with treating futex-wait timeouting as a spurious wakeup. Tentative patch attached -- untested, but unless I'm missing something this should both take care of the lost wakeup and allow deadline handlers to do their stuff. Cheers, -- Nikodemus |
From: Nikodemus S. <nik...@ra...> - 2010-03-12 12:29:51
|
2010/3/12 Gábor Melis <me...@re...>: >>> Addresses... maybe. We have THREAD-OS-THREAD, which should (I think) >>> be aligned, so we could use: >>> >>> (logior os-thread-address 0) ; wait >>> (logior os-thread-address 1) ; notify >>> >>> That should be enough. Can someone check my thinking? > > os_thread_t is pthread_t, on Linux it happens to be an address, but in > general it may be anything. Furthermore, pthread ids may be reused > (and on Linux they are, and pretty quickly IIRC) which may be a > problem. Hm, but in principle a per-thread ID + 1 bit for wait/notify is enough, right? ...but recycling, ouch. It seems to me that would be a problem with any ID but a freshly consed one. :/ Cheers, -- Nikodemus |
From: Nikodemus S. <nik...@ra...> - 2010-03-15 18:11:37
Attachments:
0001-lost-wakeups-in-CONDITION-WAIT.patch
|
I'm slowly despairing, but I would offer the attached patch as at least-as-correct as the current code. Cheers, -- Nikodemus |
From: Tobias C. R. <tc...@fr...> - 2010-03-19 16:10:51
|
Nikodemus Siivola <nik...@ra...> writes: > I'm slowly despairing, but I would offer the attached patch as at > least-as-correct as the current code. Here's another scenario: - Thread A calls CONDITION-WAIT on waitqueue CVAR w/ mutex MTX - CVAR's token is set to A. - Thread A wakes up from FUTEX-WAIT with ETIMEOUT or EINTR. - Thread A executes cleanup form (ALLOW-WITH-INTERRUPTS (GET-MUTEX)) - MTX is contended, so a deadline is signaled in (GET-MUTEX) - Thread A is preempted; - Thread B is being executed. - We assume it was Thread B holding on MTX. - Thread B invokes CONDITION-NOTIFY on CVAR. - CVAR's token is set to B. - Thread B releases MTX. - We get back to Thread A - Inside the now executing deadline handler in Thread A, we call CONDITION-WAIT _again_ on CVAR & MTX. - That CONDITION-WAIT will set CVAR's token to A again. Here it results that the CONDITION-NOTIFY performed by thread B will neither affect the CONDITION-WAIT in the deadline handler, nor the original CONDITION-WAIT of Thread A. I.e. it is lost. -T. |
From: Tobias C. R. <tc...@fr...> - 2010-03-23 21:39:44
|
Nikodemus Siivola <nik...@ra...> writes: > I'm slowly despairing, but I would offer the attached patch as at > least-as-correct as the current code. Here's a test case that is very likely to hang on current SBCL but that will not hang with your patch applied. Hopefully with that test case, the fix can finally go in after the freeze. -T. ;;;; Explanation: ;;; ;;; We have three threads A, B and C where A,B will be waiting on ;;; CVAR, and C will wake them up in a way that will make B lose the ;;; wakeup. ;;; ;;; - A,B,C synchronize with each other. In particular, C waits ;;; until A and B are in the following states: ;;; ;;; A is in a deadline handler due to the DECODE-TIMEOUT in ;;; the beginning of CONDITION-WAIT's loop. As a result, it's ;;; A that holds on the mutex. ;;; ;;; B is in a deadline handler due to the DECODE-TIMEOUT in ;;; the GET-MUTEX in the UWP cleanup clause of CONDITION-WAIT. ;;; ;;; - A,B,C wake up from mutual waiting: ;;; ;;; C will sleep now a bit so A can continue from the deadline ;;; to release the mutex. ;;; ;;; A does so. It's now in FUTEX-WAIT. ;;; ;;; B will sleep now, too, considerably longer than C. It'll ;;; sleep over C's next step: ;;; ;;; - C grabs the mutex, and calls CONDITION-BROADCAST. It sets ;;; CVAR's token to the wakeup token. ;;; ;;; A will wake up, and finish. ;;; ;;; B is still sleeping. ;;; ;;; - C calls CONDITION-WAIT. Consequently, it sets CVAR's token ;;; back to the wait token. The call to CONDITION-WAIT is timed, ;;; (the purpose is just to set the token), so C will finish ;;; shortly after. ;;; ;;; - B awakes from the sleep. Continues from the deadline handler ;;; back into CONDITION-WAIT -- remember we were in the UWP ;;; cleanup clause -- where we'll enter a new iteration of the ;;; loop which means we'll come FUTEX-WAIT which compares with the ;;; token and will now see the wait token ;;; ;;; In case the wait token is shared between all CONDITION-WAIT ;;; calls, it follows that the wakeup is lost, and B will wait ;;; forever. ;;; (defun lost-wakeup-bug () (let ((AB-ready (sb-thread:make-semaphore)) (C-ready (sb-thread:make-semaphore))) (macrolet ((wait-for-C () `(progn (sb-thread:signal-semaphore AB-ready) (sb-thread:wait-on-semaphore C-ready))) (wait-for-AB () `(progn (sb-thread:wait-on-semaphore AB-ready) (sb-thread:wait-on-semaphore AB-ready) (sb-thread:signal-semaphore C-ready 2))) (spawn (name &body body) `(sb-thread:make-thread #'(lambda () ,@body) :name (string ',name)))) (let ((mutex (sb-thread:make-mutex)) (cvar (sb-thread:make-waitqueue))) (mapc #'sb-thread:join-thread (list (spawn A (handler-bind ((sb-sys:deadline-timeout #'(lambda (c) ;; A will be holding MUTEX here. (wait-for-C) ;; Now get back into ;; CONDITION-WAIT immediately to ;; release MUTEX. (sb-sys:defer-deadline 10.0 c)))) (sb-sys:with-deadline (:seconds 0.1) (sb-thread:with-mutex (mutex) (sb-thread:condition-wait cvar mutex))))) (spawn B (handler-bind ((sb-sys:deadline-timeout #'(lambda (c) ;; B won't hold MUTEX; we'll come ;; here from the UWP cleanup ;; clause in CONDITION-WAIT. (wait-for-C) ;; Wait longer than C, i.e. sleep ;; over its notification. If ;; we've erroneously losed that ;; wakeup, B will hang. (sleep 0.5) (sb-sys:defer-deadline 10.0 c)))) (sb-sys:with-deadline (:seconds 0.2) (sb-thread:with-mutex (mutex) (sb-thread:condition-wait cvar mutex))))) (spawn C (wait-for-AB) ;; After all are ready, sleep a bit to let A release ;; the mutex so we can get it. (sleep 0.2) (sb-thread:with-mutex (mutex) ;; Set the wakeup token. (sb-thread:condition-broadcast cvar) ;; Set the wait token. (handler-case (sb-sys:with-deadline (:seconds 0.1) (sb-thread:condition-wait cvar mutex)) (sb-sys:deadline-timeout ())))))) :no-hang)))) |
From: Tobias C. R. <tc...@fr...> - 2010-03-05 15:16:40
|
Nikodemus Siivola <nik...@ra...> writes: > On 5 March 2010 16:13, Tobias C. Rittweiler <tc...@fr...> wrote: > >> See Feb 05, and Feb 13, subject contains "lost wakeup". >> >> The bit about stack-allocation in the posting on Feb 13 was already >> dismissed as wrong by Gabor in Vienna. > > The fix Gabor posted is IMO obviously correct, and doing that should > not interfere with treating futex-wait timeouting as a spurious > wakeup. > > Tentative patch attached -- untested, but unless I'm missing something > this should both take care of the lost wakeup and allow deadline > handlers to do their stuff. No, it'll make my/your test case fail. It'll also make a test-case hang in the test suite I recently introduced. Here is the code again (let ((mutex (sb-thread:make-mutex)) (cvar (sb-thread:make-waitqueue))) (handler-bind ((sb-sys:deadline-timeout #'(lambda (c) (sb-thread:with-recursive-lock (mutex) (sb-thread:condition-broadcast cvar)) (sb-sys:defer-deadline 10.0 c)))) (sb-sys:with-deadline (:seconds 0.1) (sb-thread:with-mutex (mutex) (sb-thread:condition-wait cvar mutex))))) With your patch, CONDITION-WAIT will set WAITQUEUE-DATA of CVAR to thread X. We go into deadline handler, and execute CONDITION-BROADCAST which will set WAITQUEUE-DATA of CVAR to the _same_ value, thread X. We return from the deadline handler back into CONDITION-WAIT where the consequent FUTEX-WAIT will not see any update because WAITQUEUE-DATA of CVAR is still pointing to thread X. For that to work, either CONDITION-NOTIFY or CONDITION-WAIT has to set WAITQUEUE-DATA to a guaranteed unique datum. As there will probably be more threads executing CONDITION-WAIT, I suggest to cons up that unique datum in CONDITION-NOTIFY as a micro-optimization. -T. |
From: Nikodemus S. <nik...@ra...> - 2010-03-05 15:36:42
|
On 5 March 2010 17:16, Tobias C. Rittweiler <tc...@fr...> wrote: > which will set WAITQUEUE-DATA of CVAR to the _same_ value, thread X. We Right, doh. > For that to work, either CONDITION-NOTIFY or CONDITION-WAIT has to set > WAITQUEUE-DATA to a guaranteed unique datum. As there will probably be > more threads executing CONDITION-WAIT, I suggest to cons up that unique > datum in CONDITION-NOTIFY as a micro-optimization. Yuck, but yeah. (Difference between zero and some consing is big.) Anyone up to proving that a word-sized modular counter is guaranteed to be enough? (Or proving that it isn't, for that matter...) Cheers, -- Nikodemus |
From: <dhe...@te...> - 2010-03-11 17:56:57
|
>> For that to work, either CONDITION-NOTIFY or CONDITION-WAIT has to set >> WAITQUEUE-DATA to a guaranteed unique datum. As there will probably be >> more threads executing CONDITION-WAIT, I suggest to cons up that unique >> datum in CONDITION-NOTIFY as a micro-optimization. > > Anyone up to proving that a word-sized modular counter is guaranteed > to be enough? (Or proving that it isn't, for that matter...) Counters can always wrap, and they cost an atomic increment. Is there a memory address you could use instead? - Daniel |
From: Nikodemus S. <nik...@ra...> - 2010-03-12 09:51:53
|
On 11 March 2010 19:56, <dhe...@te...> wrote: > Is there a memory address you could use instead? Addresses... maybe. We have THREAD-OS-THREAD, which should (I think) be aligned, so we could use: (logior os-thread-address 0) ; wait (logior os-thread-address 1) ; notify That should be enough. Can someone check my thinking? Cheers, -- Nikodemus |
From: Nikodemus S. <nik...@ra...> - 2010-03-12 10:13:32
Attachments:
0001-condition-variable-hacking.patch
|
On 12 March 2010 11:51, Nikodemus Siivola <nik...@ra...> wrote: > On 11 March 2010 19:56, <dhe...@te...> wrote: > >> Is there a memory address you could use instead? > > Addresses... maybe. We have THREAD-OS-THREAD, which should (I think) > be aligned, so we could use: > > (logior os-thread-address 0) ; wait > (logior os-thread-address 1) ; notify > > That should be enough. Can someone check my thinking? Attached a quick, untested patch. Cheers, -- Nikodemus |
From: Gábor M. <me...@re...> - 2010-03-12 12:45:55
|
On Fri, Mar 12, 2010 at 11:13 AM, Nikodemus Siivola <nik...@ra...> wrote: > On 12 March 2010 11:51, Nikodemus Siivola <nik...@ra...> wrote: >> On 11 March 2010 19:56, <dhe...@te...> wrote: >> >>> Is there a memory address you could use instead? >> >> Addresses... maybe. We have THREAD-OS-THREAD, which should (I think) >> be aligned, so we could use: >> >> (logior os-thread-address 0) ; wait >> (logior os-thread-address 1) ; notify >> >> That should be enough. Can someone check my thinking? os_thread_t is pthread_t, on Linux it happens to be an address, but in general it may be anything. Furthermore, pthread ids may be reused (and on Linux they are, and pretty quickly IIRC) which may be a problem. > > Attached a quick, untested patch. > > Cheers, > > -- Nikodemus > > ------------------------------------------------------------------------------ > Download Intel® Parallel Studio Eval > Try the new software tools for yourself. Speed compiling, find bugs > proactively, and fine-tune applications for parallel performance. > See why Intel Parallel Studio got high marks during beta. > http://p.sf.net/sfu/intel-sw-dev > _______________________________________________ > Sbcl-devel mailing list > Sbc...@li... > https://lists.sourceforge.net/lists/listinfo/sbcl-devel > > |