#403 Access Violation on processing destroyed PTimer

Development_Branch
closed-fixed
nobody
PTLib (141)
5
2013-04-18
2012-12-25
No

Housekeeper thread processes destroyed PTimer:

1. Some object creates PTimer (subobject) with state==Stopped.

2. Someone starts PTimer (queues START request).

3. Someone starts destroying PTimer with state==Stopped.
(case 1: PTimer can be completely destroyed here => next steps result in Access Violation)

4. Housekeeper detects START request and adds PTimer to expired timers list.
(case 2: PTimer can be completely destroyed here => next steps result in Access Violation)

5. Housekeeper sets PTimer state to Running.
(case 3: PTimer can be completely destroyed here => next steps result in Access Violation)

6. Housekeeper processes PTimer.
(case 4: PTimer can be completely destroyed here, while processing PTimer => Access Violation)

All cases result in Access Violation.
Also step 4 may occur between steps 2 and 3.

Discussion

  • Patch #3598455 fixes this.

     
  • Also in current timers implementation it's possible that we queue START request in PTimer::StartRunning, but don't queue STOP request in PTimer::Stop:

    1. Timer created with state == Stopped.
    2. Some thread calls PTimer::StartRunning, queues START request.
    3. Some thread calls PTimer::Stop, but doesn't queue STOP request because PTimer state is Stopped.
    4. Housekeeper thread detects START request and starts timer processing.

     
  • Previous comment is true for PTimer::Pause/PTimer::Resume too: we can never PAUSE timer and we can never Resume timer from Stopped state.

     
  • I've reverted timer changes to implementation in rev. 27502 with few modifications which fix PTimer issues (hopefully).

     
  • Here is an updated patch.

     
  • Updated patch (for rev. 28838)

     
    Attachments
  • Fixed by patch #3598686, Eridani and later.

     
    • status: open --> closed-fixed
     
  • Backed out patch as got reports of deadlocks.

     
    • status: closed-fixed --> open
     
  • I can't be sure, but these deadlocks can be associated not with a PTimer, but with it's use. So they can appear with the current implementation too.

     
    • status: open --> pending
     
  • Unfortunately, it was pretty blatant. No deadlocks, added your patch, deadlocks, removed your patch, deadlocks gone. No other changes made by anyone. And was with production code that has been working for quite some time.

    While I agree that the problem is almost certainly outside of the scope of the PTImer code, that is irrelevant. Some internal behaviour changed enough to cause a problem. I cannot put the patch in. We need to find another solution.

     
  • We have a new PTimer implementation (with tests for it) - see patch #224.

     
    • status: pending --> closed-fixed
     
  • Fixed in patches #224 & #229