Menu

#101 Automatic NOTIFY uses stale handle

open
None
5
2008-11-12
2008-11-12
Pekka Pessi
No

The nua handle gets recycled and reference counting fails if a NOTIFY is generated by a stack after handle is destroyed.

--

Due to a slightly non-standard SIP flow in an attended transfer implementation I encountered, I've found a way to crash the Sofia stack. It's a race condition, and is difficult to reproduce, but I can usually get it to happen after about an hour of execution on our test system here. I am willing to run some tests of fixes on our set-up here, if need be.

I have the transferer and transferee running on the same host, sharing the same Sofia nua stack instance. The transfer destination is running on another host. The transferee's handle for its dialog with the transferer is getting deallocated (ref count reaches zero) while it is still on the handle list. The deallocation memsets the entire handle to 0xaa. The next time nua_stack_timer runs, it calls nh_call_pending for the deallocated handle, which dereferences a pointer found inside the handle, and BOOM! I get a bus error due to an unaligned memory address access (it's trying to load a 32-bit word from 0xaaaaaaaa + 0x20).

Further investigation shows that the stack is getting into this situation as follows:

1. Transferer has sent INVITE to transfer destination and received a 100 Trying and 180 Ringing, but transfer destination hasn't answered the call.
2. Transferer sends REFER to transferee, immediately followed by a BYE (This is the non-"standard" part. The transferer should really be waiting to send the BYE when it gets the NOTIFY [200 OK]).
3. Transferee receives the REFER and sends an INVITE to the transfer destination (starts dialog D3).
4. Transferee receives BYE indications, etc., with the last being a "Terminated" from Sofia for D1, which the transferee responds to by calling nua_handle_destroy.
5. Transferee's nua "protocol thread" processes an r_destroy signal for D1's handle, thereby removing that handle from the handle list.
6. Transfer destination sends 100 Trying and 180 Ringing to transferee for D3.
7. Transferee's nua "protocol thread" processes an r_notify signal in nua_stack_signal. At line 549, in nua_stack.c, the handle is seen to not be on the handle list, and is added back onto the list.
8. 180 Ringing arrives for D3 from the transfer destination just as the transferee hangs up (calls nua_bye for) D3, because the automated test system ended the call.
9. Transferee receives a bunch of indications from nua for D3 (as a result of receiving the 487 Request Terminated), the last of which is a "Terminated" indication. The transferee responds to the "Terminated" indication by calling nua_handle_destroy for D3's handle.
10. Transferee's nua "protocol thread" processes an r_notify signal for D1's handle. The handle is unrefd, and the ref count reaches zero. The handle is deallocated, but it's still on the handle list.

Of course, the next time the timer expires and runs nua_stack_timer, the handle list is traversed, and my thread catches the SIGBUS to hell. ;)

It seems like the NOTIFY code needs to remove the handle from the handle list when the subscription terminates, but I don't really know what I'm talking about here. Maybe there's some other reason why the handle needs to remain on the handle list, in which case, there must be a missing call to nua_handle_ref somewhere.

I have a workaround that I'm testing right now. I've changed the transferee such that it waits until after D3 has either become active or terminated before it calls nua_destroy_handle for the terminated dialog D1. This seems to be working, but I'll run it for a few days to be sure.

Cheers.

--Jen

Discussion


Log in to post a comment.