Menu

#498 Maybe race condition: 200 OK for INVITE relayed after CANCEL

ver 1.3.x
open
modules (357)
3
2008-08-06
2008-08-03
No

I have an SBC that sends calls to Kamailio, whose registrar module resolves multiple contacts for that AOR causing two call branches to be generated and sent toward the contacts, A and B. A picks up first, then B, but both 200 OKs arrive in succession in the packet capture prior to the occurrence of any other SIP messages in any direction.

A and B pick up (200 OK) almost simultaneously. As a result, Kamailio appears to pass back the second 200 OK to the SBC along with the first, even after sending a CANCEL to the B contact and receiving a 200 OK for that dialog.

Here is the precise sequence of events:

1. SBC sends call to Kamailio proxy.

2. Proxy does registrar dip and resolves two contacts - A and B.

3. Proxy bifurcates the call into two branches 'branch A' (to A) and 'branch B' (to B). Rewrites RURI, relays INVITE.

4. A answers with 200 OK.

5. B answers with 200 OK.

6. Proxy passes back 200 OK to SBC for A. Then for B.

7. SBC issues in-dialog end-to-end ACK for that 200 OK; proxy decides to forward it only to A as per the ONREPLY-ROUTE. No replies are forwarded to B. It is here that I think things go wrong.

8. B keeps sending 200 OKs and getting no ACKs for them, and eventually gives up and kills the session.

So, it looks like not all replies are being statefully relayed to both branches.

Additionally, it looks like the following is happening:

- At step #6 above, the 200 OK passed to the SBC is for A only.

- The proxy elects to CANCEL the other branch to B between #6 and #7.

- After sending the CANCEL, the proxy decides to pass back the original 200 OK for the INVITE (with SDP) for B back to the SBC as well.

- After that, B replies with a 200 OK for the CANCEL issued by the proxy. Why does it reply with a 200 OK? Simply because it is after the INVITE was already OK'd? Is that per the RFC? I thought a call leg
could not be CANCEL'd at this stage at all and requires a BYE?

- SBC ACKs the 200 OK (for INVITE) from A, and proxy relays to A.

- Meanwhile, B keeps sending 200 OKs for the INVITE (AFTER a CANCEL on that branch!) and the proxy keeps relaying them back to the SBC, which replies with ACKs. But these ACKs keep getting forwarded back to A, not B, presumably because from the proxy's POV the B leg is now CANCEL'd and OK'd (in the penultimate step).

And here is the time-indexed sequence of events from the packet capture:

- Packet 9, time index 7.953711: 200 OK arrives from A.
- Packet 10, time index 7.954636: 200 OK arrives from B.
- Packet 11, time index 7.969227: Proxy passes 200 OK from A back to SBC.
- Packet 12, time index 7.969268: Proxy originates CANCEL for branch B.
- Packet 13, time index 7.970279: Proxy passes 200 OK from B back to SBC.
- Packet 14, time index 7.971508: 200 OK for CANCEL request arrives from B. [1]
- Packet 15, time index 8.018730: SBC originates ACK for branch to A.
- Packet 16, time index 8.018895: Proxy passes ACK for branch A to A.
- Packet 17, time index 8.153957: B retransmits 200 OK for INVITE.
- Packet 18, time index 8.155309: Proxy forwards 200 OK from B to SBC.
- Packet 19, time index 8.155853: SBC sends ACK again to A's contact. This is really strange because the Contact address in Packet 18 is for B.

According to Juha Heinanen, this may be a race condition because:

"9 and 10 arrive to proxy very close to each other, which may result in a race condition bug causing proxy to send packet 13, which it should not do."

---

I am using OpenSER 1.3.2. The sender UA is a Nextone SBC, and the two registrants are both Asterisk 1.4.21.2. This problem might occurs in the same way every single time, irrespectively of any temporal variations. It may be possible to reproduce by concurrently registering two Asterisk instances against the Kamailio registrar for one AOR and sending a call to them; also, append_branches is turned on for the registrar.

Discussion

  • Daniel-Constantin Mierla

    • priority: 5 --> 3
    • assigned_to: nobody --> miconda
     
  • Daniel-Constantin Mierla

    Logged In: YES
    user_id=1246013
    Originator: NO

    It is a well know race situation with 200OK for INVITE and CANCEL. The RFC3261 is not clear about dealing with it, I don't know if there were some updates in other documents. I will search in the next days (quite busy now) for discussions related to same topic, so we can continue this thread.

    Lowering the priority now to skip it in release blockers counting.

     
  • Klaus Darilion

    Klaus Darilion - 2008-08-06

    Logged In: YES
    user_id=1318360
    Originator: NO

    I think Kamailio can do nothing in such a scenario. The CANCEL to B will be ignored as the call is already answered (the reponse code of the CANCEL-reply is rather irrelevant).

    Only the caller can handle such situations. That means the SBC should detect that the second 200 OK has a different to tag. Thus, there are 2 dialogs created at the SBC: one with A and one with B. Thus, the SBC has to decide what to do - usually it should terminate the second dialog - i.e. send ACK to both, A and B, and then send BYE to B.

    IMO this can be closed.

     
  • Alex Balashov - Evariste System

    Logged In: YES
    user_id=2167036
    Originator: YES

    I must disagree here:

    - Packet 11, time index 7.969227: Proxy passes 200 OK from A back to SBC.
    - Packet 12, time index 7.969268: Proxy originates CANCEL for branch B.
    - Packet 13, time index 7.970279: Proxy passes 200 OK from B back to SBC.

    Kamailio should not be passing back that second 200 OK *after* it has CANCEL'd that same branch.

    I suspect the problem may be that the thread charged with passing back the 200 OKs from the branches back to the SBC and the thread issuing the CANCEL are different threads and unaware of what the other has done at this precise moment. I think that is the issue that needs to be fixed via some form of locking / state machining.

    Is this a naive, uninformed perspective?

     
  • Nobody/Anonymous

    Logged In: NO

    > Kamailio should not be passing back that second 200 OK *after* it has
    > CANCEL'd that same branch.

    That is wrong. The proxy MUST forward the 200 OK as the dialog is already established at B and a proxy can not terminate a dialog. The SBC MUST deal with this race condition.

    klaus

     
  • Nobody/Anonymous

    Logged In: NO

    I just wanted to verify if the standard has the same opinion than:

    16.7 Response Processing (Proxy)

    10. Generate CANCELs

    If the forwarded response was a final response, the proxy MUST
    generate a CANCEL request for all pending client transactions
    associated with this response context. A proxy SHOULD also
    generate a CANCEL request for all pending client transactions
    associated with this response context when it receives a 6xx
    response. A pending client transaction is one that has
    received a provisional response, but no final response (it is
    in the proceeding state) and has not had an associated CANCEL
    generated for it. Generating CANCEL requests is described in
    Section 9.1.

    The requirement to CANCEL pending client transactions upon
    forwarding a final response does not guarantee that an endpoint
    will not receive multiple 200 (OK) responses to an INVITE. 200
    (OK) responses on more than one branch may be generated before
    the CANCEL requests can be sent and processed. Further, it is
    reasonable to expect that a future extension may override this
    requirement to issue CANCEL requests.

    IMO tis is not 100% written down, but it says the "endpoint" receives multiple 200 OK. Further, as a proxy by definition can not send a BYE for a logical point of view it has to forward the second 200 OK to the caller, and the caller has to take care to terminate the second dialog.

    Maybe the dialog could be used to fake BYE also in this scenario, but I strongly suggest to fix the problem where it should be fixed -> in the SBC - although this is probably the most complicated option ;-)

    klaus

     
  • Anatoly Pidruchny

    Logged In: YES
    user_id=1759384
    Originator: NO

    Please excuse my barging in, but I just want to support Alex in the opinion that it is OpenSER that has to take care of this. It is OpenSER, not SBC, that forked the call to two branches. SBC has no idea that the call was forked. IMO, when OpenSER receives second 200 OK from branch B, it should not relay it to SBC, but it should send ACK immediately followed by BYE to B, and later it should silently drop the 200 OK for that BYE. RFC is not clear as to what to do in this case. And where is the definition of the proxy saying that it can not send BYE?

    /Anatoly.

     
  • Alex Balashov - Evariste System

    Logged In: YES
    user_id=2167036
    Originator: YES

    Thank you, Anatoly.

    While I agree that the SBC should handle this situation, the fact is, OpenSER receives OK from branch A, receives OK from branch B, sends OK from branch A, sends a CANCEL to branch B, then relays the OK from branch B. In that order. Perhaps I am not understanding something about the specified behaviour of a proxy, but I think that if it has taken it upon itself to branch the call, it should be responsible for managing the outcome.

    I understand that in principle, proxies are meant to properly relay whatever they receive. But in this case, branching is performed by the proxy, and the SBC is not aware of it. Furthermore, it is obvious that the proxy is empowered to terminate one of the branches in response to a certain set of conditions, so given that the situation above plays out in the temporal order that it does, why would we say that it has no power to intervene? The OKs are in-branch responses, not responses just to the original dialog initiated by the SBC.

    It seems to me that the problem here is that one hand of OpenSER is not aware of what the other hand is doing, or it would not pass a 200 OK that it has *previously* received *after* terminating a branch.

     
  • Alex Hermann

    Alex Hermann - 2008-08-07

    Logged In: YES
    user_id=1212856
    Originator: NO

    Please quote the parts of the releveant rfc's that support your statements.

    The INVITE/CANCEL/200 OK race condition is well known, and the way to handle it is to acknowledge the dialog with an ACK and terminate any unwanted dialog with a BYE.

    https://lists.cs.columbia.edu/pipermail/sip-implementors/2004-March/006217.html

    > It seems to me that the problem here is that one hand of OpenSER is not
    > aware of what the other hand is doing, or it would not pass a 200 OK that
    > it has *previously* received *after* terminating a branch.

    You do mean the 200 OK for the INVITE, do you? If you mean the 200 OK for the CANCEL, than you have a point, the proxy should absorb them.

    The 200 OK for the INVITE MUST be passed to the UAC. The UAC should end every dialog it is not interested in by sending a BYE. If your UAC fails to do so, the UAC is broken, not the proxy.

    RFC 3261 page 110, paragraph 2:
    After a final response has been sent on the server transaction,
    the following responses MUST be forwarded immediately:

    - Any 2xx response to an INVITE request

    RFC 3261 section 13.2.2.4:
    Multiple 2xx responses may arrive at the UAC for a single INVITE
    request due to a forking proxy.

     
  • Nobody/Anonymous

    Logged In: NO

    Hi Alex (axlh)!

    Thanks for the pointers to the revelant paragraphs in the RFC.

    As it is clear now that this is not a bug, we should either close it or move it to the feature request tracker (as Kamailio with the dialog module can already be used as some kind of mixture betweeen proxy and B2BUA)

    regards
    klaus

     
  • Alex Balashov - Evariste System

    Logged In: YES
    user_id=2167036
    Originator: YES

    > You do mean the 200 OK for the INVITE, do you?

    Yes, I do.

    > The 200 OK for the INVITE MUST be passed to the UAC. The UAC should end
    > every dialog it is not interested in by sending a BYE. If your UAC fails to
    > do so, the UAC is broken, not the proxy.

    How is the UAC supposed to know that, when, from its point of view, there is only one call branch?

    > RFC 3261 section 13.2.2.4:
    > Multiple 2xx responses may arrive at the UAC for a single INVITE
    > request due to a forking proxy.

    Well, I guess that pretty much settles it.

    My issue was with the idea that the proxy relays a 200 OK for an INVITE *after* it has CANCEL'd the branch to which that OK corresponds. If it relayed it before, that is understandable, but that is not what is going on.

     
  • Daniel-Constantin Mierla

    Logged In: YES
    user_id=1246013
    Originator: NO

    I think we should try to fix in this specific case, don't send CANCEL if transaction answered, after release and backport. But the race still remains at SIP level and the UAC shall be able to discard CANCEL if it has answered the call. Think about:

    A ======== P1 ========= B

    A, B are UAC
    P1 is stateful proxy

    the 200OK is issued by B more or less in same time when P1 sends the CANCEL. B must be able to handle this situation either by replying negatively to CANCEL or, maybe, send 200ok to cancel followed by BYE. I haven't followed the RFCs for UAC side that much lately to see if there is now a recommendation to solve the race.

    As stateful proxy we cannot send by, perhaps using dialog, as Klaus said, in the future.

     

Log in to post a comment.

MongoDB Logo MongoDB