Menu

#466 Problem handling 300 reply with multiple choices

open
None
5
2008-08-06
2008-06-03
No

We are running an instance of openser 1.3.0 in our network and are seeing weird things happening to the destination set’s of calls at times.
I have not been able to reproduce this problem in a lab, however it does seem to happen enough in our network to cause concerns.

Since I am not able to provide a specific call scenario, I’ll do my best to document the main scenario’s were I’ve seen this happen.
For all scenarios we set the following two timers:
Fr_timer = 2
Fr_inv_timer = 120

Scenario 1:
Openser receives an Invite. Openser pushes this invite to a redirect server that returns a 300 Mulitiple Choices with a single contact header containing anywhere between 1 and 7 contacts weighted by q value. This q value is unique for each contact and has a value between 0 and 1.0 down to the thousandths decimal place.
example:

SIP/2.0 300 Multiple choices.
Via: SIP/2.0/UDP 1.1.1.1;branch=z9hG4bK1a1c.279e4a91.0,SIP/2.0/UDP 2.2.2.2:5060;branch=z9hG4bK62810cb4;rport=5060.
From: "DOC" <sip:+18668123456@3.3.3.3>;tag=as7e6accf1.
To: <sip:+18885551212@1.1.1.1>.
Call-ID: 6de77314108c97eb78835cea7b453597@3.3.3.3.
CSeq: 102 INVITE.
Contact: <sip:+18885551212@1.1.1.2:5060;transport=udp>;q=0.972;rate=0.0087,<sip:+18885551212@1.1.1.3:5060;transport=udp>;q=0.932;rate=0.0214,<sip:+18885551212@1.1.1.4:5060;transport=udp>;q=0.931;rate=0.0217,<sip:+18885551212@1.1.1.5:5060;transport=udp>;q=0.928;rate=0.0227,<sip:+18885551212@1.1.1.6:5060;transport=udp>;q=0.01;rate=0.0087,<sip:+18885551212@1.1.1.7:5060;transport=udp>;q=0.0090;rate=0.0227.
Content-Length: 0.

We catch this 300 in a failure route with the following code:
if (t_check_status("300") || t_check_status("302"))
{
get_redirects("*");
serialize_branches(0);
next_branches();
t_on_branch("1");
t_on_failure("2");
t_relay();
return;
}

Branch_Route[1] does nothing more than building an RPID Header and PAI Header and sticks them in the outgoing Invite.

Under normal circumstances the call is then sent to the contact with the highest Q value. If we do not receive a 100 or any other provisional within 2 seconds, openser receives a 408 generated internally and we catch this in failure_route[2] (from above).

failure_route[2]
{
if (t_check_status("503") ||
t_check_status("480") ||
t_check_status("502") ||
t_check_status("504") ||
t_check_status("403") ||
t_check_status("404") ||
t_check_status("484") ||
t_check_status("488") ||
t_check_status("500") ||
t_check_status("400") ||
t_check_status("606") ||
t_check_status("408"))
{
if(!next_branches())
{
if((t_check_status("408")) && (t_local_replied("last")))
{
xlog( "L_ERR", "Failure Route2: Status: Local $T_reply_code Ran out of branches for callid $ci" );
}
else
{
xlog( "L_ERR", "Failure Route2: Status: Network $T_reply_code Ran out of branches for callid $ci" );
}
return;
}
else
{
t_on_branch("1");
t_on_failure("2");
}
route(2); # simple t_relay
return;
}
}

From the failure route we call next_branches and t_relay the Invite, otherwise we push the 408 out.

In one of the “weird” scenarios, I’ve seen openser parallel fork out the Invite(s) to all of the contacts in the 300 rather than building different branches. (this shouldn’t happen)
In another one of the “weird” scenarios, I’ve seen openser fail on the next_branches call in the failure route and push the 408 out instead. I’m able to pull a trace for these calls and can see there multiple contacts in the 300, so I know there should still be additional branches...

Scenario 2:
Openser receives an Invite. Openser calls rewritehostport(“1.1.1.1:5060”), arms failure route 3 and t_relays the message

failure_route[3]
{
if (t_check_status("503") ||
t_check_status("480") ||
t_check_status("502") ||
t_check_status("504") ||
t_check_status("403") ||
t_check_status("404") ||
t_check_status("484") ||
t_check_status("488") ||
t_check_status("500") ||
t_check_status("400") ||
t_check_status("606") ||
t_check_status("408"))
{
if(!next_branches())
{
if((t_check_status("408")) && (t_local_replied("last")))
{
xlog( "L_ERR", "Failure Route3: Status: Local $T_reply_code Ran out of branches for callid $ci" );

# One last try for this call....
rewritehostport("3.4.5.6:5060");
append_branch();
t_on_failure("0");
t_on_branch("0");
route(2); # simple t_relay
return;

}
else
{
xlog( "L_ERR", "Failure Route3: Status: Network $T_reply_code Ran out of branches for callid $ci" );
}
return;
}
else
{
t_on_branch("1");
t_on_failure("2");
}
route(2); # simple t_relay
return;
}
}

Openser does not receive any provisional responses within 2 seconds and therefore falls into a failure route when the fr_timer expires. This failure route calls next_branches (which should fail). We then check if it was an internally generated 408, we build one last branch to try the call a last time.....However sometimes this comes up with additional branches and parallel forks out the invite to multiple Ips. In this scenario I know there are no additional branches, so not sure how this is happening either.

Most of these problems seem to be related to a 300 with multiple routes in the contact and the fr_timer expiring. I realize that I’ve not really given much to go on here, but I’m willing to help out in any way I can...even adding some additional debug into openser core and the TM module and running for a while with traffic.

Is it possible to describe or point me to documentation on the following:
1. What sort of information is stored in the fr_timer structure? (A pointer referencing ???) What files and functions should I be looking at in the TM module?
2. How are the destination sets stored in the above scenarios? It looks to me like sometimes openser stores the destination set as an array, but some of the documentation mentions that serialize_branches stores the destination set in AVP(s). Can you again point me to files and functions that would help in analyzing this code.

Thanks

Scott

Discussion

  • Daniel-Constantin Mierla

    Logged In: YES
    user_id=1246013
    Originator: NO

    Renamed to express better the problem, as I haven't really seen a relation with memory corruption. Please correct me if I am wrong.

    Your scenario implies the uac_redirect module as well. I am going to ask for your assistance in troubleshooting this issue after the new release. I need to make a replica in my testbed to be able to debug.

     
  • Daniel-Constantin Mierla

    • assigned_to: nobody --> miconda
    • summary: Openser Memory Corruption --> Problem handling 300 reply with multiple choices
     
  • Nobody/Anonymous

    Found that not checking for canceled transaction in failure route will cause a lot of this. The failure route will add the branch but the branch route generates a 500 internal server error. Next thing you know the contacts for the failed call are showing up as parallel branches in the next call to hit the failure route.

     

Log in to post a comment.