#646 Offiline RTP Proxy causes system freeze

1.10.x
open
modules (454)
7
2013-08-28
2013-04-23
Digipigeon
No

I have a setup running multiple RTP Proxy instances, when multiple rtp proxies fail, CPU utilisation approaches 100%, this does not usually happen with 1 or 2 failures, but when 4+ fail (this can even be 4 instances on an 8 core/instance machine), the issues arrise.

The severity of this event happening has caused the system to stop responding for short periods, observations notice intermittent periods of no logging information and substantial packet drops (due to high CPU usage).

Once RTP Proxy instances that have failed come back online, CPU utilisation drops right down and everything functions as expected.

Discussion

<< < 1 2 3 (Page 3 of 3)
  • Digipigeon

    Digipigeon - 2013-06-03

    Hi Razvan,

    Has this been able to capture the bit that you require? or is there any other commands that I can run for you?

    Kind Regards

    Jonathan

     
  • Razvan Crainea

    Razvan Crainea - 2013-06-05

    Hi, Jonathan!

    Unfortunately the problem you have here is not simple at all. Basically the idea is that most of your SIP workers are stuck waiting for responses from the rtpproxy servers that are down. And they also acquire the transaction lock. Therefore any reply within that transaction can't be processed (because the previous process was not finished), and the process busy-waits (that's why you see the spikes).

    I saw that you are using 'rtpproxy_autobridge' feature of the RTPProxy module. Is this mandatory to your platform? If not, can you try to remove it for a while?

    Also, you said that when you experience this, the proxy does not recover at all, right? Here's a script that also adds the PID of the process to the output:

    for PID in $(pidof opensips); do
      echo $PID
      gdb $OPENSIPS $PID -batch --eval-command="bt full"
    done
    

    I'd like you to run it two times, with a few seconds between them (around 10 seconds). What I want to find out is if OpenSIPS is really stuck (and where) or it is still able to process traffic.

    Also, run the output of the following command would be useful:

    opensipsctl ps
    

    Finally, can you tell me how many rtpproxy servers your using and how many of them are down?

    Best regards,
    Răzvan

     
    Last edit: Razvan Crainea 2013-06-05
  • Digipigeon

    Digipigeon - 2013-06-27

    Hi Razvan,

    The problem has just recurred for me, however there is sensitive information in these dumps, can you please provide me a direct address that I can send this to you on?

    Kind Regards Jonathan

     
  • Igor Potjevlesch

    Hi all,

    I think that I experience the same issue.
    Have you identified the problem during your privates discussions?

    Kind Regards,

    Igor

     
  • Digipigeon

    Digipigeon - 2013-08-28

    Hello Igor,

    Unfortunately this problem still persists, it is due to some very outdated code inside the RTP module. I have gone through it and lots of gdb traces. It seems to be a collection of alot of things inside not working correctly.

    • Locking waiting for replies.
    • Multiple retries of previously disabled RTP servers.

    I have pretty much given up on this been fixed, so I have made some script modifications (which basically remove some features) from the RTP Proxy module, it dosnt do all the fancy retries now. I have also deployed 7 RTP servers (more than we really need), but its increased my stability now. However I can still cause this crash by increasing the load on single servers.

    http://www.kamailio.org/wiki/devel/rtpproxy-ng - I am hoping that this will get ported over to opensips, then we will have something ALOT better to use.

    Best of luck.

    Kind Regards Jonathan

     
  • Igor Potjevlesch

    Hello Jonathan,

    Thank you for your reply. So, as I understand, you can reproduce the problem by increasing the number of concurrent calls on a single RTP Proxy server?
    Just to know, how much calls cause the crash?

    Kinds Regards, Igor

     
  • Digipigeon

    Digipigeon - 2013-08-28

    I can reproduce it if I set our dispatcher to go to a single server. The problem is worsened by using a single RTP server as well.

    I can reproduce this with as little as 300 calls on a 2X Quad Core 2Ghz box.

    Kind Regards Jonathan

     
<< < 1 2 3 (Page 3 of 3)

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:





No, thanks