#646 Offiline RTP Proxy causes system freeze

1.10.x
open
modules (454)
7
2013-08-28
2013-04-23
Digipigeon
No

I have a setup running multiple RTP Proxy instances, when multiple rtp proxies fail, CPU utilisation approaches 100%, this does not usually happen with 1 or 2 failures, but when 4+ fail (this can even be 4 instances on an 8 core/instance machine), the issues arrise.

The severity of this event happening has caused the system to stop responding for short periods, observations notice intermittent periods of no logging information and substantial packet drops (due to high CPU usage).

Once RTP Proxy instances that have failed come back online, CPU utilisation drops right down and everything functions as expected.

Discussion

<< < 1 2 3 > >> (Page 2 of 3)
  • Digipigeon

    Digipigeon - 2013-05-31

    When locked up, if I run an strace against any of the main processes I get

    write(6, "\10\232\4\33\276\177\0\0", 8) = -1 EAGAIN (Resource temporarily unavailable)

    constantly been repeated.

    I have been told that this may be fd #6, if I run lsof on this process I get

    opensips 26254 opensips 6w FIFO 0,8 0t0 8497654 pipe

    Since first observing FIFO, I removed mi_fifo completely just in case it was related, however the problem still persists.

    Not sure if this information will help any.

    Kind Regards Regards Jonathan

     
  • Razvan Crainea

    Razvan Crainea - 2013-05-31

    Hi, Jonathan!

    So basically this happens only when multiple rtpproxy servers are down, right? It's more like a spike in the CPU, it is not a permanent deadblock?
    I think the problem is that having that many proxies disabled, OpenSIPS tries to re-establish the connection with all of them, blocking the signalling.
    Have you tried setting the "rtpproxy_disable_tout" to a higher value (like 3600 seconds)? If my assumptions are correct, the spikes will appear more rare (every hour for a 3600 seconds disable timeout). Also do you have the "rtpproxy_timeout" parameter set? What value? Have you tried setting it to a lower value, let's say "0.3"?

    Best regards,
    Răzvan

     
  • Digipigeon

    Digipigeon - 2013-05-31

    Hi Răzvan,

    Yes that's correct, all cores sit at 100% indefinitely unless I stop it.

    Unfortunately I have not been able to create any strong correlation with RTP Proxy parameters. Any changes to rtpproxy_disable_tout and rtpproxy_timeout don't seem to produce any distinctive results. (I question myself RTP Proxy Module is the sole cause of the issue).

    With rtpproxy_disable_tout=10 and rtpproxy_timeout=0.05 I have experienced almost no lockups for 14 days, then yesterday and today without any config change the problem is happening every 30 minutes. I have reset the RTP Proxy but the problem persists, I am wondering if there is some external trigger, possibly induced by incoming packets.

    I have tried on some of my traffic to bypass the RTP Proxy, but I am not able to do this with all traffic for privacy reasons. When traffic without engaging the RTP proxy, it appears to trigger less often.

    Kind Regards Jonathan

     
  • Digipigeon

    Digipigeon - 2013-05-31

    I have setup the servers pointing to a single <1ms latency RTP Proxy, I have experienced a lockup of all 32 threads.

    each thread is returning

    write(6, "\20Xp\236=\177\0\0", 8) = -1 EAGAIN (Resource temporarily unavailable)
    or
    write(6, "\270\340\332\236=\177\0\0", 8) = -1 EAGAIN (Resource temporarily unavailable)
    or different data inside.
    with strace.

    However RTP Ticks is 0.

    I this issue might not be related to RTP Proxy after all.

    Kind Regards Jonathan

     
  • Razvan Crainea

    Razvan Crainea - 2013-05-31

    Can you detail who's file descriptor is 6?

     
  • Digipigeon

    Digipigeon - 2013-05-31

    I did

    lsof -p [opensips_id]

    and I got

    opensips 26254 opensips 6w FIFO 0,8 0t0 8497654 pipe

    Please let me know if there is any other commands that you would like me to run.

     
  • Razvan Crainea

    Razvan Crainea - 2013-05-31

    When blocked, please try to fetch the backtrace of all processes. You should use a bash script similar to this one:

    for PID in $(pidof opensips); do gdb $OPENSIPS $PID -batch --eval-command="bt full"; done

     
  • Digipigeon

    Digipigeon - 2013-05-31

    Hi Razvan,

    After looking at this gdb with my limited knowledge I have identified that rabbitmq was the culprit this time around.

    I still believe that there is an issue with RTP Proxy as this issue started happening before I implemented RabbitMQ. I will wait until I can see a lockup which I believe is caused by RTP Proxy, then I will run the same script again and update.

    Kind Regards Jonathan

     
  • Digipigeon

    Digipigeon - 2013-05-31

    Hi Razvan,

    Thanks to that little bit of bash script, I was able to capture the following

    http://www.digipigeon.com/debug_gdb2.gz

    I have had a quick look at it and it appears that this is catching the original issue that I was getting and it triggered in relation to the RTP Proxy Engagement.

    Similar issues happened, 100% utilisation, etc.

    Please take a look.

    Kind Regards Jonathan

     
<< < 1 2 3 > >> (Page 2 of 3)

Log in to post a comment.