Menu

How to fix a memory leak on Private Memory?

2012-01-06
2013-05-09
  • Fernando Soares

    Fernando Soares - 2012-01-06

    Hello all,

    We're having memory leak issue while doing Load Test rounds on OpenSIPs 1.7.

    For this Load Test our OpenSIPS config runs simple SIP call scenarios, having TCP connections established for leg "A" and UDP connections for leg "B".

    After good amount of reading and testing we did the following adjustments below to investigate if the issue was a matter of  "small pool of memor_y" or "_memory leak": (ref : http://www.opensips.org/Resources/DocsTsMem)

    1 - Increased both pkmem and shmem from 2MB to 4MB and 32MB to 64MB, respectively. From syslog:

    /usr/local/sbin/opensips: INFO:core:main: using 64 Mb shared memory
    /usr/local/sbin/opensips: INFO:core:main: using 4 Mb private memory per process

    2 - Also changed the number of children processes which led the start of different numbers of TCP and UDP processes. Below a default setup for childen=4 displayed:

    ~# opensipsctl fifo ps
    Process::  ID=0 PID=5144 Type=attendant
    Process::  ID=1 PID=5145 Type=MI FIFO
    Process::  ID=2 PID=5146 Type=SIP receiver udp:192.168.62.39:5060
    Process::  ID=3 PID=5147 Type=SIP receiver udp:192.168.62.39:5060
    Process::  ID=4 PID=5148 Type=SIP receiver udp:192.168.62.39:5060
    Process::  ID=5 PID=5149 Type=SIP receiver udp:192.168.62.39:5060

    Process::  ID=6 PID=5150 Type=time_keeper
    Process::  ID=7 PID=5151 Type=timer
    Process::  ID=8 PID=5152 Type=TCP receiver
    Process::  ID=9 PID=5153 Type=TCP receiver
    Process::  ID=10 PID=5154 Type=TCP receiver
    Process::  ID=11 PID=5155 Type=TCP receiver
    Process::  ID=12 PID=5157 Type=TCP main

    After exaclty 1200s running 30cps of load and OpenSIPs throwing lot of "Not enough free memory" errors in Syslog we can see those 4MB totally allocated for the 4 UDP processed and not released:

    ~# opensipsctl fifo get_statistics 'pkmem:' | grep '\-used'
    pkmem:0-used_size = 369752
    pkmem:1-used_size = 393672
    pkmem:2-used_size = 4188704 
    pkmem:3-used_size = 4188264 
    pkmem:4-used_size = 4188448 
    pkmem:5-used_size = 4188464 
    pkmem:6-used_size = 367984
    pkmem:7-used_size = 378992
    pkmem:8-used_size = 444264
    pkmem:9-used_size = 444296
    pkmem:10-used_size = 444296
    pkmem:11-used_size = 444296
    pkmem:12-used_size = 427960

    ~# opensipsctl fifo get_statistics 'shmem:'
    shmem:total_size = 67108864
    shmem:used_size = 3186608 
    shmem:real_used_size = 3441776
    shmem:max_used_size = 11358984
    shmem:free_size = 63667088
    shmem:fragments = 8923

    All the setup test changes and results took us to the conclusion that the root cause is the memory leak on the Private Memory because after a certain time, exaclty at 1200s and 30cps of traffic the amound the pkmem was totally consumed BY THE UDP PROCESSES and not released back.

    In another hand the Shared Memory (shmem) was properbly allocated and released.
    This is a very accurated and constant beahvior.

    Well… After all this information, we would like some guidance or tips to undertand why the UDP processes are causing this memory leak on private memory and possibly how to fix that issue.

    We appreciate any help. Thanks in advance.

    Fernando

     
  • Bogdan-Andrei Iancu

    Hi Fernando,

    Could you actually post here the "not enough mem" error you get ? just to see what kind of mem is about.

    Regards,
    Bogdan

     
  • Fernando Soares

    Fernando Soares - 2012-01-09

    Hi Bogdan, thanks for your contact.

    Below the logs on the memory issue.

    Jan  5 12:50:10 tor-sc-be-1 /usr/local/sbin/opensips: WARNING:core:fm_malloc: Not enough free memory, will atempt defragmenation
    Jan  5 12:50:10 tor-sc-be-1 /usr/local/sbin/opensips: ERROR:core:fixup_pvar: no pkg memory left
    Jan  5 12:50:10 tor-sc-be-1 /usr/local/sbin/opensips: ERROR:rtpproxy:rtpproxy_answer3_f: Error in fixup_pvar
    Jan  5 12:50:10 tor-sc-be-1 /usr/local/sbin/opensips: ERROR:core:eval_elem: no more pkg memory
    Jan  5 12:50:10 tor-sc-be-1 /usr/local/sbin/opensips: ERROR:core:do_assign: no value in right expression
    Jan  5 12:50:10 tor-sc-be-1 /usr/local/sbin/opensips: ERROR:core:do_assign: error at line: 811

    In addition, decided to attach for you a graph we did showing the comsumption of shared VS private memory during the Load Test I was talking about in the last thread. You may clearly see the periods we started and stopped the Load Testing due the private memory leak that never returns to its original state.

    http://tinypic.com/r/2agpp3/5

    Rgds,

    Fernando

     
  • Bogdan-Andrei Iancu

    Fernando,

    Could you follow the indication from http://www.opensips.org/Resources/DocsTsMem and get a memory dump ? wait like 5 minutes between the end of the load test and triggering the dump - this will show us what is sticking into mem.

    Regards,
    Bogdan

     
  • Fernando Soares

    Fernando Soares - 2012-01-09

    Bogdan,

    Below the dump output taken after running the Load Test having 4 children processes, 4MB for pkg memory and 64MB for shared memory over a traffic of 30cps. In other words, the same Load Testing we are doing.

    One note is that after I enabled the dump sounds the SIPServer processing got worse i.e the pkg memory was still getting consumed but slower than not having the dump enabled.

    Anyways, here are the logs so let us know if this is enough or if you need any additional data.

    Tks again,

    Fernando

    Jan  9 18:55:08 tor-sc-be-1 /usr/local/sbin/opensips: INFO:core:sig_usr: signal 15 received
    Jan  9 18:55:08 tor-sc-be-1 /usr/local/sbin/opensips: Memory status (pkg):
    Jan  9 18:55:08 tor-sc-be-1 /usr/local/sbin/opensips: qm_status (0x7d8d40):
    Jan  9 18:55:08 tor-sc-be-1 /usr/local/sbin/opensips: INFO:core:sig_usr: signal 15 received
    Jan  9 18:55:08 tor-sc-be-1 /usr/local/sbin/opensips: INFO:core:sig_usr: signal 15 received
    Jan  9 18:55:08 tor-sc-be-1 /usr/local/sbin/opensips:  heap size= 4194304
    Jan  9 18:55:08 tor-sc-be-1 /usr/local/sbin/opensips: Memory status (pkg):
    Jan  9 18:55:08 tor-sc-be-1 /usr/local/sbin/opensips:  used= 357392, used+overhead=980000, free=3214304
    Jan  9 18:55:08 tor-sc-be-1 /usr/local/sbin/opensips: qm_status (0x7d8d40):
    Jan  9 18:55:08 tor-sc-be-1 /usr/local/sbin/opensips: Memory status (pkg):
    Jan  9 18:55:08 tor-sc-be-1 /usr/local/sbin/opensips:  heap size= 4194304
    Jan  9 18:55:08 tor-sc-be-1 /usr/local/sbin/opensips: INFO:core:sig_usr: signal 15 received
    Jan  9 18:55:08 tor-sc-be-1 /usr/local/sbin/opensips:  used= 2378320, used+overhead=3018688, free=1175616
    Jan  9 18:55:08 tor-sc-be-1 /usr/local/sbin/opensips:  max used (+overhead)= 3024944
    Jan  9 18:55:08 tor-sc-be-1 /usr/local/sbin/opensips: dumping all alloc'ed. fragments:
    Jan  9 18:55:08 tor-sc-be-1 /usr/local/sbin/opensips:       0. N  address=0x7e3580 frag=0x7e3570 size=128 used=1
    Jan  9 18:55:08 tor-sc-be-1 /usr/local/sbin/opensips: INFO:core:sig_usr: signal 15 received
    Jan  9 18:55:08 tor-sc-be-1 /usr/local/sbin/opensips:       1. N  address=0x7e3620 frag=0x7e3610 size=128 used=1
    Jan  9 18:55:08 tor-sc-be-1 /usr/local/sbin/opensips: Memory status (pkg):
    Jan  9 18:55:08 tor-sc-be-1 /usr/local/sbin/opensips: qm_status (0x7d8d40):
    Jan  9 18:55:08 tor-sc-be-1 /usr/local/sbin/opensips: INFO:core:sig_usr: signal 15 received
    Jan  9 18:55:08 tor-sc-be-1 /usr/local/sbin/opensips: INFO:core:sig_usr: signal 15 received
    Jan  9 18:55:08 tor-sc-be-1 /usr/local/sbin/opensips: Memory status (pkg):
    Jan  9 18:55:08 tor-sc-be-1 /usr/local/sbin/opensips: qm_status (0x7d8d40):
    Jan  9 18:55:08 tor-sc-be-1 /usr/local/sbin/opensips: INFO:core:sig_usr: signal 15 received
    Jan  9 18:55:08 tor-sc-be-1 /usr/local/sbin/opensips: Memory status (pkg):
    Jan  9 18:55:08 tor-sc-be-1 /usr/local/sbin/opensips: INFO:core:sig_usr: signal 15 received
    Jan  9 18:55:08 tor-sc-be-1 /usr/local/sbin/opensips: Memory status (pkg):
    Jan  9 18:55:08 tor-sc-be-1 /usr/local/sbin/opensips: qm_status (0x7d8d40):
    Jan  9 18:55:08 tor-sc-be-1 /usr/local/sbin/opensips: INFO:core:sig_usr: signal 15 received
    Jan  9 18:55:08 tor-sc-be-1 /usr/local/sbin/opensips: INFO:core:sig_usr: signal 15 received
    Jan  9 18:55:08 tor-sc-be-1 /usr/local/sbin/opensips: Memory status (pkg):
    Jan  9 18:55:08 tor-sc-be-1 /usr/local/sbin/opensips: qm_status (0x7d8d40):
    Jan  9 18:55:08 tor-sc-be-1 /usr/local/sbin/opensips:       2. N  address=0x7e36c0 frag=0x7e36b0 size=48 used=1
    Jan  9 18:55:08 tor-sc-be-1 /usr/local/sbin/opensips: Memory status (pkg):
    Jan  9 18:55:08 tor-sc-be-1 /usr/local/sbin/opensips:       3. N  address=0x7e3710 frag=0x7e3700 size=32 used=1
    Jan  9 18:55:08 tor-sc-be-1 /usr/local/sbin/opensips:  heap size= 4194304
    Jan  9 18:55:08 tor-sc-be-1 /usr/local/sbin/opensips: INFO:core:sig_usr: signal 15 received
    Jan  9 18:55:08 tor-sc-be-1 /usr/local/sbin/opensips: Memory status (pkg):
    Jan  9 18:55:08 tor-sc-be-1 /usr/local/sbin/opensips: qm_status (0x7d8d40):
    Jan  9 18:55:08 tor-sc-be-1 /usr/local/sbin/opensips: Memory status (pkg):
    Jan  9 18:55:08 tor-sc-be-1 /usr/local/sbin/opensips: qm_status (0x7d8d40):
    Jan  9 18:55:08 tor-sc-be-1 /usr/local/sbin/opensips:  heap size= 4194304
    Jan  9 18:55:08 tor-sc-be-1 /usr/local/sbin/opensips:  heap size= 4194304
    Jan  9 18:55:08 tor-sc-be-1 /usr/local/sbin/opensips:  used= 306096, used+overhead=428672, free=3765632
    Jan  9 18:55:08 tor-sc-be-1 /usr/local/sbin/opensips:  heap size= 4194304
    Jan  9 18:55:08 tor-sc-be-1 /usr/local/sbin/opensips:  used= 2324832, used+overhead=2949296, free=1245008
    Jan  9 18:55:08 tor-sc-be-1 /usr/local/sbin/opensips:  max used (+overhead)= 2956048
    Jan  9 18:55:08 tor-sc-be-1 /usr/local/sbin/opensips:  used= 357392, used+overhead=979936, free=3214368
    Jan  9 18:55:08 tor-sc-be-1 /usr/local/sbin/opensips:  max used (+overhead)= 985760
    Jan  9 18:55:08 tor-sc-be-1 /usr/local/sbin/opensips: dumping all alloc'ed. fragments:
    Jan  9 18:55:08 tor-sc-be-1 /usr/local/sbin/opensips:  heap size= 4194304
    Jan  9 18:55:08 tor-sc-be-1 /usr/local/sbin/opensips:  used= 284208, used+overhead=402848, free=3791456
    Jan  9 18:55:08 tor-sc-be-1 /usr/local/sbin/opensips: qm_status (0x7d8d40):
    Jan  9 18:55:08 tor-sc-be-1 /usr/local/sbin/opensips:  used= 357392, used+overhead=979424, free=3214880
    Jan  9 18:55:08 tor-sc-be-1 /usr/local/sbin/opensips:  max used (+overhead)= 437504
    Jan  9 18:55:08 tor-sc-be-1 /usr/local/sbin/opensips: dumping all alloc'ed. fragments:
    Jan  9 18:55:08 tor-sc-be-1 /usr/local/sbin/opensips: Memory status (pkg):
    Jan  9 18:55:08 tor-sc-be-1 /usr/local/sbin/opensips:       4. N  address=0x7e3750 frag=0x7e3740 size=48 used=1
    Jan  9 18:55:08 tor-sc-be-1 /usr/local/sbin/opensips: qm_status (0x7d8d40):
    Jan  9 18:55:08 tor-sc-be-1 /usr/local/sbin/opensips: qm_status (0x7d8d40):
    Jan  9 18:55:08 tor-sc-be-1 /usr/local/sbin/opensips:  heap size= 4194304
    Jan  9 18:55:08 tor-sc-be-1 /usr/local/sbin/opensips: qm_status (0x7d8d40):
    Jan  9 18:55:08 tor-sc-be-1 /usr/local/sbin/opensips:  heap size= 4194304
    Jan  9 18:55:08 tor-sc-be-1 /usr/local/sbin/opensips:  used= 2357024, used+overhead=2990640, free=1203664
    Jan  9 18:55:08 tor-sc-be-1 /usr/local/sbin/opensips:  heap size= 4194304
    Jan  9 18:55:08 tor-sc-be-1 /usr/local/sbin/opensips:  used= 294688, used+overhead=414160, free=3780144
    ….
    …….
    ………..

    Jan  9 18:55:09 tor-sc-be-1 /usr/local/sbin/opensips: -------------------
    Jan  9 18:55:09 tor-sc-be-1 /usr/local/sbin/opensips: Memory status (pkg):
    Jan  9 18:55:09 tor-sc-be-1 /usr/local/sbin/opensips: qm_status (0x7d8d40):
    Jan  9 18:55:09 tor-sc-be-1 /usr/local/sbin/opensips:  heap size= 4194304
    Jan  9 18:55:09 tor-sc-be-1 /usr/local/sbin/opensips:  used= 282288, used+overhead=401216, free=3793088
    Jan  9 18:55:09 tor-sc-be-1 /usr/local/sbin/opensips:  max used (+overhead)= 404736

    ….
    …….
    ………..
    Jan  9 18:55:09 tor-sc-be-1 /usr/local/sbin/opensips: dumping free list stats :
    Jan  9 18:55:09 tor-sc-be-1 /usr/local/sbin/opensips: hash=   1. fragments no.:     6, unused:     0#012#011#011 bucket size:        16 -        16 (first        16)
    Jan  9 18:55:09 tor-sc-be-1 /usr/local/sbin/opensips: hash=   2. fragments no.:    24, unused:     0#012#011#011 bucket size:        32 -        32 (first        32)
    Jan  9 18:55:09 tor-sc-be-1 /usr/local/sbin/opensips: hash=   3. fragments no.:     1, unused:     0#012#011#011 bucket size:        48 -        48 (first        48)
    Jan  9 18:55:09 tor-sc-be-1 /usr/local/sbin/opensips: hash=   5. fragments no.:    12, unused:     0#012#011#011 bucket size:        80 -        80 (first        80)
    Jan  9 18:55:09 tor-sc-be-1 /usr/local/sbin/opensips: hash=   7. fragments no.:     1, unused:     0#012#011#011 bucket size:       112 -       112 (first       112)
    Jan  9 18:55:09 tor-sc-be-1 /usr/local/sbin/opensips: hash=   8. fragments no.:     2, unused:     0#012#011#011 bucket size:       128 -       128 (first       128)
    Jan  9 18:55:09 tor-sc-be-1 /usr/local/sbin/opensips: hash=  80. fragments no.:     1, unused:     0#012#011#011 bucket size:      1280 -      1280 (first      1280)
    Jan  9 18:55:09 tor-sc-be-1 /usr/local/sbin/opensips: hash= 1032. fragments no.:     1, unused:     0#012#011#011 bucket size:   2097152 -   4194304 (first   3789568)
    Jan  9 18:55:09 tor-sc-be-1 /usr/local/sbin/opensips: -------------------
    Jan  9 18:55:09 tor-sc-be-1 /usr/local/sbin/opensips: Memory status (shm):
    Jan  9 18:55:09 tor-sc-be-1 /usr/local/sbin/opensips: qm_status (0x7f2586b01000):
    Jan  9 18:55:09 tor-sc-be-1 /usr/local/sbin/opensips:  heap size= 67108864
    Jan  9 18:55:09 tor-sc-be-1 /usr/local/sbin/opensips:  used= 1488, used+overhead=524864, free=66584000
    Jan  9 18:55:09 tor-sc-be-1 /usr/local/sbin/opensips:  max used (+overhead)= 15334304
    Jan  9 18:55:09 tor-sc-be-1 /usr/local/sbin/opensips: dumping all alloc'ed. fragments:
    Jan  9 18:55:09 tor-sc-be-1 /usr/local/sbin/opensips:       0. N  address=0x7f2586b0b840 frag=0x7f2586b0b830 size=16 used=1
    Jan  9 18:55:09 tor-sc-be-1 /usr/local/sbin/opensips:       7. N  address=0x7f2586b0f970 frag=0x7f2586b0f960 size=16 used=1
    Jan  9 18:55:09 tor-sc-be-1 /usr/local/sbin/opensips:       9. N  address=0x7f2586b0f9d0 frag=0x7f2586b0f9c0 size=16 used=1
    Jan  9 18:55:09 tor-sc-be-1 /usr/local/sbin/opensips:      10. N  address=0x7f2586b0fa00 frag=0x7f2586b0f9f0 size=16 used=1
    Jan  9 18:55:09 tor-sc-be-1 /usr/local/sbin/opensips:      14. N  address=0x7f2586b102f0 frag=0x7f2586b102e0 size=48 used=1
    Jan  9 18:55:09 tor-sc-be-1 /usr/local/sbin/opensips:      46. N  address=0x7f2586b10d50 frag=0x7f2586b10d40 size=400 used=1
    Jan  9 18:55:09 tor-sc-be-1 /usr/local/sbin/opensips:      77. N  address=0x7f2586d92100 frag=0x7f2586d920f0 size=16 used=1
    Jan  9 18:55:09 tor-sc-be-1 /usr/local/sbin/opensips:      78. N  address=0x7f2586d92130 frag=0x7f2586d92120 size=16 used=1
    Jan  9 18:55:09 tor-sc-be-1 /usr/local/sbin/opensips:     107. N  address=0x7f2586d92970 frag=0x7f2586d92960 size=16 used=1
    Jan  9 18:55:09 tor-sc-be-1 /usr/local/sbin/opensips:     127. N  address=0x7f2586db5640 frag=0x7f2586db5630 size=624 used=1
    Jan  9 18:55:09 tor-sc-be-1 /usr/local/sbin/opensips:     128. N  address=0x7f2586db58d0 frag=0x7f2586db58c0 size=112 used=1
    Jan  9 18:55:09 tor-sc-be-1 /usr/local/sbin/opensips:     131. N  address=0x7f2586db59f0 frag=0x7f2586db59e0 size=64 used=1
    Jan  9 18:55:09 tor-sc-be-1 /usr/local/sbin/opensips:     132. N  address=0x7f2586db5a50 frag=0x7f2586db5a40 size=16 used=1
    Jan  9 18:55:09 tor-sc-be-1 /usr/local/sbin/opensips:     285. N  address=0x7f2586db8670 frag=0x7f2586db8660 size=64 used=1
    Jan  9 18:55:09 tor-sc-be-1 /usr/local/sbin/opensips:     286. N  address=0x7f2586db86d0 frag=0x7f2586db86c0 size=16 used=1
    Jan  9 18:55:09 tor-sc-be-1 /usr/local/sbin/opensips:     287. N  address=0x7f2586db8700 frag=0x7f2586db86f0 size=32 used=1
    Jan  9 18:55:09 tor-sc-be-1 /usr/local/sbin/opensips: -------------------

     
  • Bogdan-Andrei Iancu

    Fernando,

    Looking at the dump, I do not thing you properly enabled the memory debugging - see the example from http://www.opensips.org/Resources/DocsTsMem; when changing the Makdefile.def for enabling the DBG_QM_MALLOC flag, be sure you do not have a mixture of commented and active lines (this is not allowed) - all commented lines must be at the end.

    Also, to minimize the logs at runtime, set in your config dile:
              memlog=6   # we do not want runtime logs from mem debugger
              memdump=1  # we want to get only mem dumps

    Regards,
    Bogdan

     
  • Fernando Soares

    Fernando Soares - 2012-01-10

    Hello Bodgan,

    Thanks for your tips. Just noticed the "memdump=1" was not inside the example from http://www.opensips.org/Resources/DocsTsMem. Maybe it is a good idea to include that.

    Well… I run the tests again and here are the full dumps. Let's see if you can find out any memory leak. Hopefully!

    I uploaded the dump in the following URL. Please let me know any problems while downloading it.

    http://www.4shared.com/zip/HVtj210n/syslogdump1001.html

    Rgds,

    Fernando

     
  • Bogdan-Andrei Iancu

    Thanks Fernando,

    The guilty function seams to be "fixup_pvar" - there are more than 15K of instances per process. This can happen only if you either user perl module, either fixup_pvar is badly used in some module.

    As the function is used in tens of modules, in order to locate the guilty call, could you privately send me your script ?

    Thanks,
    Bogdan

     
  • Fernando Soares

    Fernando Soares - 2012-01-10

    Hey Bogdan,

    After taking a look at the code after your thoughts on the "fixup_pvar" sounds that I have made good progress in terms of Load Testing performance.

    In the module dedicated to my RPTProxy, I was not correctly implementing the function to free the "fixed" parameters.  I detached below what I was not doing/implementing fine in my function based on the interface below:

    struct cmd_export_ {
    char* name;             /* null terminated command name */
    cmd_function function;  /* pointer to the corresponding function */
    int param_no;           /* number of parameters used by the function */
    fixup_function fixup;   /* pointer to the function called to "fix" the    parameters */
    free_fixup_function free_fixup; /* pointer to the function called to free the  "fixed" parameters */
    int flags;              /* Function flags */
    };

    So, after applying the properly "free" I could realize that the huge amount of  "fixup_pvar" we had before in the dump (15K per process) consuming all private memory  was noticed in a very minor rate. Sounds that the fix made the private memory be allocated/deallocated correctly this time.

    I will continue my testing here to complete evaluating if the fix is valid. Let's see…

    I appreciated all your help.

    Rgds,

    Fernando

     
  • Bogdan-Andrei Iancu

    Fernando,

    do you have a module of yours (I mean a private module) ? or this change is in a public module ?

    Nevertheless, the fixup functions are called only once, when opensips starts, so normally they cannot generate leaks. Only if you have a strange function call that triggers the fixup function each time the function is executed in script.

    Regards,
    Bogdan

     
  • Fernando Soares

    Fernando Soares - 2012-01-17

    Hey Bogdan,

    Sorry for the delay.  Yes, it was a private module in a prototype function we created and this was the root cause for the memory leak.

    Rgds,

    Fernando