#157 signo=Could not find the frame base for "sig_alarm_abort"

1.4.x
closed-out-of-date
core (110)
2
2015-04-06
2009-07-08
Thomas V
No

opensips 1.4.5-notls, FreeBSD 7.0 (i386). opensips core dumps after a few hours with:

#0 0x28268017 in kill () from /lib/libc.so.7
#1 0x28267f76 in raise () from /lib/libc.so.7
#2 0x28266b8a in abort () from /lib/libc.so.7
#3 0x08067ee3 in sig_alarm_abort (signo=Could not find the frame base for "sig_alarm_abort".
) at main.c:423
#4 <signal handler called>
#5 fm_status (qm=0x287fe000) at mem/f_malloc.c:512
#6 0x080679ed in cleanup (show_status=1) at main.c:356
#7 0x08068795 in handle_sigs () at main.c:524
#8 0x0806c969 in main (argc=3, argv=0xbfbfec58) at main.c:866

any idea?

Discussion

    • assigned_to: nobody --> bogdan_iancu
     
  • Hi Thomas,

    the core is actually showing an unsuccessful shutdown (see the cleanup() on frame 6). The final coredump was generated because the shutdown took longer than 20 seconds (and there is an ALARM signal to prevent blocking during the shutdown).
    So the question is: in logs, do you have any reports related to the cause for the initial shutdown ?

    Regards,
    Bogdan

     
  • Thomas V
    Thomas V
    2009-07-10

    Hi Bogdan

    I'm not sure but it looks like a memory leak. I upgraded the system to FreeBSD 7.2 and added an additional 1 GB memory (total 2gb). After 48h opensips reserved at least 1gb (1gb is for postgresql). We do not allow to use swap on the system. If the system has less than 50mb left, opensips crashes

    Log:

    Jul 10 08:58:59 sip02 /usr/local/sbin/opensips[8408]: INFO:core:handle_sigs: child process 8421 exited by a signal 11
    Jul 10 08:58:59 sip02 /usr/local/sbin/opensips[8408]: INFO:core:handle_sigs: core was generated
    Jul 10 08:58:59 sip02 /usr/local/sbin/opensips[8408]: INFO:core:handle_sigs: terminating due to SIGCHLD
    Jul 10 08:58:59 sip02 /usr/local/sbin/opensips[8439]: INFO:core:sig_usr: signal 15 received
    Jul 10 08:58:59 sip02 /usr/local/sbin/opensips[8435]: INFO:core:sig_usr: signal 15 received
    Jul 10 08:58:59 sip02 /usr/local/sbin/opensips[8434]: INFO:core:sig_usr: signal 15 received
    Jul 10 08:58:59 sip02 /usr/local/sbin/opensips[8433]: INFO:core:sig_usr: signal 15 received
    Jul 10 08:58:59 sip02 /usr/local/sbin/opensips[8432]: INFO:core:sig_usr: signal 15 received
    Jul 10 08:58:59 sip02 /usr/local/sbin/opensips[8427]: INFO:core:sig_usr: signal 15 received
    Jul 10 08:58:59 sip02 /usr/local/sbin/opensips[8426]: INFO:core:sig_usr: signal 15 received
    Jul 10 08:58:59 sip02 /usr/local/sbin/opensips[8425]: INFO:core:sig_usr: signal 15 received
    Jul 10 08:58:59 sip02 /usr/local/sbin/opensips[8424]: INFO:core:sig_usr: signal 15 received
    Jul 10 08:58:59 sip02 /usr/local/sbin/opensips[8423]: INFO:core:sig_usr: signal 15 received
    Jul 10 08:58:59 sip02 /usr/local/sbin/opensips[8420]: INFO:core:sig_usr: signal 15 received
    Jul 10 08:58:59 sip02 /usr/local/sbin/opensips[8420]: CRITICAL:core:fm_status: different free frag. count: 15!=37 for hash 5
    Jul 10 08:58:59 sip02 /usr/local/sbin/opensips[8419]: INFO:core:sig_usr: signal 15 received
    Jul 10 08:58:59 sip02 /usr/local/sbin/opensips[8431]: INFO:core:sig_usr: signal 15 received
    Jul 10 08:58:59 sip02 /usr/local/sbin/opensips[8430]: INFO:core:sig_usr: signal 15 received
    Jul 10 08:58:59 sip02 /usr/local/sbin/opensips[8429]: INFO:core:sig_usr: signal 15 received
    Jul 10 08:58:59 sip02 /usr/local/sbin/opensips[8428]: INFO:core:sig_usr: signal 15 received
    Jul 10 08:58:59 sip02 /usr/local/sbin/opensips[8450]: INFO:core:sig_usr: signal 15 received
    Jul 10 08:59:59 sip02 /usr/local/sbin/opensips[8408]: CRITICAL:core:sig_alarm_abort: BUG - shutdown timeout triggered, dying...

    coredump:

    (gdb) where
    #0 0x2826de67 in kill () from /lib/libc.so.7
    #1 0x2826ddc6 in raise () from /lib/libc.so.7
    #2 0x2826c9da in abort () from /lib/libc.so.7
    #3 0x08067ee3 in sig_alarm_abort (signo=14) at main.c:422
    #4 <signal handler called>
    #5 0x080daf8b in fm_status (qm=0x28807000) at mem/f_malloc.c:530
    #6 0x080679ed in cleanup (show_status=1) at fastlock.h:181
    #7 0x08068795 in handle_sigs () at main.c:523
    #8 0x0806c969 in main (argc=3, argv=0xbfbfec7c) at main.c:860
    (gdb) frame 6
    #6 0x080679ed in cleanup (show_status=1) at fastlock.h:181
    181 fastlock.h: No such file or directory.
    in fastlock.h

     
  • Thomas V
    Thomas V
    2009-07-10

    It crashes also with more Ram. Looks like memory is not the case. I do not have more information.

    With an older opensips 1.4.2-notls the system always crashed with:
    SIP/2.0 180 Ringing\r\nVia: SIP/2.0/UDP 212.xxx.xxx.xxx;branch=z9hG4bK0622.99552112.0\r\nVia: SIP/2.0/UDP
    85.xxx.xxx.xxx4:5060;received=85.xxx.xxx.xxx;rport=5060;branch=z9hG4bKc3ba5c8f50bc782082391762e9a2fa9e\r\nF"..., len=503,
    rcv_info=0xbfbfeb14

    A ngrep when 1.4.2 was running showed:
    U 85.xxx.xxx.xxx:5060 -> 212.xxx.xxx.xxx:5060
    INVITE sip:xxxxxxxx@212.xxx.xxx.xxx SIP/2.0.
    Via: SIP/2.0/UDP 85.xxx.xxx.xxx4:5060;rport;branch=z9hG4bK3759d84a0bea3edc47d08b9ec1cd2979.
    To: <sip:xxxxxxx@212.xxx.xxx.xxx>.
    From: <sip:xxxxxx5@212.xxx.xxx.xxx>;tag=AI60E7E8F4CC7D6BD2.
    Call-ID: AI1E214F5EDC57C05C@192.168.1.240.
    CSeq: 1 INVITE.
    Max-Forwards: 70.
    Contact: <sip:xxxxxxx@85.xxx.xxx.xxx;line=AIDF22AB43A35036D9>.
    Accept: application/sdp.
    Allow: ACK,BYE,CANCEL,INVITE,NOTIFY,OPTIONS,REFER.
    P-Preferred-Identity: <sip:xxxxxxxxx@212.xxx.xxx.xxx5>.
    Privacy: none.
    User-Agent: Aastra Intelligate.
    Content-Type: application/sdp.
    Content-Length: 248.
    .

    .

    .

    .

    .

    .

    .

    .

    .

    #
    I 212.xxx.xxx.xxx -> 85.xxx.xxx.xxx 3:3
    ...E(..k...;.
    Uv...e.}~ ....;l
    #

    We installed Revision: 5049. After that we had the sig_alarm_abort issue. So we updated to 1.4.5-notls. Nothing changed, opensips still crashes.

     
  • Hi Thomas,

    Somewhere in the code, a SEG FAULT happens:

    Jul 10 08:58:59 sip02 /usr/local/sbin/opensips[8408]:
    INFO:core:handle_sigs: child process 8421 exited by a signal 11

    This triggers a shutdown, but this fails and a second crash happens - the new core file is overwriting the original one, hiding the real cause of the crash.....

    To avoid this, I attached here a small patch that will skip the shutdown part, so that it will avoid the overwrite of the core. Please patch your code and post here the BT of the real core file

     
  • skip shutdown cleanup patch

     
    Attachments
  • Thomas V
    Thomas V
    2009-07-14

    We are sure, that the Aastra Intelligate of this customer is the cause of the crash.

    coredump with patch:
    #0 free_to (tb=0x81e54b8) at parser/parse_to.c:75
    75 foo = tp->next;
    [New Thread 0x28316040 (LWP 100124)]
    [New LWP 100159]
    (gdb) bt
    #0 free_to (tb=0x81e54b8) at parser/parse_to.c:75
    #1 0x080dcba7 in clean_hdr_field (hf=0x289e5b5c) at parser/hf.c:186
    #2 0x286b6e54 in run_trans_callbacks (type=128, trans=0x28a15718,
    req=0x289e4f98, rpl=0x81e1440, code=183) at sip_msg.h:49
    #3 0x286c8548 in relay_reply (t=0x28a15718, p_msg=0x81e1440, branch=0,
    msg_status=183, cancel_bitmap=0xbfbfea10) at t_reply.c:1158
    #4 0x286c8f78 in reply_received (p_msg=0x81e1440) at t_reply.c:1402
    #5 0x08064eff in forward_reply (msg=0x81e1440) at forward.c:507
    #6 0x08093ee8 in receive_msg (
    buf=0x817eb40 "SIP/2.0 183 Session Progress\r\nVia: SIP/2.0/UDP 212.XXX.XXX.XXX;branch=z9hG4bK53b5.12b29f42.0\r\nVia: SIP/2.0/UDP 85.XXX.XXX.XXX:5060;received=85.XXX.XXX.XXX;rport=5060;branch=z9hG4bK43901f80b58da41b0f9fa749d2"..., len=895,
    rcv_info=0xbfbfeb14) at receive.c:203
    #7 0x080d4bf8 in udp_rcv_loop () at udp_server.c:449
    #8 0x0806c54c in main (argc=3, argv=0xbfbfec50) at main.c:782

    (gdb) frame 7
    #7 0x080d4bf8 in udp_rcv_loop () at udp_server.c:449
    449 receive_msg(buf, len, &ri);
    (gdb) frame 8
    #8 0x0806c54c in main (argc=3, argv=0xbfbfec50) at main.c:782
    782 udp_rcv_loop();

     
  • This looks like something familiar to me - an older bug in TM - are you sure you use the latest 1.4 SVN version which should include the fix?

    Regards,
    Bogdan

     
  • Thomas V
    Thomas V
    2009-07-14

    You mean this? http://opensips.svn.sourceforge.net/viewvc/opensips?view=rev&revision=5050

    We tried revision 5050 patch with 1.4.2 but it didn't fix the issue. Thats why we updated to 1.4.5 but it still has the same issue.

    opensips -V
    version: opensips 1.4.5-notls (i386/freebsd)
    flags: STATS: Off, USE_IPV6, USE_TCP, DISABLE_NAGLE, USE_MCAST, SHM_MEM, SHM_MMAP, PKG_MALLOC, F_MALLOC, FAST_LOCK-ADAPTIVE_WAIT
    ADAPTIVE_WAIT_LOOPS=1024, MAX_RECV_BUFFER_SIZE 262144, MAX_LISTEN 16, MAX_URI_SIZE 1024, BUF_SIZE 65535
    poll method support: poll, select, kqueue.
    svnrevision: unknown
    @(#) $Id: main.c 5456 2009-03-16 18:27:55Z bogdan_iancu $
    main.c compiled on 12:20:06 Jul 13 2009 with gcc 4.2.1

     
  • yes, that one :)

    Maybe you should consider upgrading to 1.5...more or less 1.4 is not maintain and debugged any more...
    Is this an option to you?

     
  • Well is this issue fixed in 1.5? If it is, then it's an option :)

    If not, then it makes no sense. We have a few thousand users and more or less complicated dial plans. i have no clue how backward compatible old 1.4 config files are to 1.5. No changes of fixing this issue in 1.4?

     
  • There were several fixes in that area, but I cannot 100% say that this was fixed in 1.5. The positive fact is that nobody reported such crashes on 1.5.

    To give you an idea about the difference (scripting point of view) from 1.4 to 1.5 (for migration), take a look at http://www.opensips.org/Resources/DocsMigration14to15

     
  • Thomas V
    Thomas V
    2009-07-16

    We give it a try in our labs. Since we use pgsql and not mysql this can be a bit tricky. I will report back with 1.5 if the issue will be still triggerd by Aastra Intelligate with a re-invite, done with hold, described in rfc2543.

     
  • OK, let me know on the outcome.

     
    • priority: 5 --> 2
     
  • reduced the priority until further updates on this

     
    • status: open --> closed-out-of-date