#44 core dump parsing messages

trunk
closed-fixed
modules (454)
9
2009-01-05
2008-11-05
Richard Revels
No

opensips segfault occuring multiple times daily. Core files available on request.

Core was generated by `/usr/local/opensips/sbin/opensips -f /usr/local/opensips/etc/opensips/opensips.'.
Program terminated with signal 11, Segmentation fault.
#0 free_to (tb=0x820a9d8) at parser/parse_to.c:75
75 foo = tp->next;
(gdb) bt
#0 free_to (tb=0x820a9d8) at parser/parse_to.c:75
#1 0x080de8e0 in clean_hdr_field (hf=0x9864de0c) at parser/hf.c:186
#2 0x002e91b7 in run_trans_callbacks (type=2, trans=0x986bd438, req=0x9864d3d0, rpl=0x820b6f0, code=300) at sip_msg.h:49
#3 0x002f353c in t_reply_matching (p_msg=0x820b6f0, p_branch=0xbfd11764) at t_lookup.c:840
#4 0x002f39dc in t_check (p_msg=0x820b6f0, param_branch=0xbfd11764) at t_lookup.c:911
#5 0x00304136 in reply_received (p_msg=0x820b6f0) at t_reply.c:1288
#6 0x08064e4a in forward_reply (msg=0x820b6f0) at forward.c:507
#7 0x080951b6 in receive_msg (
buf=0x817a120 "SIP/2.0 300 Multiple choices\r\nVia: SIP/2.0/UDP XXX.XX.XXX.XXX;branch=z9hG4bKe2a7.40d37b23.0,SIP/2.0/UDP XX.XXX.XX.XX:5060;received=XX.XXX.XX.XX;branch=z9hG4bK4abb0821;rport=5060\r\nFrom: \"FOO\" <sip:+1"..., len=893, rcv_info=0xbfd11894) at receive.c:203
#8 0x080d7b37 in udp_rcv_loop () at udp_server.c:449
#9 0x0806d5ce in main (argc=9, argv=0xbfd11a84) at main.c:780

Discussion

  • Turned debug up to 6 for a while. Here is a partial log output showing the process id startup and first few lines from the last time opensips core dumped.

    [root@sips-proxy-01 opensips]# grep 14: /var/log/opensips.log | grep 17235
    Nov 6 14:45:33 sips-proxy-01 /usr/local/opensips/sbin/opensips[17235]: DBG:core:init_mod_child: type=PROC_TCP_MAIN, rank=-4, module=usrloc
    Nov 6 14:45:33 sips-proxy-01 /usr/local/opensips/sbin/opensips[17235]: DBG:core:init_mod_child: type=PROC_TCP_MAIN, rank=-4, module=tm
    Nov 6 14:45:33 sips-proxy-01 /usr/local/opensips/sbin/opensips[17235]: DBG:tm:child_init_callid: callid: '7bf8b0b7-17235@216.82.225.235'
    Nov 6 14:45:33 sips-proxy-01 /usr/local/opensips/sbin/opensips[17235]: DBG:core:init_mod_child: type=PROC_TCP_MAIN, rank=-4, module=registrar
    Nov 6 14:45:33 sips-proxy-01 /usr/local/opensips/sbin/opensips[17235]: DBG:core:init_mod_child: type=PROC_TCP_MAIN, rank=-4, module=xlog
    Nov 6 14:45:33 sips-proxy-01 /usr/local/opensips/sbin/opensips[17235]: DBG:xlog:child_init: init_child [-4] pid [17235]
    Nov 6 14:45:33 sips-proxy-01 /usr/local/opensips/sbin/opensips[17235]: DBG:core:init_mod_child: type=PROC_TCP_MAIN, rank=-4, module=acc
    Nov 6 14:45:33 sips-proxy-01 /usr/local/opensips/sbin/opensips[17235]: DBG:core:init_mod_child: type=PROC_TCP_MAIN, rank=-4, module=avpops
    Nov 6 14:45:33 sips-proxy-01 /usr/local/opensips/sbin/opensips[17235]: DBG:core:init_mod_child: type=PROC_TCP_MAIN, rank=-4, module=permissions
    Nov 6 14:45:33 sips-proxy-01 /usr/local/opensips/sbin/opensips[17235]: DBG:core:init_mod_child: type=PROC_TCP_MAIN, rank=-4, module=dispatcher
    Nov 6 14:45:33 sips-proxy-01 /usr/local/opensips/sbin/opensips[17235]: DBG:dispatcher:child_init: #-4 / pid <17235>
    Nov 6 14:45:33 sips-proxy-01 /usr/local/opensips/sbin/opensips[17235]: DBG:core:init_mod_child: type=PROC_TCP_MAIN, rank=-4, module=mi_fifo
    Nov 6 14:45:33 sips-proxy-01 /usr/local/opensips/sbin/opensips[17235]: DBG:core:init_mod_child: type=PROC_TCP_MAIN, rank=-4, module=db_flatstore
    Nov 6 14:45:33 sips-proxy-01 /usr/local/opensips/sbin/opensips[17235]: DBG:core:init_mod_child: type=PROC_TCP_MAIN, rank=-4, module=dialog
    Nov 6 14:45:33 sips-proxy-01 /usr/local/opensips/sbin/opensips[17235]: DBG:core:init_mod_child: type=PROC_TCP_MAIN, rank=-4, module=auth_db
    Nov 6 14:45:33 sips-proxy-01 /usr/local/opensips/sbin/opensips[17235]: DBG:core:db_do_init: connection 0x83def70 not found in pool
    Nov 6 14:45:33 sips-proxy-01 /usr/local/opensips/sbin/opensips[17235]: DBG:db_postgres:db_postgres_new_connection: db_id = 0x83def70
    Nov 6 14:45:33 sips-proxy-01 /usr/local/opensips/sbin/opensips[17235]: DBG:db_postgres:db_postgres_new_connection: 0x83def90=pkg_malloc(36)
    Nov 6 14:45:33 sips-proxy-01 /usr/local/opensips/sbin/opensips[17235]: DBG:db_postgres:db_postgres_new_connection: opening connection: postgres://xxxx:xxxx@localhost:5432/openser13
    Nov 6 14:45:33 sips-proxy-01 /usr/local/opensips/sbin/opensips[17235]: DBG:db_postgres:db_postgres_new_connection: PQsetdbLogin(0xa0fd2a8)
    Nov 6 14:45:33 sips-proxy-01 /usr/local/opensips/sbin/opensips[17235]: DBG:core:init_mod_child: type=PROC_TCP_MAIN, rank=-4, module=snmpstats
    Nov 6 14:45:33 sips-proxy-01 /usr/local/opensips/sbin/opensips[17235]: DBG:core:io_watch_add: io_watch_add(0x8169da0, 6, 1, 0x819f2a0), fd_no=0
    Nov 6 14:45:33 sips-proxy-01 /usr/local/opensips/sbin/opensips[17235]: DBG:core:io_watch_add: io_watch_add(0x8169da0, 7, 4, 0x982dbcec), fd_no=1
    Nov 6 14:45:33 sips-proxy-01 /usr/local/opensips/sbin/opensips[17235]: DBG:core:io_watch_add: io_watch_add(0x8169da0, 8, 4, 0x982dbd78), fd_no=2
    Nov 6 14:45:33 sips-proxy-01 /usr/local/opensips/sbin/opensips[17235]: DBG:core:io_watch_add: io_watch_add(0x8169da0, 9, 4, 0x982dbe04), fd_no=3
    Nov 6 14:45:33 sips-proxy-01 /usr/local/opensips/sbin/opensips[17235]: DBG:core:io_watch_add: io_watch_add(0x8169da0, 10, 4, 0x982dbe90), fd_no=4
    Nov 6 14:45:33 sips-proxy-01 /usr/local/opensips/sbin/opensips[17235]: DBG:core:io_watch_add: io_watch_add(0x8169da0, 11, 4, 0x982dbf1c), fd_no=5
    Nov 6 14:45:33 sips-proxy-01 /usr/local/opensips/sbin/opensips[17235]: DBG:core:io_watch_add: io_watch_add(0x8169da0, 12, 4, 0x982dbfa8), fd_no=6
    Nov 6 14:45:33 sips-proxy-01 /usr/local/opensips/sbin/opensips[17235]: DBG:core:io_watch_add: io_watch_add(0x8169da0, 13, 4, 0x982dc034), fd_no=7
    Nov 6 14:45:33 sips-proxy-01 /usr/local/opensips/sbin/opensips[17235]: DBG:core:io_watch_add: io_watch_add(0x8169da0, 16, 4, 0x982dc0c0), fd_no=8
    Nov 6 14:45:33 sips-proxy-01 /usr/local/opensips/sbin/opensips[17235]: DBG:core:io_watch_add: io_watch_add(0x8169da0, 18, 4, 0x982dc14c), fd_no=9
    Nov 6 14:45:33 sips-proxy-01 /usr/local/opensips/sbin/opensips[17235]: DBG:core:io_watch_add: io_watch_add(0x8169da0, 20, 4, 0x982dc1d8), fd_no=10
    Nov 6 14:45:33 sips-proxy-01 /usr/local/opensips/sbin/opensips[17235]: DBG:core:io_watch_add: io_watch_add(0x8169da0, 22, 4, 0x982dc264), fd_no=11
    Nov 6 14:45:33 sips-proxy-01 /usr/local/opensips/sbin/opensips[17235]: DBG:core:io_watch_add: io_watch_add(0x8169da0, 14, 3, 0x82083f0), fd_no=12
    Nov 6 14:45:33 sips-proxy-01 /usr/local/opensips/sbin/opensips[17235]: DBG:core:io_watch_add: io_watch_add(0x8169da0, 15, 3, 0x8208404), fd_no=13
    Nov 6 14:45:33 sips-proxy-01 /usr/local/opensips/sbin/opensips[17235]: DBG:core:io_watch_add: io_watch_add(0x8169da0, 17, 3, 0x8208418), fd_no=14
    Nov 6 14:45:33 sips-proxy-01 /usr/local/opensips/sbin/opensips[17235]: DBG:core:io_watch_add: io_watch_add(0x8169da0, 19, 3, 0x820842c), fd_no=15
    Nov 6 14:48:46 sips-proxy-01 /usr/local/opensips/sbin/opensips[17235]: CRITICAL:core:receive_fd: EOF on 9
    Nov 6 14:48:46 sips-proxy-01 /usr/local/opensips/sbin/opensips[17235]: DBG:core:handle_ser_child: dead child 3, pid 17224 (shutting down?)
    Nov 6 14:48:46 sips-proxy-01 /usr/local/opensips/sbin/opensips[17235]: DBG:core:io_watch_del: io_watch_del (0x8169da0, 9, -1, 0x0) fd_no=16 called
    Nov 6 14:48:46 sips-proxy-01 /usr/local/opensips/sbin/opensips[17235]: INFO:core:sig_usr: signal 15 received
    Nov 6 14:48:46 sips-proxy-01 /usr/local/opensips/sbin/opensips[17235]: Memory status (pkg):

     
  • Richard Revels
    Richard Revels
    2008-11-11

    This seems consistent in 50 to 60 percent of the core files. Looks like there are two scenarios causing the crashes. One when replies arrive and the other on initial invites. This is from a reply arriving and the sequence of the function calls in the backtrace match the backtrace used to open the bug report.

    (gdb) print tb
    $1 = (struct to_body *) 0x820a9d8
    (gdb) print *tb
    $2 = {error = 808333871, body = {s = 0x5044552f <Address 0x5044552f out of bounds>, len = 775370272}, uri = {s = 0x2e323531 <Address 0x2e323531 out of bounds>, len = 775500850},
    display = {s = 0x353a3033 <Address 0x353a3033 out of bounds>, len = 993015344}, tag_value = {s = 0x6e617262 <Address 0x6e617262 out of bounds>, len = 2050844771},
    parsed_uri = {user = {s = 0x34476839 <Address 0x34476839 out of bounds>, len = 842222434}, passwd = {s = 0x34353865 <Address 0x34353865 out of bounds>, len = 1916483894},
    host = {s = 0x74726f70 <Address 0x74726f70 out of bounds>, len = 909129021}, port = {s = 0x460a0d30 <Address 0x460a0d30 out of bounds>, len = 980250482}, params = {
    s = 0x73612220 <Address 0x73612220 out of bounds>, len = 1769104756}, headers = {s = 0x20226b73 <Address 0x20226b73 out of bounds>, len = 1885958972}, port_no = 24890,
    proto = 29811, type = 1936290405, transport = {s = 0x3736406b <Address 0x3736406b out of bounds>, len = 842346798}, ttl = {s = 0x3934322e <Address 0x3934322e out of bounds>,
    len = 1043346222}, user_param = {s = 0x6761743b <Address 0x6761743b out of bounds>, len = 896753981}, maddr = {s = 0x62323339 <Address 0x62323339 out of bounds>,
    len = 221591651}, method = {s = 0x3a6f540a <Address 0x3a6f540a out of bounds>, len = 1769159712}, lr = {s = 0x31323a70 <Address 0x31323a70 out of bounds>,
    len = 842542646}, r2 = {s = 0x3532322e <Address 0x3532322e out of bounds>, len = 892547630}, transport_val = {s = 0x61743b3e <Address 0x61743b3e out of bounds>,
    len = 808533351}, ttl_val = {s = 0x61306665 <Address 0x61306665 out of bounds>, len = 1680959076}, user_param_val = {s = 0x64633732 <Address 0x64633732 out of bounds>,
    len = 1664495718}, maddr_val = {s = 0x38656461 <Address 0x38656461 out of bounds>, len = 879112754}, method_val = {s = 0x36343939 <Address 0x36343939 out of bounds>,
    len = 942564405}, lr_val = {s = 0xd313864 <Address 0xd313864 out of bounds>, len = 1818313482}, r2_val = {s = 0x44492d6c <Address 0x44492d6c out of bounds>,
    len = 1697914938}}, param_lst = 0x30396334, last_param = 0x39336434}

     
  • Richard Revels
    Richard Revels
    2008-11-12

    Initial indications are that when I compile with

    #define PKG_MEM_POOL_SIZE 1024*4068

    I don't get the core dumps.

    When I compile with

    #define PKG_MEM_POOL_SIZE 1024*9216

    I get lots of them. I am running at about 1 call per second which is the same load the system was at when it was dumping on a regular basis. The other change is that I am running with command line option -m 256 vs -m 512 but I suspect the pkg memory change is the significant one.

    -rw------- 1 root root 547639296 Nov 9 15:03 core.32127
    -rw------- 1 root root 547602432 Nov 9 15:14 core.5279
    -rw------- 1 root root 547745792 Nov 9 21:00 core.7677
    -rw------- 1 root root 547614720 Nov 9 22:59 core.22886
    -rw------- 1 root root 547745792 Nov 10 19:36 core.18281
    -rw------- 1 root root 547655680 Nov 11 06:18 core.15740
    -rw------- 1 root root 547749888 Nov 11 19:16 core.4572
    -rw------- 1 root root 547745792 Nov 11 21:28 core.22289
    -rw------- 1 root root 547614720 Nov 11 21:50 core.21091
    -rw------- 1 root root 547614720 Nov 11 22:21 core.25760
    -rw------- 1 root root 547614720 Nov 11 22:33 core.310
    -rw------- 1 root root 547610624 Nov 11 23:07 core.3837
    -rw------- 1 root root 547614720 Nov 12 01:05 core.10964

     
    • labels: 1134769 --> modules
    • assigned_to: nobody --> bogdan_iancu
    • priority: 5 --> 9
    • milestone: --> trunk
     
  • Hi Richard,

    I'm investigating the initial crash you reported (on reply time). First question - which of the following modules are you using: nat_traversal, UAC, siptrace, acc, osp..

    The crash happens on the TMCB_RESPONSE_IN callback (in TM) and only the above modules are using this callback.

    Regards,
    Bogdan

     
  • Richard Revels
    Richard Revels
    2008-11-13

    Yeah, that whole pkg memory thing didn't stop the crashes after all.

    I am using acc. I am also running with the account early media flag off because I was crashing with it on. When I turned it off this started popping up and I wanted to deal with one problem at a time, so left it off.

     
  • OK- acc module...
    I guess you have some extra_accounting parameters....could you list them?

    Bogdan

     
  • Richard Revels
    Richard Revels
    2008-12-02

    The core dumps appear to occur when accounting is on and message retransmissions occur. I have quite a few core files and a log file that was generated with debug set to level 6 in the config if that would help narrow down the source of the problem.

     
  • Hi Richard,

    Thanks for all your support in debugging this - finally I found the bug and apply a fix on SVN - the fix in incorporated in 1.4.3 release.
    Please report.

    Thanks and regards,
    Bogdan

     
    • status: open --> open-fixed
     
    • status: open-fixed --> closed-fixed
     
  • Richard Revels
    Richard Revels
    2008-12-13

    • status: closed-fixed --> open-fixed
     
  • Richard Revels
    Richard Revels
    2008-12-13

    Been running for several days with no problem. Thank you very much for the effort that went into fixing this bug.

     
    • status: open-fixed --> closed-fixed