#2365 subagent crashes in handle_subagent_set_response

agentx
open
nobody
agent (1103)
3
2012-12-12
2012-05-17
No

Hi

There is a single Subagent(single threaded).
In case of continuous master agent timeout, subagent crashes with 'retsess' corrupted in handle_subagent_set_response

Backtrace:
#0 handle_subagent_set_response (op=1, session=0x0, reqid=2146889144, pdu=0x30b655e0, magic=0x30ab4d70)
at net-snmp/src/net-snmp-5.1.4/agent/mibgroup/agentx/subagent.c:567
#1 0x30751c6c in snmpv3_make_report () from /usr/lib/libnetsnmp.so.8
#2 0x307533cc in _sess_read () from /usr/lib/libnetsnmp.so.8
#3 0x30753de0 in snmp_sess_read () from /usr/lib/libnetsnmp.so.8
#4 0x30753e68 in snmp_read () from /usr/lib/libnetsnmp.so.8
#5 0x3058233c in agent_check_and_process () from /usr/lib/libnetsnmpagent.so.8
#6 0x0044b480 in main ()

Registers:
sr lo hi bad cause pc
00005c13 00000001 00000004 1026a0bd 10800008 30614ac4

The cause saying that its an unaligned access and sometimes retsess is seen to be NULL also

Kindly help me fix this to address the corruption seen.

Discussion

  • (gdb) bt
    #0 handle_subagent_set_response (op=1, session=0x251, reqid=1, pdu=0x310013b0, magic=0x31001d78)
    at net-snmp-5.1.4/agent/mibgroup/agentx/subagent.c:569
    #1 0x30751c6c in _sess_process_packet (sessp=0x31000860, sp=0x31002fa8, isp=0x310011d0, transport=0x0, opaque=0x0, olength=822087912,
    packetptr=0x310106f0 "\001\t", length=1) at net-snmp-5.1.4/snmplib/snmp_api.c:5061
    #2 0x307533cc in _sess_read (sessp=0x31000860, fdset=0x310106f0)
    at net-snmp-5.1.4/snmplib/snmp_api.c:5553
    #3 0x30753de0 in snmp_sess_read (sessp=0x31000860, fdset=0x251)
    at net-snmp-5.1.4/snmplib/snmp_api.c:5572
    #4 0x30753e68 in snmp_read (fdset=0x7f9c6bc0) at net-snmp-5.1.4/snmplib/snmp_api.c:5200
    #5 0x3058233c in agent_check_and_process () from /usr/lib/libnetsnmpagent.so.8
    #6 0x0044b480 in main ()
    (gdb) fr
    #2 0x307533cc in _sess_read (sessp=0x31000860, fdset=0x310106f0)
    at net-snmp-5.1.4/snmplib/snmp_api.c:5553
    5553 in net-snmp-5.1.4/snmplib/snmp_api.c
    (gdb) info locals
    nslp = (struct session_list *) 0x0
    new_transport = (netsnmp_transport *) 0x0
    data_sock = 2140957632
    sp = (netsnmp_session *) 0x31002fa8
    isp = (struct snmp_internal_session *) 0x310011d0
    transport = (netsnmp_transport *) 0x31000578
    pdulen = 2140957632
    rxbuf_len = 65536
    rxbuf = (u_char *) 0x310106f0 "\001\t"
    length = 822216432
    olength = 0
    rc = 0
    opaque = (void *) 0x0
    __FUNCTION__ = "_sess_read"
    (gdb) p transport->sock
    $2 = 16
    (gdb) fr
    #2 0x307533cc in _sess_read (sessp=0x31000860, fdset=0x310106f0)
    at net-snmp-5.1.4/snmplib/snmp_api.c:5553
    5553 in net-snmp-5.1.4/snmplib/snmp_api.c
    (gdb) fr 1
    #1 0x30751c6c in _sess_process_packet (sessp=0x31000860, sp=0x31002fa8, isp=0x310011d0, transport=0x0, opaque=0x0, olength=822087912,
    packetptr=0x310106f0 "\001\t", length=1) at net-snmp-5.1.4/snmplib/snmp_api.c:5061
    5061 in net-snmp-5.1.4/snmplib/snmp_api.c
    (gdb) p transport->sock
    Cannot access memory at address 0x18
    (gdb) p transport->sockQuit
    (gdb) bt
    #0 handle_subagent_set_response (op=1, session=0x251, reqid=1, pdu=0x310013b0, magic=0x31001d78)
    at net-snmp-5.1.4/agent/mibgroup/agentx/subagent.c:569
    #1 0x30751c6c in _sess_process_packet (sessp=0x31000860, sp=0x31002fa8, isp=0x310011d0, transport=0x0, opaque=0x0, olength=822087912,
    packetptr=0x310106f0 "\001\t", length=1) at net-snmp-5.1.4/snmplib/snmp_api.c:5061
    #2 0x307533cc in _sess_read (sessp=0x31000860, fdset=0x310106f0)
    at net-snmp-5.1.4/snmplib/snmp_api.c:5553
    #3 0x30753de0 in snmp_sess_read (sessp=0x31000860, fdset=0x251)
    at net-snmp-5.1.4/snmplib/snmp_api.c:5572
    #4 0x30753e68 in snmp_read (fdset=0x7f9c6bc0) at net-snmp-5.1.4/snmplib/snmp_api.c:5200
    #5 0x3058233c in agent_check_and_process () from /usr/lib/libnetsnmpagent.so.8
    #6 0x0044b480 in main ()
    (gdb) fr 1
    #1 0x30751c6c in _sess_process_packet (sessp=0x31000860, sp=0x31002fa8, isp=0x310011d0, transport=0x0, opaque=0x0, olength=822087912,
    packetptr=0x310106f0 "\001\t", length=1) at net-snmp-5.1.4/snmplib/snmp_api.c:5061
    5061 in net-snmp-5.1.4/snmplib/snmp_api.c
    (gdb) info locals
    callback = 0x3085d194 <sem\_timedwait+2728>
    magic = (void *) 0x30643000
    pdu = (netsnmp_pdu *) 0x310010e8
    rp = (netsnmp_request_list *) 0x310007d8
    orp = (netsnmp_request_list *) 0x0
    sptr = (struct snmp_secmod_def *) 0x30643000
    ret = 0
    handled = 1
    __FUNCTION__ = "_sess_process_packet"
    (gdb) fr 1
    #1 0x30751c6c in _sess_process_packet (sessp=0x31000860, sp=0x31002fa8, isp=0x310011d0, transport=0x0, opaque=0x0, olength=822087912,
    packetptr=0x310106f0 "\001\t", length=1) at net-snmp-5.1.4/snmplib/snmp_api.c:5061
    5061 in net-snmp-5.1.4/snmplib/snmp_api.c
    (gdb) fr 2
    #2 0x307533cc in _sess_read (sessp=0x31000860, fdset=0x310106f0)
    at net-snmp-5.1.4/snmplib/snmp_api.c:5553
    5553 in net-snmp-5.1.4/snmplib/snmp_api.c
    (gdb) info locals
    nslp = (struct session_list *) 0x0
    new_transport = (netsnmp_transport *) 0x0
    data_sock = 2140957632
    sp = (netsnmp_session *) 0x31002fa8
    isp = (struct snmp_internal_session *) 0x310011d0
    transport = (netsnmp_transport *) 0x31000578
    pdulen = 2140957632
    rxbuf_len = 65536
    rxbuf = (u_char *) 0x310106f0 "\001\t"
    length = 822216432
    olength = 0
    rc = 0
    opaque = (void *) 0x0
    __FUNCTION__ = "_sess_read"
    (gdb) fr 3
    #3 0x30753de0 in snmp_sess_read (sessp=0x31000860, fdset=0x251)
    at net-snmp-5.1.4/snmplib/snmp_api.c:5572
    5572 in net-snmp-5.1.4/snmplib/snmp_api.c
    (gdb) infor locals
    Undefined command: "infor". Try "help".
    (gdb) info locals
    pss = (netsnmp_session *) 0x30643000
    rc = 0
    (gdb) fr 4
    #4 0x30753e68 in snmp_read (fdset=0x7f9c6bc0) at net-snmp-5.1.4/snmplib/snmp_api.c:5200
    5200 in net-snmp-5.1.4/snmplib/snmp_api.c
    (gdb) infor locals
    Undefined command: "infor". Try "help".
    (gdb) info locals
    slp = (struct session_list *) 0x31000860
    (gdb)

     
  • Thomas Anders
    Thomas Anders
    2012-05-21

    The Net-SNMP version you're using is quite old. The 5.0.x and 5.1.x branches have already been declared end-of-life and won't receive any further development or bug fixes (except for major security issues).

    Please retest with current code (see http://www.net-snmp.org/dev/schedule.html and http://www.net-snmp.org/download.html\) and report back.

     
  • Bill Fenner
    Bill Fenner
    2012-05-21

    I have no useful debugging, but we have occasionally seen this, including under 5.7.1. Our internal bug title is nearly the same:

    "AgentX subagent dies with SIGSEGV in handle_subagent_set_response"

    and the incomplete analysis is exactly the same:

    -------------------- Comment 7 --------------------
    By: <<>>@aristanetworks.com 2011-12-19 17:14:34

    I traced the SEGV location to line 664 in net-snmp-5.7.1/agent/mibgroup/agentx/subagent.c. It looks
    like retsess is corrupted. I ran <<our test="">> 1500 times with Snmp under valgrind to catch the
    corruption, but I didn't hit it.

     
  • Hi fenner,
    I can provide you any info required for debugging. This is an occasional issue but I can reproduce with a few steps.

    Kindly set timeout to 2 and retries to 1. Add a sleep of 10secs to some particular callback(like user creation) in subagent and try create and delete of user continuously. The crash would be reproducible this way.

    Any sort of help for debugging would be very kind.

    Thanks!

     
  • Nathan Kitchen
    Nathan Kitchen
    2012-06-14

    sudheendrasp,

    I haven't been able to reproduce the bug using your suggested steps. I have an object in a test MIB whose handler sleeps for 10 seconds, and I set agentXTimeout to 2 and agentXRetries to 1 in snmpd.conf. When I send a request, all I see is that the AgentX master disconnects the subagent; I don't see any of the symptoms of retsess corruption previously posted here. I see the same for Get requests with the sleep in the Get handler or for Set requests with the sleep in the Set handler.

    Do you have any more details that could help us to reproduce this bug more consistently?

     
  • Cif
    Cif
    2012-12-12

    I have the same problem with version 5.4.3

    (gdb) back
    #0 handle_subagent_set_response (op=<value optimized="" out="">, session=<value optimized="" out="">,
    reqid=<value optimized="" out="">, pdu=0x2c80d60, magic=0x2c956b0)
    at mibgroup/agentx/subagent.c:612
    #1 0x00007ff203326c52 in _sess_process_packet (sessp=<value optimized="" out="">, sp=0x1184990,
    isp=0x1184920, transport=<value optimized="" out="">, opaque=<value optimized="" out="">,
    olength=<value optimized="" out="">, packetptr=0x2c84180 "\230..\003\362\177", length=1)
    at snmp_api.c:5263
    #2 0x00007ff203327a05 in _sess_read (sessp=0x11848f0, fdset=<value optimized="" out="">)
    at snmp_api.c:5764
    #3 0x00007ff2033282d9 in snmp_sess_read (sessp=0x7fff076e3c80, fdset=0x0) at snmp_api.c:5783
    #4 0x00007ff20332832b in snmp_read (fdset=0x7fff076e4b40) at snmp_api.c:5400
    #5 0x00007ff203b32c8f in agent_check_and_process (block=0) at snmp_agent.c:606
    #6 0x000000000040452a in main (argc=1, argv=0x7fff076e4d08) at netsnmptmp.4684.c:219

    I understand this is not a supported version anymore, but if the problem affect newer version I will port the patch to this one.