Menu

#2411 snmpd crashes/hangs when AgentX subagent times-out

agentx
closed
nobody
agent (1105)
7
2019-06-06
2012-09-05
Ken Farnen
No

SNMPD acting as master agent, AgentX subagent registering to handle a MIB, and processing GETNEXT requests. When the subagent is under heavy load (and so responds slowly) requests start to pile up in the queue, replies from the subagent arrive too late (per log messages) and eventually the subagent is timed out. When the timeout occurs there is a high probability of either a crash (Segfault) or a hang (100% CPU utilisation, tight loop in the snmpd code) dependent on the version of the snmpd under test. This also happens when the subagent dies unexpectedly with outstanding transactions unserviced.

Tested with net-snmp-5.7.1 (Segfaults), net-snmp-5.7.1 plus "subagent-free-cache" patch (basically patch 1633670) (Infinite loop), current trunk (infinite loop).

Our systems are Linux 2.6 based, Montavista CGL V4 and V5 on x86 and x86-64 platforms. glibc 2.3.3.

Attached is a stripped down test subagent that excercises the bug (by forcing a long delay between servicing the agentx requests), together with a script that throws traffic at the snmpd that will make it crash quite quickly. These assume the default snmpd/agentx config, with a 1 second timeout - though our testing indicates it will crash eventually with longer timeouts, especially in the situation where a subagent crashes.

Transactions are based on those we've seen in the field, and are GETNEXT requests for multiple OIDs, all from the MIB provided by the subagent, but with some OIDs numbered such that the response is in the adjacent MIB (i.e. the GETNEXT is walking off the end of the subagent MIB). This kind of transaction appears to excersise the bug very effectively.

Some more details on the degugging we've done so far in the net-snmp-coders list.

Also attached is a core dump from 5.7.1 segfault, and a log extract from 5.7.1 looping.

Discussion

  • Ken Farnen

    Ken Farnen - 2012-09-05

    Example subagent code

     
  • Ken Farnen

    Ken Farnen - 2012-09-05

    Script to send queries to demonstrate crash

     
  • Ken Farnen

    Ken Farnen - 2012-09-05

    Log output from 5.7.1 unpatched

     
  • Ken Farnen

    Ken Farnen - 2012-09-05

    Core file from 5.7.1 unpatched

     
  • Ken Farnen

    Ken Farnen - 2012-09-05

    Log file from 5.7.1 (patched) - 25766 repeated lines removed from end...

     
  • Ken Farnen

    Ken Farnen - 2012-09-05

    Log file from 5.8-dev hanging - many duplicate lines from end removed....

     
  • Jiri Cervenka

    Jiri Cervenka - 2012-10-26

    I have submitted patch 3580458 which fixes looping (and another crash situation) for me. It would be great if you could let me know whether the patch passes your test cases. Thanks.

     
  • Martin East

    Martin East - 2014-07-22

    Thanks for the useful posting, and test harness.

    Here is some info for a fix on Redhat Enterprise Linux v5.x which uses version net-snmp version 5.3.2.2.

    The bug is captured in Redhat bugzilla 1038007 (https://bugzilla.redhat.com/show_bug.cgi?id=1038007). There is also a CVE reference for this: CVE-2012-6151.

    Using the harness, I ran a stress test on net-snmp-*-5.3.2.2-20.el5 under RHEL5.9 (2.6.18-348.3.1.el5), and found it reproduced reliably after ~1 hour of running snmp- crashme.sh and looped agentofdeath subagent runs. The core backtrace was:

    Core was generated by `/usr/sbin/snmpd -LS0-7d -Lf /var/log/snmpd.log -p /var/run/snmpd.pid'.
    Program terminated with signal 11, Segmentation fault.
    #0  0x00002b7b03010ab9 in netsnmp_add_varbind_to_cache () from /usr/lib64/libnetsnmpagent.so.10
    (gdb) bt
    #0  0x00002b7b03010ab9 in netsnmp_add_varbind_to_cache () from /usr/lib64/libnetsnmpagent.so.10
    #1  0x00002b7b0301105c in netsnmp_reassign_requests () from /usr/lib64/libnetsnmpagent.so.10
    #2  0x00002b7b030110e8 in handle_getnext_loop () from /usr/lib64/libnetsnmpagent.so.10
    #3  0x00002b7b03012bb9 in check_delayed_request () from /usr/lib64/libnetsnmpagent.so.10
    #4  0x00002b7b03012c76 in netsnmp_check_outstanding_agent_requests () from /usr/lib64/libnetsnmpagent.so.10
    #5  0x00002b7b03012f27 in netsnmp_remove_delegated_requests_for_session () from /usr/lib64/libnetsnmpagent.so.10
    #6  0x00002b7b030388ce in close_agentx_session () from /usr/lib64/libnetsnmpagent.so.10
    #7  0x00002b7b0301f064 in agentx_got_response () from /usr/lib64/libnetsnmpagent.so.10
    #8  0x00002b7b036acf28 in snmp_sess_timeout () from /usr/lib64/libnetsnmp.so.10
    #9  0x00002b7b036ad088 in snmp_timeout () from /usr/lib64/libnetsnmp.so.10
    #10 0x00002b7b027e50d5 in main ()
    

    Then I upgraded to net-snmp-*-5.3.2.2-22.el5, and ran the same test for 24 hours. No crashes occurred.

    So, the net-snmp-*-5.3.2.2-22.el5 set of packages, seem to fix this bug. Although I have not analysed the source.

    Redhat RHSA reference: https://rhn.redhat.com/errata/RHSA-2014-0322.html

     
  • Junling Zheng

    Junling Zheng - 2015-10-30

    Does 5.7.3 have this vulnerability, CVE-2012-6151?

     
  • Alagu

    Alagu - 2016-04-11

    Hi Experts,

    This is just an query i have also opened a new ticket but since i need this response as soon as possible i am also posting this query on "CVE-2012-6151" path here.

    https://sourceforge.net/p/net-snmp/patches/1323/

    We are currently using CentOS 4.8 with net-snmp-5.4.1 rpms installed. We recently noticed the occurrence of the issue described in “CVE-2012-6151” in one of our customers environment. Hence we are in need of 5.4 version of NET-SNMP with the patch for the “CVE-2012-6151 - net-snmp: snmpd crashes/hangs when AgentX subagent times-out”(net-snmp-5.5-agentx-disconnect-crash.patch). We tried downloading the source rpms and tar.gz files from the below links, but we didn’t find this particular patch fix integrated in it.

    Will this patch be integrated to net-snmp-5.4 version? Pls provide light on this.(any useful link to download patch for this version of NET-SNMP)

    https://sourceforge.net/projects/net-snmp/files/net-snmp/5.4.4/
    https://sourceforge.net/projects/net-snmp/files/net-snmp/5.4.5-pre-releases/

    Hence we tried installing net-snmp-5.5 version which had the patch but we weren’t successful since it was not compatible with CentOS 4.8. We also tried installing net-snmp-5.3 rpms which has the patch fix(net-snmp-5.3.2.2-22.el5_10). Though we were successful we have a slight concern of moving to a lower version of NET-SNMP.
    So we would like to clarify our doubts for a couple of questions,
    1. Whether the patch for the “CVE-2012-6151” was integrated as part of any net-snmp-5.4 versions? If yes we kindly request you to share us the link for the source rpm for the same.
    2. Secondly is there any major functional differences between net-snmp-5.4.1 and net-snmp-5.3.2.2-22 versions which may affect our movement to the lower version of NET-SNMP.

    Thanks in advance for your replies.

    If this is not the right place to clarify, kindly redirect to the concerned forum. Any help on the above two queries will really help us.

     
  • Anders Wallin

    Anders Wallin - 2018-09-03

    Patch integrated in 5.8, 793d596838ff7cb48a73b675d62897c56c9e62df

    Bug should be closed!

     
  • Bart Van Assche

    Bart Van Assche - 2018-09-03
    • status: open --> closed
     
  • Arjun

    Arjun - 2018-09-26

    Hi Team,
    could you please help me on how to repro this issue.
    I have tried this script snmp-crashme.sh but not able to repro the issue.
    I have below mentioned back traces.


    Core was generated by `/usr/sbin/snmpd -f -Lsd -M+/sw/unicorn/snmp/mibs -Dtrap -Dusm -Dinit_mibs -I-sy'.
    Program terminated with signal 11, Segmentation fault.

    0 0x00007f984a328217 in netsnmp_oid_find_prefix ()

    from /usr/lib64/libnetsnmp.so.30

    0 0x00007f984a328217 in netsnmp_oid_find_prefix () from /usr/lib64/libnetsnmp.so.30

    No symbol table info available.

    1 0x00007f984aa783ae in netsnmp_add_varbind_to_cache () from /usr/lib64/libnetsnmpagent.so.30

    No symbol table info available.

    2 0x00007f984aa78a3c in netsnmp_reassign_requests () from /usr/lib64/libnetsnmpagent.so.30

    No symbol table info available.

    3 0x00007f984aa78ac8 in handle_getnext_loop () from /usr/lib64/libnetsnmpagent.so.30

    No symbol table info available.

    4 0x00007f984aa7bce2 in check_delayed_request () from /usr/lib64/libnetsnmpagent.so.30

    No symbol table info available.

    5 0x00007f984aa7bed2 in netsnmp_check_outstanding_agent_requests () from /usr/lib64/libnetsnmpagent.so.30

    No symbol table info available.

    6 0x0000000000404805 in main ()

    No symbol table info available.

     

    Last edit: Arjun 2018-09-26
  • Arjun

    Arjun - 2018-09-26

    Core was generated by `/usr/sbin/snmpd -f -Lsd -M+/sw/unicorn/snmp/mibs -Dtrap -Dusm -Dinit_mibs -I-sy'.
    Program terminated with signal 11, Segmentation fault.

    0 0x00007f4e7398d399 in netsnmp_add_varbind_to_cache ()

    from /usr/lib64/libnetsnmpagent.so.30

    0 0x00007f4e7398d399 in netsnmp_add_varbind_to_cache () from /usr/lib64/libnetsnmpagent.so.30

    No symbol table info available.

    1 0x00007f4e7398da3c in netsnmp_reassign_requests () from /usr/lib64/libnetsnmpagent.so.30

    No symbol table info available.

    2 0x00007f4e7398dac8 in handle_getnext_loop () from /usr/lib64/libnetsnmpagent.so.30

    No symbol table info available.

    3 0x00007f4e73990ce2 in check_delayed_request () from /usr/lib64/libnetsnmpagent.so.30

    No symbol table info available.

    4 0x00007f4e73990ed2 in netsnmp_check_outstanding_agent_requests () from /usr/lib64/libnetsnmpagent.so.30

    No symbol table info available.

    5 0x0000000000404805 in main ()

    No symbol table info available.

     
  • Kiran Kumar Pamula

    Dear Experts.,

    Our Customer reported this issue

    https://bst.cloudapps.cisco.com/bugsearch/bug/CSCvi02645/?rfs=iqvred

    Where the symptoms and backtrace exactly look similar to the issue that is discussed in this forum. We are using net-snmp 5.7.3, that we think is vulnerable to this problem, and we believe customer is hitting the same .

    As upgrading to 5.8 takes considerable amount of time and bandwidth, we plan to use the patch specified in the thread and provide this fix to customer, who is significantly affected by this issue.

    However, we are finding it challenging to validate this fix (Unit Test), as we are not able to recreate the problem locally, where we need your help.

    We have run the below script for 3 days to replicate the problem in the affected version of code, but we didnt hit the issue.

    !/usr/bin/perl

    $i = 0;
    while($i <= 100000000000)
    {
    system("snmpwalk -v2c -c r0c4s localhost iso >> testiso.txt\n");
    printf("The value of i is $i");
    $i++;
    sleep(3);

    }

    Please let us know if you could suggest a way to replicate the isssue in-house, so that we can provide the fix to customer with confidence in the upcoming release.

    Kindly help provide necessary pointers/help in recreating the issue, that shall be of great help to us,

    Thanks,
    Kiran

     
  • Anders Wallin

    Anders Wallin - 2019-06-06

    Please try with the latest release or port the patches related to this
    problem to your version

    Regards
    Anders

     

    Last edit: Bart Van Assche 2019-06-06

Log in to post a comment.