From: SourceForge.net <no...@so...> - 2012-09-10 15:05:25
|
Bugs item #3565004, was opened at 2012-09-05 06:47 Message generated for change (Settings changed) made by kenfagilent You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=112694&aid=3565004&group_id=12694 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: agent Group: agentx Status: Open Resolution: None >Priority: 7 Private: No Submitted By: Ken Farnen (kenfagilent) Assigned to: Nobody/Anonymous (nobody) Summary: snmpd crashes/hangs when AgentX subagent times-out Initial Comment: SNMPD acting as master agent, AgentX subagent registering to handle a MIB, and processing GETNEXT requests. When the subagent is under heavy load (and so responds slowly) requests start to pile up in the queue, replies from the subagent arrive too late (per log messages) and eventually the subagent is timed out. When the timeout occurs there is a high probability of either a crash (Segfault) or a hang (100% CPU utilisation, tight loop in the snmpd code) dependent on the version of the snmpd under test. This also happens when the subagent dies unexpectedly with outstanding transactions unserviced. Tested with net-snmp-5.7.1 (Segfaults), net-snmp-5.7.1 plus "subagent-free-cache" patch (basically patch 1633670) (Infinite loop), current trunk (infinite loop). Our systems are Linux 2.6 based, Montavista CGL V4 and V5 on x86 and x86-64 platforms. glibc 2.3.3. Attached is a stripped down test subagent that excercises the bug (by forcing a long delay between servicing the agentx requests), together with a script that throws traffic at the snmpd that will make it crash quite quickly. These assume the default snmpd/agentx config, with a 1 second timeout - though our testing indicates it will crash eventually with longer timeouts, especially in the situation where a subagent crashes. Transactions are based on those we've seen in the field, and are GETNEXT requests for multiple OIDs, all from the MIB provided by the subagent, but with some OIDs numbered such that the response is in the adjacent MIB (i.e. the GETNEXT is walking off the end of the subagent MIB). This kind of transaction appears to excersise the bug very effectively. Some more details on the degugging we've done so far in the net-snmp-coders list. Also attached is a core dump from 5.7.1 segfault, and a log extract from 5.7.1 looping. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=112694&aid=3565004&group_id=12694 |