#1237 snmpd crashes/hangs when subagent disconnects during GetNext


This is my attempt to extend patch 1633670. This patch is created
against 5.7.2 plus patch 1633670.

Problems addressed:

1) Inifinite loop in delegated requests removal

In my opionion, the for-loop iteration on asp->requests in
netsnmp_remove_delegated_requests_for_session() is incorrect, as it
may end up flagging only a subset of asp's requests (depending on how
many different subtrees (and hence treecaches) this asp's requests
OIDs fall in).

The loop situation occurs when the asp->requests[0] request has
already been answered (and is not part of any treecache), whereas some
other requests on this asp still have the delegated flag on
(reproducible by GetNext requests which contain both a variable
pointing to subtree belonging to the closing session, and another one
pointing to a range preceding that very subtree (but not fulfilled by
any preceding handler).

I think this for-loop should walk all asp->vbcount items on the
asp->requests array instead.

2) Crashes on dangling subtree pointers after AgentX Close PDU

The same crash patterns as those described in patch 1633670 appear
when a session gets closed via an explicit Close PDU from a subagent.
So, I think we need to go through the remove delegated requests loop
also when a valid sessid has been passed to close_agentx_session().

Adding this change caused double frees on delegated caches in
agentx_got_response() ('response is too late') -- apparently caused by
responses to retries, and also by callback from snmp_sess_close(). I
think this can be fixed by returning 1 from agentx_got_response() on
this occasion, thus letting _sess_process_packet() delete the request
from the session pending requests list.

Tested with 5.7.2 w/ patch 1633670, on x86_64 Linux, RHEL 5.5.

I tested the patch also with the subagent and traffic generating
script attached to bug 3565004 -- and it worked for me.


  • Ken Farnen

    Ken Farnen - 2012-10-29

    Thanks for the patch!

    Having problems applying it to 5.7.2 though, as the patch fails when applied to the released 5.7.2 tarball after applying the final (v3) 1633670 patch.

    Failed chunk is:

    more master.c.rej
    *** 219,228 ****
    if (!cache) {
    DEBUGMSGTL(("agentx/master", "response too late on session %8p\n",
    - /* response is too late, free the cache */
    if (magic)
    netsnmp_free_delegated_cache((netsnmp_delegated_cache*) magic);
    - return 0;
    requests = cache->requests;

    --- 219,231 ----
    if (!cache) {
    DEBUGMSGTL(("agentx/master", "response too late on session %8p\n",
    + /*
    + * Response is too late, free the cache and return 1
    + * so that the session pending request list item can be deleted
    + */
    if (magic)
    netsnmp_free_delegated_cache((netsnmp_delegated_cache*) magic);
    + return 1;
    requests = cache->requests;

    Fails because:

          if \(magic\)
              netsnmp\_free\_delegated\_cache\(\(netsnmp\_delegated\_cache\*\) magic\);

    isn't present, am I missing a patch to be applied to 5.7.2? Or am I working with the wrong version of the original patch?

    Meanwhile, I've manually applied the change and I'm testing with that - has not crashed yet...

  • Jiri Cervenka

    Jiri Cervenka - 2012-10-29

    Tha patch is meant to be applied against the final version of the patch 1633670 committed on trunk,
    i.e. the 43cd9db4 commit 'CHANGES: PATCH 1633670: fixed snmpd crashing when an AgentX subagent disconnect in the middle of processing of a request.'.

    The final version contains two additional delegated cache memory leaks fixes.

    I have added the final version to this tracker artifact as 'patch-1633670-commit.patch'.

    Sorry for confusion.

  • Ken Farnen

    Ken Farnen - 2012-10-29

    Thanks for that, I'll re-test with the updated 1633670 and 5.7.2 stock - testing with my hacked together version stayed up for about 2 hours, but then started hanging up for long periods and chewing up memory.

    Much better than anything that we achieved, but not quite there yet - I'll tell you how testing went after it has had an overnight soak.

  • Ken Farnen

    Ken Farnen - 2012-11-01

    OK, with the right version of the 1633670 patch, that works a lot better!

    Passes my tests (flat out traffic generation into both our production application and the test subagent at the same time) and also a looped SNMP-SET test (which was the original problem that got us started down the 1633670 patch road).

    With the inevitable large queues of pending transactions generated by these tests, the SET test does cause the snmpd to pause a bit, but it never falls over, and always comes back and starts responding, with SETs still operational. It also seems to pass valgrind pretty much.

    We're going to deploy the patched 5.7.2 into automated test, and create a formal patch of our system for further testing next week.

    Overall though, this looks like a winner.

    Thanks very much, you've succeeded where we had failed miserably!


  • Jiri Cervenka

    Jiri Cervenka - 2012-11-06

    draft of an alternative patch, against plain 5.7.2

  • Jiri Cervenka

    Jiri Cervenka - 2012-11-06

    While running some additional tests I have hit another inifinite loop
    around delegated requests removal (this one triggered by requests
    containing OIDs delegated to multiple different subagents).

    This got me thinking whether the iterate-next-loop approach taken by
    patch 1633670 is really necessary (it seems to me that -- apart from
    the infinite loop risk -- there is also the effect of pausing new
    request processing by favouring next loop iteration for a subset of

    I am currently testing an alternative approach (uploaded as
    alt-cancel-next-walk.patch). The basic idea is to a) mark the whole
    asp explicitly as canceled (we cannot rely on the 'delegated' flag
    alone), and b) break out of the next walk when cancel is in progress
    (the asp is destined to fail with genError anyway).

    While this patch has so far passed my crash/leak tests, I consider it
    to be of preliminary status (I am aware of several open questions,
    e.g. I don't like the idea of adding a new variable to
    netsnmp_agent_session structure).

    The patch is created against plain 5.7.2.

    Any feedback gladly welcome.

  • Jiri Cervenka

    Jiri Cervenka - 2012-11-07

    Uploaded alt-cancel-next-walk-v2.patch.

    We don't want to start handling queued requests (from
    netsnmp_agent_queued_list) when on the close session path. Decoupled
    the delegated list check into a new function.

  • Jan Safranek

    Jan Safranek - 2013-04-23

    The patch looks fine to me, or at least to my valgrind :), but adding new item to struct netsnmp_agent_session_s is IMHO ABI breaker. It can be submitted to git master, but not to stable branches.

    There must be either separate list of cancelled sessions or other existing struct netsnmp_agent_session_s item must be misused to mark the cancelled ones.

  • Jan Safranek

    Jan Safranek - 2013-06-25

    Jiri, you mention some 'additional tests' and 'my crash/leak tests', could you please elaborate on what these tests do and/or even share the tests here?

  • Jiri Cervenka

    Jiri Cervenka - 2013-06-27

    Attaching tools-p1237.tar.gz -- tools I have used to reproduce/test the problem.

    $ cat README
    Subagent derived from 'agentofdeath.tar' attached to bug#2411.
    ( agentx socket: "/tmp/snmp-test/var/agentx/master";
    ./runsub.sh pause-len sub-oid-1 sub-oid-2 )

    Subagent with SetReq support derived from OpenHPI subagent tutorial
    @ http://openhpi.sourceforge.net/subagent-manual/x649.html .
    ( agentx socket: "/tmp/snmp-test/var/agentx/master";
    ./runsub.sh pause-len)

    GetNext, GetBulk, Get traffic mix generators, see e.g. crash1-n-v2c-bulkmix.sh.

    SetReq traffic mix example.

    Some GDB macros printing relevant lists and structures (e.g. "padl"
    (agent_delegated_list), "ptp" (netsnmp_subtree), "preq" (netsnmp_request_info),
    "pasp" (netsnmp_agent_session), "ps" (netsnmp_session), "ppdu" (netsnmp_pdu) etc.)

    An easy way to reproduce the original problem is to start an instance of the 'pausing'
    suba agent (./runsub.sh 2 4 1) and generate some GetNext traffic (e.g. crash1-n-v2c-bulkmix.sh).
    Adding more subagents responsible for adjacent subtrees increase the likelihood
    of hitting the problem (e.g. 2 suba's with the same generator reproduce the snmpd loop
    in patch 1633670).

    My basic setup for soak tests was to run snmpd with 2-3 instances of suba's attached
    to adjacent subtrees, plus an instance of subwgt; and generate walk and set traffic
    (with some additional script repeatedly shutting down subagents (to test agentx
    initiated Close)). I repeated the tests with snmpd being run under valgrind
    (valgrind --tool=memcheck --leak-check=full).

    (alt-cancel-next-walk-v2.patch against 5.7.2, RHEL 5.5, x86_64).

  • Jan Safranek

    Jan Safranek - 2013-07-02

    Thanks a lot for the scripts. I pushed the patch into git master, commit 793d596838ff7cb48a73b675d62897c56c9e62df, as it breaks ABI.

  • Bill Fenner

    Bill Fenner - 2013-07-02

    Did the patch to master.c get missed in 793d596838ff7cb48a73b675d62897c56c9e62df?

  • Jan Safranek

    Jan Safranek - 2013-07-03

    Master.c changes are already there, IMO.

  • Junling Zheng

    Junling Zheng - 2015-10-30

    Hi, Jan:
    I can't find this fixing code in 5.7.3.
    Does 5.7.3 release version have this vulnerability?

    Last edit: Junling Zheng 2015-10-30
  • Jan Safranek

    Jan Safranek - 2015-10-30

    The patch was applied to master only, as it breaks API a bit.

  • Joan Landry

    Joan Landry - 2017-02-08

    I am trying to run agentx in 5.7.3 and having no luck - first I get the crash - after I apply the patch I get a result of generic error to any subagent get request.

    Any ideas what I need to do to get agentx to work?

  • Joan Landry

    Joan Landry - 2017-02-08

    also in my setup the subagent does not fail to respond - as I have verified that the handler is being called and the response sent.

    Last edit: Joan Landry 2017-02-08

Log in to post a comment.