Menu

#698 Deadlock in AgenX master+subagent under trap load

open
nobody
None
6
2016-06-19
2006-11-17
No

I'm using net-snmp 5.3.1 under Sun Solaris (version 10)

I have:
- one master agentx
- one subagent connected to the master agentX

I write a sample where my subagent tries to send massively 10 000 traps. After sending 280 traps, the subagent and the master agentx are deadlocked, and they don't respond anymore. If I kill the subagent, the snmpd runs correctly.

Here is the stack of the subagent during the deadlock:
fc8c068c send (8, 1725f8, a8, 0)
fe6370d8 _sess_async_send (780f8, 161e78, 0, 0, 0, 1725f8) + 820
fe6373ac snmp_sess_async_send (780f8, 161e78, 0, 0, 780f8, 161e78) + 4c
ff245420 send_trap_to_sess (161e78, 16e140, 9697a2d5, d, ff27f3b0, 78388) + 144
ff244f10 netsnmp_send_traps (16e140, c, 0, 167570, 16ff40, a) + 4fc
ff244f98 send_enterprise_trap_vars (ffffffff, ffffffff, ff2808d8, a, 166310, 0) + 1c

Here is the stack of the master agentx during the deadlock:
fedc068c send (a, 294758, b0, 0)
ff1370d8 _sess_async_send (c00, ca078, 0, 0, 0, 294758) + 820
ff1373ac snmp_sess_async_send (133698, ca078, 0, 0, 133698, 10400) + 4c
ff24ff50 handle_master_agentx_packet (e5498, ca078, 10800, 185028, 144, ff26f3b0) + 4d8
ff137f60 _sess_process_packet (133698, 12f440, 93698, 13baa8, 1, 185028) + 820
ff138b38 _sess_read (133698, 20800, 2a0, 7fffffff, 1ea8c0, 133698) + a04
ff1393ec snmp_sess_read (133698, ff1a2bbc, 0, ff1a59e8, 43ca4, 6aae8) + 1c
ff138104 snmp_read (ffbff248, 133698, ff1a5698, ff1a2bbc, 120, 0) + 34
000146e0 receive (c, 25934, 1, ff270870, ff27519c, ff27510c) + 810
00013ae0 main (10000, 400, ffbff464, 26ca0, 25ee8, 0) + 1530
00011b28 _start (0, 0, 0, 0, 0, 0) + 108

Discussion

  • Thomas Anders

    Thomas Anders - 2006-11-17

    Logged In: YES
    user_id=848638
    Originator: NO

    Have you tried 5.4.rc2? I guess it'd suffer the same problem, but it'd be good to double-check.

     
  • Jan Safranek

    Jan Safranek - 2007-12-03

    Logged In: YES
    user_id=1784238
    Originator: NO

    I have the same problem with net-snmp-5.4.1. When I call send_easy_trap() in loop, snmpd gets stuck. Here is my theory how it happens:

    subagent*:
    for (i=0; i<1000; i++)
    send_easy_trap(SNMP_TRAP_ENTERPRISESPECIFIC, i);

    The send_easy_trap directly writes to the socket, but never reads response - it's handled in main loop of the subagent.

    snmpd:
    (some inner loop)
    recv() from /var/snmp/master
    process the trap
    send() response to /var/snmp/master

    Because the subagent does not read the responses, the write buffer on snmpd side becomes full and send() becomes blocking. Snmpd stops reading from the /var/snmp/master (because the send() is blocked), thus subagent's send() fills the buffers and blocks too -> deadlock.

    To fix this bug snmpd must check, if the socket is writeable and add outgoing agentx message to queue if not. Or is here easy fix on subagent side (like send_easy_trap_and_read_incoming_messages())?

    *) sending traps in such loop is of course artificial, it's just an example how to reproduce it quickly. Real application does something more useful, but in some cases it results in lot of send_easy_trap() calls without going through main loop of subagent.

     
  • Jan Safranek

    Jan Safranek - 2008-06-27
     
  • Jan Safranek

    Jan Safranek - 2008-06-27

    Logged In: YES
    user_id=1784238
    Originator: NO

    Attaching experimental patch. It implements asynchronous send in snmp sessions, send() should never block and agentx master does not end in deadlock. Outgoing messages are stored in a queue until appropriate outgoing socket becomes writeable. There are few new config options to set the queue size and send timeout (i.e. how long can an outgoing message wait in the queue).

    Only sessions explicitly marked with ASYNC flags are affected, all other sessions work with blocking send() as before. Currently only snmp agent sets the flag and only when it's explicitly allowed in the agent config file.

    On the opposite side, current snmpd implementation simply ignores send errors. I.e. when the queue of outgoing message becomes full, all messages which do not fit in are discarded. I can change it to release the session (or agentx session), if you find it more appropriate.

    The patch introduces new snmp API calls: snmp_select_info_ex() and snmp_write(), which *must* be used when an application uses any session with ASYNC flag. Snmpd was updated accordingly, I can implement the async. send also in other tools (namely net-snmp-config and snmptrapd). I do not see much sense in updating client tools.

    File Added: net-snmp-async-write.patch

     

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.