I'm using net-snmp 5.3.1 under Sun Solaris (version 10)
I have:
- one master agentx
- one subagent connected to the master agentX
I write a sample where my subagent tries to send massively 10 000 traps. After sending 280 traps, the subagent and the master agentx are deadlocked, and they don't respond anymore. If I kill the subagent, the snmpd runs correctly.
Here is the stack of the subagent during the deadlock:
fc8c068c send (8, 1725f8, a8, 0)
fe6370d8 _sess_async_send (780f8, 161e78, 0, 0, 0, 1725f8) + 820
fe6373ac snmp_sess_async_send (780f8, 161e78, 0, 0, 780f8, 161e78) + 4c
ff245420 send_trap_to_sess (161e78, 16e140, 9697a2d5, d, ff27f3b0, 78388) + 144
ff244f10 netsnmp_send_traps (16e140, c, 0, 167570, 16ff40, a) + 4fc
ff244f98 send_enterprise_trap_vars (ffffffff, ffffffff, ff2808d8, a, 166310, 0) + 1c
Here is the stack of the master agentx during the deadlock:
fedc068c send (a, 294758, b0, 0)
ff1370d8 _sess_async_send (c00, ca078, 0, 0, 0, 294758) + 820
ff1373ac snmp_sess_async_send (133698, ca078, 0, 0, 133698, 10400) + 4c
ff24ff50 handle_master_agentx_packet (e5498, ca078, 10800, 185028, 144, ff26f3b0) + 4d8
ff137f60 _sess_process_packet (133698, 12f440, 93698, 13baa8, 1, 185028) + 820
ff138b38 _sess_read (133698, 20800, 2a0, 7fffffff, 1ea8c0, 133698) + a04
ff1393ec snmp_sess_read (133698, ff1a2bbc, 0, ff1a59e8, 43ca4, 6aae8) + 1c
ff138104 snmp_read (ffbff248, 133698, ff1a5698, ff1a2bbc, 120, 0) + 34
000146e0 receive (c, 25934, 1, ff270870, ff27519c, ff27510c) + 810
00013ae0 main (10000, 400, ffbff464, 26ca0, 25ee8, 0) + 1530
00011b28 _start (0, 0, 0, 0, 0, 0) + 108
Logged In: YES
user_id=848638
Originator: NO
Have you tried 5.4.rc2? I guess it'd suffer the same problem, but it'd be good to double-check.
Logged In: YES
user_id=1784238
Originator: NO
I have the same problem with net-snmp-5.4.1. When I call send_easy_trap() in loop, snmpd gets stuck. Here is my theory how it happens:
subagent*:
for (i=0; i<1000; i++)
send_easy_trap(SNMP_TRAP_ENTERPRISESPECIFIC, i);
The send_easy_trap directly writes to the socket, but never reads response - it's handled in main loop of the subagent.
snmpd:
(some inner loop)
recv() from /var/snmp/master
process the trap
send() response to /var/snmp/master
Because the subagent does not read the responses, the write buffer on snmpd side becomes full and send() becomes blocking. Snmpd stops reading from the /var/snmp/master (because the send() is blocked), thus subagent's send() fills the buffers and blocks too -> deadlock.
To fix this bug snmpd must check, if the socket is writeable and add outgoing agentx message to queue if not. Or is here easy fix on subagent side (like send_easy_trap_and_read_incoming_messages())?
*) sending traps in such loop is of course artificial, it's just an example how to reproduce it quickly. Real application does something more useful, but in some cases it results in lot of send_easy_trap() calls without going through main loop of subagent.
Logged In: YES
user_id=1784238
Originator: NO
Attaching experimental patch. It implements asynchronous send in snmp sessions, send() should never block and agentx master does not end in deadlock. Outgoing messages are stored in a queue until appropriate outgoing socket becomes writeable. There are few new config options to set the queue size and send timeout (i.e. how long can an outgoing message wait in the queue).
Only sessions explicitly marked with ASYNC flags are affected, all other sessions work with blocking send() as before. Currently only snmp agent sets the flag and only when it's explicitly allowed in the agent config file.
On the opposite side, current snmpd implementation simply ignores send errors. I.e. when the queue of outgoing message becomes full, all messages which do not fit in are discarded. I can change it to release the session (or agentx session), if you find it more appropriate.
The patch introduces new snmp API calls: snmp_select_info_ex() and snmp_write(), which *must* be used when an application uses any session with ASYNC flag. Snmpd was updated accordingly, I can implement the async. send also in other tools (namely net-snmp-config and snmptrapd). I do not see much sense in updating client tools.
File Added: net-snmp-async-write.patch