From: SourceForge.net <no...@so...> - 2009-05-31 01:30:36
|
Bugs item #1507771, was opened at 2006-06-17 10:54 Message generated for change (Comment added) made by nicomatsu You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=112694&aid=1507771&group_id=12694 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: apps Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: Jochen (jochen2) Assigned to: Nobody/Anonymous (nobody) Summary: snmpget using SNMPv3 may hang Initial Comment: I have an agent aparently suffering from an time overflow bug. However, this causes snmpget to hang (and any application using snmpget to poll a device like cacti or nagios to fail). Command to query the device: snmpget -O fntev -u nagios -l authNoPriv -a SHA -A <secret> -v 3 -t 1 -r 3 <device>:161 .1.3.6.1.2.1.1.3.0 snmpget tries to get the value 5 times and then hangs. Tested with 5.2.1 and 5.2.2. According to ethereal this is what is happening: No. Time Source Destination Protocol Info 1 0.000000 <src> <device> SNMP GET 2 0.001783 <device> <src> SNMP REPORT 1.3.6.1.6.3.15.1.1.4.0 3 0.026461 <src> <device> SNMP GET 1.3.6.1.2.1.1.3.0 4 0.027940 <device> <src> SNMP REPORT 1.3.6.1.6.3.15.1.1.2.0 5 0.039909 <src> <device> SNMP GET 1.3.6.1.2.1.1.3.0 6 0.041247 <device> <src> SNMP REPORT 1.3.6.1.6.3.15.1.1.2.0 7 0.058454 <src> <device> SNMP GET 1.3.6.1.2.1.1.3.0 8 0.059949 <device> <src> SNMP REPORT 1.3.6.1.6.3.15.1.1.2.0 9 0.069243 <src> <device> SNMP GET 1.3.6.1.2.1.1.3.0 10 0.070610 <device> <src> SNMP REPORT 1.3.6.1.6.3.15.1.1.2.0 11 0.081977 <src> <device> SNMP GET 1.3.6.1.2.1.1.3.0 12 0.083309 <device> <src> SNMP REPORT 1.3.6.1.6.3.15.1.1.2.0 Simple Network Management Protocol Version: 3 (3) Message Global Header Message Global Header Length: 16 Message ID: 1824129936 Message Max Size: 1472 Flags: 0x01 .... .0.. = Reportable: Not set .... ..0. = Encrypted: Not set .... ...1 = Authenticated: Set Message Security Model: USM Message Security Parameters Message Security Parameters Length: 50 Authoritative Engine ID: 800007E5803973832F0216F13B 1... .... = Engine ID Conformance: RFC3411 (SNMPv3) Engine Enterprise ID: U.C. Davis, ECE Dept. Tom (2021) Engine ID Format: Reserved/Enterprise-specific (128): UCD-SNMP Random Engine ID Data: 3973832F Engine ID Data: Creation Time: Nov 13, 2001 13:45:54 Engine Boots: 45 Engine Time: 21525072 User Name: nagios Authentication Parameter: DFC172CB679BA88F5DACA64F Privacy Parameter: Context Engine ID: 800007E5803973832F0216F13B 1... .... = Engine ID Conformance: RFC3411 (SNMPv3) Engine Enterprise ID: U.C. Davis, ECE Dept. Tom (2021) Engine ID Format: Reserved/Enterprise-specific (128): UCD-SNMP Random Engine ID Data: 3973832F Engine ID Data: Creation Time: Nov 13, 2001 13:45:54 Context Name: PDU type: REPORT (8) Request Id: 0x6495b4f7 Error Status: NO ERROR (0) Error Index: 0 Object identifier 1: 1.3.6.1.6.3.15.1.1.2.0 Value: COUNTER: 760 (0x2f8) ---------------------------------------------------------------------- Comment By: nicomatsu (nicomatsu) Date: 2009-05-30 22:30 Message: Hi, I have the same problem, the client version is 5.3.2.2 (the latest available to Centos 5.3) , the agent is a Nokia Firewall 560 The problem is that it enters a blocked state as stated by the debug line sess_select: blocking:no session requests or alarms. Now, I asked myself, why would I need a blocked command line? Is that feature good? Besides, if the average user doesn't turn on the debug mode, he will never find out what is blocking it. I think that in a data polling environment, blocks are never good, a timeout should always be implemented, especially if the applicaction states so . The crazy thing is that when I checked the change log I found this (5.2 Change Log) Fixes: - AgentX memory leak on sets - SNMPv3 not-in-time window after 248 days - Agent hang in HOST-RESOURCES MIB - double free on duplicate registration - OIDs with IP Addresses as indexes now print prettier -'Lazy' installation of headers (only install if newer) - allow whitespace in rocommunity/rwcommunity - many miscellaneous bug reports But I couldn't find specifically what the bug fix "SNMPv3 not-in-time window after 248 days" means, is the agent overflow? is the client overflow? is the error handling involved?. I was not able to find the bug id in the bug tracker. My point is, don't tell me it is going to block, just give an error code and exit. I looked for the code, and the bug is in snmp_api.c, where the function snmp_sess_select_info finds out that there is no work to do and that it will block. I gave it a quick look and the resolution would need a lot of coding, not just a couple of lines, because I think the chosen philpsophy, I quote the code comment, is wrong. A timeout is a timeout, whether I am waiting for some file desciptor to be ready or I have none file descriptors and there is nothing to wait (in that case timeout is 0 and must be implemented that way) * The value of block indicates how the timeout value is interpreted. * If block is true on input, the timeout value will be treated as undefined, * but it must be available for setting in snmp_select_info. On return, * block is set to true if the value returned for timeout is undefined; * when block is set to false, timeout may be used as a parmeter to 'select'. Here is the debug output, just in case. # snmpget -t 1 -r 1 -Cf -Dsess,verbose,session -Oe -a MD5 -u XXXX -l XXXX -A XXXX -X YYYYY -x DES A.B.C.D .1.3.6.1.2.1.2.2.1.8.7 No log handling enabled - turning on stderr logging registered debug token sess, 1 registered debug token verbose, 1 registered debug token session, 1 verbose:socket:buffer: Original client send buffer is 126976 verbose:socket:buffer: New client send buffer size is smaller than original! verbose:socket:buffer: Original client receive buffer is 126976 verbose:socket:buffer: New client receive buffer size is smaller than original! sess_select: for single session: 3 (to in 1243731794.326034 sec) verbose:sess_select: timer due in 0.999840 sec verbose:sess_select: setting timer to 0.999840 sec, clear block (was 1) sess_process_packet: session 0x5b10260 fd 3 pkt 0x5b1ee60 length 118 sess_select: for all sessions: 3 (to in 1243731794.327490 sec) verbose:sess_select: timer due in 0.999841 sec verbose:sess_select: setting timer to 0.999841 sec, clear block (was 1) sess_process_packet: session 0x5b10260 fd 3 pkt 0x5b1ee60 length 136 sess_select: for all sessions: 3 (to in 1243731794.328564 sec) verbose:sess_select: timer due in 0.999842 sec verbose:sess_select: setting timer to 0.999842 sec, clear block (was 1) sess_process_packet: session 0x5b10260 fd 3 pkt 0x5b1ee60 length 136 sess_select: for all sessions: 3 (to in 1243731794.329553 sec) verbose:sess_select: timer due in 0.999845 sec verbose:sess_select: setting timer to 0.999845 sec, clear block (was 1) sess_process_packet: session 0x5b10260 fd 3 pkt 0x5b1ee60 length 136 sess_select: for all sessions: 3 sess_select: blocking:no session requests or alarms. Has this been fixed in new versions? the change log fixes it's not very specific. I can't try every version because I have no compiling tools on that platform, and as I said I used the last version from the CentOS repository. ---------------------------------------------------------------------- Comment By: Jochen (jochen2) Date: 2006-06-19 17:26 Message: Logged In: YES user_id=85147 5.3.1.rc1 still has the bug. Last few lines of debug: trace: snmp_pdu_parse(): snmp_api.c, 4269: dumpv_recv: Command REPORT trace: snmp_pdu_parse(): snmp_api.c, 4350: dumph_recv: request_id dumpx_recv: 02 04 77 5B 7A 8C dumpv_recv: Integer: 2002483852 (0x775B7A8C) trace: snmp_pdu_parse(): snmp_api.c, 4361: dumph_recv: error status dumpx_recv: 02 01 00 dumpv_recv: Integer: 0 (0x00) trace: snmp_pdu_parse(): snmp_api.c, 4372: dumph_recv: error index dumpx_recv: 02 01 00 dumpv_recv: Integer: 0 (0x00) trace: snmp_pdu_parse(): snmp_api.c, 4390: dumph_recv: VarBindList trace: snmp_pdu_parse(): snmp_api.c, 4420: dumph_recv: VarBind trace: snmp_parse_var_op(): snmp.c, 166: dumph_recv: Name dumpx_recv: 06 0A 2B 06 01 06 03 0F 01 01 02 00 dumpv_recv: ObjID: .1.3.6.1.6.3.15.1.1.2.0 trace: snmp_pdu_parse(): snmp_api.c, 4429: dumph_recv: Value dumpx_recv: 41 02 03 98 dumpv_recv: UInteger: 920 (0x398) trace: _snmp_parse(): snmp_api.c, 4180: snmp_parse: Parsed SNMPv3 message (secName:nagios, secLevel:authNoPriv): trace: snmpv3_get_report_type(): snmp_api.c, 4061: report: Report type: -36 trace: snmp_sess_select_info(): snmp_api.c, 5849: sess_select: for all sessions: 3 sess_select: blocking:no session requests or alarms. ---------------------------------------------------------------------- Comment By: Jochen (jochen2) Date: 2006-06-19 13:13 Message: Logged In: YES user_id=85147 I will retest with 5.3.1.rc1 tomorrow. ---------------------------------------------------------------------- Comment By: Jochen (jochen2) Date: 2006-06-19 13:11 Message: Logged In: YES user_id=85147 I retested this with 5.3.0.1 and snmpget still hangs. The agent itself is probably an old 5.1 with the time overflow bug. My report is not about the wrong reports but about the hang in snmpget. ---------------------------------------------------------------------- Comment By: Thomas Anders (tanders) Date: 2006-06-19 11:29 Message: Logged In: YES user_id=848638 Testing against 5.3.1.rc1 (or later) would be preferred. ---------------------------------------------------------------------- Comment By: Jochen (jochen2) Date: 2006-06-19 11:16 Message: Logged In: YES user_id=85147 This was tested with 5.2.1 and 5.2.2. I'll try to test again with 5.3.0.1. Thanks, Jochen ---------------------------------------------------------------------- Comment By: Thomas Anders (tanders) Date: 2006-06-18 17:30 Message: Logged In: YES user_id=848638 What net-snmp version have you tested this with? Sound like the same issue I've been running into two year ago: http://groups.google.com/group/mailing.unix.net-snmp-coders/browse_frm/thread/8dccf6b8f8d672be/85ab0d83e9223b13 (issue #2 in this thread). ---------------------------------------------------------------------- Comment By: Jochen (jochen2) Date: 2006-06-17 12:16 Message: Logged In: YES user_id=85147 The reason for the hand seems to be this part in snmp_api.c: /* * trigger immediate retry on recoverable Reports * * (notInTimeWindow), incr_retries == TRUE to prevent * * inifinite resend */ if (rp->retries <= sp->retries) { snmp_resend_request(slp, rp, TRUE); break; } Here i would expect an "else return -1" or so. On the other hand, an incoming REPORT clears the flag for an expected reponse: pdu->flags &= (~UCD_MSG_FLAG_EXPECT_RESPONSE); So there is no work left and libsnmp finally hangs. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=112694&aid=1507771&group_id=12694 |