From: SourceForge.net <no...@so...> - 2007-08-21 02:20:04
|
Bugs item #1345296, was opened at 2005-11-01 13:22 Message generated for change (Comment added) made by sf-robot You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=112694&aid=1345296&group_id=12694 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: library Group: linux >Status: Closed Resolution: None Priority: 2 Private: No Submitted By: Justin McNutt (mcnuttj) Assigned to: Nobody/Anonymous (nobody) Summary: SNMP queries lock up in recvfrom() function Initial Comment: I am using RedHet Enterprise Linux 3 Update 6. I built the package net-snmp-5.2-1.fc2 from the SRPM file some time ago. Lately, we have been seeing quite a few lockups when using either the command-line utilities or the SNMP.pm module. An strace on any locked-up process shows that it has hung on the recvfrom() function. This is - unfortunately - easy to reproduce at the moment, so I have the strace output and a packet capture from a query that locked up this afternoon (querying the ARP cache on a router). I have uploaded these files. ---------------------------------------------------------------------- >Comment By: SourceForge Robot (sf-robot) Date: 2007-08-20 19:20 Message: Logged In: YES user_id=1312539 Originator: NO This Tracker item was closed automatically by the system. It was previously set to a Pending status, and the original submitter did not respond within 180 days (the time period specified by the administrator of this Tracker). ---------------------------------------------------------------------- Comment By: Justin McNutt (mcnuttj) Date: 2006-01-05 10:52 Message: Logged In: YES user_id=1248072 Yes, the call to the recvfrom() function never returns. I am doing all of my programming in Perl, but I will suggest this circumstance to RedHat via the case I have open and see what they have to say. ---------------------------------------------------------------------- Comment By: Dave Shield (dts12) Date: 2006-01-05 02:38 Message: Logged In: YES user_id=88893 When you say "it locks up in the recvfrom() function", what exactly are you seeing? Does the call to this function never return, or what? Note that recvfrom() is a system library call, not a Net-SNMP provided routine. So if there are any problems with it, we're reliant on RedHat to fix them. My suspicion is that the recv call is possibly blocking until it receives a (valid) packet, having been led to believe by 'select' that there was one waiting. If the network driver discards the mangled packet after having signalled it using select, but before passing it back via recv, then this might indeed have the effect of locking up within the recvfrom call. It might be worth playing about with non-blocking sockets and/or socket timeouts. Alternatively, a simple test application, along the lines of: while (1) { select; recvfrom; } might be able to reproduce the problem. ---------------------------------------------------------------------- Comment By: Justin McNutt (mcnuttj) Date: 2005-12-12 09:43 Message: Logged In: YES user_id=1248072 I have not heard from either the maintainers of Net-SNMP nor from RedHat on this issue. Have we proven FOR CERTAIN that it is or is not a kernel bug? After all, the function that locks up is the recvfrom() function, which is defined in the Net-SNMP sources. Also, no applications OTHER THAN those linked against the Net-SNMP libraries suffer this problem, which would suggest that the bug is in Net-SNMP. ---------------------------------------------------------------------- Comment By: Justin McNutt (mcnuttj) Date: 2005-11-09 08:39 Message: Logged In: YES user_id=1248072 For reference, here is the bug opened with RedHat: https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=172777 ---------------------------------------------------------------------- Comment By: Justin McNutt (mcnuttj) Date: 2005-11-09 08:34 Message: Logged In: YES user_id=1248072 I have opened a Bugzilla report with RedHat regarding this issue. However, I find it interesting that it is the recvfrom() function that locks up, and so far as I can tell, the only recvfrom() function in the libraries snmpwalk links to is in /usr/lib/libnetsnmp.so.5. Also, is it necessarily so that the OS does L4 checksum verification? Or is the application layer still given the packet and still responsible for checking and/or changing behavior based on whether or not the checksum is bad? E.g. is the kernel supposed to DROP the packet, report that it's bad, or stand out of the way and just hand the packet up to the higher layers? ---------------------------------------------------------------------- Comment By: Robert Story (rstory) Date: 2005-11-05 15:44 Message: Logged In: YES user_id=76148 if the udp checksum is bad, i'd expect the os to return some sort of error. I"m not sure there is a lot we can do about this, and suggest you submit a bug to RedHat or the kernel folks. Let us know what they say. ---------------------------------------------------------------------- Comment By: Justin McNutt (mcnuttj) Date: 2005-11-02 06:09 Message: Logged In: YES user_id=1248072 Okay, confirmed. Every time an SNMP query using snmpwalk or "GETNEXT" calls via SNMP.pm, it locks up in the recvfrom() function and a packet capture shows a UDP packet with a bad checksum (and sometimes a mangled SNMP payload). Is there anything that can be done about this in the net-snmp library? If the UDP checksum is bad or the SNMP packet is malformed, can it discard the packet and retry the query (or let the most recent query time out normally)? ---------------------------------------------------------------------- Comment By: Justin McNutt (mcnuttj) Date: 2005-11-02 05:54 Message: Logged In: YES user_id=1248072 Ah ha! I took a closer look at that capture myself and discovered something interesting. I don't know yet if this happens in every case, but it appears that the UDP checksum on that last received packet is BAD (cause is likely to be a PIX firewall in the middle mangling packets). I'll try to reproduce this a couple of times to be sure. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=112694&aid=1345296&group_id=12694 |