net-snmp 5.1.1 on hpux - binary distribution
Distribution file:
http://prdownloads.sourceforge.net/net-snmp/net-
snmp-5.1.1-HP-UX_B.11.00_9000_712.tar.gz
HP UX 11.0 U9000/800
The snmpd works fine for 2-3 queries and returns
everything correctly (cpu/mem/interfaces/disks),
however it starts using 100% of the cpu and the log file
(in -d mode) stops in the HOST-RESOURCES-MIB
hrStorageAllocationFailures.2 (or .3)
.. and never answer again. We had to kill the process
and restart again and it will loop after another request.
We are using Orion 7.2 (Solarwinds) to monitor the
hp/ux.
Logged In: NO
We see the same problem in Solaris 8 after adding the volume
in Solarwinds v7.2.2 and during the first poll.
Truss shows the following endless loop:
close(12) = 0
open("/etc/mnttab", O_RDONLY) = 12
fstat64(12, 0xFFBFE520) = 0
fstat64(12, 0xFFBFE3C8) = 0
ioctl(12, TCGETA, 0xFFBFE4AC) Err#22 EINVAL
read(12, " / d e v / d s k / c 0 t".., 512) = 434
statvfs("/", 0xFFBFEAA0) = 0
llseek(12, 0xFFFFFFFFFFFFFEA9, SEEK_CUR) = 91
close(12) = 0
open("/etc/mnttab", O_RDONLY) = 12
fstat64(12, 0xFFBFE520) = 0
fstat64(12, 0xFFBFE3C8) = 0
ioctl(12, TCGETA, 0xFFBFE4AC) Err#22 EINVAL
read(12, " / d e v / d s k / c 0 t".., 512) = 434
statvfs("/", 0xFFBFEAA0) = 0
llseek(12, 0xFFFFFFFFFFFFFEA9, SEEK_CUR) = 91
close(12) = 0
open("/etc/mnttab", O_RDONLY) = 12
fstat64(12, 0xFFBFE520) = 0
fstat64(12, 0xFFBFE3C8) = 0
ioctl(12, TCGETA, 0xFFBFE4AC) Err#22 EINVAL
read(12, " / d e v / d s k / c 0 t".., 512) = 434
statvfs("/", 0xFFBFEAA0) = 0
llseek(12, 0xFFFFFFFFFFFFFEA9, SEEK_CUR) = 91
close(12) = 0
open("/etc/mnttab", O_RDONLY) = 12
fstat64(12, 0xFFBFE520) = 0
fstat64(12, 0xFFBFE3C8) = 0
Logged In: NO
We have Solarwinds 7.2 as well with the same problem on
Solaris.
The issue is most likely within
agent\mibgroup\host\hr_storage.c and hr_fileio.c functions
around the var_hrstore() function
Sequence of calls are:
var_hrstore
header_hrstore
header_hrstoreEntry
header_hrfilesys
Logged In: NO
Same for Solaris 9, net-snmp 5.1.2
After polling for hrStorageAllocationFailures.2, the agent gets
into a deadloop, the dump shows:
9450 trace: snmp_comstr_parse(): snmp_auth.c, 131:
9451 dumph_recv: SNMP version
9452 dumpx_recv: 02 01 01
9453 dumpv_recv: Integer: 1 (0x01)
9454 trace: snmp_comstr_parse(): snmp_auth.c, 143:
9455 dumph_recv: community string
9456 dumpx_recv: 04 0A 73 65 63 72 65 74 72 65 61 64
9457 dumpv_recv: String: fakedhere
9458 trace: _snmp_parse(): snmp_api.c, 3968:
9459 dumph_recv: PDU
9460 trace: snmp_pdu_parse(): snmp_api.c, 4179:
9461 dumph_recv: request_id
9462 dumpx_recv: 02 04 5E AE 80 59
9463 dumpv_recv: Integer: 1588494425
(0x5EAE8059)
9464 trace: snmp_pdu_parse(): snmp_api.c, 4190:
9465 dumph_recv: error status
9466 dumpx_recv: 02 01 00
9467 dumpv_recv: Integer: 0 (0x00)
9468 trace: snmp_pdu_parse(): snmp_api.c, 4201:
9469 dumph_recv: error index
9470 dumpx_recv: 02 01 00
9471 dumpv_recv: Integer: 0 (0x00)
9472 trace: snmp_pdu_parse(): snmp_api.c, 4219:
9473 dumph_recv: VarBindList
9474 trace: snmp_pdu_parse(): snmp_api.c, 4249:
9475 dumph_recv: VarBind
9476 trace: snmp_parse_var_op(): snmp.c, 166:
9477 dumph_recv: Name
9478 dumpx_recv: 06 0B 2B 06 01 02 01 19 02 03 01
07 02
9479 dumpv_recv: ObjID: HOST-RESOURCES-
MIB::hrStorageAllocationFai
lures.2
9480 trace: snmp_pdu_parse(): snmp_api.c, 4258:
9481 dumph_recv: Value
9482 trace: init_agent_snmp_session(): snmp_agent.c,
1119:
9483 snmp_agent: agent_sesion 0010c520 created
9484 trace: snmp_call_callbacks(): callback.c, 176:
9485 callback: START calling callbacks for maj=1 min=5
9486 trace: snmp_call_callbacks(): callback.c, 184:
9487 callback: calling a callback for maj=1 min=5
9488 trace: vacm_in_view(): mibII/vacm_vars.c, 744:
9489 mibII/vacm_vars: vacm_in_view: ver=1,
community=fakedhere
9490 trace: netsnmp_udp_getSecName():
snmpUDPDomain.c, 745:
9491 netsnmp_udp_getSecName: resolve <"fakedhere",
0x7f000001>
9492 trace: netsnmp_udp_getSecName():
snmpUDPDomain.c, 750:
9493 netsnmp_udp_getSecName: compare <"fakedhere",
0x00000000/0x00000000>...
SUCCESS
9494 trace: netsnmp_subtree_find_first(): agent_registry.c,
156:
9495 subtree: looking for subtree for context: ""
9496 trace: netsnmp_subtree_find_first(): agent_registry.c,
160:
9497 subtree: found one for: ""
9498 trace: vacm_in_view(): mibII/vacm_vars.c, 851:
9499 mibII/vacm_vars: vacm_in_view:
sn=anonymousSecName000, gn=anonymousGroup
Name000, Done checking setup
9500 trace: snmp_call_callbacks(): callback.c, 196:
9501 callback: END calling callbacks for maj=1 min=5 (1
called)
9502 trace: netsnmp_subtree_find_first(): agent_registry.c,
156:
9503 subtree: looking for subtree for context: ""
9504 trace: netsnmp_subtree_find_first(): agent_registry.c,
160:
9505 subtree: found one for: ""
9506 trace: snmp_call_callbacks(): callback.c, 176:
9507 callback: START calling callbacks for maj=1 min=0
9508 trace: snmp_call_callbacks(): callback.c, 184:
9509 callback: calling a callback for maj=1 min=0
9510 trace: vacm_in_view(): mibII/vacm_vars.c, 744:
9511 mibII/vacm_vars: vacm_in_view: ver=1,
community=fakedhere
9512 trace: netsnmp_udp_getSecName():
snmpUDPDomain.c, 745:
9513 netsnmp_udp_getSecName: resolve <"fakedhere",
0x7f000001>
9514 trace: netsnmp_udp_getSecName():
snmpUDPDomain.c, 750:
9515 netsnmp_udp_getSecName: compare <"fakedhere",
0x00000000/0x00000000>...
SUCCESS
9516 trace: netsnmp_subtree_find_first(): agent_registry.c,
156:
9517 subtree: looking for subtree for context: ""
9518 trace: netsnmp_subtree_find_first(): agent_registry.c,
160:
9519 subtree: found one for: ""
9520 trace: vacm_in_view(): mibII/vacm_vars.c, 851:
9521 mibII/vacm_vars: vacm_in_view:
sn=anonymousSecName000, gn=anonymousGroup
Name000, vn=anonymousView000trace: vacm_getViewEntry
(): vacm.c, 370:
9522 vacm:getView: , found
9523 , vt=1
9524 trace: snmp_call_callbacks(): callback.c, 196:
9525 callback: END calling callbacks for maj=1 min=0 (1
called)
9526 trace: netsnmp_add_varbind_to_cache():
snmp_agent.c, 1677:
9527 snmp_agent: add_vb_to_cache( 10c520, 1, HOST-
RESOURCES-MIB::hrStorageAl
locationFailures.2, e7580)
9528 trace: netsnmp_add_varbind_to_cache():
snmp_agent.c, 1742:
9529 snmp_agent: tp->start HOST-RESOURCES-
MIB::hrStorageAllocationFailures, t
p->end HOST-RESOURCES-MIB::hrStorageEntry.8,
9530 trace: netsnmp_call_handlers(): agent_handler.c, 443:
9531 handler:calling: main handler bulk_to_next
9532 trace: netsnmp_call_handler(): agent_handler.c, 381:
9533 handler:calling: calling handler bulk_to_next for mode
GET
9534 trace: netsnmp_call_handler(): agent_handler.c, 381:
9535 handler:calling: calling handler old_api for mode GET
9536 trace: header_hrstoreEntry(): host/hr_storage.c, 383:
9537 host/hr_storage: var_hrstoreEntry: HOST-RESOURCES-
MIB::hrStorageAllocati
onFailures.2 1
9538 (index 2 ....HOST-RESOURCES-
MIB::hrStorageAllocationFailures.2
9539 trace: header_hrstoreEntry(): host/hr_storage.c, 433:
9540 host/hr_storage: ... get storage stats HOST-
RESOURCES-MIB::hrStorageAllo
cationFailures.2
9541 trace: header_hrstoreEntry(): host/hr_storage.c, 383:
9542 host/hr_storage: var_hrstoreEntry: HOST-RESOURCES-
MIB::hrStorageAllocati
onFailures.2 1
9543 (index 2 ....HOST-RESOURCES-
MIB::hrStorageAllocationFailures.2
9544 trace: header_hrstoreEntry(): host/hr_storage.c, 433:
9545 host/hr_storage: ... get storage stats HOST-
RESOURCES-MIB::hrStorageAllo
cationFailures.2
9546 trace: header_hrstoreEntry(): host/hr_storage.c, 383:
9547 host/hr_storage: var_hrstoreEntry: HOST-RESOURCES-
MIB::hrStorageAllocati
onFailures.2 1
9548 (index 2 ....HOST-RESOURCES-
MIB::hrStorageAllocationFailures.2
9549 trace: header_hrstoreEntry(): host/hr_storage.c, 433:
9550 host/hr_storage: ... get storage stats HOST-
RESOURCES-MIB::hrStorageAllo
cationFailures.2
... Then deadloop starts.
Logged In: YES
user_id=76148
Adding info from dup bug 1009420.
Date: 2004-08-15 14:29
Sender: nobody
Logged In: NO
More information on the problem:
5.0.9 does not suffer this problem. Solution found here:
http://groups.google.com/groups?q=solarwinds+net-
snmp&hl=en&lr=&ie=UTF-8&selm=cemu5q%24pgj%241%
40FreeBSD.csie.NCTU.edu.tw&rnum=1
-----------
I discovered at the very least, a situation which causes the
snmpd to
run into a spin opening up /etc/mtab. I did a bit of discovery
on the
issue - I have tcpdump with full packet length + output from
strace. I
am not very familiar with gdb - if someone can give me the
commands to use I would be happy to provide the results.
I have been able to track it down to the following OID get:
( the below is from a ticket I opened in my ticket system to
work with
this issue. The frame reference is the frame # in the pcap
file - which
I will provide if anyone wants to see it )
| fails at snmp get: 1.3.6.1.2.1.25.2.3.1.7
|
| see frame 229-230 for the last success->failure begin
|
| strace data from running snmpd:
|
| root@freakazoid:~# strace -fp 24179
|
| open("/etc/mtab", O_RDONLY|O_LARGEFILE) = 12
| close(12) = 0
| open("/etc/mtab", O_RDONLY|O_LARGEFILE) = 12
| close(12) = 0
| open("/etc/mtab", O_RDONLY|O_LARGEFILE) = 12
| close(12) = 0
| open("/etc/mtab", O_RDONLY|O_LARGEFILE) = 12
| close(12) = 0
| open("/etc/mtab", O_RDONLY|O_LARGEFILE) = 12
| ) = 0
|
| root@freakazoid:~# kill -9 24179
|
|
| packet capture attached.
|
| ----
|
| top lines from a previously run snmpd ( the one that started
the debug
| process )
|
| USER PID %CPU %MEM VSZ RSS TTY STAT
START TIME COMMAND
| root 1161 75.8 0.6 7216 3428 ? R 18:35 3:44
| /usr/sbin/snmpd -P /var/run/snmpd.pid
|
hrStorageAllocationFailures OBJECT-TYPE
~ SYNTAX Counter32
~ MAX-ACCESS read-only
~ STATUS current
~ DESCRIPTION
~ "The number of requests for storage represented
by
~ this entry that could not be honored due to
not
enough
~ storage. It should be noted that as this
object
has a
~ SYNTAX of Counter32, that it does not have a
defined
~ initial value. However, it is recommended that
this
~ object be initialized to zero, even though
management
~ stations must not depend on such an
initialization."
~ -- 1.3.6.1.2.1.25.2.3.1.7 -- ::= { hrStorageEntry 7 }
so it looks like this is dying at something in the storage entry
section, per the doc, rfc2790.txt
time to look up this rfc, is this a bug in this peticular
version
of
net-snmp or is this something the solarwinds get is causing to
break
- ---- END of ticket comment -----
I initially discovered this issue working with net-snmp 5.1.1.
When I
tested 5.1.2 on my system, the exact same behavior was
observed. If I
run net-snmp 5.0.9 the system runs perfectly ( exact same
snmpd.conf
config ... )
What I find most interesting is this:
1) If I walk the system using the net-snmp snmpwalk,
everything works
just fine - nothing out of the ordinary.
2) If the system is walked by a product named Solarwinds,
the system
sends into the mtab bit
If there is any additional information I could provide, I can
re-
produce
this issue at-will. Let me know if you want to see anything.
==============================================
Go back to net-snmp-5.0.9 and it works.
Logged In: YES
user_id=76148
Fix in cvs for 5.2 and 5.1.3. Until then, here is a quickie
patch:
diff -r5.5.2.9 hr_storage.c
491c491
< try_next:
---
Logged In: YES
user_id=76148
update title to help me find this bug faster next time..