Menu

#1129 snmpd loops forever in host resources (hrStorage)

hpux
closed
nobody
agent (1105)
7
2012-11-08
2004-07-06
Marcio
No

net-snmp 5.1.1 on hpux - binary distribution
Distribution file:
http://prdownloads.sourceforge.net/net-snmp/net-
snmp-5.1.1-HP-UX_B.11.00_9000_712.tar.gz

HP UX 11.0 U9000/800

The snmpd works fine for 2-3 queries and returns
everything correctly (cpu/mem/interfaces/disks),
however it starts using 100% of the cpu and the log file
(in -d mode) stops in the HOST-RESOURCES-MIB
hrStorageAllocationFailures.2 (or .3)
.. and never answer again. We had to kill the process
and restart again and it will loop after another request.

We are using Orion 7.2 (Solarwinds) to monitor the
hp/ux.

Discussion

  • Nobody/Anonymous

    Logged In: NO

    We see the same problem in Solaris 8 after adding the volume
    in Solarwinds v7.2.2 and during the first poll.

    Truss shows the following endless loop:

    close(12) = 0
    open("/etc/mnttab", O_RDONLY) = 12
    fstat64(12, 0xFFBFE520) = 0
    fstat64(12, 0xFFBFE3C8) = 0
    ioctl(12, TCGETA, 0xFFBFE4AC) Err#22 EINVAL
    read(12, " / d e v / d s k / c 0 t".., 512) = 434
    statvfs("/", 0xFFBFEAA0) = 0
    llseek(12, 0xFFFFFFFFFFFFFEA9, SEEK_CUR) = 91
    close(12) = 0
    open("/etc/mnttab", O_RDONLY) = 12
    fstat64(12, 0xFFBFE520) = 0
    fstat64(12, 0xFFBFE3C8) = 0
    ioctl(12, TCGETA, 0xFFBFE4AC) Err#22 EINVAL
    read(12, " / d e v / d s k / c 0 t".., 512) = 434
    statvfs("/", 0xFFBFEAA0) = 0
    llseek(12, 0xFFFFFFFFFFFFFEA9, SEEK_CUR) = 91
    close(12) = 0
    open("/etc/mnttab", O_RDONLY) = 12
    fstat64(12, 0xFFBFE520) = 0
    fstat64(12, 0xFFBFE3C8) = 0
    ioctl(12, TCGETA, 0xFFBFE4AC) Err#22 EINVAL
    read(12, " / d e v / d s k / c 0 t".., 512) = 434
    statvfs("/", 0xFFBFEAA0) = 0
    llseek(12, 0xFFFFFFFFFFFFFEA9, SEEK_CUR) = 91
    close(12) = 0
    open("/etc/mnttab", O_RDONLY) = 12
    fstat64(12, 0xFFBFE520) = 0
    fstat64(12, 0xFFBFE3C8) = 0

     
  • Nobody/Anonymous

    Logged In: NO

    We have Solarwinds 7.2 as well with the same problem on
    Solaris.

    The issue is most likely within
    agent\mibgroup\host\hr_storage.c and hr_fileio.c functions
    around the var_hrstore() function

    Sequence of calls are:
    var_hrstore
    header_hrstore
    header_hrstoreEntry
    header_hrfilesys

         Init\_HR\_FileSys \(which performs the open /etc/mnttab\)
    
     
  • Nobody/Anonymous

    Logged In: NO

    Same for Solaris 9, net-snmp 5.1.2
    After polling for hrStorageAllocationFailures.2, the agent gets
    into a deadloop, the dump shows:
    9450 trace: snmp_comstr_parse(): snmp_auth.c, 131:
    9451 dumph_recv: SNMP version
    9452 dumpx_recv: 02 01 01
    9453 dumpv_recv: Integer: 1 (0x01)
    9454 trace: snmp_comstr_parse(): snmp_auth.c, 143:
    9455 dumph_recv: community string
    9456 dumpx_recv: 04 0A 73 65 63 72 65 74 72 65 61 64
    9457 dumpv_recv: String: fakedhere
    9458 trace: _snmp_parse(): snmp_api.c, 3968:
    9459 dumph_recv: PDU
    9460 trace: snmp_pdu_parse(): snmp_api.c, 4179:
    9461 dumph_recv: request_id
    9462 dumpx_recv: 02 04 5E AE 80 59
    9463 dumpv_recv: Integer: 1588494425
    (0x5EAE8059)
    9464 trace: snmp_pdu_parse(): snmp_api.c, 4190:
    9465 dumph_recv: error status
    9466 dumpx_recv: 02 01 00
    9467 dumpv_recv: Integer: 0 (0x00)
    9468 trace: snmp_pdu_parse(): snmp_api.c, 4201:
    9469 dumph_recv: error index
    9470 dumpx_recv: 02 01 00
    9471 dumpv_recv: Integer: 0 (0x00)
    9472 trace: snmp_pdu_parse(): snmp_api.c, 4219:
    9473 dumph_recv: VarBindList
    9474 trace: snmp_pdu_parse(): snmp_api.c, 4249:
    9475 dumph_recv: VarBind
    9476 trace: snmp_parse_var_op(): snmp.c, 166:
    9477 dumph_recv: Name
    9478 dumpx_recv: 06 0B 2B 06 01 02 01 19 02 03 01
    07 02
    9479 dumpv_recv: ObjID: HOST-RESOURCES-
    MIB::hrStorageAllocationFai
    lures.2
    9480 trace: snmp_pdu_parse(): snmp_api.c, 4258:
    9481 dumph_recv: Value
    9482 trace: init_agent_snmp_session(): snmp_agent.c,
    1119:
    9483 snmp_agent: agent_sesion 0010c520 created
    9484 trace: snmp_call_callbacks(): callback.c, 176:
    9485 callback: START calling callbacks for maj=1 min=5
    9486 trace: snmp_call_callbacks(): callback.c, 184:
    9487 callback: calling a callback for maj=1 min=5
    9488 trace: vacm_in_view(): mibII/vacm_vars.c, 744:
    9489 mibII/vacm_vars: vacm_in_view: ver=1,
    community=fakedhere
    9490 trace: netsnmp_udp_getSecName():
    snmpUDPDomain.c, 745:
    9491 netsnmp_udp_getSecName: resolve <"fakedhere",
    0x7f000001>
    9492 trace: netsnmp_udp_getSecName():
    snmpUDPDomain.c, 750:
    9493 netsnmp_udp_getSecName: compare <"fakedhere",
    0x00000000/0x00000000>...
    SUCCESS
    9494 trace: netsnmp_subtree_find_first(): agent_registry.c,
    156:
    9495 subtree: looking for subtree for context: ""
    9496 trace: netsnmp_subtree_find_first(): agent_registry.c,
    160:
    9497 subtree: found one for: ""
    9498 trace: vacm_in_view(): mibII/vacm_vars.c, 851:
    9499 mibII/vacm_vars: vacm_in_view:
    sn=anonymousSecName000, gn=anonymousGroup
    Name000, Done checking setup
    9500 trace: snmp_call_callbacks(): callback.c, 196:
    9501 callback: END calling callbacks for maj=1 min=5 (1
    called)
    9502 trace: netsnmp_subtree_find_first(): agent_registry.c,
    156:
    9503 subtree: looking for subtree for context: ""
    9504 trace: netsnmp_subtree_find_first(): agent_registry.c,
    160:
    9505 subtree: found one for: ""
    9506 trace: snmp_call_callbacks(): callback.c, 176:
    9507 callback: START calling callbacks for maj=1 min=0
    9508 trace: snmp_call_callbacks(): callback.c, 184:
    9509 callback: calling a callback for maj=1 min=0
    9510 trace: vacm_in_view(): mibII/vacm_vars.c, 744:
    9511 mibII/vacm_vars: vacm_in_view: ver=1,
    community=fakedhere
    9512 trace: netsnmp_udp_getSecName():
    snmpUDPDomain.c, 745:
    9513 netsnmp_udp_getSecName: resolve <"fakedhere",
    0x7f000001>
    9514 trace: netsnmp_udp_getSecName():
    snmpUDPDomain.c, 750:
    9515 netsnmp_udp_getSecName: compare <"fakedhere",
    0x00000000/0x00000000>...
    SUCCESS
    9516 trace: netsnmp_subtree_find_first(): agent_registry.c,
    156:
    9517 subtree: looking for subtree for context: ""
    9518 trace: netsnmp_subtree_find_first(): agent_registry.c,
    160:
    9519 subtree: found one for: ""
    9520 trace: vacm_in_view(): mibII/vacm_vars.c, 851:
    9521 mibII/vacm_vars: vacm_in_view:
    sn=anonymousSecName000, gn=anonymousGroup
    Name000, vn=anonymousView000trace: vacm_getViewEntry
    (): vacm.c, 370:
    9522 vacm:getView: , found
    9523 , vt=1
    9524 trace: snmp_call_callbacks(): callback.c, 196:
    9525 callback: END calling callbacks for maj=1 min=0 (1
    called)
    9526 trace: netsnmp_add_varbind_to_cache():
    snmp_agent.c, 1677:
    9527 snmp_agent: add_vb_to_cache( 10c520, 1, HOST-
    RESOURCES-MIB::hrStorageAl
    locationFailures.2, e7580)
    9528 trace: netsnmp_add_varbind_to_cache():
    snmp_agent.c, 1742:
    9529 snmp_agent: tp->start HOST-RESOURCES-
    MIB::hrStorageAllocationFailures, t
    p->end HOST-RESOURCES-MIB::hrStorageEntry.8,
    9530 trace: netsnmp_call_handlers(): agent_handler.c, 443:
    9531 handler:calling: main handler bulk_to_next
    9532 trace: netsnmp_call_handler(): agent_handler.c, 381:
    9533 handler:calling: calling handler bulk_to_next for mode
    GET
    9534 trace: netsnmp_call_handler(): agent_handler.c, 381:
    9535 handler:calling: calling handler old_api for mode GET
    9536 trace: header_hrstoreEntry(): host/hr_storage.c, 383:
    9537 host/hr_storage: var_hrstoreEntry: HOST-RESOURCES-
    MIB::hrStorageAllocati
    onFailures.2 1
    9538 (index 2 ....HOST-RESOURCES-
    MIB::hrStorageAllocationFailures.2
    9539 trace: header_hrstoreEntry(): host/hr_storage.c, 433:
    9540 host/hr_storage: ... get storage stats HOST-
    RESOURCES-MIB::hrStorageAllo
    cationFailures.2
    9541 trace: header_hrstoreEntry(): host/hr_storage.c, 383:
    9542 host/hr_storage: var_hrstoreEntry: HOST-RESOURCES-
    MIB::hrStorageAllocati
    onFailures.2 1
    9543 (index 2 ....HOST-RESOURCES-
    MIB::hrStorageAllocationFailures.2
    9544 trace: header_hrstoreEntry(): host/hr_storage.c, 433:
    9545 host/hr_storage: ... get storage stats HOST-
    RESOURCES-MIB::hrStorageAllo
    cationFailures.2
    9546 trace: header_hrstoreEntry(): host/hr_storage.c, 383:
    9547 host/hr_storage: var_hrstoreEntry: HOST-RESOURCES-
    MIB::hrStorageAllocati
    onFailures.2 1
    9548 (index 2 ....HOST-RESOURCES-
    MIB::hrStorageAllocationFailures.2
    9549 trace: header_hrstoreEntry(): host/hr_storage.c, 433:
    9550 host/hr_storage: ... get storage stats HOST-
    RESOURCES-MIB::hrStorageAllo
    cationFailures.2

    ... Then deadloop starts.

     
  • Robert Story

    Robert Story - 2004-10-06

    Logged In: YES
    user_id=76148

    Adding info from dup bug 1009420.

    Date: 2004-08-15 14:29
    Sender: nobody
    Logged In: NO

    More information on the problem:

    5.0.9 does not suffer this problem. Solution found here:
    http://groups.google.com/groups?q=solarwinds+net-
    snmp&hl=en&lr=&ie=UTF-8&selm=cemu5q%24pgj%241%
    40FreeBSD.csie.NCTU.edu.tw&rnum=1

    -----------

    I discovered at the very least, a situation which causes the
    snmpd to
    run into a spin opening up /etc/mtab. I did a bit of discovery
    on the
    issue - I have tcpdump with full packet length + output from
    strace. I
    am not very familiar with gdb - if someone can give me the
    commands to use I would be happy to provide the results.

    I have been able to track it down to the following OID get:

    ( the below is from a ticket I opened in my ticket system to
    work with
    this issue. The frame reference is the frame # in the pcap
    file - which
    I will provide if anyone wants to see it )

    | fails at snmp get: 1.3.6.1.2.1.25.2.3.1.7
    |
    | see frame 229-230 for the last success->failure begin
    |
    | strace data from running snmpd:
    |
    | root@freakazoid:~# strace -fp 24179
    |
    | open("/etc/mtab", O_RDONLY|O_LARGEFILE) = 12
    | close(12) = 0
    | open("/etc/mtab", O_RDONLY|O_LARGEFILE) = 12
    | close(12) = 0
    | open("/etc/mtab", O_RDONLY|O_LARGEFILE) = 12
    | close(12) = 0
    | open("/etc/mtab", O_RDONLY|O_LARGEFILE) = 12
    | close(12) = 0
    | open("/etc/mtab", O_RDONLY|O_LARGEFILE) = 12
    | ) = 0
    |
    | root@freakazoid:~# kill -9 24179
    |
    |
    | packet capture attached.
    |
    | ----
    |
    | top lines from a previously run snmpd ( the one that started
    the debug
    | process )
    |
    | USER PID %CPU %MEM VSZ RSS TTY STAT
    START TIME COMMAND
    | root 1161 75.8 0.6 7216 3428 ? R 18:35 3:44
    | /usr/sbin/snmpd -P /var/run/snmpd.pid
    |

    hrStorageAllocationFailures OBJECT-TYPE
    ~ SYNTAX Counter32
    ~ MAX-ACCESS read-only
    ~ STATUS current
    ~ DESCRIPTION
    ~ "The number of requests for storage represented
    by
    ~ this entry that could not be honored due to
    not
    enough
    ~ storage. It should be noted that as this
    object
    has a
    ~ SYNTAX of Counter32, that it does not have a
    defined
    ~ initial value. However, it is recommended that
    this
    ~ object be initialized to zero, even though
    management
    ~ stations must not depend on such an
    initialization."
    ~ -- 1.3.6.1.2.1.25.2.3.1.7 -- ::= { hrStorageEntry 7 }

    so it looks like this is dying at something in the storage entry
    section, per the doc, rfc2790.txt

    time to look up this rfc, is this a bug in this peticular
    version
    of
    net-snmp or is this something the solarwinds get is causing to
    break

    - ---- END of ticket comment -----

    I initially discovered this issue working with net-snmp 5.1.1.
    When I
    tested 5.1.2 on my system, the exact same behavior was
    observed. If I
    run net-snmp 5.0.9 the system runs perfectly ( exact same
    snmpd.conf
    config ... )

    What I find most interesting is this:

    1) If I walk the system using the net-snmp snmpwalk,
    everything works
    just fine - nothing out of the ordinary.
    2) If the system is walked by a product named Solarwinds,
    the system
    sends into the mtab bit

    If there is any additional information I could provide, I can
    re-
    produce
    this issue at-will. Let me know if you want to see anything.

    ==============================================

    Go back to net-snmp-5.0.9 and it works.

     
  • Robert Story

    Robert Story - 2004-10-06

    Logged In: YES
    user_id=76148

    Fix in cvs for 5.2 and 5.1.3. Until then, here is a quickie
    patch:

    diff -r5.5.2.9 hr_storage.c
    491c491
    < try_next:
    ---

    really_try_next:
    783a784,789

    try_next:
    if (!exact)
    goto really_try_next;

    return NULL;
    
     
  • Robert Story

    Robert Story - 2005-04-25

    Logged In: YES
    user_id=76148

    update title to help me find this bug faster next time..

     

Log in to post a comment.