Menu

#1683 5.3.1 Error while sending INFORMS

linux
open
nobody
agent (1105)
8
2012-11-08
2006-10-09
ebp1968
No

I've found that when the first inform is not
'aknowledged' by the trap receiving system (after 5
retries), the agent doesn't send any inform to it
anymore.
After that, "snmpd: send_trap: USM unknown security
name (no such user exists)" appears in the log
everytime a new inform need to be sent.
After digging deeply into the code I've found that the
agent only tries to discover the peer engineID once per
'session':
send_v2trap -> send_trap_vars ->
send_enterprise_trap_vars -> netsnmp_send_traps ->
send_trap_to sess -> snmp_async_send ->
snmp_sess_async_send -> _sess_async_send calls to
snmpv3_engineID_probe. This function sets the
session->flags SNMP_FLAGS_DONT_PROBE to prevent
recursion... but if the remote engine doesn't answer,
this flag remains set. And, later, if a new inform
needs to be sent the agent doesn't retry to obtain the
engineID again.

I've detected this behaviour using net-snmp versions
5.3.0.1 and 5.3.1 libraries in a subagent module
through agentx (SNMPV3) running on Linux (2.4.20)

To correct this I've done the following modification
(turn off the flag SNMP_FLAG_DONT_PROBE if we have no
answer) in function snmpv3_engineID_probe in
snmplib/snmp_api.c

if (slp->session->securityEngineIDLen == 0) {
DEBUGMSGTL(("snmp_api",
"unable to determine remote engine ID\n"));
session->flags &= ~SNMP_FLAGS_DONT_PROBE;
return 0;
}

But this modification provoques two effects: 1) the
engineID is requested for every inform while we have no
answer from the receiver (GOOD)).
2)The informs are repeated a lot of times when the
receiver acknowledge them (BAD!!!).

Discussion

  • jordi-66

    jordi-66 - 2006-11-23

    Logged In: YES
    user_id=1627701
    Originator: NO

    I observed same efect running version 5.4.rc3.
    Also I noticed that not only if the first inform is not acknowledged but any one in a sequence, agent stops sending all pending informs.
    The funcion snmpv3_engineID_probe() you means above is the same in both versions.

     
  • Michael Kirkham

    Michael Kirkham - 2007-01-05

    Logged In: YES
    user_id=498198
    Originator: NO

    How are you configuring the inform target/session? Are you using trapsess in snmpd.conf and specifying passwords and no keys and engine IDs? I was checking out inform retry behavior for a client, and happened across this report. Can't comment on jordi-66's report, but here's what I think I see happening with trapsess. e.g.,

    trapsess -Ci -v 3 -u <name> -a MD5 -A <password> -l authNoPriv <ip>:<port></port></ip></password></name>

    What happens at startup is snmpd parses the configuration file. It sees the trapsess command and tries to set up the target, notification, and user tables. usmUserTable is indexed in part by Engine ID. Since it can't get the Engine ID, it can't create the usmUserTable entry, and session creation fails. Something like this shows up in the debug logs:

    /usr/local/net-snmp/share/snmp/snmpd.conf: line 39: Error: snmpd: failed to parse this line or the remote trap receiver is down. Possible cause:
    snmpd: snmpd_parse_config_trapsess(): Timeout

    Because it failed, the session and/or relevant tables aren't set up as needed, and net-snmp only relies on those tables for deciding where to send notifications (for instance, when it sends the coldStart notification at startup, post-configuration). It won't re-try the session setup because that only happens during snmpd.conf file parsing. You can cause it to reload snmpd.conf by sending SIGHUP, and it will try again with the discovery and table setup.

    So if I start snmpd without the target app (happens to be snmptrapd on another machine) running, it tries discovery and fails and doesn't send it the coldStart inform. If I then stop snmpd, it doesn't even bother trying the nsNotifyShutdown inform. If, before stopping snmpd, I start up snmptrapd and send snmpd a SIGHUP, then it /does/ send the nsNotifyShutdown inform upon termination.

    If, instead of using trapsess, I use createUser (with -e to specify Engine ID), targetAddr, targetParams, etc. to populate the tables, the necessary entries are created. Without snmptrapd running, I can start snmpd, and it will try (and fail) to do discovery, and not send the coldStart inform. But because the tables were set up, I can stop snmpd and it will go ahead and attempt discovery to send the nsNotifyShutdown inform (and succeed if I started snmptrapd before killing snmpd, even without sending it a SIGHUP).

    This seems like it could be the situation you're having, as far as things go for the first inform not being acknowledged. The situation jordi-66 describes could be different.

     
  • Michael Kirkham

    Michael Kirkham - 2007-01-06

    Logged In: YES
    user_id=498198
    Originator: NO

    I modified receive() in snmpd.c (the main select() loop) to time out the select after 5 seconds and send an inform, so that I would receive notifications continuously every 5 seconds. I disconnected the machine where snmptrapd was running from the ethernet for ~ 10 seconds so that the inform would time out. I reconnected the ethernet, and it resumed normally. So that doesn't seem to be an issue.

    However, I instead terminated snmptrapd, waited a while for informs to be sent and time out, and then started it up again. From that point on, snmptrapd didn't report any informs being received. Looking in the snmpd log I see it erroring due to being out of time sync:

    trace: trace: _snmp_parse(): snmp_api.c, 4091:
    _snmp_parse(): snmp_api.c, 4091:
    snmp_parse: snmp_parse: Parsed SNMPv3 message (secName:****, secLevel:authNoPriv): USM not in time window

    Unfortunately I tried this a few times and it seems like I got it to do it one more time but in several other attempts I couldn't. So I tried to force the issue: started snmptrapd and snmpd, waited for informs to be exchanged, stopped snmptrapd, moved my system clock forward about 10 minutes, and started snmptrapd again. That got them out of sync, and snmpd failed to re-sync. This looks like possibly a time (re-)synchronization bug with SNMPv3 informs, at least with the somewhat old version I'm testing with (may have already been fixed, may not).

    This seems to be the closest I can get to reproducing jordi-66's symptoms with my version. They informs are still being sent, but the (later) snmptrapd I'm testing with is rejecting them with a NotInTimeWindow report.

     
  • Michael Kirkham

    Michael Kirkham - 2007-01-06

    Logged In: YES
    user_id=498198
    Originator: NO

    Hm, nevermind. The timeliness issue was observer error on my part. I was a bit too hasty in my terminating and restarting snmptrapd and probably did not allow it to write out the boots/time before restarting. Testing again several times and giving it time to finish before restarting, it works fine, and I see (in the snmpd log) the report received and inform resent (and received successfully by the restarted snmptrapd).

     
  • jordi-66

    jordi-66 - 2007-01-08

    Logged In: YES
    user_id=1627701
    Originator: NO

    Thanks for your accurate response muonics.
    I've retested the issue running version 4.5 and works fine (except if the cold start inform is lost as ebp1968 reported).
    Surely I did a mistake in my test environment.
    so I retire my last post.
    Sorry for the inconvenience.

     
  • jordi-66

    jordi-66 - 2007-01-08

    Logged In: YES
    user_id=1627701
    Originator: NO

    I followed steps pointed by muonics and tested out that the system fix up the problem after sendig SIGHUP when the inform receiver turns to run.
    Unfortunately my real system works embedded and no administrative actions can be taken. So it must be robust enough to solve itself the problem.
    Therefore the solution you propose to setting up and populating user tables must be taken. But I don't undestand how do you use createUser token instead trapsess (without trapsess snmpd v3 doesn't send informs, isn't it?). I can't also find anywhere targetAddr, targetParams tokens you mean.
    Please can you be a bit more explicit on how to configure snmpd.
    My snmpd.conf that I currently use is next:

    master agentx
    agentuser root
    agentxsocket /tmp/agentx
    agentXPerms 775 775 cgi-usr users
    leave_pidfile yes
    sysServices 72
    agentaddress udp:161
    createUser snmpadmin MD5 snmpadmin
    group v3_rw_grp usm snmpadmin
    view vall included .1
    view vnone excluded .1
    access v3_rw_grp "" any auth exact vall vall vall
    trapsess -C i -v 3 -u snmpadmin -a MD5 -A snmpadmin -l authNoPriv 172.16.2.8:162

    thanks.

     
  • Michael Kirkham

    Michael Kirkham - 2007-01-08

    Logged In: YES
    user_id=498198
    Originator: NO

    Looks like trapsess also accepts -e for specifying the authoritative Engine ID (which is that of the target for SNMPv3 informs), based on the snmpcmd manpage, so probably you can get by with that.

    without trapsess snmpd v3 doesn't send informs, isn't it?

    The trapsess command essentially does the same thing as populating usmUserTable, snmpTargetAddrTable, snmpTargetParamsTable and snmpNotifyTable individually/directly. One of the columns in snmpNotifyTable specifies the type of notifications to send (traps or informs). Where things break down due to not being able to get that initial response is it can't get the right Engine ID for the usmUserTable row or to create localized keys for the user. If you were the populate these tables directly via SNMP, you would be specifying all of the necessary information ahead of time; similarly if you supply the Engine ID to trapsess or createUser so that it need not rely on discovery succeeding at startup.

     
  • Erez Makavy

    Erez Makavy - 2007-06-27

    Logged In: YES
    user_id=1409986
    Originator: NO

    Hi,

    I've encountered exactly the same issue as ebp1968.
    I'm running net-snmp-5.3.1 on Linux , PPC.
    I'm using the snmpTargetMIB mechanism for configuring tarp-managers.
    When the master agent is started, and the trap-manager is down, the first inform triggers a probe which gets no response, and from that point no probes are sent (thus not informs are sent to that trap-manager).

    I also found the same fix as ebp1968 suggested, and it works.
    I think it's a good fix, simple and correct.

    I'm glad I don't have to submit a new BUG, and hope the fix will appear in the next release.

    Why is this BUG not assigned to anyone?

    Thanks,
    Erez.

     

Log in to post a comment.

MongoDB Logo MongoDB