man xymond in 4.3.10 says:
The flap-count setting should be at least (N/300)-1, e.g. if you set
flap-seconds to 3600 (1 hour), then flap-count should be at least
(3600/300)-1, i.e. 11.
How is this calculated? Is this assuming that statuses can change every 300
seconds (5 minutes) at the fastest? But this assumption is incorrect. Even
if you are using the default network test intervals so that they run every 5
minutes, the re-test happens a minute later on failed tests. I run network
tests every minute, and the re-test every 30 seconds. Some of my client
reports are sent every minute (and others at longer intervals), and the same
for my custom tests. So, can I use a much lower flap-seconds? If my test
flaps 11 times (including recoveries) in 5 minutes, and then recovers, I
don't want it persisting in alarm state for another 25 minutes, showering us
with more alerts...
This issue reminds me of a suggestion I believe I made before
(http://lists.xymon.com/archive/2011-December/033328.html), which is that
flap-count and especially flap-seconds settings should be configurable on a
per-test basis, as, as you can see, some tests are run more frequently than
others, at least on my system.
While I'm on this, I will also re-post that I think there should be
something in the e-mail alerts to say the status is flapping. There is on
the web-page, but not in the e-mail. The only way you can tell (by looking
at the e-mail alone) is because the colours in the subject and the body of
the e-mail do not match each other.
In fact, there was much more in my previous e-mail, with perhaps a better
idea on how alerts should be sent out when a status is flapping:
This approach would also avoid sending out too many alerts as it would not
send out an alert when the status was not in alert state, and so we would
not be sending out more alerts after the final recovery.