Re: [Mon-devel] Mon's Acknowledge system flawed?
Brought to you by:
trockij
From: David N. <vit...@cm...> - 2007-03-08 05:03:50
|
On 3/7/07, Augie Schwer <aug...@gm...> wrote: > It seems that mon's ack system is flawed at least if it is used how > the examples show multiple hosts in a watch group. > > Ack'ing an alert acks the service in the group, not the host in the > group that is alerting, so for example if you have a host group for > your web servers and you watch http; if http alerts on one of the > hosts and you ack it, the rest of your web servers could go down and > you would never know about it because the other host's http alerts > would be suppressed. > > Is this expected behavior? Am I wrong to think that this is a flaw? You are correct that the old mon 0.99.2 code exhibits this behavior. The more recent code in CVS has a configurable feature that causes mon to remove the ack state from a service if the summary component of the failure message changes. In most common usage the summary is the list of hosts that are failing, so additional hosts failing would remove an ack. There has also been some discussion in the past of adding true per-host status tracking to Mon, but that proposal has never been followed through on. (IIRC, we got bogged down in discussion of how we would need to add structure to the data communicated between mon and the monitor/alert scripts, and how to maintain backwards compatibility with existing scripts) > > I know the mon project is pretty much not maintained anymore, so if I > don't get any response back I won't be surprised, but I thought I > would float this question out there and see if I get any responses. While thats an understandable conclusion based on the lack of a stable release in approximately forver, there has been a lot of work since the last release. The lack of a (declared) stable release has been in part because of a lack of feedback on the development versions. In fact I posted a release candidate for mon 1.2.0 back in september (http://www.managedandmonitored.net/mon/) but I have received almost no feedback on this version. In many cases I assume the mon users just haven't had the opportunity to replace known-working systems or setup parallel monitoring infrastructure. -David |