|
From: Jamie C. <jca...@we...> - 2006-05-18 21:59:57
|
On 18/May/2006 16:07 Joaquim Homrighausen wrote .. > > I have a "master webmin"-server, that does little else apart from being > just that. It runs quite a few monitors and collects data from other webmin > servers and webmin monitors. > > On this server, I monitor a server we call "teacher". Quite frequently, > but not according to any pattern I've been able to find, one or two of > the monitors will report that "webmin has gone down", and then three minutes > later, it comes back up. > > > The exact messages are: > > "Monitor on teacher.domain.com for 'Postfix, teacher' has detected that > Webmin is down at Thu May 18 18:21:01 2006" > > and then > > "Monitor on teacher.domain.com for 'Postfix, teacher' has detected that > the service has gone back up at Thu May 18 18:24:01 2006 > > > The monitors are configured as such: > > Postfix > - failures before reporting 2 > - run on host teacher.domain.com > - check on schedule? yes, and report on status changes > Apache > - failures before reporting 2 > - run on host teacher.domain.com > - check on schedule? yes, and report on status changes > > I have at least 12 more monitors configured almost identically, none of > them exhibit this behavior. > > The scheduled monitoring is set to check every 3 minutes with offset 0 > Send email when "when a service changes status" > > > For your records, the services that are being monitored, postfix and apache, > have never been down. > > > Good thing I shave my head, or I'd be pulling my hair out right about now.. So you have multiple remote monitors for the teacher server, but only one is failing like this? If so, is there any firewall between the two machines that could be blocking ports 10000-10100 ? - Jamie |