From: Joaquim H. <jo...@we...> - 2010-06-30 10:00:56
|
We have a number of servers all running Ubuntu 8.10 under VMware ESXi; nothing else on the servers is misbehaving or working in unexpected manners .. except for Webmin. In particular, the Webmin monitoring. It keeps reporting timeouts, monitors down, monitors up, etc. etc. Yet when we actually check the services it complains about, they are never down. We do monitoring via a "central" Webmin server that issues the various warnings/e-mails, it seems to have serious connection problems. But they are sporadic and erratic at best. We have checked the internal and external firewalls, we have checked everything we can think of. And since it does work most of the time or "half" of the time .. Right now, the only solution that I can think of would be that each server has its own monitoring and own e-mail alerts .. but that gets a bit messy. -joho |
From: Jamie C. <jca...@we...> - 2010-06-30 20:39:29
|
On 30/Jun/2010 03:00 Joaquim Homrighausen <jo...@we...> wrote .. > > We have a number of servers all running Ubuntu 8.10 under VMware ESXi; > nothing else on the servers is misbehaving or working in unexpected > manners .. except for Webmin. In particular, the Webmin monitoring. It > keeps reporting timeouts, monitors down, monitors up, etc. etc. > > Yet when we actually check the services it complains about, they are > never down. > > We do monitoring via a "central" Webmin server that issues the various > warnings/e-mails, it seems to have serious connection problems. But they > are sporadic and erratic at best. > > We have checked the internal and external firewalls, we have checked > everything we can think of. And since it does work most of the time or > "half" of the time .. > > Right now, the only solution that I can think of would be that each > server has its own monitoring and own e-mail alerts .. but that gets a > bit messy. Hi Joaquim, Could the issue perhaps be networking problems between the monitoring Webmin system and the hosts being monitored? That could lead to false reports of outages. Also, you might want to increase the number of failures before alerting for each monitor from 1 to 2 or 3. This will prevent false alarms due to short transient failures. - Jamie |
From: Joaquim H. <jo...@we...> - 2010-07-01 08:12:45
|
We realized that we had a (virtual) SuSE 9.0 system being monitored from the central server. We, unfortunately, need that server to be in place since we need a binary distribution of MySQL 4.0.x (and it's hard to compile 4.0.x from source on a modern Linux system). It turns out that Webmin could not use SSL when talking to it because I had not seen that the SSL (perl) module that Webmin needs wasn't installed. I had already removed the monitors for that server from the centeral Webmin and placed them directly on the SuSE 9.0 server when we realized this. But as soon as this server was removed from the central monitoring, things began to act normally. So I guess a question could be how this could affect all the other monitors .. could it be that due to timeout reasons for this particular server, the ones "following" in the list, were delayed so much that a timeout occurred, and thus generated a warning? We found a heap of dead/zombie rpc processes too ... -joho On 06/30/2010 10:39 PM, Jamie Cameron wrote: > Could the issue perhaps be networking problems between the monitoring > Webmin system and the hosts being monitored? That could lead to false > reports of outages. > > Also, you might want to increase the number of failures before alerting > for each monitor from 1 to 2 or 3. This will prevent false alarms due to > short transient failures. > > - Jamie > |
From: Jamie C. <jca...@we...> - 2010-07-01 17:19:09
|
Was webmin perhaps trying to connect in non-SSL mode to an SSL server? That could take a long time to timeout .. but shouldn't effect other tests from what I can see in the code. On 01/Jul/2010 01:12 Joaquim Homrighausen <jo...@we...> wrote .. > > We realized that we had a (virtual) SuSE 9.0 system being monitored from > the central server. We, unfortunately, need that server to be in place > since we need a binary distribution of MySQL 4.0.x (and it's hard to > compile 4.0.x from source on a modern Linux system). It turns out that > Webmin could not use SSL when talking to it because I had not seen that > the SSL (perl) module that Webmin needs wasn't installed. > > I had already removed the monitors for that server from the centeral > Webmin and placed them directly on the SuSE 9.0 server when we realized > this. > > But as soon as this server was removed from the central monitoring, > things began to act normally. So I guess a question could be how this > could affect all the other monitors .. could it be that due to timeout > reasons for this particular server, the ones "following" in the list, > were delayed so much that a timeout occurred, and thus generated a warning? > > We found a heap of dead/zombie rpc processes too ... > > > -joho > > > On 06/30/2010 10:39 PM, Jamie Cameron wrote: > > Could the issue perhaps be networking problems between the monitoring > > Webmin system and the hosts being monitored? That could lead to false > > reports of outages. > > > > Also, you might want to increase the number of failures before alerting > > for each monitor from 1 to 2 or 3. This will prevent false alarms due to > > short transient failures. > > > > - Jamie > > > > > ------------------------------------------------------------------------------ > This SF.net email is sponsored by Sprint > What will you do first with EVO, the first 4G phone? > Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first > - > Forwarded by the Webmin mailing list at web...@li... > To remove yourself from this list, go to > http://lists.sourceforge.net/lists/listinfo/webadmin-list |
From: Joaquim H. <jo...@we...> - 2010-07-01 22:40:23
|
I can't recall exactly if that was the case, but it seems likely, or vice versa (?). Regardless, it did affect the other monitors' stability. Things have definitely quieted down after we removed it from the central monitoring. -joho |