From: micah m. <mi...@gm...> - 2005-05-21 00:24:02
|
Hello, I'm looking at using munin-limits to perform alerting about various problems with hosts. Looking at apache first, the existing plugins are: apache_accesses - I can't think of an alert that would make sense here since this just gathers how many accesses your apache gets. apache_processes (busy and idle) - an alert here about all your processes being busy might be useful for knowing when you need to tune your apache apache_volume - perhaps an alert if volume gets really large?=20 It appears as if the apache plugins have no warning/critical configuration options, and even if they did they would be of limited use. What I really want in terms of apache monitoring is the ability to get an alert when apache is *not* available. This could be achieved with using the process monitor (although this doesn't have a warning/critical built-in either), but there are times when apache is running, but not responding, so I would create a process monitor, but I would also want to test in apache's responsiveness. In a typical monitoring/alerting environment this would be accomplished by attempting to connect and grab a URL, if this fails, generate an alert. Is the solution to this to write a plugin that monitors how long apache responds on a host and set a [field].warning to a number of seconds, and [field].critical to a higher number? Or is there already something that I have overlooked? Thanks, micah |
From: Patrick v. d. H. <pa...@wu...> - 2005-05-21 11:23:02
|
micah milano wrote: [...] > apache_accesses - I can't think of an alert that would make sense here > since this just gathers how many accesses your apache gets. Munin primarily does statistics and after running it for some time you will know the usual average and derivation. If you leave that range, an alert might be reasonable. E.g. if you apache gets 10 hits each second and munin gives you 0 hits in the last sample, there might be a problem. Perhaps you network-connection is down so you can't receive any reqests any more? So personally I see resonable alerts depending on the apache-statistics. [...] > use. What I really want in terms of apache monitoring is the ability > to get an alert when apache is *not* available. This could be achieved In that case the existing apache-plugins should report "NaN", indicating that theycould not access the apache-server to gather statistics. I don't think you can raise an alert if a plugin fails to gather data, but you should verify wheter a "value.critical NaN" works.... > with using the process monitor (although this doesn't have a > warning/critical built-in either), but there are times when apache is Why should it? There is no need for a plugin to define warning/critical levels, just do it in your configuration. [...] > Is the solution to this to write a plugin that monitors how long > apache responds on a host and set a [field].warning to a number of > seconds, and [field].critical to a higher number? Or is there already > something that I have overlooked? You certainly missed nagios, BigBrother, monit etc., just to name a few. IMHO avability of services is not really that kind of data munin was expected to monitor. Of course you can write a munin-plugin to verify that apache is running and return 1 if it does or 0 if it does not. Than you can do graphs, alerts, etc. on that plugin. How the plugin verifies that apache is running is up to you. You can have your plugin report the time required by apache to answer your request. That would certainly be much more useful, but be careful about the timeout values you choose, they have to be shorter than munins timeout for your plugin. What will your plugin report if there is no connection? NaN? Well, you already get that by the existing apache-plugins if the fail to connect, so you gain littel... -- CU, Patrick. |
From: Micah A. <mi...@ri...> - 2005-05-22 17:51:52
|
Patrick von der Hagen schrieb am Saturday, den 21. May 2005: > micah milano wrote: > [...] > >use. What I really want in terms of apache monitoring is the ability > >to get an alert when apache is *not* available. This could be achieved > In that case the existing apache-plugins should report "NaN", indicating > that theycould not access the apache-server to gather statistics. I > don't think you can raise an alert if a plugin fails to gather data, but > you should verify wheter a "value.critical NaN" works.... I'll give this a shot. > >with using the process monitor (although this doesn't have a > >warning/critical built-in either), but there are times when apache is > Why should it? There is no need for a plugin to define warning/critical > levels, just do it in your configuration. Many plugins have warning/ciritcal levels coded into them because they are sane levels, which then can be overwritten by configuration options. So, the only need in adding these to a plugin are to be nice to others using the plugin. > [...] > >Is the solution to this to write a plugin that monitors how long > >apache responds on a host and set a [field].warning to a number of > >seconds, and [field].critical to a higher number? Or is there already > >something that I have overlooked? > You certainly missed nagios, BigBrother, monit etc., just to name a few. > IMHO avability of services is not really that kind of data munin was > expected to monitor. No, I didn't miss them, I've spent quite a bit of time with nagios, bigbrother, bigsister, monit, mon, pong, spong, hobbit monitor, argus, and others. For various reasons I did not want to use any of them. Some of those reasons include: code has undergone heavy bitrot and is not maintained, code requires a maddening amount of configuration (nagios) just to monitor simple things, documentation does not correspond to reality so to get simple ping monitors setup requires asking on mailing lists and reading code. Finally the most important thing I realized was that the most viable of these these solutions do the same thing munin does, from the other direction. Nagios does monitoring, and hey! it also does graphs. Munin does graphs, and hey! it does monitoring. I don't like the idea of having two sets of software, each requiring their own maintenance and configuration, each regularly polling all my systems asking how much disk space is used, etc. and then going back and churning out relatively similar graphs of varying quality. Depending on the statistics gathering you are doing, this can be a rather intensive process for munin to do it every 5 minutes, doing it from two separate vectors for roughly the same purpose seems silly to me. Its like running mrtg, munin, and cricket all at the same time and wondering why some of your systems have load spikes every 5 minutes. This is, IMHO, why munin has its nagios plugin functionality. It makes sense that munin gather this information, and then generate the alerts to something like nagios. There is a reason why munin-limits exists, and the options for alerting, and while it may not be as extensive and mature as nagios is, I'm not a huge fan of that project and have gone through the motions to set it up a couple of times and collapsed under the sheer weight of configuration files I had to mess with. That said, I've heard nagios 2.0 is better, and I may look into it, but I will still want only one system to poll, and I frankly prefer munin due to the simplicity of plugins. Micah |
From: Patrick v. d. H. <pa...@wu...> - 2005-05-22 23:16:01
|
Micah Anderson schrieb: > Patrick von der Hagen schrieb am Saturday, den 21. May 2005: > > >>micah milano wrote: [...] >>>with using the process monitor (although this doesn't have a >>>warning/critical built-in either), but there are times when apache is >> >>Why should it? There is no need for a plugin to define warning/critical >>levels, just do it in your configuration. > > > Many plugins have warning/ciritcal levels coded into them because they > are sane levels, which then can be overwritten by configuration > options. So, the only need in adding these to a plugin are to be nice > to others using the plugin. I wanted to express that you can define limits for plugins that don't define limits themselves. Of course it can be resonalbe for many plugins to define default limits, but usually that are plugins doing percantages or similar values like 'disk almost full'. The number of http-requests is specific for a given server so default-limits can't really be established. [not using nagios and the like] > Finally the most important thing I realized was that the most viable > of these these solutions do the same thing munin does, from the other > direction. Nagios does monitoring, and hey! it also does graphs. Munin > does graphs, and hey! it does monitoring. I don't like the idea of But their primary focus is on monitoring differnt things. I'll explain below. > having two sets of software, each requiring their own maintenance and > configuration, each regularly polling all my systems asking how much > disk space is used, etc. and then going back and churning out > relatively similar graphs of varying quality. Depending on the > statistics gathering you are doing, this can be a rather intensive > process for munin to do it every 5 minutes, doing it from two separate > vectors for roughly the same purpose seems silly to me. Its like > running mrtg, munin, and cricket all at the same time and wondering > why some of your systems have load spikes every 5 minutes. I agree with you very, very much. Certainly Nagios and munin are closely related and there are areas where they overlapp. For example, both can verify that a given host sill has diskspace available. However, I see differnces. You want to check that a given service is running and answering requests. Often that service is not running on localhost, but on a remote host. That's exactly the kind of (active) check Nagios is very good at and which munin (currently) just dosen't handle. Usually such tests don't really need statistics and graphs, you just have to know wheter the service is running or not. Since there is no statistics involved, munin doesn't relly care about it..... However, there are other tests where Nagios has problems. That are those tests that don't just verify wheter a service is somehow available, but instead require some kind of measuring to decide wheter a tests succeeds or fails. For example "did the system process any mail in the last 5 minutes?" Munin is an excellent tool to gather such data and raise alerts if limits are met. I just tried to describe two seperate kinds of tests network-monitoring has to perform, one where Nagios wins (service avability over network-connections) and one where munin is IMHO performing much better (gathering local data and alerting where certain statistical limits grow out of hand). It's some kind of fuzzy distinction, not sure wheter you can follow me here... Basically, I see areas where munin is better and other areas where Nagios wins. munin-limits enables them to cooperate, so you can get the best of both applications. Nagios-active-checks are my first choice to verify my smtp-daemon is running while munin is the first choice to verify the mailqueue is OK. The check you want to perform in your initial question, wheter Apache is running or not, is IMHO one of those tests Nagios is good at. Still, you can define a Munin-plugin returning 0 if your apache is running and 1 otherwise. Send an alert if that plugin reports 1. > This is, IMHO, why munin has its nagios plugin functionality. It > makes sense that munin gather this information, and then generate the Service-avability is not the kind of information munin is good at gathering. -- CU, Patrick. |