It would be a VERY nice feature to have.
Here we have been using Pandora for some time now, and this "services"
feature is just the thing to have: as we are monitoring applications and
servers, sometimes is more important to ensure that a combination of
monitors are running fine, instead of each monitor by itself.
I was talking to a friend here, and he suggested that an extra posibility is
to add something like a schedule to maybe change the weight distribution:
for example, at the end of the month it would be very important to have a
very low latency to the accounting database to ensure that everyone get
their checks, but the rest of the month we could accept the database
running a little slow. Also, in this high-use time it would be very
important that the accounting application is running all day (24x7), but
during the rest of the month we could just ignore if it's running or not.
Just a couple of ideas ;)
On Fri, Dec 11, 2009 at 9:08 AM, Sancho Lerena <slerena@...> wrote:
> Hi everybody,
> Yesterday I have a nice conversation with some people of a really big
> company here in Spain about a feature they miss in Pandora. This is about
> the "service" monitoring. For them, a service is a "complex" definition of
> several things who could goes bad, but some things could be more important
> than others, and need some "margin" to avoid getting alerted for any small
> problem or short-timed problem. They introduced me to the concept of
> "weights" in order to catalog problems using a "sum" of weights with a
> limit to fire the trigger. This is a concept above the tecnical stuff, this
> is about a first approach to business monitoring.
> In this idea:
> Service will be defined as a list of agent modules (could be from different
> agents, of course), and an numeric value (weigth) what will be multiplied
> by 0.5 if module is in warning status, and by 1 if it's in critical.
> Service total weight will the SUM of all modules containing that service. A
> Service will be "WARNING" if SUM is above "min. warning" for that service,
> and in "CRITICAL" if sum is above "min. critical". Both parameters will be
> a parameter of the definition of the service.
> A service could also have "child" services, and they work as any other
> module, just waching its status, and applying them a weight.
> A external processor (a function implemented in data server), will
> calculate the service status, making the sum operations and storing in the
> database each XX minutes. And user could visualize it from the console.
> SERVICES also could be included in the visual map, as other icon, and
> services also could be have a alert template linked to it in order to
> notify when a service is going down or having problems.
> Service will give also, a % of compliance of the whole service definition,
> based on the current values of weights for modules not in valid margins.
> "Online Customer Service" is defined as:
> - "Web farm service"; a child service, with a weight of 2
> - Web browser avg. latency, weight 1.
> - External mail service, weight 1
> - FTP Service, weight 1
> - External router check, weight 1.
> - Backend latency, weight 1.
> - Database connection overload, weight 1.
> General parameters for "Online customer service":
> * Min. Warning: 2.5
> * Min. Critical: 4
> * Interval: 10 min.
> * History: yes
> * Alerts: Yes. Send an email.
> So this means that if FTP service is down, and latency in the backend is
> pretty high, service will be "OK", but if database connections are in
> warning, service indicator will go to "warning".
> "Web farm" is another service (configured as child of "Online customer
> service", and has the following configuration:
> - Server 1, weight 1
> - Server 2, weight 1
> - Server 3, weight 1
> - Server 4, weight 2
> - External latency, weight 1
> - Network balancer, weight 5
> - Router A, weight 3
> - Router B, weight 3
> * Min. Warning: 2
> * Min. Critical: 4
> * Interval: 5 min
> * History: Yes.
> * Alerts: No
> When "WEB farm service" module weights sums more than 2, this service will
> be as "warning" and when have more than 4, will be critical. This values
> could be used in "Online customer service" for raise weight 2x0.5=1 on "Web
> farm service" subcomponent if this is in warning status or 2x1 on critical.
> This could be a important feature for Pandora FMS 3.1 development, I would
> know if anyone here has an opinion about this.
> Thanks for your comments.
> Un saludo
> Sancho Lerena <slerena@...>
> Return on Information:
> Google Enterprise Search pays you back
> Get the facts.
> Pandora-develop mailing list