Re: [Nodebrain-users] Nagios integration and counting objects
Rule Engine for State and Event Monitoring
Brought to you by:
trettevik
|
From: Trettevik, Ed A <ed....@bo...> - 2013-12-02 21:08:46
|
Hi Dr Marco,
The Tree module was designed for use in classification and event data enrichment (adding attributes based on known attributes), and does not currently provide a mechanism for extracting the number of elements with a given value, although that could be added as an enhancement.
As an alternative you can manage counts in a separate variable or tree. Let’s say you wanted to maintain a count of the number of servers with a given resource/application down. When you assert the resource down or up, you can manage the count as part of the assertion.
assert servicedown(“srv1”,”apache”),serversdown(“apache”)=serversdown(“apache”)+1;
assert ?servicedown(“srv1”,”apache”),serversdown(“apache”)=serversdown(“apache”)-1;
However, this requires that you only make the assertion when changing state. If the assertion is performed as a rule action, you can add a test to the rule condition to only make the assertion on a state change.
Here’s an example using this approach.
#rules
define service node;
service. define host cell;
service. define app cell;
service. define state cell "up";
define servicedown node tree; # host,app down
define serversdown node tree:notfound=0; # number of servers down per app – start new entries at zero
service. define down on(state="down" and ?servicedown(host,app)) servicedown(host,app),serversdown(app)=serversdown(app)+1;
service. define up on(state="up" and servicedown(host,app)) ?servicedown(host,app),serversdown(app)=serversdown(app)-1;
define apacheDownLimit on(serversdown("apache")>2):-echo "just so you know I know there are three apache servers down"
# assertions
service. assert host="srv1",app="apache",state="down";
service. assert host="srv2",app="apache",state="down";
service. assert host="srv3",app="apache",state="down";
service. assert host="srv4",app="apache",state="down";
service. assert host="srv3",app="apache",state="up";
service. assert host="srv2",app="apache",state="up";
service. assert host="srv2",app="apache",state="down";
You might also consider using a Cache node instead of Tree node. A Cache node is like a Tree node is some respects, but entries don’t have values. Instead, the nodes of a Cache have counters upon which you can set thresholds. Without knowing your requirements better, I can’t say if this is a good approach for your use case.
# rules
define DownServiceServer node cache:(app[3],host);
DownServiceServer. define r1 if(app._kidState):$ -echo "just so you know I know there are three ${app} servers down"
# assertions
DownServiceServer. assert ("apache","srv1");
DownServiceServer. assert ("apache","srv2");
DownServiceServer. assert ("apache","srv1");
DownServiceServer. assert ("apache","srv3");
DownServiceServer. assert ("apache","srv4");
You may find that you need different rules for each resource/application. This is where the notion of rule compilers can come in handy. A rule compiler is a script you write, if necessary, and is based on an abstract model of how you want to monitor similar resources. You specify, in a configuration file of your design, a list of resources and monitoring parameters within your model. Your rule compiler then generates the appropriate NodeBrain rules using your configuration file as input. Using this approach, you might have a node for each resource.
# configuration file
apache,3
foobar,2
# rules generated by hypothetical rule compiler
define 'apache' node;
'apache'. define DownServer node cache:([3]:host);
'apache'.DownServer. define r1 if(_kidState):$ -echo "just so you know I know there are three apache servers down"
define 'foobar' node;
'foobar'. define DownServer node cache:([2]:host);
'foobar'.DownServer. define r1 if(_kidState):$ -echo "just so you know I know there are two foobar servers down"
# assertions
'apache'.DownServer. assert ("srv1");
'apache'.DownServer. assert ("srv2");
'apache'.DownServer. assert ("srv3");
'foobar'.DownServer. assert ("srv1");
'foobar'.DownServer. assert ("srv2");
The Caboodle NodeBrain Kit provides a framework for managing rules as XML documents from which compilers generate the actual NodeBrain rules. However, you may find you can get along just fine with a simpler approach of your own design, or you may develop something much more effective.
Seems I’ve strayed a bit from your question, to which the answer might have been simply “no”. ☺ But hopefully this helps in some way.
From: Marco Musso [mailto:ma...@mu...]
Sent: Monday, November 25, 2013 3:29 AM
To: nod...@li...
Subject: [Nodebrain-users] Nagios integration and counting objects
Hi fellow nodebrain users!
I'd like to submit a solution for my problem that you can probably improve...
Let's suppose to define a tree node to store the status of a resource (let's say apache) of some servers (the total number of servers is dynamic and unknown):
define servers node tree;
and then populate the tree (via some servant scripts):
servers. assert ("srv1","apache")=0; # or alert servers("srv1","apache")=0
servers. assert ("srv2","apache")=0;
servers. assert ("srv3","apache")=0;
servers. assert ("srv4","apache")=0;
we'll get:
show servers
"srv1"
"apache"=0
"srv2"
"apache"=0
"srv3"
"apache"=0
"srv1"
"apache"=0
The goal is to count the number of server with apache != 0 (ie. resource not available).
The first thing I tried was:
define broken node tree; # a tree that contains servers without running apache
define r1 on(!servers(x,"apache")=0): broken(x); # very much like the tutorial (paragraph 6.3)
which should trigger on assert x="srv1" and check the status and eventually define "srv1"=1, like this:
servers. assert ("srv1","apache")=1;
Rule local.r1 fired (@.local.broken(@.local.x)=1)
show broken
broken = ! == node tree
"srv1"=1
To clear the state when apache is available again I can define another rule:
define r2 on(servers(x,"apache")=0) ?broken(x); # or broken(x)=0
servers. assert ("srv1","apache")=0;
Rule local.r2 fired (@.local.broken(@.local.x)=?)
show broken
broken = ! == node tree
This works as far as x has the value of a server (i.e. to trigger those rules I have to assert x=). To me this doesn't sound as an elegant solution (and probably I should have used IF/ALERT instead of ON/ASSERT).
Then there is the problem that I want to know how many server are broken and call an adapter.
How can I count the cardinality of a tree (or the number of element with a given property/value directly on the servers tree)?
With those questions in mind I started to thing that probably the method I'm following is not the best (also because it resembles too closely an standard programming logic): is there a better way?
TIA
— Dr. Marco Musso
|