Re: [Nodebrain-users] Nagios integration and counting objects
Rule Engine for State and Event Monitoring
Brought to you by:
trettevik
From: Trettevik, Ed A <ed....@bo...> - 2013-12-02 21:08:46
|
Hi Dr Marco, The Tree module was designed for use in classification and event data enrichment (adding attributes based on known attributes), and does not currently provide a mechanism for extracting the number of elements with a given value, although that could be added as an enhancement. As an alternative you can manage counts in a separate variable or tree. Let’s say you wanted to maintain a count of the number of servers with a given resource/application down. When you assert the resource down or up, you can manage the count as part of the assertion. assert servicedown(“srv1”,”apache”),serversdown(“apache”)=serversdown(“apache”)+1; assert ?servicedown(“srv1”,”apache”),serversdown(“apache”)=serversdown(“apache”)-1; However, this requires that you only make the assertion when changing state. If the assertion is performed as a rule action, you can add a test to the rule condition to only make the assertion on a state change. Here’s an example using this approach. #rules define service node; service. define host cell; service. define app cell; service. define state cell "up"; define servicedown node tree; # host,app down define serversdown node tree:notfound=0; # number of servers down per app – start new entries at zero service. define down on(state="down" and ?servicedown(host,app)) servicedown(host,app),serversdown(app)=serversdown(app)+1; service. define up on(state="up" and servicedown(host,app)) ?servicedown(host,app),serversdown(app)=serversdown(app)-1; define apacheDownLimit on(serversdown("apache")>2):-echo "just so you know I know there are three apache servers down" # assertions service. assert host="srv1",app="apache",state="down"; service. assert host="srv2",app="apache",state="down"; service. assert host="srv3",app="apache",state="down"; service. assert host="srv4",app="apache",state="down"; service. assert host="srv3",app="apache",state="up"; service. assert host="srv2",app="apache",state="up"; service. assert host="srv2",app="apache",state="down"; You might also consider using a Cache node instead of Tree node. A Cache node is like a Tree node is some respects, but entries don’t have values. Instead, the nodes of a Cache have counters upon which you can set thresholds. Without knowing your requirements better, I can’t say if this is a good approach for your use case. # rules define DownServiceServer node cache:(app[3],host); DownServiceServer. define r1 if(app._kidState):$ -echo "just so you know I know there are three ${app} servers down" # assertions DownServiceServer. assert ("apache","srv1"); DownServiceServer. assert ("apache","srv2"); DownServiceServer. assert ("apache","srv1"); DownServiceServer. assert ("apache","srv3"); DownServiceServer. assert ("apache","srv4"); You may find that you need different rules for each resource/application. This is where the notion of rule compilers can come in handy. A rule compiler is a script you write, if necessary, and is based on an abstract model of how you want to monitor similar resources. You specify, in a configuration file of your design, a list of resources and monitoring parameters within your model. Your rule compiler then generates the appropriate NodeBrain rules using your configuration file as input. Using this approach, you might have a node for each resource. # configuration file apache,3 foobar,2 # rules generated by hypothetical rule compiler define 'apache' node; 'apache'. define DownServer node cache:([3]:host); 'apache'.DownServer. define r1 if(_kidState):$ -echo "just so you know I know there are three apache servers down" define 'foobar' node; 'foobar'. define DownServer node cache:([2]:host); 'foobar'.DownServer. define r1 if(_kidState):$ -echo "just so you know I know there are two foobar servers down" # assertions 'apache'.DownServer. assert ("srv1"); 'apache'.DownServer. assert ("srv2"); 'apache'.DownServer. assert ("srv3"); 'foobar'.DownServer. assert ("srv1"); 'foobar'.DownServer. assert ("srv2"); The Caboodle NodeBrain Kit provides a framework for managing rules as XML documents from which compilers generate the actual NodeBrain rules. However, you may find you can get along just fine with a simpler approach of your own design, or you may develop something much more effective. Seems I’ve strayed a bit from your question, to which the answer might have been simply “no”. ☺ But hopefully this helps in some way. From: Marco Musso [mailto:ma...@mu...] Sent: Monday, November 25, 2013 3:29 AM To: nod...@li... Subject: [Nodebrain-users] Nagios integration and counting objects Hi fellow nodebrain users! I'd like to submit a solution for my problem that you can probably improve... Let's suppose to define a tree node to store the status of a resource (let's say apache) of some servers (the total number of servers is dynamic and unknown): define servers node tree; and then populate the tree (via some servant scripts): servers. assert ("srv1","apache")=0; # or alert servers("srv1","apache")=0 servers. assert ("srv2","apache")=0; servers. assert ("srv3","apache")=0; servers. assert ("srv4","apache")=0; we'll get: show servers "srv1" "apache"=0 "srv2" "apache"=0 "srv3" "apache"=0 "srv1" "apache"=0 The goal is to count the number of server with apache != 0 (ie. resource not available). The first thing I tried was: define broken node tree; # a tree that contains servers without running apache define r1 on(!servers(x,"apache")=0): broken(x); # very much like the tutorial (paragraph 6.3) which should trigger on assert x="srv1" and check the status and eventually define "srv1"=1, like this: servers. assert ("srv1","apache")=1; Rule local.r1 fired (@.local.broken(@.local.x)=1) show broken broken = ! == node tree "srv1"=1 To clear the state when apache is available again I can define another rule: define r2 on(servers(x,"apache")=0) ?broken(x); # or broken(x)=0 servers. assert ("srv1","apache")=0; Rule local.r2 fired (@.local.broken(@.local.x)=?) show broken broken = ! == node tree This works as far as x has the value of a server (i.e. to trigger those rules I have to assert x=). To me this doesn't sound as an elegant solution (and probably I should have used IF/ALERT instead of ON/ASSERT). Then there is the problem that I want to know how many server are broken and call an adapter. How can I count the cardinality of a tree (or the number of element with a given property/value directly on the servers tree)? With those questions in mind I started to thing that probably the method I'm following is not the best (also because it resembles too closely an standard programming logic): is there a better way? TIA — Dr. Marco Musso |