nodebrain-users Mailing List for NodeBrain
Rule Engine for State and Event Monitoring
Brought to you by:
trettevik
You can subscribe to this list here.
2003 |
Jan
|
Feb
|
Mar
(5) |
Apr
|
May
|
Jun
(1) |
Jul
(1) |
Aug
(4) |
Sep
|
Oct
|
Nov
|
Dec
|
---|---|---|---|---|---|---|---|---|---|---|---|---|
2004 |
Jan
|
Feb
|
Mar
|
Apr
|
May
(1) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2006 |
Jan
|
Feb
|
Mar
|
Apr
|
May
(2) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2013 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
(1) |
Dec
(2) |
2014 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
(1) |
From: Ed T. <ea...@no...> - 2014-12-15 02:15:55
|
NodeBrain Users, Version 0.9.03 has been released and is available for download at SourceForge. See http://nodebrain.org for more info. Ed |
From: <mar...@km...> - 2013-12-10 09:15:32
|
On 02/dic/2013, at 21:51, Trettevik, Ed A <ed....@bo...> wrote: Hi Dr Marco, Hi Ed :) However, this requires that you only make the assertion when changing state. If the assertion is performed as a rule action, you can add a test to the rule condition to only make the assertion on a state change. Here’s an example using this approach. Thanks for pointing out this solution, in fact I used that! I adapted the solution for our environment creating some "nodes" in the tree to reflect the business service the servers are delivering along with every KPI that when must check (I fact I spent the last days writing the servants and adding a SQL backend to store some meaningful data, I read somewhere about a JournalKit that maybe does that) You might also consider using a Cache node instead of Tree node. A Cache node is like a Tree node is some respects, but entries don’t have values. Instead, the nodes of a Cache have counters upon which you can set thresholds. Without knowing your requirements better, I can’t say if this is a good approach for your use case. I definitely will because I don’t want a too-reactive node brain instance: the concept is indeed similar to the SOFT/HARD states in Nagios but of course the cache is way more powerful… hope to get to that soon. You may find that you need different rules for each resource/application. This is where the notion of rule compilers can come in handy. A rule compiler is a script you write, if necessary, and is based on This is another good hint... The Caboodle NodeBrain Kit provides a framework for managing rules as XML documents from which compilers generate the actual NodeBrain rules. However, you may find you can get along just fine with a simpler approach of your own design, or you may develop something much more effective. Seems I’ve strayed a bit from your question, to which the answer might have been simply “no”. J But hopefully this helps in some way. Always helpful, thanks! bye From: Marco Musso [mailto:ma...@mu...] Sent: Monday, November 25, 2013 3:29 AM To: nod...@li... Subject: [Nodebrain-users] Nagios integration and counting objects Hi fellow nodebrain users! I'd like to submit a solution for my problem that you can probably improve... Let's suppose to define a tree node to store the status of a resource (let's say apache) of some servers (the total number of servers is dynamic and unknown): define servers node tree; and then populate the tree (via some servant scripts): servers. assert ("srv1","apache")=0; # or alert servers("srv1","apache")=0 servers. assert ("srv2","apache")=0; servers. assert ("srv3","apache")=0; servers. assert ("srv4","apache")=0; we'll get: show servers "srv1" "apache"=0 "srv2" "apache"=0 "srv3" "apache"=0 "srv1" "apache"=0 The goal is to count the number of server with apache != 0 (ie. resource not available). The first thing I tried was: define broken node tree; # a tree that contains servers without running apache define r1 on(!servers(x,"apache")=0): broken(x); # very much like the tutorial (paragraph 6.3) which should trigger on assert x="srv1" and check the status and eventually define "srv1"=1, like this: servers. assert ("srv1","apache")=1; Rule local.r1 fired (@.local.broken(@.local.x)=1) show broken broken = ! == node tree "srv1"=1 To clear the state when apache is available again I can define another rule: define r2 on(servers(x,"apache")=0) ?broken(x); # or broken(x)=0 servers. assert ("srv1","apache")=0; Rule local.r2 fired (@.local.broken(@.local.x)=?) show broken broken = ! == node tree This works as far as x has the value of a server (i.e. to trigger those rules I have to assert x=). To me this doesn't sound as an elegant solution (and probably I should have used IF/ALERT instead of ON/ASSERT). Then there is the problem that I want to know how many server are broken and call an adapter. How can I count the cardinality of a tree (or the number of element with a given property/value directly on the servers tree)? With those questions in mind I started to thing that probably the method I'm following is not the best (also because it resembles too closely an standard programming logic): is there a better way? TIA — Dr. Marco Musso -- Dr. Marco Musso SIP: +39 011 2178981 Mob: +39 348 2303085 Fax: +39 02 700410445 | +39 011 83031108 |
From: Trettevik, Ed A <ed....@bo...> - 2013-12-02 21:08:46
|
Hi Dr Marco, The Tree module was designed for use in classification and event data enrichment (adding attributes based on known attributes), and does not currently provide a mechanism for extracting the number of elements with a given value, although that could be added as an enhancement. As an alternative you can manage counts in a separate variable or tree. Let’s say you wanted to maintain a count of the number of servers with a given resource/application down. When you assert the resource down or up, you can manage the count as part of the assertion. assert servicedown(“srv1”,”apache”),serversdown(“apache”)=serversdown(“apache”)+1; assert ?servicedown(“srv1”,”apache”),serversdown(“apache”)=serversdown(“apache”)-1; However, this requires that you only make the assertion when changing state. If the assertion is performed as a rule action, you can add a test to the rule condition to only make the assertion on a state change. Here’s an example using this approach. #rules define service node; service. define host cell; service. define app cell; service. define state cell "up"; define servicedown node tree; # host,app down define serversdown node tree:notfound=0; # number of servers down per app – start new entries at zero service. define down on(state="down" and ?servicedown(host,app)) servicedown(host,app),serversdown(app)=serversdown(app)+1; service. define up on(state="up" and servicedown(host,app)) ?servicedown(host,app),serversdown(app)=serversdown(app)-1; define apacheDownLimit on(serversdown("apache")>2):-echo "just so you know I know there are three apache servers down" # assertions service. assert host="srv1",app="apache",state="down"; service. assert host="srv2",app="apache",state="down"; service. assert host="srv3",app="apache",state="down"; service. assert host="srv4",app="apache",state="down"; service. assert host="srv3",app="apache",state="up"; service. assert host="srv2",app="apache",state="up"; service. assert host="srv2",app="apache",state="down"; You might also consider using a Cache node instead of Tree node. A Cache node is like a Tree node is some respects, but entries don’t have values. Instead, the nodes of a Cache have counters upon which you can set thresholds. Without knowing your requirements better, I can’t say if this is a good approach for your use case. # rules define DownServiceServer node cache:(app[3],host); DownServiceServer. define r1 if(app._kidState):$ -echo "just so you know I know there are three ${app} servers down" # assertions DownServiceServer. assert ("apache","srv1"); DownServiceServer. assert ("apache","srv2"); DownServiceServer. assert ("apache","srv1"); DownServiceServer. assert ("apache","srv3"); DownServiceServer. assert ("apache","srv4"); You may find that you need different rules for each resource/application. This is where the notion of rule compilers can come in handy. A rule compiler is a script you write, if necessary, and is based on an abstract model of how you want to monitor similar resources. You specify, in a configuration file of your design, a list of resources and monitoring parameters within your model. Your rule compiler then generates the appropriate NodeBrain rules using your configuration file as input. Using this approach, you might have a node for each resource. # configuration file apache,3 foobar,2 # rules generated by hypothetical rule compiler define 'apache' node; 'apache'. define DownServer node cache:([3]:host); 'apache'.DownServer. define r1 if(_kidState):$ -echo "just so you know I know there are three apache servers down" define 'foobar' node; 'foobar'. define DownServer node cache:([2]:host); 'foobar'.DownServer. define r1 if(_kidState):$ -echo "just so you know I know there are two foobar servers down" # assertions 'apache'.DownServer. assert ("srv1"); 'apache'.DownServer. assert ("srv2"); 'apache'.DownServer. assert ("srv3"); 'foobar'.DownServer. assert ("srv1"); 'foobar'.DownServer. assert ("srv2"); The Caboodle NodeBrain Kit provides a framework for managing rules as XML documents from which compilers generate the actual NodeBrain rules. However, you may find you can get along just fine with a simpler approach of your own design, or you may develop something much more effective. Seems I’ve strayed a bit from your question, to which the answer might have been simply “no”. ☺ But hopefully this helps in some way. From: Marco Musso [mailto:ma...@mu...] Sent: Monday, November 25, 2013 3:29 AM To: nod...@li... Subject: [Nodebrain-users] Nagios integration and counting objects Hi fellow nodebrain users! I'd like to submit a solution for my problem that you can probably improve... Let's suppose to define a tree node to store the status of a resource (let's say apache) of some servers (the total number of servers is dynamic and unknown): define servers node tree; and then populate the tree (via some servant scripts): servers. assert ("srv1","apache")=0; # or alert servers("srv1","apache")=0 servers. assert ("srv2","apache")=0; servers. assert ("srv3","apache")=0; servers. assert ("srv4","apache")=0; we'll get: show servers "srv1" "apache"=0 "srv2" "apache"=0 "srv3" "apache"=0 "srv1" "apache"=0 The goal is to count the number of server with apache != 0 (ie. resource not available). The first thing I tried was: define broken node tree; # a tree that contains servers without running apache define r1 on(!servers(x,"apache")=0): broken(x); # very much like the tutorial (paragraph 6.3) which should trigger on assert x="srv1" and check the status and eventually define "srv1"=1, like this: servers. assert ("srv1","apache")=1; Rule local.r1 fired (@.local.broken(@.local.x)=1) show broken broken = ! == node tree "srv1"=1 To clear the state when apache is available again I can define another rule: define r2 on(servers(x,"apache")=0) ?broken(x); # or broken(x)=0 servers. assert ("srv1","apache")=0; Rule local.r2 fired (@.local.broken(@.local.x)=?) show broken broken = ! == node tree This works as far as x has the value of a server (i.e. to trigger those rules I have to assert x=). To me this doesn't sound as an elegant solution (and probably I should have used IF/ALERT instead of ON/ASSERT). Then there is the problem that I want to know how many server are broken and call an adapter. How can I count the cardinality of a tree (or the number of element with a given property/value directly on the servers tree)? With those questions in mind I started to thing that probably the method I'm following is not the best (also because it resembles too closely an standard programming logic): is there a better way? TIA — Dr. Marco Musso |
From: Marco M. <ma...@mu...> - 2013-11-25 11:56:53
|
Hi fellow nodebrain users! I'd like to submit a solution for my problem that you can probably improve... Let's suppose to define a tree node to store the status of a resource (let's say apache) of some servers (the total number of servers is dynamic and unknown): define servers node tree; and then populate the tree (via some servant scripts): servers. assert ("srv1","apache")=0; # or alert servers("srv1","apache")=0 servers. assert ("srv2","apache")=0; servers. assert ("srv3","apache")=0; servers. assert ("srv4","apache")=0; we'll get: show servers "srv1" "apache"=0 "srv2" "apache"=0 "srv3" "apache"=0 "srv1" "apache"=0 The goal is to count the number of server with apache != 0 (ie. resource not available). The first thing I tried was: define broken node tree; # a tree that contains servers without running apache define r1 on(!servers(x,"apache")=0): broken(x); # very much like the tutorial (paragraph 6.3) which should trigger on assert x="srv1" and check the status and eventually define "srv1"=1, like this: servers. assert ("srv1","apache")=1; Rule local.r1 fired (@.local.broken(@.local.x)=1) show broken broken = ! == node tree "srv1"=1 To clear the state when apache is available again I can define another rule: define r2 on(servers(x,"apache")=0) ?broken(x); # or broken(x)=0 servers. assert ("srv1","apache")=0; Rule local.r2 fired (@.local.broken(@.local.x)=?) show broken broken = ! == node tree This works as far as x has the value of a server (i.e. to trigger those rules I have to assert x=). To me this doesn't sound as an elegant solution (and probably I should have used IF/ALERT instead of ON/ASSERT). Then there is the problem that I want to know how many server are broken and call an adapter. How can I count the cardinality of a tree (or the number of element with a given property/value directly on the servers tree)? With those questions in mind I started to thing that probably the method I'm following is not the best (also because it resembles too closely an standard programming logic): is there a better way? TIA — Dr. Marco Musso |
From: Trettevik, Ed A <ed....@bo...> - 2006-05-10 17:15:10
|
Hi Luc, An identity is just a name with an associated key. When you define the same identity to two or more NodeBrain processes that communicate (peers), the identity must be defined with the same key, although they are not required to both use the private form of the key---one may use the public form of the identity portrayed by the other. You not only "can" use the same identity on different nodes, you "must" define the same identity on different nodes if you want to communicate between them. You can choose how you want to map identities to machines, people, accounts on machines, and NodeBrain processes on machines. Here are two very different possibilities. Consider a single application of NodeBrain including multiple NodeBrain processes (agents and clients) on multiple machines running under multiple machine accounts. 1) Generate one identity and copy it into the read protected $HOME/.nb/private.nb of every machine account for the application that executes NodeBrain on every machine and have every NodeBrain process portray this single identity. This is the easiest to manage, and may be appropriate in some applications of NodeBrain. However, this creates a "fully trusted" relationship between the accounts on all the machines that in many cases will not provide the appropriate level of security. You must consult the security policies in your environment and your own judgment.=20 2) Generate a unique identity for every NodeBrain agent within a set that you want to communicate. Each identity's private key is defined only on the machine where the process runs, and only in the $HOME/.nb/private.nb file of the machine account (user) that runs the process. Also generate a unique identity for any other machine accounts that will execute NodeBrain as a client to these agents. For every peer (agent and client) that you want to be able to communicate with a given agent process, store the public form of the agent's identity declaration in $HOME/.nb/private.nb and use the RANK command to give them the appropriate level of authority. In the second case you will want to come up with a naming convention for your unique identity names. For client accounts you can use a combination of host name and user name. If the user name is "charlie" and the host name is "snoopy", you might use SnoopyCharlie as the identity name. For an agent identity you may want to combine the process name with the host name. For process "appmon" on machine "goofy", you could use "GoofyAppmon" as the identity name. The brain declaration would use "GoofyAppmon@goofy". If you have several agents running under that same machine account (user) and there will be no variation in access granted to them, then you can use the same identity for all of the agents. In this case, you can use an application name in place of the process names to reduce the number of identities. Say the application is WeatherMon and it has 5 NodeBrain agent processes running under the "weather" account on machine "rainy". An identity name of RainyWeatherMon could be used for all 5 agents on rainy. If the machine account "weather" is an account set up for the WeatherMon application, then you could name the agent identities like our client identity SnoopyCharlie example above---RainyWeather. If the WeatherMon application runs on three machines (rainy, sunny, and cloudy) and you are willing to establish a fully trusted relationship between these machines for the weather account, then you may want to just use weather as the identity on all three machines. Here we have relaxed to option (1) above for the agents but still follow option (2) for the clients. The brain declarations would be "weather@rainy", "weather@sunny" and "weather@cloudy" and the identity weather would have the same declaration in /home/weather/.nb/private.nb on all three machines (assuming the home directory for the weather account is /home/weather). So you see, you have a lot of choices and it is admittedly complicated.=20 =09 The NodeBrain protocol NBP is not designed to participate in a public key infrastructure. A NodeBrain identity's public key is only public within a community of administrators. NBP public and private keys are both managed as secret keys on a controlled set of machines and accounts. NodeBrain, or at least NBP, is not intended for dynamic peer-to-peer applications. I should point out that NBP was designed before I became an SSH and SSL user. In some future release of NodeBrain I expect to include additional authentication options based on open-source SSH and SSL packages to reduce the learning curve for people familiar with those protocols and simplify key management for larger applications. It is also relatively easy for developers to write a skill module (plug-in) to implement any peer-to-peer protocol for NodeBrain that is desired as an alternative to NBP. I expect you will find the NodeBrain identity scheme relatively simple once you get a simple answer to your questions. :) You can say that myid (the identity you referenced in the document) is an identity for a process, a person, a machine account (user), or a machine, depending on where you declare it with a private key, if more than one process portrays it or not, and if machine accounts are shared by multiple people or not. So you can be absolutely right if your configuration matches your concept of what myid identifies. Let me know if you have more questions. Ed Trettevik =20 =20 -----Original Message----- From: Luc Stepniewski [mailto:luc...@ad...]=20 Sent: Wednesday, May 10, 2006 3:32 AM To: nod...@li... Subject: [Nodebrain-users] Can I have the same userid on different nodes(machines)? Hello, In the Nodebrain User Guide (0.6.4), chapter 1.4 (User Account Configuration), I find that the naming of identities is a bit misleading. If I follow the examples in the chapter 1.4, I create a "user account" (you name it 'myid') on a machine (using identify), then I create a brain, which is correctly identified by myid@localhost (notice that the hostname is correctly present). But, later in the same chapter, you specify that if I want to communicate with a remote host, I have to define its identity in the same file. That declaration is exactly the same as for a local user declaration. It is not specified that it is remote (except for the second parameter which is 0). So that means that I can't have a user named foobar on two distinct machines. My conclusion is that what you call myid, should be more viewed as 'the unique machine name', and not a username on one of the machines. Am I right? Luc Stepniewski -- Luc Stepniewski <luc...@ad...> <sip:72...@fw...> Adelux - Securite, Linux Public key: <http://lstep.free.fr/pubkey.txt> Key BC0E3C2A fingerprint =3D A4FA466C68D27E46B427 07D083ED6340BC0E3C2A ------------------------------------------------------- Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=3Dlnk&kid=3D120709&bid=3D263057&dat=3D= 121642 _______________________________________________ Nodebrain-users mailing list Nod...@li... https://lists.sourceforge.net/lists/listinfo/nodebrain-users |
From: Luc S. <luc...@ad...> - 2006-05-10 10:31:47
|
Hello, In the Nodebrain User Guide (0.6.4), chapter 1.4 (User Account Configuration), I find that the naming of identities is a bit misleading. If I follow the examples in the chapter 1.4, I create a "user account" (you name it 'myid') on a machine (using identify), then I create a brain, which is correctly identified by myid@localhost (notice that the hostname is correctly present). But, later in the same chapter, you specify that if I want to communicate with a remote host, I have to define its identity in the same file. That declaration is exactly the same as for a local user declaration. It is not specified that it is remote (except for the second parameter which is 0). So that means that I can't have a user named foobar on two distinct machines. My conclusion is that what you call myid, should be more viewed as 'the unique machine name', and not a username on one of the machines. Am I right? Luc Stepniewski -- Luc Stepniewski <luc...@ad...> <sip:72...@fw...> Adelux - Securite, Linux Public key: <http://lstep.free.fr/pubkey.txt> Key BC0E3C2A fingerprint = A4FA466C68D27E46B427 07D083ED6340BC0E3C2A |
From: <ben...@id...> - 2004-05-22 12:54:59
|
Dear Open Source developer I am doing a research project on "Fun and Software Development" in which I kindly invite you to participate. You will find the online survey under http://fasd.ethz.ch/qsf/. The questionnaire consists of 53 questions and you will need about 15 minutes to complete it. With the FASD project (Fun and Software Development) we want to define the motivational significance of fun when software developers decide to engage in Open Source projects. What is special about our research project is that a similar survey is planned with software developers in commercial firms. This procedure allows the immediate comparison between the involved individuals and the conditions of production of these two development models. Thus we hope to obtain substantial new insights to the phenomenon of Open Source Development. With many thanks for your participation, Benno Luthiger PS: The results of the survey will be published under http://www.isu.unizh.ch/fuehrung/blprojects/FASD/. We have set up the mailing list fa...@we... for this study. Please see http://fasd.ethz.ch/qsf/mailinglist_en.html for registration to this mailing list. _______________________________________________________________________ Benno Luthiger Swiss Federal Institute of Technology Zurich 8092 Zurich Mail: benno.luthiger(at)id.ethz.ch _______________________________________________________________________ |
From: <gh...@rl...> - 2003-08-07 22:45:50
|
OK, I just am not getting it, NodeBrain does not respond the way I would except from reading the manual. Here I set a rule, then I alert a number of time. When the rule condition become true the rule fires. Then the rule fire for each alert after that, even though there are no variablse that are alerted in the rule!!!!!!! What's up........ -Gilbert. Log below: ------------------------ LOG ---------------------------------------------------------------------------- --- N o d e B r a i n 0.5.4 2003/07/22 NBP 0.0.1 Copyright (C) 1998-2003 The Boeing Company GNU General Public License ---------------------------------------------------------------- /usr/bin/nb Date Time Message ---------- -------- -------------------------------------------- 2003/08/07 05:17:24 NB000I NodeBrain nb rct[6106] 2003/08/07 05:17:24 NB000I Daemon log is /test/tmp/monitoring/policy_es.log 2003/08/07 05:17:24 NB000I FIFO Listener fifoinput enabled on /test/tmp/monitoring/policy_es_fifo 2003/08/07 05:17:24 NB000T FIFO /test/tmp/monitoring/policy_es_fifo > define P_2 if (M_21_Warning >= M_21_numnodes):- echo "Policy fired" >> /tmp/policy_test.txt > alert M_20_OK=1,M_20_Warning=0,M_20_numnodes=1,M_20_state=0; > alert M_10_OK=1,M_10_Warning=0,M_10_numnodes=1,M_10_state=0; > alert M_13_OK=0,M_13_Warning=0,M_13_numnodes=1,M_13_state=2; > alert M_18_OK=1,M_18_Warning=0,M_18_numnodes=1,M_18_state=0; > alert M_21_OK=0,M_21_Warning=1,M_21_numnodes=1,M_21_state=1; 2003/08/07 05:17:27 NB000I Rule "P_2" fired > - echo "Policy fired" >> /tmp/policy_test.txt > alert M_24_OK=1,M_24_Warning=0,M_24_numnodes=1,M_24_state=0; 2003/08/07 05:17:27 NB000I Rule "P_2" fired > - echo "Policy fired" >> /tmp/policy_test.txt > alert M_25_OK=1,M_25_Warning=0,M_25_numnodes=1,M_25_state=0; 2003/08/07 05:17:27 NB000I Rule "P_2" fired > - echo "Policy fired" >> /tmp/policy_test.txt > alert M_19_OK=6,M_19_Warning=0,M_19_numnodes=38,M_19_state=2; 2003/08/07 05:17:29 NB000I Rule "P_2" fired > - echo "Policy fired" >> /tmp/policy_test.txt > alert M_22_OK=0,M_22_Warning=0,M_22_numnodes=1,M_22_state=2; 2003/08/07 05:17:31 NB000I Rule "P_2" fired > - echo "Policy fired" >> /tmp/policy_test.txt Gilbert Hyatt |
From: Trettevik, Ed A <ed....@bo...> - 2003-08-06 20:33:11
|
Looks like it is working as I would expect. You have created a nice = illustration of the difference between ON and IF rules with respect to = the ALERT command, and it looks like you fully understand that = difference. In your second alert, all 3 ON rules fired, but only 1 IF = rule fired. Two IF rules didn't respond as you expected. The IF rule = that fired is the one defined in the context you alerted. The 2 defined = in the parent context did not fire because that context was not alerted. = In your third alert, no ON rule fired, as you expected, because the = conditions did not transition to a true state---they were already there. = Again, because only 1 IF rule was alerted, only 1 fired. You would = have to alert the parent context to trigger the other two rules. =20 I'm not sure what will work best in your case, but here are some ideas = to consider when organizing rules for multiple nodes. 1) When alerting a context devoted to a single node, that context can = alert other contexts. In the following example, the M_21 context alerts = the root context ("@") every time M_21 is alerted. It could be = conditional---just replace (1) with a condition.=20 M_21 define P_root if(1):@ alert node=3D"21"; 2) When alerting a high level context, that context can alert node = specific contexts. I'm not sure if numnodes was intended as a node = specific variable, but I'm making the assumption that both Warning and = numnodes may be used as both root variables and node specific variables. define P_node if(1):$ m_$${node} alert = Warning=3D$${Warning},numnodes=3D$${numnodes}; alert node=3D"21",Warning=3D1,numnodes=3D1; 3) A cache (table) can often be used to avoid node specific rules. The = following example monitors for 5, 10 and 30 nodes having the same = problem within a 2 hour period. You can add and subtract problem and = node names to your environment without changing these rules. define cProblemNode context cache(~(2h),problem[5,10,30],node); cProblemNode define P1 if(problem._kidState):$ - echo "$${problem} on = $${problem._kids} nodes" >> /tmp/policy.txt cProblemNode assert ("Degraded","web1"); cProblemNode assert ("Degraded","web2"); cProblemNode assert ("Degraded","db1"); If you want different caching intervals or thresholds for different sets = of problems, just create multiple caches and assert specific types of = problems to the right cache. define cProblemNodeA context cache(~(4h),problem[100,200,1000],node); cProblemNode assert ("Degraded","web1"); cProblemNodeA assert ("TrivialSomething","web1"); 4) Node specific conditions can be represented by cache tables. define NodeRequired context cache(node); define r1 if(NodeRequired(node)):$ - echo "$${problem} on $${node}" >> = /tmp/policy.txt =09 NodeRequired assert ("21"); alert node=3D"21",problem=3D"DroppedTransaction"; 5) If you have a set of rules that you want to repeat and maintain for = multiple entities, SOURCE a rule file using symbolic substitution. source nodeRules.nb node=3D"abc"; source nodeRules.nb node=3D"xyz"; The sourced file uses %{} for symbolic substitution of = parameters---different than symbolic substitution using context = variables where the notation is ${}. define %{node} context; %{node} define r1 on(%{node}.a=3D1 and %{node}.b=3D2); %{node} define r2 on(%{node}.c=3D1 and %{node}.d<5); If your node names include special characters (other than "'") you may = use quoted names for your contexts. define '%{node}' context; '%{node}' define r1 on('%{node}'.a=3D1 and '%{node}'.b=3D2); '%{node}' define r2 on('%{node}'.c=3D1 and '%{node}'.d<5);=09 source nodeRules.nb node=3D"humpty-dumpty.mothergoose.com"; 'humpty-dumpty.mothergoose.com' assert a=3D1,b=3D2,c=3D1,d=3D0; Don't know if this last example addresses your syntax concern. If not, = please describe the problem in more detail and I'll give it another = shot. -Ed -----Original Message----- From: gh...@rl... [mailto:gh...@rl...] Sent: Tuesday, August 05, 2003 12:25 PM To: nod...@li... Subject: Is this a bug, or am I not getting it.... Below is a block of code, and the out come. Tell me if you see = something wrong.I made up 6 case where all the rules should fire, but not all of = them do. On the second alert only 4 out of 6 fire. On the third alert only 1 out of 3 fire. I am looking at this becuase I want to make a context per node, and have rules that include multiple nodes, so multiple contexts. The easiest = syntax would be P_1a/b. P_2a/b work, but is to limited. -Gilbert. ------------------- CODE ---------------------------- define M_21 context; #P_1 define P_1a on(M_21.Warning>=3DM_21.numnodes):- echo "Policy fired 1a" = >> /tmp/policy.txt define P_1b if(M_21.Warning>=3DM_21.numnodes):- echo "Policy fired 1b" = >> /tmp/policy.txt #P_2 M_21 define P_2a on(Warning>=3Dnumnodes):- echo "Policy fired 2a" >> /tmp/policy.txt M_21 define P_2b if(Warning>=3Dnumnodes):- echo "Policy fired 2b" >> /tmp/policy.txt #P_3 alert T_3=3D=3D(M_21.Warning>=3DM_21.numnodes); define P_3a on(T_3):- echo "Policy fired 3a" >> /tmp/policy.txt define P_3b if(T_3):- echo "Policy fired 3b" >> /tmp/policy.txt M_21 alert Warning=3D0,numnodes=3D1; M_21 alert Warning=3D1,numnodes=3D1; M_21 alert Warning=3D1,numnodes=3D1; show -cells; show P_1a; show P_1b; show M_21.P_2a; show M_21.P_2b; show P_3a; show P_3b; -------------- Output ------------------------ --- 4 out of 6 2003/08/05 01:45:43 NB000I Rule "@.P_3a" fired > - echo "Policy fired 3a" >> /tmp/policy.txt 2003/08/05 01:45:43 NB000I Rule "P_2a" fired > - echo "Policy fired 2a" >> /tmp/policy.txt 2003/08/05 01:45:43 NB000I Rule "@.P_1a" fired > - echo "Policy fired 1a" >> /tmp/policy.txt 2003/08/05 01:45:43 NB000I Rule "P_2b" fired > - echo "Policy fired 2b" >> /tmp/policy.txt --- 1 out of 3 2003/08/05 01:59:34 NB000I Rule "P_2b" fired > - echo "Policy fired 2b" >> /tmp/policy.txt T_3 =3D 1 =3D=3D (M_21.Warning>=3DM_21.numnodes) M_21.numnodes =3D 1 M_21.Warning =3D 1 M_21.OK =3D 0 @> P_1a on(M_21.Warning>=3DM_21.numnodes):- echo "Policy fired 1a" >> /tmp/policy.txt @> P_1b if(M_21.Warning>=3DM_21.numnodes):- echo "Policy fired 1b" >> /tmp/policy.txt @> M_21.P_2a on(M_21.Warning>=3DM_21.numnodes):- echo "Policy fired 2a" = >> /tmp/policy.txt @> M_21.P_2b if(M_21.Warning>=3DM_21.numnodes):- echo "Policy fired 2b" = >> /tmp/policy.txt @> P_3a on(T_3):- echo "Policy fired 3a" >> /tmp/policy.txt @> P_3b if(T_3):- echo "Policy fired 3b" >> /tmp/policy.txt |
From: <gh...@rl...> - 2003-08-06 19:06:11
|
> Below is a block of code, and the out come. Tell me if you see something > wrong.I made up 6 case where all the rules should fire, but not all of > them do. > > On the second alert only 4 out of 6 fire. > On the third alert only 1 out of 3 fire. > > I am looking at this becuase I want to make a context per node, and have > rules that include multiple nodes, so multiple contexts. The easiest > syntax would be P_1a/b. P_2a/b work, but is to limited. > > -Gilbert. > > ------------------- CODE ---------------------------- > define M_21 context; > > #P_1 > define P_1a on(M_21.Warning>=M_21.numnodes):- echo "Policy fired 1a" >> > /tmp/policy.txt > define P_1b if(M_21.Warning>=M_21.numnodes):- echo "Policy fired 1b" >> > /tmp/policy.txt > > #P_2 > M_21 define P_2a on(Warning>=numnodes):- echo "Policy fired 2a" >> > /tmp/policy.txt > M_21 define P_2b if(Warning>=numnodes):- echo "Policy fired 2b" >> > /tmp/policy.txt > > #P_3 > alert T_3==(M_21.Warning>=M_21.numnodes); > define P_3a on(T_3):- echo "Policy fired 3a" >> /tmp/policy.txt > define P_3b if(T_3):- echo "Policy fired 3b" >> /tmp/policy.txt > > M_21 alert Warning=0,numnodes=1; > M_21 alert Warning=1,numnodes=1; > M_21 alert Warning=1,numnodes=1; > > show -cells; > show P_1a; > show P_1b; > show M_21.P_2a; > show M_21.P_2b; > show P_3a; > show P_3b; > > -------------- Output ------------------------ > --- 4 out of 6 > 2003/08/05 01:45:43 NB000I Rule "@.P_3a" fired > > - echo "Policy fired 3a" >> /tmp/policy.txt > > 2003/08/05 01:45:43 NB000I Rule "P_2a" fired > > - echo "Policy fired 2a" >> /tmp/policy.txt > > 2003/08/05 01:45:43 NB000I Rule "@.P_1a" fired > > - echo "Policy fired 1a" >> /tmp/policy.txt > > 2003/08/05 01:45:43 NB000I Rule "P_2b" fired > > - echo "Policy fired 2b" >> /tmp/policy.txt > > --- 1 out of 3 > 2003/08/05 01:59:34 NB000I Rule "P_2b" fired > > - echo "Policy fired 2b" >> /tmp/policy.txt > > > T_3 = 1 == (M_21.Warning>=M_21.numnodes) > M_21.numnodes = 1 > M_21.Warning = 1 > M_21.OK = 0 > > @> P_1a on(M_21.Warning>=M_21.numnodes):- echo "Policy fired 1a" >> > /tmp/policy.txt > @> P_1b if(M_21.Warning>=M_21.numnodes):- echo "Policy fired 1b" >> > /tmp/policy.txt > > @> M_21.P_2a on(M_21.Warning>=M_21.numnodes):- echo "Policy fired 2a" >> > /tmp/policy.txt > @> M_21.P_2b if(M_21.Warning>=M_21.numnodes):- echo "Policy fired 2b" >> > /tmp/policy.txt > > @> P_3a on(T_3):- echo "Policy fired 3a" >> /tmp/policy.txt > @> P_3b if(T_3):- echo "Policy fired 3b" >> /tmp/policy.txt > > |
From: <gh...@rl...> - 2003-08-05 19:25:43
|
Below is a block of code, and the out come. Tell me if you see something wrong.I made up 6 case where all the rules should fire, but not all of them do. On the second alert only 4 out of 6 fire. On the third alert only 1 out of 3 fire. I am looking at this becuase I want to make a context per node, and have rules that include multiple nodes, so multiple contexts. The easiest syntax would be P_1a/b. P_2a/b work, but is to limited. -Gilbert. ------------------- CODE ---------------------------- define M_21 context; #P_1 define P_1a on(M_21.Warning>=M_21.numnodes):- echo "Policy fired 1a" >> /tmp/policy.txt define P_1b if(M_21.Warning>=M_21.numnodes):- echo "Policy fired 1b" >> /tmp/policy.txt #P_2 M_21 define P_2a on(Warning>=numnodes):- echo "Policy fired 2a" >> /tmp/policy.txt M_21 define P_2b if(Warning>=numnodes):- echo "Policy fired 2b" >> /tmp/policy.txt #P_3 alert T_3==(M_21.Warning>=M_21.numnodes); define P_3a on(T_3):- echo "Policy fired 3a" >> /tmp/policy.txt define P_3b if(T_3):- echo "Policy fired 3b" >> /tmp/policy.txt M_21 alert Warning=0,numnodes=1; M_21 alert Warning=1,numnodes=1; M_21 alert Warning=1,numnodes=1; show -cells; show P_1a; show P_1b; show M_21.P_2a; show M_21.P_2b; show P_3a; show P_3b; -------------- Output ------------------------ --- 4 out of 6 2003/08/05 01:45:43 NB000I Rule "@.P_3a" fired > - echo "Policy fired 3a" >> /tmp/policy.txt 2003/08/05 01:45:43 NB000I Rule "P_2a" fired > - echo "Policy fired 2a" >> /tmp/policy.txt 2003/08/05 01:45:43 NB000I Rule "@.P_1a" fired > - echo "Policy fired 1a" >> /tmp/policy.txt 2003/08/05 01:45:43 NB000I Rule "P_2b" fired > - echo "Policy fired 2b" >> /tmp/policy.txt --- 1 out of 3 2003/08/05 01:59:34 NB000I Rule "P_2b" fired > - echo "Policy fired 2b" >> /tmp/policy.txt T_3 = 1 == (M_21.Warning>=M_21.numnodes) M_21.numnodes = 1 M_21.Warning = 1 M_21.OK = 0 @> P_1a on(M_21.Warning>=M_21.numnodes):- echo "Policy fired 1a" >> /tmp/policy.txt @> P_1b if(M_21.Warning>=M_21.numnodes):- echo "Policy fired 1b" >> /tmp/policy.txt @> M_21.P_2a on(M_21.Warning>=M_21.numnodes):- echo "Policy fired 2a" >> /tmp/policy.txt @> M_21.P_2b if(M_21.Warning>=M_21.numnodes):- echo "Policy fired 2b" >> /tmp/policy.txt @> P_3a on(T_3):- echo "Policy fired 3a" >> /tmp/policy.txt @> P_3b if(T_3):- echo "Policy fired 3b" >> /tmp/policy.txt |
From: Trettevik, Ed A <ed....@bo...> - 2003-07-07 17:59:45
|
Hi Gilbert, Sorry my response is not timely, I've been away on vacation. I probably don't understand your application properly yet, but I'll try = to provide a couple examples. You can correct me if my examples don't = fit your problem. =20 Suppose you have three services that you want to monitor: s1, s2, and = s3. Let's say s3 depends on s1 and s2, while s2 depends on s1. We'll = say s1 is independent. If I understand correctly, you only want to = report s3 down if s1 and s2 are up. Similarly, you only want to report = s2 down, if s1 is up. =20 Let's assume you have some simple method of representing your = requirements. For example, you might represent your requirements as = follows in a file you name service.cfg. s1=3D"Service 1" s2=3D"Service 2":s1 s3=3D"Service 3":s1,s2 Now you want to convert this into NodeBrain, preferably with a script or = program you construct. Example 1: In this example, we keep the NodeBrain code simple by assuming you have = a command called "checkService" that can check the status of several = monitored services and report the status to your NodeBrain agent with a = single assertion. Suppose you use this command as follows. (Alternatively you could = reference service.cfg.) checkService s1 s2 s3 If your NodeBrain agent is defined as "mynb" in your private.nb file, = you could assert the current states as follows, where a value of 1 = represents UP and a value of 0 represents DOWN.=20 nb ":>mynb service assert s1=3D1,s2=3D0,s3=3D0" Based on these states, I'm thinking you would like to be notified that = s2 is down, but not that s3 is down because the dependencies are not = satisfied. This could be accomplished by the following NodeBrain rules to be = included in your agent configuration file, perhaps sourced from a = separate file called service.nb. define service context; # Check status of services every 5 minutes service define schedule on(~(5m)):=3DcheckServices s1 s2 s3 # Alarm when service is down with dependencies up. service define r1 on(s1=3D0):=3Dalarm "Service 1 is down" service define r2 on(s2=3D0 and s1):=3Dalarm "Service 2 is down" service define r3 on(s3=3D0 and s1 and s2):=3Dalarm "Service 3 is down" Notice the relationship of the conditions for rules r1, r2, and r3 to = the hypothetical configuration file above. You must provide a host = command "alarm" to perform the required notification, changing the name = and syntax as desired. Example 2: In this example, we'll complicate the NodeBrain code a bit---no big deal = if you are generating it with a script or program. This time we assume = you have a script or program called checkService that checks and reports = the status of a single service, and that you only want to execute it for = a given service when the dependencies are satisfied. To check the status of s1 the following command might be used. checkService s1 To report the status of s1 as UP the following nb command would be used = by checkService. nb ":>mynb service s1.up=3D1" For this example we'll use a context containing a separate context for = each monitored service. The context for a given service contains a set = of rules and defined cells. # context containing a set of contexts---one for each monitored service define service context;=20 # Service 1 context service define s1 context;=20 # dependencies for s1 - always satisfied - see dependencies for s2 and = s3 below service.s1 define dep cell 1; # schedule to check service status=20 service.s1 define sched cell (dep and ~(5m)); # status is no longer known (see note [1] below) service.s1 define rExpire on(sched):assert up=3D?; # check service status=20 service.s1 define rCheck on(sched):=3DcheckService s1 # note when service is down service.s1 define down cell (dep and not up); # respond when service goes down (see note [2] below) service.s1 define rAlarm on(down ^ not down):=3Dalarm "Service 1 is = down"; service define s2 context; service.s2 define dep cell s1.up; service.s2 define sched cell (dep and ~(5m)); service.s2 define rExpire on(sched):assert up=3D?;=20 service.s2 define rCheck on(sched):=3DcheckService s2 service.s2 define down cell (dep and not up); service.s2 define rAlarm on(down ^ not down):=3Dalarm "Service 2 is = down"; service define s3 context; service.s3 define dep cell s1.up and s2.up; service.s3 define sched cell (dep and ~(5m)); service.s3 define rExpire on(sched):assert up=3D?;=20 service.s3 define rCheck on(sched):=3DcheckService s2 service.s3 define down cell (dep and not up); service.s3 define rAlarm on(down ^ not down):=3Dalarm "Service 3 is = down"; [1] The way we've scheduled the checkService commands, they can all run = concurrently and we don't know in what order they will report status = back to NodeBrain. By setting the status to ? ("Unknown") at the same = time we spawn the checkStatus command, we ensure that no response is = taken until the status of a service and all dependencies is current. [2] We introduced a variable called "down" to use as a basis for = alarming. This variable is true only when the status of a service is = known to be DOWN and the status of all dependencies is known to be UP. = In our alarm rule we use the "down" variable with the "flip flop" = operator ("^") to make sure it only toggles on known conditions. Since = the value of "up" for a given service and all dependencies will take on = the value of ? ("Unknown"), the value of "down" will also take on the = unknown value. But only known conditions will toggle the "flip flop" = condition. So the rule will only fire when the known status transitions = to DOWN (with dependencies known to be UP) when the previous known = condition was UP. Agent Configuration Example: In either case above, let's say your rules are stored as service.nb. = Your agent configuration file might look like this. #!/usr/bin/nb set log=3D"/home/myuser/log/myagent.log"; set out=3D"/home/myuser/out"; define l1 listener protocol=3D"NBP",port=3D12345; # you pick the port source /home/myuser/service.nb; Your private.nb file might look like this. (The identity string will be = different.) declare myid identity 3.3575658473647a8b34.3434578934738473.0; # = generate with the identity command portray myid; declare mynb brain myid@localhost:12345; Hopefully this will get you started. Let me know if I've misunderstood = your application. Ed Trettevik -----Original Message----- From: gh...@rl... [mailto:gh...@rl...] Sent: Thursday, June 26, 2003 4:14 PM To: nod...@li... Subject: [Nodebrain-users] Help! Examples wanted. Hello, I just finished reading the User Manual and nodebrain looks great, could = you help me with afew examples. I would like to monitor items with a dependence tree, where I would not report items when one of the things it dependences on is down. Nodebrain is very different than the programming I am use to, and does = not come naturally. The group I work for would like me to put together the PROS/CONS to = using it for a meeting tomarrow, and I did not like I have enough knowledge yet = to do it justice. An example or three would really help. I want to schedule monitors if = the dependences are not down, and report back status on the items. Thanks for all your work, Gilbert. ------------------------------------------------------- This SF.Net email is sponsored by: INetU Attention Web Developers & Consultants: Become An INetU Hosting Partner. Refer Dedicated Servers. We Manage Them. You Get 10% Monthly Commission! INetU Dedicated Managed Hosting http://www.inetu.net/partner/index.php _______________________________________________ Nodebrain-users mailing list Nod...@li... https://lists.sourceforge.net/lists/listinfo/nodebrain-users |
From: <gh...@rl...> - 2003-06-26 23:14:20
|
Hello, I just finished reading the User Manual and nodebrain looks great, could you help me with afew examples. I would like to monitor items with a dependence tree, where I would not report items when one of the things it dependences on is down. Nodebrain is very different than the programming I am use to, and does not come naturally. The group I work for would like me to put together the PROS/CONS to using it for a meeting tomarrow, and I did not like I have enough knowledge yet to do it justice. An example or three would really help. I want to schedule monitors if the dependences are not down, and report back status on the items. Thanks for all your work, Gilbert. |
From: <li...@ho...> - 2003-03-20 23:33:20
|
ok.. trying to move this to the list ;-) Trettevik, Ed A wrote: > Hi Ian, > > 1) Loadable module interface: In general I like the idea of providing a loadable module interface, but I may be getting twisted around a bit in understanding your idea. Do you want to load NodeBrain or have NodeBrain load your code? Do you want to interact with a cell or be a cell? Somehow I have a feeling you're going to say "both to both". :) And that's probably the right answer. > > You may want to take a look at nbmain.c in the patch release I just put out there. It provides a simple C API. It isn't what you want, but may give you a chance to experiment some. > I was after a way to "be" the cell, i never thought of nbmain as being rules engine.. would be nice. The other thing would be to have a agent API to communicate to a running brain, so I wouldn't have to spawn a process every time. > Hint: > define r1 on(~=a):$ +hey ian a=$${a} > assert a=1; > assert a=2; > assert a="abc"; > > Oops, now I need to check to see if the prefix operator ~= is documented. It responds to any change. > > 2) Anomaly detection: I have actually been giving this one a bit of thought. I just scanned the link you provided, so I'm not sure, but it looks like something I have been planning on including. An exponentially weighted moving average is extremely efficient to compute in a real-time monitor, and a similar technique can be applied to calculate a moving deviation. The control limits can then be expressed in deviations from the moving average independent of magnitude, and bias between past and recent history can be adjusted with a selectable factor. This is standard stuff I've studied a bit at http://www.itl.nist.gov/div898/handbook/index.htm and want to apply to NodeBrain. But I need to give more thought to the actual implementation. It would not be difficult to generate time series by hour or minute of day within day of week. This would incorporate the typical human-schedule based variations, except holidays. However, the real problem in many environments is more c omplex than that. I'm thinking we might need to split out separate time series based on other variables that are not scheduled. For example, in a load balanced environment, the number of servers running may have a major impact on the expected load on the other servers. If the number of servers was a factor in splitting out separate time series, the software could learn "normal" even under a "two server down" condition. Provided we alarm on server downs, we could avoid generating alarms on every other server taking on a larger load. The software could recognize it as a normal load with two servers down. I need to play with it a bit to see if this can be done without making life overly complex for the rule coder. Do you think this is something you would apply? > Thanks for the link.. I'll have a look at it. we're just a 'simple' web site in a lot of respects, I'm sure a lot of other data centers would require pretty complex setups. > Ed > > > regards Ian > > > > -----Original Message----- > From: Ian Holsman [mailto:Ian...@cn...] > Sent: Thursday, March 20, 2003 12:23 PM > To: Trettevik, Ed A > Subject: [Fwd: [Nodebrain-announce] NodeBrain 0.5.2 Released - Numskull > Patch] > > > hey ed, > two more things... > > > 1. I was thinking maybe a loadable module interface. > the module would define some cells, so that instead of > firing off a perl script I could just query the cell (which > would know how to figure out the value, and could fire event > when the cell changes)... eg.. a cpu.sys% or dbms.num_users > > 2. One of the major problems I have is that we get very seasonal > traffic. ..so at 3AM 'normal' might be 100, but at 12pm that would > be an error condition. now.. I understand the nodebrain can be setup > to handle this. but I was wondering if there could be some forcecasting > built in (see http://cricket.sourceforge.net/aberrant/) so that nodebrain > could 'know' what normal was. > > > oh.. I found another tool you might be interested in PCP (http://oss.sgi.com/projects/pcp/) > it is more of a monitoring tool, but it contains a rule-based mechansim for alerting > > eg.. > some_host ( > ($SWAP.free $HOSTS / $SWAP.length $HOSTS) * 100 < 50 && > ($SWAP.free $HOSTS / $SWAP.length $HOSTS) * 100 >= 25 > ) -> print 10 min "swap more than half-full: " "%h: %v% free " & > shell 10 min "rsh -n guest@%h /sbin/ps -eo > 'ruser=UID,pid=PID,ppid=PPID,pcpu=%CPU,sz=9999999SZ,rss=RSS,stime=STIME,time=TIME,args=CMD' | sort > +4 -nr | sed -e 's/9999999SZ / SZ:/' | /usr/sbin/Mail -s '%h swap more than half-full (%v% > free)' $MINDER &"; > > > but it is pretty complex to setup properly > > regards > Ian. > > > -------- Original Message -------- > Subject: [Nodebrain-announce] NodeBrain 0.5.2 Released - Numskull Patch > Date: Thu, 20 Mar 2003 09:55:59 -0800 > From: Trettevik, Ed A <ed....@bo...> > Organization: Holsman.NET > Newsgroups: other > > Patch release 0.5.2 for version 0.5 (Numskull) is now available for download. This release corrects > some minor differences between the code and documentation. In addition, the source now includes a > makefile and supports compilation on Mac OS X (Darwin). A simple prototype C API was included as > part of the minor restructuring needed to support the makefile. > > > ------------------------------------------------------- > This SF.net email is sponsored by: Tablet PC. > Does your code think in ink? You could win a Tablet PC. > Get a free Tablet PC hat just for playing. What are you waiting for? > http://ads.sourceforge.net/cgi-bin/redirect.pl?micr5043en > _______________________________________________ > Nodebrain-announce mailing list > Nod...@li... > https://lists.sourceforge.net/lists/listinfo/nodebrain-announce > |
From: Trettevik, Ed A <ed....@bo...> - 2003-03-19 16:29:06
|
Hi Ian, =20 I don't seem to be receiving mail sent to these lists---guess I need to = figure that out. =20 On Encryption: =20 NodeBrain uses AES (Rijndael) for data encryption, so it isn't my own = encryption. http://csrc.nist.gov/CryptoToolkit/aes/rijndael/ =20 For peer authentication, NodeBrain uses the RSA public/private key = encryption algorithm, again not my own. However, the authentication = protocol is unique to NodeBrain, and I think it is appropriate to = question the decision to create a new authentication protocol. I agree = that both SSH and SSL would provide a good foundation for NodeBrain = communication. I'm open to moving in that direction by adding one or = both as an option and then perhaps dropping some existing code. If = someone has worked with SSH or SSL code before and wants to help out, = that would be great. If not, I'll get around to it eventually. =20 If you are concerned about NodeBrain protocol (NBP), or prevented from = using it in your environment, you can run NodeBrain without using an NBP = listener, or you can bind the listener to the localhost interface to = avoid remote access. You can then code your NodeBrain rules to execute = ssh or scp commands to communicate over the network. =20 define in listener type=3D"NBP",interface=3D"127.0.0.1",port=3D12345; # = no remote access=20 -or-=20 define in listener = type=3D"NBQ",brain=3D"brainname",schedule=3D=3D~(30s); # no socket = connections =20 define r1 on(a=3D1 and b=3D2):-scp mytransactions = mya...@my...:. [use sshd]=20 -or-=20 define r1 on(a=3D1 and b=3D2):-myscript "my transaction" [use whatever] = On SEC: =20 No, I was not aware of SEC, and I appreciate your bringing it to my = attention. I took a quick scan at the link you provided. I don't know = enough yet to provide a proper comparison, but I'll comment anyway. :) I = think SEC and NodeBrain address the same problem space, event monitoring = and correlation, but with very different approaches to rule syntax. I = can't speak without bias on this subject, but it appears at first glance = that SEC has more variety in rule structure, a sign that rule coding is = done at a higher level. A rule type in SEC seems to identify a "logic = template", with specific types of parameters. NodeBrain rule syntax is = more general (if I'm understanding SEC correctly), allowing/requiring = users to define their own "logic templates" using source files and = symbolic substitution. The following NodeBrain code is valid syntax for = a new rule type that I'm just making up right now as an example. define myfilesys context; # Sample file system utilization monitor source myfilesysrule.nb = filesys=3D"/var",warnPercent=3D80,criticalPercent=3D90,interval=3D"30m"; source myfilesysrule.nb filesys=3D"/opt"; =20 With NodeBrain as is, one would have to define this new type of rule by = placing something like the following in the file called myfilesysrule.nb default warnPercent=3D75,criticalPercent=3D92,interval=3D"2h"; myfilesys define '%{filesys}' context; myfilesys.'%{filesys}' define percentUsed cell; myfilesys.'%{filesys}' define r1 on(~(%{interval})):-myfilesyschecker.pl = %{filesys} myfilesys.'%{filesys}' define r2 = on(percentUsed>=3D%{warnPercent}):-myalarm.pl "%{filesys} ..." myfilesys.'%{filesys}' define r3 = on(percentUsed>=3D%{criticalPercent}):-myalarm.pl "%{filesys} ..."=20 After symbolic substitution, the source commands expand as follows. > define myfilesys context; > source myfilesysrule.nb = filesys=3D"/var",warnPercent=3D80,criticalPercent=3D90,interval=3D"30m"; = > default warnPercent=3D75,criticalPercent=3D92,interval=3D"2h"; > myfilesys define '/var' context;=20 > myfilesys.'/var' define percentUsed cell;=20 > myfilesys.'/var' define r1 on(~(30m)):-myfilesyschecker.pl /var=20 > myfilesys.'/var' define r2 on(percentUsed>=3D80):-myalarm.pl "/var = ..."=20 > myfilesys.'/var' define r3 on(percentUsed>=3D90):-myalarm.pl "/var = ..."=20 2003/03/19 08:05:05 NB000I Rule file "myfilesysrule.nb" included. = size=3D426 > source myfilesysrule.nb filesys=3D"/opt" > default warnPercent=3D75,criticalPercent=3D92,interval=3D"2h"; > myfilesys define '/opt' context;=20 > myfilesys.'/opt' define percentUsed cell;=20 > myfilesys.'/opt' define r1 on(~(2h)):-myfilesyschecker.pl /opt=20 > myfilesys.'/opt' define r2 on(percentUsed>=3D75):-myalarm.pl "/opt = ..."=20 > myfilesys.'/opt' define r3 on(percentUsed>=3D92):-myalarm.pl "/opt = ..."=20 2003/03/19 08:05:05 NB000I Rule file "myfilesysrule.nb" included. = size=3D426 =20 Once you build a file like myfilesysrule.nb, you can think of the source = command as a higher level rule type of your creation, based on three = NodeBrain rules (r1,r2, and r3). It seems like SEC has predefined rule = types. I don't see anything wrong with that if SEC provides all the rule = types you need. There may also be a way of extending the rule types that = I didn't see. It also looks like SEC is more focused on log file monitoring than = NodeBrain. I would not advocate replacing working SEC rules with = NodeBrain rules for log file monitoring. But I can imaging someone = using SEC as an event source for NodeBrain event correlation. In that = way, they would not compete, but compliment each other. On right way to use NodeBrain?=20 Yes, your intended use is exactly the way I first applied NodeBrain. I = created a Unix System Monitor Kit with NodeBrain and a set of Perl = scripts. You can schedule your Perl scripts with cron, or NodeBrain. I = use NodeBrain when I want to schedule on conditions more than time, or = when I want to keep the scheduling rule with the response rules as shown = in the myfilesysrule.nb file above. In this example, I would call the = script myfilesyschecker.pl a "probe". It must find or compute = information and reports it back to the agent as follows. =20 # Perl code to report a value to a NodeBrain agent (myagent) system("nb \":>myagent assert myfilesys.'/var'.percent=3D75;\""); =20 I have Perl scripts that I call "alarm adapters" for sending email = notification, text pages, snmp traps, and alerts into a couple different = commercial event management systems. I also have Perl scripts called = "configuration adapters" that generate NodeBrain rules from = configuration files unique to the kit. Hopefully you will find NodeBrain = useful for building a similar kit for your own environment. Eventually = it would be nice if other projects adopted NodeBrain as a rule engine, = and shared rule sets and related scripts. =20 I'll try to find time to post sample rules on the web site. I can expand = on the example above to show how to avoid multiple alarms using a reset = threshold and the flip-flop operator. And I expect more examples using = event caches would be helpful. =20 Thanks for your comments and questions, and let me know if you run into = problems with NodeBrain. =20 Ed Trettevik <ea...@no...> |
From: <li...@ho...> - 2003-03-15 17:41:03
|
I just joined this list, and am still working through the user guide. as Benoit mentioned, doing your own encyrption worried me as well, not that there is anything wrong with it, but it makes for a discussion point with the security people and admins as they are nervous people. What I would like to see is node-brain make use of SSH if possible. there are implementations on most platforms, people have keys already, and nodebrain could remove most of the protocol work. (or even use something as simple as 'stunnel' which does the SSL for you) Also some sample rule-bases would be good as well. I was also wondering if you have seen SEC (http://www.estpak.ee/~risto/sec/) and was wondering on how it compares to nodebrain. What I intend to use nodebrain for is to integrate with our existing machine performance monitoring (a perl script looking at /proc basically) and have that script raise the 'events'. Is this the right way of using nodebrain? Regards Ian |
From: Trettevik, Ed A <ed....@bo...> - 2003-03-10 23:34:37
|
Thank you for the comments. A project request to include a makefile has = been entered. With respect to SSL, it may be a better way to go, = especially if major deficiencies are discovered in the NodeBrain = encryption code. The current design was selected before deciding to go = open source with NodeBrain. At that time, I wanted to have full control = and independence. I didn't want to have something larger than needed = and possibility more difficult to port to new platforms, or something = that might go through revisions that NodeBrain would have to chase. I = assumed any approach other than writing it myself would introduce these = problems, which may not be correct for SSL implementations. And, after = all, NodeBrain is dependent on a C compiler and libraries, so what's = another library? I will reconsider this issue if there are security = deficiencies in the current method or it makes it less attractive for = use by others. Thanks for bringing up this question. Ed Trettevik <ea...@no...> -----Original Message----- From: Benoit DOLEZ [mailto:bd...@an...] Sent: Monday, March 10, 2003 1:08 AM To: nod...@li... Cc: Trettevik, Ed A Subject: RE: project interest Hi, Thank for your mail. Your examples will help me in building rules... About the count of line, it was a bad example. I have many sort of data source: - syslog files - virus log files - host monitoring (delay, up/down, ...) - ... And I have to centralize these data for many servers. For the moment, I use echelog but I have to run my own script to split mesure and put them in rrdtool db, split syslog lines to retrieve number of email/days, to look at rejected/accepted/dropped lines in firewall netfilter logs and more ... For now, I don't known how doing correlation rules with these data. With a friend, we have defined a language that have many common points with yours. We are thinking this is not a good idea to rebuild a project that exist. So I prefer to test and give you new ideas / patch to put on your project. I do that for echelog project and I think it is a work = fine. I do not want to run a undetermined number of processus on the log = server and perl is very heavy for memory and cpu. I have read all of your doc (very good work), but my are you using your own encryption, why don't you use SSL with certificates to identify hosts? I propose to build a Makefile. All your source files are loaded in a the = nb.c. Is it in your todo list? Benoit --=20 Benoit DOLEZ GSM: +33 6 21 05 91 69 mailto:bd...@an... |
From: Benoit D. <bd...@an...> - 2003-03-10 09:09:17
|
Hi, Thank for your mail. Your examples will help me in building rules... About the count of line, it was a bad example. I have many sort of data source: - syslog files - virus log files - host monitoring (delay, up/down, ...) - ... And I have to centralize these data for many servers. For the moment, I use echelog but I have to run my own script to split mesure and put them in rrdtool db, split syslog lines to retrieve number of email/days, to look at rejected/accepted/dropped lines in firewall netfilter logs and more ... For now, I don't known how doing correlation rules with these data. With a friend, we have defined a language that have many common points with yours. We are thinking this is not a good idea to rebuild a project that exist. So I prefer to test and give you new ideas / patch to put on your project. I do that for echelog project and I think it is a work fine. I do not want to run a undetermined number of processus on the log server and perl is very heavy for memory and cpu. I have read all of your doc (very good work), but my are you using your own encryption, why don't you use SSL with certificates to identify hosts? I propose to build a Makefile. All your source files are loaded in a the nb.c. Is it in your todo list? Benoit -- Benoit DOLEZ GSM: +33 6 21 05 91 69 mailto:bd...@an... |