nodebrain-announce Mailing List for NodeBrain
Rule Engine for State and Event Monitoring
Brought to you by:
trettevik
You can subscribe to this list here.
2003 |
Jan
|
Feb
|
Mar
(3) |
Apr
|
May
|
Jun
|
Jul
(1) |
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
---|---|---|---|---|---|---|---|---|---|---|---|---|
2014 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
(1) |
From: Ed T. <ea...@no...> - 2014-12-15 00:58:20
|
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"><html xmlns="http://www.w3.org/1999/xhtml"><head> <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /> </head><body> <p>NodeBrainers,</p> <p>Release 0.9.03 is available for download at SourceForge.  See <a target="_blank" href="http://nodebrain.org">http://nodebrain.org</a> for more info.  A Demo link has been added to the project home page.  It takes you to a small demonstration site, <a target="_blank" href="http://demo.nodebrain.org.">http://demo.nodebrain.org</a>.  You can experiment with online changes to the demonstration scenarios. The site currently prevents external interactions between the rule engine, the host system and remote information sources, so it is only a way to study rule engine behavior with a small set of rules and transactions submit via a web form.</p> <p>The goal of the next release is to add support for an "un-clocked play mode', where one or more time stamped transaction files drives a simulation, and a transaction is simply a NodeBrain command (e.g. assert and alert).  The rule engine's internal clock will ignore the system clock and step as quickly as possible in pretend one second increments from the time of a given transaction to the time of the next transaction.  The idea is to process events faster than real-time while making the same decisions that would have been made in real-time.  This mode will be used for regression testing the rule engine and for experimentation and validation of rules by using a well understood and repeatable event stream.  It can also be used by the demonstration site.  A closely related "clocked play mode' will convert the input time stamps to time intervals between transactions and operate in real-time based on the system clock.  This mode will run much slower, but will enable simulations that involve interaction with other components operating in real-time.  This is just a plan based on a concept today.  What is learned while attempting to implement it may cause adjustments to the plan. </p> <p>Ed </p> </body></html> |
From: Trettevik, Ed A <ed....@bo...> - 2003-07-26 22:05:41
|
Release 0.5.4: Patch release 0.5.4 is available for download. This release fixes a = significant defect in the handling of server sockets relative to long = running spawned child processes. If you are running NodeBrain as an = agent (Unix daemon or Windows service), this update is recommended.=20 A couple less important defects have been fixed in the handling of = scheduled deletion of caching table rows, conditions responding to = caching table row expiration, and time conditions.=20 Code has been included for a plugin module C API that is unfinished and = undocumented, but sufficient to illustrate the future direction. (Our = plugin module API will be the theme of release 0.6.)=20 Documentation has been included for using NodeBrain over an SSH tunnel = (http://www.nodebrain.org/tipSshTunnel.html). A Personal Note: I must confess that I'm having difficulty finding time away from my = regular job and family to keep NodeBrain moving forward. To address = this problem, I've scheduled vacation from my regular job for the next = two weeks. The first week will be spent with my family, during which I = may have difficulty responding to email. The second week will be spent = roughing out a 0.6 release. If you have feature requests, please enter = them so they can be considered. From what I've heard from some of you, = perhaps the greatest need at this point is a set of working rules and = agent configurations to help people get started. I'll consider that a = requirement of 0.6 in addition to the C API for plugins that will = clearly require examples as well. Ed Trettevik ea...@no... =20 =20 |
From: Trettevik, Ed A <ed....@bo...> - 2003-03-20 18:38:36
|
Patch release 0.5.2 for version 0.5 (Numskull) is now available for = download. This release corrects some minor differences between the code = and documentation. In addition, the source now includes a makefile and = supports compilation on Mac OS X (Darwin). A simple prototype C API was = included as part of the minor restructuring needed to support the = makefile. |
From: Trettevik, Ed A <ed....@bo...> - 2003-03-10 06:27:21
|
Hi, It seems the nod...@li... list was not set = up properly---I'm still learning how to admin a project on SourceForge. = I did not receive a copy of your note via mail, but stumbled on to it in = the archive. I've created a new list, = nod...@li..., that you may use in the future. From: Benoit DOLEZ <bdolez@an...>=20 Project interest =20 2003-03-06 15:16 =20 Hi, =20 Your project is very interesting. We are looking for something like = that for our usage. The document was not synchronized with source ex : listener declaration, 'type' wasn't recognize, we might use = 'protocol' protocol 'LOG' doesn't work. =20 Could someone send me a sample of file reading an analysing? for example checking for number of line per day. =20 Benoit In response to your question, NodeBrain doesn't directly address your = example problem; that is, NodeBrain will not efficiently count the lines = in a file. (I'll describe an inefficient direct method later.) = Depending on what you want to do with the line counts, NodeBrain may, or = may not, be useful for monitoring it. Let's say you wanted to be = notified if a particular log exceeded 1,000,000 lines in one day. = Without NodeBrain, using your favorite scripting language, you could = write a cron job to issue a "wc -l filename" on daily archives or scan a = file that is not archived daily counting the lines for the previous day. = Your cron job could notify you via email. If you have no special = requirements beyond that, NodeBrain would only complicate the situation. However, there may be situations where you want to correlate the line = count with other information before deciding notification is necessary. = In that case, NodeBrain may be helpful. So I'll give an example, = realizing this may not match your requirement. Agent Script: #!/usr/bin/nb set log=3D"/myap/myagent.log"; # this is your NodeBrain agent's log portray default; # don't use default except to experiment = (insecure) # the following listener only accepts connections from the local machine define ear listener type=3D"NBQ",interface=3D"127.0.0.1",port=3D49001; source /myap/logmon.nb; # include monitor for log lines # source other monitors here ... Monitor Rules: (/myap/logmon.nb) # daily monitor of log file size define logmon context; # context to monitor size of single log # To do multiple logs, repeat these rules replacing logmon with = logmon.'filename'=20 # schedule a probe (note the Perl script performs a very specific and = simple task) logmon define r1 on(~(hour(3))):-/myap/logmon.pl # set a threshold and response (note again a Perl script performs a = specific task) logmon define r2 on(lines>1000000):-/myap/alert.pl "log exceeded 1000000 = lines" Probe Script: (/myap/logmon.pl) #!/usr/bin/perl $size=3D`wc -l /myap/myapp.log`; if($size=3D~/\s*(\d*)\s/){$size=3D$1;} else{$size=3D"?";} # Send the line count to my NodeBrain agent print("/usr/bin/nb \":declare myagent brain default\@localhost:49001;\" \":>myagent assert logmon.lines=3D$size;\""); We would normally declare the brain in our $HOME/.nodebrain/private.nb = file so it would not be necessary to include the declare in the system() = call. And we would use a secure identity instead of default. The notification script, /myap/alert.pl, would do whatever you want. = You need to change the last rule in the monitor to conform to the syntax = for your notification script. =20 I should emphasize here that NodeBrain is not a procedural scripting = language and is not a reasonable alternative to your favorite scripting = language for solving most problems. Clearly this example is more = complicated than just testing for the threshold in the Perl script, = scheduling your script with cron, and leaving NodeBrain out of it. But = if we change the problem a bit, NodeBrain may be quite helpful. Suppose = we are monitoring log size on 50 servers and we want to be notified if = any one exceeds 1,000,000 lines, AND when 5 or more exceed 700,000 = lines. In that case, each server would report the line count to a = NodeBrain agent on a central server and the new condition would be = implemented there. =20 If you elected to only run a NodeBrain agent on the central server and = use cron on the remote servers, you could modify the Perl script = slightly to replace the localhost address with the central server name = and include the remote server name in the variable identifier. system("/usr/bin/nb \":declare master brain = default\@centralservername:49001;\" \":>master logmon.'SERVER' assert lines=3D$size;\""); Now the central server would have rules to monitor the log size on all = 50 remote servers. To monitor for 5, 10, and 20 servers exceeding = 700,000 lines we could add a cache. define cLog7Server context cache({5,10,20}:server); cLog7Server define r1 if(_rowState):-alert.pl "$${_rows} servers have = logs exceeding 7,000,000 lines"=20 We might assert server names to this cache by including the following = rules for each of the 50 servers. logmon.'SERVER' define r3 on(lines>700000):cLogHighServer assert = ("SERVER"); # Is High logmon.'SERVER' define r4 on(lines<700000):cLogHighServer assert = !("SERVER"); # Isn't High =09 Now let's clean this up a bit so we don't have to maintain 50 copies of = these same rules. We can do better than that. Let's have the remote = servers ALERT the central server instead of asserting a value to a = specific variable for each host name. In the monlog.pl script we would = make this change. Replace: >master logmon.'SERVER' assert lines=3D$size; With: >master logmon alert server=3D"SERVER",lines=3D$size; Now we can reduce the 50 sets of rules down to a single set on our = central server. The complete rule set for monitoring the logs on the = central server is shown here. define cLog7Server context cache({5,10,20}:server); cLog7Server define r1 if(_rowState):-alert.pl "$${_rows} servers have = logs exceeding 7,000,000 lines"=20 define logmon context; logmon define server cell; # Name of remote server [Not required = but helps to document.] =20 logmon define lines cell; # Number of lines in log file [Not required = but helps to document.] logmon define r0 if(lines>1000000):$ -/myap/alert.pl "$${server} log at = $${lines} lines" logmon define r1 if(lines>700000):cLog7Server assert(logmon.server); logmon define r2 if(lines<700000):cLog7Server assert !(logmon.server); We also have the option of running a NodeBrain agent on each remote = server and replicating the rules. We could go back to having the script = report line counts to the local agent and then let the local agent only = report to the central server when a threshold is exceeded. The command = prefix ">master" would move from the Perl script to the rule action as = shown below. define logmon context; logmon define server cell; # Name of remote server =20 logmon define lines cell; # Number of lines in log file logmon define r0 if(lines>1000000):$ >master -/myap/alert.pl "$${server} = log at $${lines} lines" logmon define r1 if(lines>700000):>master cLog7Server = assert(logmon.server); logmon define r2 if(lines<700000):>master cLog7Server assert = !(logmon.server); In addition to distributing the monitoring task, this configuration = would also enable the master agent to take corrective action via the = remote agents. I should point out that there is no master/slave concept = in NodeBrain, the agents are peers. However, there can be "management" = server and "managed" server relationships in the rules we write. Perhaps from this discussion, you notice that NodeBrain is not designed = as a monitor of anything more specific than state and events. That = means, unless somebody else develops rules and scripts for your specific = problem, you will need to write them yourself. For Unix system health = monitoring, I have constructed a set of Perl scripts that, combined with = NodeBrain, actually do something. :) My hope is that others find = NodeBrain useful for constructing their own monitoring applications and = share them with the rest of us. Now, the LOG listener. You are correct, the document is out of sync = with the code in this area. I'll release an update soon to correct this = and other problems. I have been using NodeBrain's "pipe" command for = monitoring log files myself, but the LOG listener will replace it. For = this reason I don't want to give an example using "define file" and = "pipe". Instead I'll give an example using a LOG listener which is now = working in 0.5.1 which I'll release soon. NodeBrain is capable of tail'ing a log file and looking for regular = expression matches. But you need to develop the rules to specify what = to look for and how to respond. And again, you can do this easily with = your favorite scripting language (I'm happy with Perl for this type of = problem). So we would only be motivated to use NodeBrain if we want to = correlate information from multiple sources and perhaps multiple = servers. Even then we may have a better tool for monitoring a given log = file. We can always send alarms from another tool into NodeBrain for = correlation. Having said that, lets look at an example using NodeBrain (0.5.1) = without help from our favorite scripting language. We'll use a = NodeBrain translator and some correlation rules. Let's say our = requirement is to alarm on user login failures when a given user fails = login on a given system more than 5 times in 3 minutes without ultimate = success within 10 minutes. (I see some deficiencies in the documentation = here---will update.) # Cache to support our 10 minute delay for success define cFailedLoginWait context cache(!~(10m):server,user); # Rule to establish response to row expiration cFailedLoginWait define r1 if(_action=3D"expire"):$ -alert.pl "5 failed = logins by $${user} on $${server}" # Cache to support our 5 in 3 minute requirement=20 define cFailedLogin context cache(~(3m):server,user(5)); # Rule to establish response to our threshold condition (must be on one = line even if it wraps here)=20 cFailedLogin define r1 if(user._hitState and not = cFailedLoginWait(server,user)):$ cFailedLoginWait assert = ("$${server}","$${user}"); These rules solve part of the problem, but we still need a way to send = events to the cache. Independent of how we detect the events, we need = to do something like this. User U1 failed login on server S1: cFailedLogin assert ("S1","U1"); # assert server and user to failed = login cache User U1 successfully logged in on server S1: cFailedLogin assert !("S1","U1"); # remove server and user from failed = login cache cFailedLoginWait assert !("S1","U1"); # remove server and user from 10 = min wait cache Now we need a way to detect the actual events so we can report them to = NodeBrain in this way. It could (and probably should) be done with your = choice of scripting languages, but I promised we'll do it with NodeBrain = here. So let's define a LOG listener, assuming the information we need = is written to a log we'll call login.log. define logmatch translator /myap/logmatch.nbx;=20 define logwatch listener = type=3D"LOG",file=3D"login.log",schedule=3D=3D~(20s),translator=3D"logmat= ch"; Let's assume the entries in the this log identify failed and successful = logins as follows. ... user USERNAME failed login to SERVER ... ... user USERNAME successful login to SERVER ... Now we can write our NodeBrain translator, /myap/logmatch.nbx. We use = extended regular expressions to match on lines in the log file and emit = NodeBrain commands based on matched conditions. This requires = familiarity with regular expressions, NodeBrain translator syntax, and = NodeBrain command syntax. # Example watching for failed and successful logins (user ([^ ]*) failed login to ([^ ]*)){ : cFailedLogin assert ("$[2]","$[1]"); # emit NodeBrain command } (user ([^ ]*) successful login to ([^ ]*)){ : cFailedLogin assert !("$[2]","$[1]"); : cFailedLoginWait assert !("$[2]","$[1]"); } From this, you may have figured out that it is possible to have = NodeBrain monitor the number of lines written to a log file over some = sliding interval and report when thresholds are reached. I'll give an = example here, but I would not recommend this solution for high volume = logs. We'll translate every line that appears in the log into an = "event" by making an assertion to a NodeBrain event cache. # Translator - match on anything and just assert the name of the log for = every line. (.*){ : cLog assert ("LOGNAME"); } # Rules - Alarm on 100, 300, and 1000 lines within a 4 hour period. # If it drops to 50 in 4 hours, we consider it back to normal, so we = reset to enable # the cache to alarm again on the next episode of abnormal volume define cLog context cache(~(4h):log(^50,100,300,1000)); cLog define r1 if(log._hitState):-alert.pl "$${log._hits) lines added to = $${log} in $${_interval}" Here's an alternate method that would alarm on a single threshold in = fixed (not sliding) intervals. # Translator - set a cell named "count" to the current value of a cell = named "lines" for every line. (.*){=20 : assert count=3D{lines}; # add 1 to lines - see rules below } # Rules - funny way to count assert lines=3D=3Dcount+1,count=3D-1;=20 define r1 on(lines>100):-alert.pl "100 lines added to log in 4 hours" define r2 on(~(4h)):assert count=3D-1; # reset lines to zero every 4 = hours Our method of counting in this example may require an illustration. The = following was pasted from an interactive session. It illustrates how = the value of lines is modified by changing the value of count. Because = we define lines to be a function of count, the value of lines changes = every time count changes. @> assert lines=3D=3Dcount+1,count=3D-1; @> show -cells lines =3D 0 =3D=3D (count+1) count =3D -1 @> assert count=3D{lines}; @> show -cells lines =3D 1 =3D=3D (count+1) count =3D 0 @> assert count=3D{lines}; @> show -cells lines =3D 2 =3D=3D (count+1) count =3D 1 @>=20 Again, I don't recommend this approach for simply counting lines in a = high volume log file (more than 1 per second) because it is not the most = efficient way to do it. It would be better to export this problem to a = procedural scripting language, and use NodeBrain to monitor the results = and correlate it with other information if there is such a requirement. = It might, however, be appropriate to use NodeBrain on a high volume log = if we are looking for specific strings and correlating events derived = from the matching conditions. Hopefully this addresses your question. I'm working on getting 0.5.1 = released on SourceForge to resolve the identified defects. If your = primary interest is in scanning log files with NodeBrain, it would be = best to wait for the update. If you can obtain the counts with a = script, and can solve a correlation requirement as described previously, = then the 0.5.0 release should work as well. Thanks for your interest. Ed Trettevik <ea...@no...>=20 |
From: Benoit D. <bd...@an...> - 2003-03-06 23:16:53
|
Hi, Your project is very interesting. We are looking for something like that for our usage. The document was not synchronized with source ex : listener declaration, 'type' wasn't recognize, we might use 'protocol' protocol 'LOG' doesn't work. Could someone send me a sample of file reading an analysing? for example checking for number of line per day. Benoit -- Benoit DOLEZ GSM: +33 6 21 05 91 69 mailto:bd...@an... |