[Nodebrain-announce] RE: Project interest

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Hi,

It seems the nod...@li... list was not set =
up properly---I'm still learning how to admin a project on SourceForge.  =
I did not receive a copy of your note via mail, but stumbled on to it in =
the archive.  I've created a new list, =
nod...@li..., that you may use in the future.

   From: Benoit DOLEZ <bdolez@an...>=20
   Project interest  =20
   2003-03-06 15:16 =20
   Hi,
=20
   Your project is very interesting. We are looking for something like =
that
   for our usage.
   The document was not synchronized with source
      ex : listener declaration, 'type' wasn't recognize, we might use =
'protocol'
           protocol 'LOG' doesn't work.
=20
   Could someone send me a sample of file reading an analysing?
        for example checking for number of line per day.
=20
   Benoit

In response to your question, NodeBrain doesn't directly address your =
example problem; that is, NodeBrain will not efficiently count the lines =
in a file.  (I'll describe an inefficient direct method later.)  =
Depending on what you want to do with the line counts, NodeBrain may, or =
may not, be useful for monitoring it.  Let's say you wanted to be =
notified if a particular log exceeded 1,000,000 lines in one day.  =
Without NodeBrain, using your favorite scripting language, you could =
write a cron job to issue a "wc -l filename" on daily archives or scan a =
file that is not archived daily counting the lines for the previous day. =
 Your cron job could notify you via email.  If you have no special =
requirements beyond that, NodeBrain would only complicate the situation.

However, there may be situations where you want to correlate the line =
count with other information before deciding notification is necessary.  =
In that case, NodeBrain may be helpful.  So I'll give an example, =
realizing this may not match your requirement.

Agent Script:

#!/usr/bin/nb
set log=3D"/myap/myagent.log";    # this is your NodeBrain agent's log
portray default;                # don't use default except to experiment =
(insecure)
# the following listener only accepts connections from the local machine
define ear listener type=3D"NBQ",interface=3D"127.0.0.1",port=3D49001;
source /myap/logmon.nb;         # include monitor for log lines
# source other monitors here ...

Monitor Rules: (/myap/logmon.nb)

# daily monitor of log file size
define logmon context;        # context to monitor size of single log
# To do multiple logs, repeat these rules replacing logmon with =
logmon.'filename'=20
# schedule a probe (note the Perl script performs a very specific and =
simple task)
logmon define r1 on(~(hour(3))):-/myap/logmon.pl
# set a threshold and response (note again a Perl script performs a =
specific task)
logmon define r2 on(lines>1000000):-/myap/alert.pl "log exceeded 1000000 =
lines"

Probe Script: (/myap/logmon.pl)

#!/usr/bin/perl
$size=3D`wc -l /myap/myapp.log`;
if($size=3D~/\s*(\d*)\s/){$size=3D$1;}
else{$size=3D"?";}
# Send the line count to my NodeBrain agent
print("/usr/bin/nb \":declare myagent brain default\@localhost:49001;\"
  \":>myagent assert logmon.lines=3D$size;\"");

We would normally declare the brain in our $HOME/.nodebrain/private.nb =
file so it would not be necessary to include the declare in the system() =
call.  And we would use a secure identity instead of default.

The notification script, /myap/alert.pl, would do whatever you want.  =
You need to change the last rule in the monitor to conform to the syntax =
for your notification script. =20

I should emphasize here that NodeBrain is not a procedural scripting =
language and is not a reasonable alternative to your favorite scripting =
language for solving most problems.  Clearly this example is more =
complicated than just testing for the threshold in the Perl script, =
scheduling your script with cron, and leaving NodeBrain out of it.  But =
if we change the problem a bit, NodeBrain may be quite helpful.  Suppose =
we are monitoring log size on 50 servers and we want to be notified if =
any one exceeds 1,000,000 lines, AND when 5 or more exceed 700,000 =
lines.  In that case, each server would report the line count to a =
NodeBrain agent on a central server and the new condition would be =
implemented there. =20

If you elected to only run a NodeBrain agent on the central server and =
use cron on the remote servers, you could modify the Perl script =
slightly to replace the localhost address with the central server name =
and include the remote server name in the variable identifier.

system("/usr/bin/nb \":declare master brain =
default\@centralservername:49001;\"
  \":>master logmon.'SERVER' assert lines=3D$size;\"");

Now the central server would have rules to monitor the log size on all =
50 remote servers.  To monitor for 5, 10, and 20 servers exceeding =
700,000 lines we could add a cache.

define cLog7Server context cache({5,10,20}:server);
cLog7Server define r1 if(_rowState):-alert.pl "$${_rows} servers have =
logs exceeding 7,000,000 lines"=20

We might assert server names to this cache by including the following =
rules for each of the 50 servers.

logmon.'SERVER' define r3 on(lines>700000):cLogHighServer assert =
("SERVER");  # Is High
logmon.'SERVER' define r4 on(lines<700000):cLogHighServer assert =
!("SERVER"); # Isn't High
=09
Now let's clean this up a bit so we don't have to maintain 50 copies of =
these same rules.  We can do better than that.  Let's have the remote =
servers ALERT the central server instead of asserting a value to a =
specific variable for each host name.  In the monlog.pl script we would =
make this change.

Replace:	>master logmon.'SERVER' assert lines=3D$size;
With:		>master logmon alert server=3D"SERVER",lines=3D$size;

Now we can reduce the 50 sets of rules down to a single set on our =
central server.  The complete rule set for monitoring the logs on the =
central server is shown here.

define cLog7Server context cache({5,10,20}:server);
cLog7Server define r1 if(_rowState):-alert.pl "$${_rows} servers have =
logs exceeding 7,000,000 lines"=20

define logmon context;
logmon define server cell; # Name of remote server         [Not required =
but helps to document.] =20
logmon define lines  cell; # Number of lines in log file   [Not required =
but helps to document.]
logmon define r0 if(lines>1000000):$ -/myap/alert.pl "$${server} log at =
$${lines} lines"
logmon define r1 if(lines>700000):cLog7Server assert(logmon.server);
logmon define r2 if(lines<700000):cLog7Server assert !(logmon.server);

We also have the option of running a NodeBrain agent on each remote =
server and replicating the rules.  We could go back to having the script =
report line counts to the local agent and then let the local agent only =
report to the central server when a threshold is exceeded. The command =
prefix ">master" would move from the Perl script to the rule action as =
shown below.

define logmon context;
logmon define server cell; # Name of remote server =20
logmon define lines  cell; # Number of lines in log file
logmon define r0 if(lines>1000000):$ >master -/myap/alert.pl "$${server} =
log at $${lines} lines"
logmon define r1 if(lines>700000):>master cLog7Server =
assert(logmon.server);
logmon define r2 if(lines<700000):>master cLog7Server assert =
!(logmon.server);

In addition to distributing the monitoring task, this configuration =
would also enable the master agent to take corrective action via the =
remote agents.  I should point out that there is no master/slave concept =
in NodeBrain, the agents are peers.  However, there can be "management" =
server and "managed" server relationships in the rules we write.

Perhaps from this discussion, you notice that NodeBrain is not designed =
as a monitor of anything more specific than state and events.  That =
means, unless somebody else develops rules and scripts for your specific =
problem, you will need to write them yourself.  For Unix system health =
monitoring, I have constructed a set of Perl scripts that, combined with =
NodeBrain, actually do something. :)  My hope is that others find =
NodeBrain useful for constructing their own monitoring applications and =
share them with the rest of us.

Now, the LOG listener.  You are correct, the document is out of sync =
with the code in this area.  I'll release an update soon to correct this =
and other problems.  I have been using NodeBrain's "pipe" command for =
monitoring log files myself, but the LOG listener will replace it.  For =
this reason I don't want to give an example using "define file" and =
"pipe".  Instead I'll give an example using a LOG listener which is now =
working in 0.5.1 which I'll release soon.

NodeBrain is capable of tail'ing a log file and looking for regular =
expression matches.  But you need to develop the rules to specify what =
to look for and how to respond.  And again, you can do this easily with =
your favorite scripting language (I'm happy with Perl for this type of =
problem).  So we would only be motivated to use NodeBrain if we want to =
correlate information from multiple sources and perhaps multiple =
servers.  Even then we may have a better tool for monitoring a given log =
file.  We can always send alarms from another tool into NodeBrain for =
correlation.

Having said that, lets look at an example using NodeBrain (0.5.1) =
without help from our favorite scripting language.  We'll use a =
NodeBrain translator and some correlation rules.  Let's say our =
requirement is to alarm on user login failures when a given user fails =
login on a given system more than 5 times in 3 minutes without ultimate =
success within 10 minutes. (I see some deficiencies in the documentation =
here---will update.)

# Cache to support our 10 minute delay for success
define cFailedLoginWait context cache(!~(10m):server,user);
# Rule to establish response to row expiration
cFailedLoginWait define r1 if(_action=3D"expire"):$ -alert.pl "5 failed =
logins by $${user} on $${server}"

# Cache to support our 5 in 3 minute requirement=20
define cFailedLogin context cache(~(3m):server,user(5));
# Rule to establish response to our threshold condition (must be on one =
line even if it wraps here)=20
cFailedLogin define r1 if(user._hitState and not =
cFailedLoginWait(server,user)):$ cFailedLoginWait assert =
("$${server}","$${user}");

These rules solve part of the problem, but we still need a way to send =
events to the cache.  Independent of how we detect the events, we need =
to do something like this.

User U1 failed login on server S1:

	cFailedLogin assert ("S1","U1"); # assert server and user to failed =
login cache

User U1 successfully logged in on server S1:

	cFailedLogin assert !("S1","U1"); # remove server and user from failed =
login cache
	cFailedLoginWait assert !("S1","U1"); # remove server and user from 10 =
min wait cache

Now we need a way to detect the actual events so we can report them to =
NodeBrain in this way.  It could (and probably should) be done with your =
choice of scripting languages, but I promised we'll do it with NodeBrain =
here.  So let's define a LOG listener, assuming the information we need =
is written to a log we'll call login.log.

define logmatch translator /myap/logmatch.nbx;=20
define logwatch listener =
type=3D"LOG",file=3D"login.log",schedule=3D=3D~(20s),translator=3D"logmat=
ch";

Let's assume the entries in the this log identify failed and successful =
logins as follows.

... user USERNAME failed login to SERVER ...
... user USERNAME successful login to SERVER ...

Now we can write our NodeBrain translator, /myap/logmatch.nbx.  We use =
extended regular expressions to match on lines in the log file and emit =
NodeBrain commands based on matched conditions.  This requires =
familiarity with regular expressions, NodeBrain translator syntax, and =
NodeBrain command syntax.

# Example watching for failed and successful logins
(user ([^ ]*) failed login to ([^ ]*)){
  : cFailedLogin assert ("$[2]","$[1]"); # emit NodeBrain command
  }
(user ([^ ]*) successful login to ([^ ]*)){
  : cFailedLogin assert !("$[2]","$[1]");
  : cFailedLoginWait assert !("$[2]","$[1]");
  }

From this, you may have figured out that it is possible to have =
NodeBrain monitor the number of lines written to a log file over some =
sliding interval and report when thresholds are reached.  I'll give an =
example here, but I would not recommend this solution for high volume =
logs.  We'll translate every line that appears in the log into an =
"event" by making an assertion to a NodeBrain event cache.

# Translator - match on anything and just assert the name of the log for =
every line.
(.*){
  : cLog assert ("LOGNAME");
  }

# Rules - Alarm on 100, 300, and 1000 lines within a 4 hour period.
# If it drops to 50 in 4 hours, we consider it back to normal, so we =
reset to enable
# the cache to alarm again on the next episode of abnormal volume
define cLog context cache(~(4h):log(^50,100,300,1000));
cLog define r1 if(log._hitState):-alert.pl "$${log._hits) lines added to =
$${log} in $${_interval}"

Here's an alternate method that would alarm on a single threshold in =
fixed (not sliding) intervals.

# Translator - set a cell named "count" to the current value of a cell =
named "lines" for every line.
(.*){=20
  : assert count=3D{lines}; # add 1 to lines - see rules below
  }

# Rules - funny way to count
assert lines=3D=3Dcount+1,count=3D-1;=20
define r1 on(lines>100):-alert.pl "100 lines added to log in 4 hours"
define r2 on(~(4h)):assert count=3D-1;  # reset lines to zero every 4 =
hours

Our method of counting in this example may require an illustration.  The =
following was pasted from an interactive session.  It illustrates how =
the value of lines is modified by changing the value of count.  Because =
we define lines to be a function of count, the value of lines changes =
every time count changes.

@> assert lines=3D=3Dcount+1,count=3D-1;

@> show -cells
lines =3D 0 =3D=3D (count+1)
count =3D -1

@> assert count=3D{lines};

@> show -cells
lines =3D 1 =3D=3D (count+1)
count =3D 0

@> assert count=3D{lines};

@> show -cells
lines =3D 2 =3D=3D (count+1)
count =3D 1

@>=20

Again, I don't recommend this approach for simply counting lines in a =
high volume log file (more than 1 per second) because it is not the most =
efficient way to do it.  It would be better to export this problem to a =
procedural scripting language, and use NodeBrain to monitor the results =
and correlate it with other information if there is such a requirement.  =
It might, however, be appropriate to use NodeBrain on a high volume log =
if we are looking for specific strings and correlating events derived =
from the matching conditions.

Hopefully this addresses your question.  I'm working on getting 0.5.1 =
released on SourceForge to resolve the identified defects.  If your =
primary interest is in scanning log files with NodeBrain, it would be =
best to wait for the update.  If you can obtain the counts with a =
script, and can solve a correlation requirement as described previously, =
then the 0.5.0 release should work as well.

Thanks for your interest.

Ed Trettevik <ea...@no...>=20

[Nodebrain-announce] RE: Project interest

Rule Engine for State and Event Monitoring

[Nodebrain-announce] RE: Project interest