Re: [Nfsen-discuss] profile not properly unlocking?
Netflow visualisation and investigation tool
Brought to you by:
phaag
|
From: Peter H. <ha...@sw...> - 2006-07-28 09:08:56
|
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi Ivan, - -------- Original Message -------- From: "Ivan A. Beveridge" <iv...@li...> To: nfsen-discuss ML <nfs...@li...> Subject: [Nfsen-discuss] profile not properly unlocking? Date: Thu Jul 27 2006 11:50:39 GMT+0200 (CEST) > Hi Peter, > > I couldn't think of a suitably descriptive subject :( > > I am having a problem whereby a profile locks and, even after unlocking > it, the data does not get parsed by nfsen. > These errors: > Jul 27 04:55:32 cyan nfsen[10456]: ERROR Update RRD time: > '200607270450', db: 'switch01', profile: 'live': ERROR while updating > RRD DB switch01.rrd: error mmapping file > /srv/nfsen/profiles/live/switch01.rrd > Jul 27 04:55:32 cyan nfsen[10456]: ERROR Update RRD time: > '200607270450', db: 'switch02', profile: 'live': ERROR while updating > RRD DB switch02.rrd: error mmapping file > /srv/nfsen/profiles/live/switch02.rrd > > <SNIP> > > Jul 27 04:55:32 cyan nfsen[10456]: ERROR Update RRD time: > '200607270450', db: 'switch28', profile: 'live': ERROR while updating > RRD DB switch28.rrd: error mmapping file > /srv/nfsen/profiles/live/switch28.rrd and .. > Jul 27 04:55:32 cyan nfsen[10456]: Error GenGraph: Profile: live, > traffic-day: malloc fetch data area at /srv/nfsen/libexec/NfSenRRD.pm > line 239. are all memory related. RRD seem to have had a memory error for whatever reason. This seems to be inline with the error message from grsec you mentioned below. However segfault is not very polite .. Whenever nfsend encounters an unpredictable error, which it can not deal with, it automatically sets that profile on hold -> locked to prevent any damage the that profile. The backlog is is processed, when errors are fixed. In your case the backlog is somewhat big at 10:16 your nfsend processed slot 04:50 - hours behind. This will take some time to remove this backlog, but should be done correctly, unless it encounters an other error. check the system memory with vmstat. As other processes seem to work find (nftrack) it could be a physical memory error ... - Peter > Jul 27 04:55:37 cyan nfsen[12828]: nftrack exited with value 0 > Jul 27 04:55:37 cyan nfsen[12828]: /usr/local/bin/nftrack -d > /srv/data/nfsen/ports-db -S -p -w /srv/data/nfsen/ports-db/portstat24.txt > Jul 27 04:55:37 cyan nfsen[12828]: nftrack exited with value 0 > Jul 27 04:55:37 cyan nfsen[12828]: PortTracker run: Done. > Jul 27 05:26:03 cyan nfsen[17957]: connection on UNIX socket > Jul 27 05:26:03 cyan nfsen[17957]: comm server started: 32317 > Jul 27 05:26:03 cyan nfsen[17957]: comm child 32317 terminated > Jul 27 05:26:15 cyan nfsen[17957]: connection on UNIX socket > Jul 27 05:26:15 cyan nfsen[17957]: comm server started: 20489 > Jul 27 05:26:15 cyan nfsen[17957]: comm child 20489 terminated > ============================== > > The final few lines repeat. The other thing to note is the "missing > time" between the port-tracker run and the subsequent log entries. > > I unlocked the profile (it was only the live profile that was locked > this time), however the nfcapd files didn't get parsed. Reloading nfsen > (nfsen reload) does not clear the problem .. but doing an "nfsen stop && > nfsen start" fixes the problem .. the scheduler notices the unparsed > logfiles and schedules the parsing: > > ============================== > Jul 27 10:16:51 cyan nfsen[11914]: Starting /srv/nfsen/bin/nfsen. > Jul 27 10:16:51 cyan nfsen[2268]: Startup. Version: snapshot-20060412 > $Id: nfsen > d 55 2006-04-12 08:35:59Z peter $ > Jul 27 10:16:51 cyan nfsen[5741]: Launcher started: [26741] > Jul 27 10:16:51 cyan nfsen[11914]: Terminating /srv/nfsen/bin/nfsen. > Jul 27 10:16:51 cyan nfsen[27053]: Comm server started: [27053] > Jul 27 10:16:51 cyan nfsen[5741]: nfsend: [5741] > Jul 27 10:16:51 cyan nfsen[5741]: Run periodic at Thu Jul 27 10:15:00 2006 > Jul 27 10:16:51 cyan nfsen[26741]: Frontend module 'PortTracker.php' found > Jul 27 10:16:51 cyan nfsen[5741]: Update profile live > Jul 27 10:16:51 cyan nfsen[26741]: PortTracker BEGIN > Jul 27 10:16:51 cyan nfsen[26741]: Loading plugin 'PortTracker': Success > Jul 27 10:16:51 cyan nfsen[26741]: PortTracker: Init > Jul 27 10:16:51 cyan nfsen[26741]: Initializing plugin 'PortTracker': > Success > Jul 27 10:16:51 cyan nfsen[26741]: ModList: live - PortTracker > Jul 27 10:16:52 cyan nfsen[5741]: nfsend: exit child[19941] > Jul 27 10:16:52 cyan nfsen[5741]: nfsend: exit child[31434] > Jul 27 10:16:52 cyan nfsen[5741]: nfsend: exit child[6586] > Jul 27 10:16:52 cyan nfsen[5741]: nfsend: exit child[26908] > Jul 27 10:16:53 cyan nfsen[5741]: nfsend: exit child[17002] > Jul 27 10:16:53 cyan nfsen[5741]: nfsend: exit child[19267] > Jul 27 10:16:53 cyan nfsen[5741]: nfsend: exit child[29482] > Jul 27 10:16:53 cyan nfsen[5741]: nfsend: exit child[4429] > Jul 27 10:16:54 cyan nfsen[5741]: nfsend: exit child[5155] > Jul 27 10:16:54 cyan nfsen[5741]: nfsend: exit child[27034] > Jul 27 10:16:54 cyan nfsen[26741]: Launcher Cycle: received: live, > 200607270450 > Jul 27 10:16:54 cyan nfsen[26741]: Launcher Cycle: Time: 200607270450, > Profile: > live, Module: PortTracker, > Jul 27 10:16:54 cyan nfsen[26741]: PortTracker run: Profile: live, Time: > 2006072 > 70450 > Jul 27 10:16:54 cyan nfsen[26741]: /usr/local/bin/nftrack -M > /srv/data/nfsen/pro > files/live/switch01:switch02:switch03:switch08:switch10:switch17:switch19:switch > 20:switch26:switch28 -r nfcapd.200607270450 -d /srv/data/nfsen/ports-db > -A -t 20 > 0607270450 -s -p -w /srv/data/nfsen/ports-db/portstat.txt > Jul 27 10:16:54 cyan nfsen[5741]: Signal launcher: live:200607270450 > Jul 27 10:16:56 cyan nfsen[5741]: nfsend: exit child[31871] > Jul 27 10:16:57 cyan nfsen[5741]: nfsend: exit child[31870] > Jul 27 10:16:57 cyan nfsen[5741]: nfsend: exit child[19788] > Jul 27 10:16:57 cyan nfsen[5741]: nfsend: exit child[10781] > Jul 27 10:16:57 cyan nfsen[5741]: nfsend: exit child[16382] > Jul 27 10:16:58 cyan nfsen[5741]: nfsend: exit child[31518] > Jul 27 10:16:58 cyan nfsen[5741]: nfsend: exit child[3112] > Jul 27 10:16:58 cyan nfsen[5741]: nfsend: exit child[13208] > Jul 27 10:16:58 cyan nfsen[5741]: nfsend: exit child[14701] > Jul 27 10:16:58 cyan nfsen[5741]: nfsend: exit child[31109] > Jul 27 10:16:58 cyan nfsen[5741]: Signal launcher: live:200607270455 > Jul 27 10:16:58 cyan nfsen[5741]: nfsend: exit child[1296] > <SNIP> > ============================== > > It continues with the above pattern of lines (signal launcher, then a > number of "nfsend: exit child") until it has caught up with the backlog. > > I'm not sure why the nfsend scheduler doesn't pick up the problem, or a > "reload". I prefer not to do a stop/start because it interrupts the data > collection (it stops all sfcapd processes and then starts them all again). > > I believe this may be related to RAM (the only "new" thing I've done in > the past few weeks is create a tmpfs partition which is using ~2GB of > the 4GB RAM), but I'd have thought the kernel would free up some buffer > space if required: > > ============================== > cyan log # free > total used free shared buffers cached > Mem: 3363480 3248756 114724 0 47376 3018772 > -/+ buffers/cache: 182608 3180872 > Swap: 3145720 604268 2541452 > ============================== > > Ah ... I've just found this in kernel.log (19/07/2006 @ 11:00 was last > problem) [having written all the rest of the email]: > > ============================== > Jul 19 11:00:37 cyan grsec: From 195.66.232.38: signal 11 sent to > /srv/nfsen/bin/nfsend[nfsend:430] uid/euid:0/210 gid/egid:81/81, parent > /sbin/init[init:1] uid/euid:0/0 gid/egid:0/0 > Jul 20 19:55:32 cyan grsec: From 195.66.232.38: signal 11 sent to > /srv/nfsen/bin/nfsend[nfsend:3018] uid/euid:0/210 gid/egid:81/81, parent > /sbin/init[init:1] uid/euid:0/0 gid/egid:0/0 > Jul 27 04:55:33 cyan grsec: From 195.66.232.38: signal 11 sent to > /srv/nfsen/bin/nfsend[nfsend:10456] uid/euid:0/210 gid/egid:81/81, > parent /sbin/init[init:1] uid/euid:0/0 gid/egid:0/0 > ============================== > > As this segfault could be due to duff RAM, I'll try to schedule downtime > for a RAM check .. but can you think of anything else offhand? > > Cheers > > > Ivan - ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys -- and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ Nfsen-discuss mailing list Nfs...@li... https://lists.sourceforge.net/lists/listinfo/nfsen-discuss - -- _______ SWITCH - The Swiss Education and Research Network ______ Peter Haag, Security Engineer, Member of SWITCH CERT PGP fingerprint: D9 31 D5 83 03 95 68 BA FB 84 CA 94 AB FC 5D D7 SWITCH, Limmatquai 138, CH-8001 Zurich, Switzerland E-mail: pet...@sw... Web: http://www.switch.ch/security -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.3 (Darwin) iQCVAwUBRMnUHf5AbZRALNr/AQL/WgP/Ugda8VLOscP/nbbkPQbmcjhv7Iexa7nK xkg99rdCM+DgHotMtdIVe7+sGlmUHtKtzeKrqYdqNVflac+4cxVXg2cq7ulzl04v 9V2EEFmD1AEJHgJIYlQ4ElAbmhZAH/8DyVBrw7TxR9IY31h2P1qZFzJHMkSbmKpy ntJQVIfYm3o= =9lM/ -----END PGP SIGNATURE----- |