From: Thomas E. <tho...@sl...> - 2005-01-13 19:59:58
|
Hi An issue with prefparsed using all the CPU when configured for "Method 4" was mentioned in post by Tim Wuyts at the end of November. No resolution was ever posted as far as I can tell. Now, I am coming across the same problem. I did try Yves' suggestion to replace the last argument in the select() call with NULL, but it had no effect. I have far less services checks than the Tim, only about 120. The host is a 3GHz P4 with 1GB of RAM running RHEL3u4. Nagios is version 2.0b1, plugins v1.4-beta1 and perfparse is 0.104.8 I ran perfparsed with strace and it appears to be running fine for a few rounds of events. Then some type of error message is starting to appear in the message buffer: "Command xxx unknown. Type 'help' for help." It then quickly gets into self degenerating loop trying to deal with the error message by putting out more of them. When writing into a file instead of a pipe, I never see these messages. Below is a short extract of the 'strace' once it got into trouble, it gets very large very quickly. If more of the strace can be of any help I can post a larger section. thanks, Thomas --- strace ../bin/perfparsed --- ... select(5, [4], NULL, NULL, NULL) = 1 (in [4]) rt_sigprocmask(SIG_BLOCK, [INT PIPE TERM], [], 8) = 0 read(4, "and \'Comma", 10) = 10 read(4, "nd\' unknow", 10) = 10 read(4, "n.\nType \'h", 10) = 10 write(4, "Command \'", 9) = 9 write(4, "Command", 7) = 7 write(4, "\' unknown.\nType \'help\' for help."..., 33) = -1 EAGAIN (Resource temporarily unavailable) rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 utime("/usr/local/nagios/var/storage_modules.status", NULL) = 0 time(NULL) = 1105566327 time(NULL) = 1105566327 waitpid(-1, NULL, WNOHANG) = -1 ECHILD (No child processes) select(5, [4], NULL, NULL, NULL) = 1 (in [4]) rt_sigprocmask(SIG_BLOCK, [INT PIPE TERM], [], 8) = 0 read(4, "elp\' for h", 10) = 10 read(4, "elp.\nComma", 10) = 10 write(4, "Command \'", 9) = 9 write(4, "Type", 4) = 4 write(4, "\' unknown.\nType \'help\' for help."..., 33) = -1 EAGAIN (Resource temporarily unavailable) rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 utime("/usr/local/nagios/var/storage_modules.status", NULL) = 0 time(NULL) = 1105566327 time(NULL) = 1105566327 waitpid(-1, NULL, WNOHANG) = -1 ECHILD (No child processes) select(5, [4], NULL, NULL, NULL) = 1 (in [4]) rt_sigprocmask(SIG_BLOCK, [INT PIPE TERM], [], 8) = 0 read(4, "nd \'TypeCo", 10) = 10 read(4, "mmand \'Com", 10) = 10 read(4, "mand\' unkn", 10) = 10 read(4, "own.\nType ", 10) = 10 write(4, "Command \'", 9) = 9 write(4, "Command", 7) = 7 write(4, "\' unknown.\nType \'help\' for help."..., 33) = 33 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 utime("/usr/local/nagios/var/storage_modules.status", NULL) = 0 time(NULL) = 1105566327 time(NULL) = 1105566327 waitpid(-1, NULL, WNOHANG) = -1 ECHILD (No child processes) select(5, [4], NULL, NULL, NULL) = 1 (in [4]) rt_sigprocmask(SIG_BLOCK, [INT PIPE TERM], [], 8) = 0 read(4, "\'help\' for", 10) = 10 read(4, " help.\nCom", 10) = 10 write(4, "Command \'", 9) = 9 write(4, "Type", 4) = 4 write(4, "\' unknown.\nType \'help\' for help."..., 33) = -1 EAGAIN (Resource temporarily unavailable) rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 utime("/usr/local/nagios/var/storage_modules.status", NULL) = 0 time(NULL) = 1105566327 time(NULL) = 1105566327 waitpid(-1, NULL, WNOHANG) = -1 ECHILD (No child processes) select(5, [4], NULL, NULL, NULL) = 1 (in [4]) ... |
From: Tim W. <tim...@gm...> - 2005-01-14 09:26:39
Attachments:
perf.strace.gz
|
This problem does not seem related to the one I was experiencing a while back (that one definitely got fixed, and the strace output looks totally different). However, I have the same problem on my production machine with 0.104.8 It did not occur on my test platform. (aargh not good!) To Yves: If I understand correctly, at one point (strace line 13516) perfparse WRITES to the pipe 'Command OK unknown. Type help ....'. Of course it picks this back up at the input, and we're off for an infinite loop where it keeps saying it doesn't understand what it has written itself. Could it be that it loses track of where messages end and a new message begins? Tim nagios@plato:/EUnet/nagios/bin$ ./perfparsed --show_config Perfparsed [options] # File where Perfparse logs messages # Error_Log = "string" Error_Log = "/EUnet/nagios/var/log/perfparse_error" # Rotate Perfparse log files # Error_Log_Rotate = "Y/N" Error_Log_Rotate = "Yes" # Keep N days of error log. Compress recent logs and remove too old ones # Error_Log_Keep_N_Days = "value" Error_Log_Keep_N_Days = "7" # When perfparse cannot parse a line, it drops it to that file # Drop_File = "string" Drop_File = "/tmp/perfparse.drop" # # Drop_File_Rotate = "Y/N" Drop_File_Rotate = "Yes" # Keep N days of drop file log. Compress recent logs and remove too old ones # Drop_File_Keep_N_Days = "value" Drop_File_Keep_N_Days = "7" # Port for perfparsed server Put 0 or "" to disable the server # Server_Port = "value" Server_Port = "0" # Log source from nagios (or other tools) that perfparse will scan Authorized values: a file name, '-' for stdin, '|' for a fifo and '>' for a host:port socket For sockets, a command 'history' will be sent before retreiving the data # Service_Log = "string" Service_Log = "|/EUnet/nagios/var/perfparse.pipe" # Save the read position in the nagios log file ? If yes, perfparse will start from that position instead of from the beginning # Service_Log_Save_Position = "Y/N" Service_Log_Save_Position = "yes" # Path for files containing the read position for nagios log files # Service_Log_Position_Mark_Path = "string" Service_Log_Position_Mark_Path = "" # Lock file for perfparsed # Daemon_Lock = "string" Daemon_Lock = "/EUnet/nagios/var/perfparsed.lock" # Run perfparsed as a daemon # Daemonize = "Y/N" Daemonize = "no" # Perform some periodic cleanup every day # Periodic_Cleanup = "Y/N" Periodic_Cleanup = "yes" # Lock file for perfparsed periodic cleanup process # Periodic_Cleanup_Lock = "string" Periodic_Cleanup_Lock = "/EUnet/nagios/var/perfparsed_periodic_cleanup.lock" # Perform some periodic cleanup every day at HHMM # Periodic_Cleanup_Hour = "value" Periodic_Cleanup_Hour = "0230" # Dummy hostname if gethostname() does not work # Dummy_Hostname = "string" Dummy_Hostname = "dummy" # Don't store raw data # No_Raw_Data = "Y/N" No_Raw_Data = "no" # Don't store bin data # No_Bin_Data = "Y/N" No_Bin_Data = "no" # Path where storage modules are # Storage_Modules_Dir = "string" Storage_Modules_Dir = "/EUnet/nagios/lib" # Modules to load (Coma separated values) # Storage_Modules_Load = "string" Storage_Modules_Load = "mysql" # File to contain Storage Modules Status # Storage_Modules_Status_File = "string" Storage_Modules_Status_File = "/EUnet/nagios/var/storage_modules.status" # Storage Module : mysql # ============================== # Database user # DB_User = "string" DB_User = "nagios" # Database password # DB_Pass = "string" DB_Pass = "password" # Database name # DB_Name = "string" DB_Name = "nagios" # Database hostname # DB_Host = "string" DB_Host = "127.0.0.1" ---------- Forwarded message ---------- From: Thomas Eriksson <tho...@sl...> Date: Thu, 13 Jan 2005 11:59:51 -0800 Subject: [Perfparse-users] CPU usage again To: per...@li... Hi An issue with prefparsed using all the CPU when configured for "Method 4" was mentioned in post by Tim Wuyts at the end of November. No resolution was ever posted as far as I can tell. Now, I am coming across the same problem. I did try Yves' suggestion to replace the last argument in the select() call with NULL, but it had no effect. I have far less services checks than the Tim, only about 120. The host is a 3GHz P4 with 1GB of RAM running RHEL3u4. Nagios is version 2.0b1, plugins v1.4-beta1 and perfparse is 0.104.8 I ran perfparsed with strace and it appears to be running fine for a few rounds of events. Then some type of error message is starting to appear in the message buffer: "Command xxx unknown. Type 'help' for help." It then quickly gets into self degenerating loop trying to deal with the error message by putting out more of them. When writing into a file instead of a pipe, I never see these messages. Below is a short extract of the 'strace' once it got into trouble, it gets very large very quickly. If more of the strace can be of any help I can post a larger section. thanks, Thomas --- strace ../bin/perfparsed --- ... select(5, [4], NULL, NULL, NULL) = 1 (in [4]) rt_sigprocmask(SIG_BLOCK, [INT PIPE TERM], [], 8) = 0 read(4, "and \'Comma", 10) = 10 read(4, "nd\' unknow", 10) = 10 read(4, "n.\nType \'h", 10) = 10 write(4, "Command \'", 9) = 9 write(4, "Command", 7) = 7 write(4, "\' unknown.\nType \'help\' for help."..., 33) = -1 EAGAIN (Resource temporarily unavailable) rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 utime("/usr/local/nagios/var/storage_modules.status", NULL) = 0 time(NULL) = 1105566327 time(NULL) = 1105566327 waitpid(-1, NULL, WNOHANG) = -1 ECHILD (No child processes) select(5, [4], NULL, NULL, NULL) = 1 (in [4]) rt_sigprocmask(SIG_BLOCK, [INT PIPE TERM], [], 8) = 0 read(4, "elp\' for h", 10) = 10 read(4, "elp.\nComma", 10) = 10 write(4, "Command \'", 9) = 9 write(4, "Type", 4) = 4 write(4, "\' unknown.\nType \'help\' for help."..., 33) = -1 EAGAIN (Resource temporarily unavailable) rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 utime("/usr/local/nagios/var/storage_modules.status", NULL) = 0 time(NULL) = 1105566327 time(NULL) = 1105566327 waitpid(-1, NULL, WNOHANG) = -1 ECHILD (No child processes) select(5, [4], NULL, NULL, NULL) = 1 (in [4]) rt_sigprocmask(SIG_BLOCK, [INT PIPE TERM], [], 8) = 0 read(4, "nd \'TypeCo", 10) = 10 read(4, "mmand \'Com", 10) = 10 read(4, "mand\' unkn", 10) = 10 read(4, "own.\nType ", 10) = 10 write(4, "Command \'", 9) = 9 write(4, "Command", 7) = 7 write(4, "\' unknown.\nType \'help\' for help."..., 33) = 33 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 utime("/usr/local/nagios/var/storage_modules.status", NULL) = 0 time(NULL) = 1105566327 time(NULL) = 1105566327 waitpid(-1, NULL, WNOHANG) = -1 ECHILD (No child processes) select(5, [4], NULL, NULL, NULL) = 1 (in [4]) rt_sigprocmask(SIG_BLOCK, [INT PIPE TERM], [], 8) = 0 read(4, "\'help\' for", 10) = 10 read(4, " help.\nCom", 10) = 10 write(4, "Command \'", 9) = 9 write(4, "Type", 4) = 4 write(4, "\' unknown.\nType \'help\' for help."..., 33) = -1 EAGAIN (Resource temporarily unavailable) rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 utime("/usr/local/nagios/var/storage_modules.status", NULL) = 0 time(NULL) = 1105566327 time(NULL) = 1105566327 waitpid(-1, NULL, WNOHANG) = -1 ECHILD (No child processes) select(5, [4], NULL, NULL, NULL) = 1 (in [4]) ... ------------------------------------------------------- The SF.Net email is sponsored by: Beat the post-holiday blues Get a FREE limited edition SourceForge.net t-shirt from ThinkGeek. It's fun and FREE -- well, almost....http://www.thinkgeek.com/sfshirt _______________________________________________ Perfparse-users mailing list Per...@li... https://lists.sourceforge.net/lists/listinfo/perfparse-users |
From: Yves <yme...@pe...> - 2005-01-14 10:03:19
|
The problem that mentionned Tim's bug was fixed in 0.104.1 or 0.104.2 (ap= pears twice in ChangeLog). This is something new, and I found one bug (I think there are 2 there). I= will post a patch soon. More in my next mail. Yves > This problem does not seem related to the one I was experiencing a > while back (that one definitely got fixed, and the strace output looks > totally different). > > However, I have the same problem on my production machine with 0.104.8 > It did not occur on my test platform. (aargh not good!) > > To Yves: > If I understand correctly, at one point (strace line 13516) perfparse > WRITES to the pipe 'Command OK unknown. Type help ....'. > Of course it picks this back up at the input, and we're off for an > infinite loop where it keeps saying it doesn't understand what it has > written itself. Could it be that it loses track of where messages end > and a new message begins? > > Tim > > nagios@plato:/EUnet/nagios/bin$ ./perfparsed --show_config > Perfparsed [options] > > # File where Perfparse logs messages > # Error_Log =3D "string" > Error_Log =3D "/EUnet/nagios/var/log/perfparse_error" > > # Rotate Perfparse log files > # Error_Log_Rotate =3D "Y/N" > Error_Log_Rotate =3D "Yes" > > # Keep N days of error log. Compress recent logs and remove too old one= s > # Error_Log_Keep_N_Days =3D "value" > Error_Log_Keep_N_Days =3D "7" > > # When perfparse cannot parse a line, it drops it to that file > # Drop_File =3D "string" > Drop_File =3D "/tmp/perfparse.drop" > > # > # Drop_File_Rotate =3D "Y/N" > Drop_File_Rotate =3D "Yes" > > # Keep N days of drop file log. Compress recent logs and remove too old= ones > # Drop_File_Keep_N_Days =3D "value" > Drop_File_Keep_N_Days =3D "7" > > # Port for perfparsed server > Put 0 or "" to disable the server > # Server_Port =3D "value" > Server_Port =3D "0" > > # Log source from nagios (or other tools) that perfparse will scan > Authorized values: a file name, '-' for stdin, '|' for a fifo and '>' > for a host:port socket > For sockets, a command 'history' will be sent before retreiving the dat= a > # Service_Log =3D "string" > Service_Log =3D "|/EUnet/nagios/var/perfparse.pipe" > > # Save the read position in the nagios log file ? If yes, perfparse > will start from that position instead of from the beginning > # Service_Log_Save_Position =3D "Y/N" > Service_Log_Save_Position =3D "yes" > > # Path for files containing the read position for nagios log files > # Service_Log_Position_Mark_Path =3D "string" > Service_Log_Position_Mark_Path =3D "" > > # Lock file for perfparsed > # Daemon_Lock =3D "string" > Daemon_Lock =3D "/EUnet/nagios/var/perfparsed.lock" > > # Run perfparsed as a daemon > # Daemonize =3D "Y/N" > Daemonize =3D "no" > > # Perform some periodic cleanup every day > # Periodic_Cleanup =3D "Y/N" > Periodic_Cleanup =3D "yes" > > # Lock file for perfparsed periodic cleanup process > # Periodic_Cleanup_Lock =3D "string" > Periodic_Cleanup_Lock =3D "/EUnet/nagios/var/perfparsed_periodic_cleanu= p.lock" > > # Perform some periodic cleanup every day at HHMM > # Periodic_Cleanup_Hour =3D "value" > Periodic_Cleanup_Hour =3D "0230" > > # Dummy hostname if gethostname() does not work > # Dummy_Hostname =3D "string" > Dummy_Hostname =3D "dummy" > > # Don't store raw data > # No_Raw_Data =3D "Y/N" > No_Raw_Data =3D "no" > > # Don't store bin data > # No_Bin_Data =3D "Y/N" > No_Bin_Data =3D "no" > > # Path where storage modules are > # Storage_Modules_Dir =3D "string" > Storage_Modules_Dir =3D "/EUnet/nagios/lib" > > # Modules to load (Coma separated values) > # Storage_Modules_Load =3D "string" > Storage_Modules_Load =3D "mysql" > > # File to contain Storage Modules Status > # Storage_Modules_Status_File =3D "string" > Storage_Modules_Status_File =3D "/EUnet/nagios/var/storage_modules.stat= us" > > > > # Storage Module : mysql > # =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D > > # Database user > # DB_User =3D "string" > DB_User =3D "nagios" > > # Database password > # DB_Pass =3D "string" > DB_Pass =3D "password" > > # Database name > # DB_Name =3D "string" > DB_Name =3D "nagios" > > # Database hostname > # DB_Host =3D "string" > DB_Host =3D "127.0.0.1" > > > ---------- Forwarded message ---------- > From: Thomas Eriksson <tho...@sl...> > Date: Thu, 13 Jan 2005 11:59:51 -0800 > Subject: [Perfparse-users] CPU usage again > To: per...@li... > > > Hi > > An issue with prefparsed using all the CPU when configured for > "Method 4" was mentioned in post by Tim Wuyts at the end of November. > No resolution was ever posted as far as I can tell. > > Now, I am coming across the same problem. I did try Yves' suggestion > to replace the last argument in the select() call with NULL, but it > had no effect. > > I have far less services checks than the Tim, only about 120. > The host is a 3GHz P4 with 1GB of RAM running RHEL3u4. > Nagios is version 2.0b1, plugins v1.4-beta1 and perfparse is 0.104.8 > > I ran perfparsed with strace and it appears to be running fine for > a few rounds of events. Then some type of error message is starting > to appear in the message buffer: "Command xxx unknown. Type 'help' for > help." It then quickly gets into self degenerating loop trying to deal > with the error message by putting out more of them. > > When writing into a file instead of a pipe, I never see these messages. > > Below is a short extract of the 'strace' once it got into trouble, it > gets very large very quickly. If more of the strace can be of any help > I can post a larger section. > > thanks, > > Thomas > > --- strace ../bin/perfparsed --- > ... > select(5, [4], NULL, NULL, NULL) =3D 1 (in [4]) > rt_sigprocmask(SIG_BLOCK, [INT PIPE TERM], [], 8) =3D 0 > read(4, "and \'Comma", 10) =3D 10 > read(4, "nd\' unknow", 10) =3D 10 > read(4, "n.\nType \'h", 10) =3D 10 > write(4, "Command \'", 9) =3D 9 > write(4, "Command", 7) =3D 7 > write(4, "\' unknown.\nType \'help\' for help."..., 33) =3D -1 EAGAIN > (Resource temporarily unavailable) > rt_sigprocmask(SIG_SETMASK, [], NULL, 8) =3D 0 > utime("/usr/local/nagios/var/storage_modules.status", NULL) =3D 0 > time(NULL) =3D 1105566327 > time(NULL) =3D 1105566327 > waitpid(-1, NULL, WNOHANG) =3D -1 ECHILD (No child process= es) > select(5, [4], NULL, NULL, NULL) =3D 1 (in [4]) > rt_sigprocmask(SIG_BLOCK, [INT PIPE TERM], [], 8) =3D 0 > read(4, "elp\' for h", 10) =3D 10 > read(4, "elp.\nComma", 10) =3D 10 > write(4, "Command \'", 9) =3D 9 > write(4, "Type", 4) =3D 4 > write(4, "\' unknown.\nType \'help\' for help."..., 33) =3D -1 EAGAIN > (Resource temporarily unavailable) > rt_sigprocmask(SIG_SETMASK, [], NULL, 8) =3D 0 > utime("/usr/local/nagios/var/storage_modules.status", NULL) =3D 0 > time(NULL) =3D 1105566327 > time(NULL) =3D 1105566327 > waitpid(-1, NULL, WNOHANG) =3D -1 ECHILD (No child process= es) > select(5, [4], NULL, NULL, NULL) =3D 1 (in [4]) > rt_sigprocmask(SIG_BLOCK, [INT PIPE TERM], [], 8) =3D 0 > read(4, "nd \'TypeCo", 10) =3D 10 > read(4, "mmand \'Com", 10) =3D 10 > read(4, "mand\' unkn", 10) =3D 10 > read(4, "own.\nType ", 10) =3D 10 > write(4, "Command \'", 9) =3D 9 > write(4, "Command", 7) =3D 7 > write(4, "\' unknown.\nType \'help\' for help."..., 33) =3D 33 > rt_sigprocmask(SIG_SETMASK, [], NULL, 8) =3D 0 > utime("/usr/local/nagios/var/storage_modules.status", NULL) =3D 0 > time(NULL) =3D 1105566327 > time(NULL) =3D 1105566327 > waitpid(-1, NULL, WNOHANG) =3D -1 ECHILD (No child process= es) > select(5, [4], NULL, NULL, NULL) =3D 1 (in [4]) > rt_sigprocmask(SIG_BLOCK, [INT PIPE TERM], [], 8) =3D 0 > read(4, "\'help\' for", 10) =3D 10 > read(4, " help.\nCom", 10) =3D 10 > write(4, "Command \'", 9) =3D 9 > write(4, "Type", 4) =3D 4 > write(4, "\' unknown.\nType \'help\' for help."..., 33) =3D -1 EAGAIN > (Resource temporarily unavailable) > rt_sigprocmask(SIG_SETMASK, [], NULL, 8) =3D 0 > utime("/usr/local/nagios/var/storage_modules.status", NULL) =3D 0 > time(NULL) =3D 1105566327 > time(NULL) =3D 1105566327 > waitpid(-1, NULL, WNOHANG) =3D -1 ECHILD (No child process= es) > select(5, [4], NULL, NULL, NULL) =3D 1 (in [4]) > ... > > ------------------------------------------------------- > The SF.Net email is sponsored by: Beat the post-holiday blues > Get a FREE limited edition SourceForge.net t-shirt from ThinkGeek. > It's fun and FREE -- well, almost....http://www.thinkgeek.com/sfshirt > _______________________________________________ > Perfparse-users mailing list > Per...@li... > https://lists.sourceforge.net/lists/listinfo/perfparse-users > --=20 - Homepage - http://ymettier.free.fr - http://www.logicacmg.com - - GPG key - http://ymettier.free.fr/gpg.txt - - Maitretarot - http://www.nongnu.org/maitretarot/ - - Perfparse - http://perfparse.sf.net/ - |
From: Yves M. <yme...@li...> - 2005-01-14 10:22:44
Attachments:
perfparsed.c.diff.gz
|
> To Yves: > If I understand correctly, at one point (strace line 13516) perfparse > WRITES to the pipe 'Command OK unknown. Type help ....'. > Of course it picks this back up at the input, and we're off for an > infinite loop where it keeps saying it doesn't understand what it has > written itself. Could it be that it loses track of where messages end > and a new message begins? Some little explanations... There are 2 bugs. - 1st bug is that you have message with a wrong syntax, and perfparse can= not parse them. No idea yet. - 2nd bug : such messages should go in the "drop file" (/tmp/perfparse.dr= op with a date in Tim's config). But they don't : they are recognized as commands for th= e server, but not a known command. (telnet on the 1976 port of your server, type 'help'= if you want to know more - perfparse definitely needs some better documentation :) I attach a patch that is supposed to redirect bad lines to /tmp/perfparse= .drop. With this, you will be able to see what the problem is with the lines, and see= if this is a nagios bug or a perfparse bug. Last thing : I develop with nagios-1.2 and with the pipe command. I canno= t test myself the new nagios-2.0 feature to write directly to a pipe. > nagios@plato:/EUnet/nagios/bin$ ./perfparsed --show_config > Drop_File =3D "/tmp/perfparse.drop" > Drop_File_Rotate =3D "Yes" > # Log source from nagios (or other tools) that perfparse will scan > Authorized values: a file name, '-' for stdin, '|' for a fifo and '>' > for a host:port socket > For sockets, a command 'history' will be sent before retreiving the dat= a If you redirect "./perfparsed --show_config" to a config file, notice tha= t you should add some '#' at the beginning of the previous lines. This is another bug = I just found and fixed for next release. No side effects as soon as you don't redirect= this to a file without adding the '#' chars :) > # Service_Log =3D "string" > Service_Log =3D "|/EUnet/nagios/var/perfparse.pipe" > --- strace ../bin/perfparsed --- > ... > select(5, [4], NULL, NULL, NULL) =3D 1 (in [4]) > rt_sigprocmask(SIG_BLOCK, [INT PIPE TERM], [], 8) =3D 0 > read(4, "and \'Comma", 10) =3D 10 > read(4, "nd\' unknow", 10) =3D 10 > read(4, "n.\nType \'h", 10) =3D 10 > write(4, "Command \'", 9) =3D 9 > write(4, "Command", 7) =3D 7 This should not appear again with the patch. > write(4, "\' unknown.\nType \'help\' for help."..., 33) =3D -1 EAGAIN > (Resource temporarily unavailable) perfparse tries to write to the pipe. With the patch, it should not happe= n any more. Well, now, perfparse should not work better, but the wrong lines go to th= e /tmp/perfparse.drop file. Please check it and say what's going wrong with= those lines. Also try to check (with the 'cat' command) what nagios writes to the pipe= ! Yves --=20 - Homepage - http://ymettier.free.fr - http://www.logicacmg.com - - GPG key - http://ymettier.free.fr/gpg.txt - - Maitretarot - http://www.nongnu.org/maitretarot/ - - Perfparse - http://perfparse.sf.net/ - |
From: Tim W. <tim...@gm...> - 2005-01-14 11:30:30
|
Yves, the only things showing up in the drop file are parts (the ends) of valid lines: for example: 0000; 0.000000 0000; .99) OK 0.021 500000; OK .99) OK 0) OK 0000; r OK 000; .00 OK 0;5683 2047 1 ) OK 0;5683 4] OK 0; 8;0;7243 I did a cat < perfparse.pipe, but I don't see any lines that don't follow the format. Tim On Fri, 14 Jan 2005 11:22:39 +0100 (CET), Yves Mettier <yme...@li...> wrote: > > To Yves: > > If I understand correctly, at one point (strace line 13516) perfparse > > WRITES to the pipe 'Command OK unknown. Type help ....'. > > Of course it picks this back up at the input, and we're off for an > > infinite loop where it keeps saying it doesn't understand what it has > > written itself. Could it be that it loses track of where messages end > > and a new message begins? > > Some little explanations... > There are 2 bugs. > - 1st bug is that you have message with a wrong syntax, and perfparse cannot parse them. > No idea yet. > > - 2nd bug : such messages should go in the "drop file" (/tmp/perfparse.drop with a date > in Tim's config). But they don't : they are recognized as commands for the server, but > not a known command. (telnet on the 1976 port of your server, type 'help' if you want to > know more - perfparse definitely needs some better documentation :) > I attach a patch that is supposed to redirect bad lines to /tmp/perfparse.drop. With > this, you will be able to see what the problem is with the lines, and see if this is a > nagios bug or a perfparse bug. > > Last thing : I develop with nagios-1.2 and with the pipe command. I cannot test myself > the new nagios-2.0 feature to write directly to a pipe. > > > > nagios@plato:/EUnet/nagios/bin$ ./perfparsed --show_config > > > Drop_File = "/tmp/perfparse.drop" > > Drop_File_Rotate = "Yes" > > > # Log source from nagios (or other tools) that perfparse will scan > > Authorized values: a file name, '-' for stdin, '|' for a fifo and '>' > > for a host:port socket > > For sockets, a command 'history' will be sent before retreiving the data > > If you redirect "./perfparsed --show_config" to a config file, notice that you should > add some '#' at the beginning of the previous lines. This is another bug I just found > and fixed for next release. No side effects as soon as you don't redirect this to a file > without adding the '#' chars :) > > > # Service_Log = "string" > > Service_Log = "|/EUnet/nagios/var/perfparse.pipe" > > > --- strace ../bin/perfparsed --- > > ... > > select(5, [4], NULL, NULL, NULL) = 1 (in [4]) > > rt_sigprocmask(SIG_BLOCK, [INT PIPE TERM], [], 8) = 0 > > read(4, "and \'Comma", 10) = 10 > > read(4, "nd\' unknow", 10) = 10 > > read(4, "n.\nType \'h", 10) = 10 > > write(4, "Command \'", 9) = 9 > > write(4, "Command", 7) = 7 > > This should not appear again with the patch. > > > write(4, "\' unknown.\nType \'help\' for help."..., 33) = -1 EAGAIN > > (Resource temporarily unavailable) > > perfparse tries to write to the pipe. With the patch, it should not happen any more. > > Well, now, perfparse should not work better, but the wrong lines go to the > /tmp/perfparse.drop file. Please check it and say what's going wrong with those lines. > > Also try to check (with the 'cat' command) what nagios writes to the pipe ! > > Yves > > -- > - Homepage - http://ymettier.free.fr - http://www.logicacmg.com - > - GPG key - http://ymettier.free.fr/gpg.txt - > - Maitretarot - http://www.nongnu.org/maitretarot/ - > - Perfparse - http://perfparse.sf.net/ - > > |
From: Yves <yme...@pe...> - 2005-01-14 14:49:00
|
> I did a diff between version 104.1 (the last working version on my > production environment) and 104.8, and found that in log_reader.c, > function log_reader you changed (104.8 first) > 502c501 > < while(0 < (r=3Dread(log_fd[i]->fd,tmp,10))) { > --- >> while(10 =3D=3D (r=3Dread(log_fd[i]->fd,tmp,10))) { > > tmp[r] =3D '\0'; > log_fd[i]->file_pos +=3Dr; > log_fd[i]->buffer =3D g_string_append(log_fd[i]->buffer,tmp); > if(strchr(tmp,'\n')) break; > if((LOG_FD_CLIENT_SOCKET !=3D log_fd[i]->type) && (r!=3D 10)) break; > } > if((r>0) && (r<10)) { > tmp[r] =3D '\0'; > log_fd[i]->file_pos +=3Dr; > log_fd[i]->buffer =3D g_string_append(log_fd[i]->buffer,tmp); > } > > As a result, the last part of the line gets added twice to the buffer, > and this results in invalid lines. That last 'if' block should not be > there, I think. I agree with you. Could you test without that 2nd "if" block ? If it works, consider this as the fix. I made that modif because of a bug with the perfparsed server. I will then release perfparse-0.104.8ym3 on my web site, but except Tim w= ho worked on that version, it's better to upgrade from 0.104.X to 0.104.9 than from 0.= 104.x to 0.104.8ym3 and then from 0.103.8ym3 to 0.104.9. > So it's not a problem of wrong input from Nagios. I thought so, but with nagios 2.0 in beta, you cannot be sure that it is = not a bug of Nagios. For example, with Nagios-2.0a1, sending a kill signal to reboot i= t would crash it. Thanks for the feedback and the tips. Yves --=20 - Homepage - http://ymettier.free.fr - http://www.logicacmg.com - - GPG key - http://ymettier.free.fr/gpg.txt - - Maitretarot - http://www.nongnu.org/maitretarot/ - - Perfparse - http://perfparse.sf.net/ - |
From: Thomas E. <tho...@sl...> - 2005-01-14 20:50:00
|
Thanks guys! While I was having my beauty sleep you solved the problem. I commented out the if block and it works fine. thanks again, Thomas Yves wrote: >>I did a diff between version 104.1 (the last working version on my >>production environment) and 104.8, and found that in log_reader.c, >>function log_reader you changed (104.8 first) >>502c501 >>< while(0 < (r=read(log_fd[i]->fd,tmp,10))) { >>--- >> >>> while(10 == (r=read(log_fd[i]->fd,tmp,10))) { >> >> tmp[r] = '\0'; >> log_fd[i]->file_pos +=r; >> log_fd[i]->buffer = g_string_append(log_fd[i]->buffer,tmp); >> if(strchr(tmp,'\n')) break; >> if((LOG_FD_CLIENT_SOCKET != log_fd[i]->type) && (r!= 10)) break; >> } >> if((r>0) && (r<10)) { >> tmp[r] = '\0'; >> log_fd[i]->file_pos +=r; >> log_fd[i]->buffer = g_string_append(log_fd[i]->buffer,tmp); >> } >> >>As a result, the last part of the line gets added twice to the buffer, >>and this results in invalid lines. That last 'if' block should not be >>there, I think. > > > I agree with you. > Could you test without that 2nd "if" block ? > If it works, consider this as the fix. > I made that modif because of a bug with the perfparsed server. > > I will then release perfparse-0.104.8ym3 on my web site, but except Tim who worked on > that version, it's better to upgrade from 0.104.X to 0.104.9 than from 0.104.x to > 0.104.8ym3 and then from 0.103.8ym3 to 0.104.9. > > >>So it's not a problem of wrong input from Nagios. > > > I thought so, but with nagios 2.0 in beta, you cannot be sure that it is not a bug of > Nagios. For example, with Nagios-2.0a1, sending a kill signal to reboot it would crash > it. > > Thanks for the feedback and the tips. > Yves > > |
From: Tim W. <tim...@gm...> - 2005-01-14 14:54:26
|
Works! Tim. On Fri, 14 Jan 2005 15:48:54 +0100 (CET), Yves <yme...@pe...> wrote: > > I did a diff between version 104.1 (the last working version on my > > production environment) and 104.8, and found that in log_reader.c, > > function log_reader you changed (104.8 first) > > 502c501 > > < while(0 < (r=read(log_fd[i]->fd,tmp,10))) { > > --- > >> while(10 == (r=read(log_fd[i]->fd,tmp,10))) { > > > > tmp[r] = '\0'; > > log_fd[i]->file_pos +=r; > > log_fd[i]->buffer = g_string_append(log_fd[i]->buffer,tmp); > > if(strchr(tmp,'\n')) break; > > if((LOG_FD_CLIENT_SOCKET != log_fd[i]->type) && (r!= 10)) break; > > } > > if((r>0) && (r<10)) { > > tmp[r] = '\0'; > > log_fd[i]->file_pos +=r; > > log_fd[i]->buffer = g_string_append(log_fd[i]->buffer,tmp); > > } > > > > As a result, the last part of the line gets added twice to the buffer, > > and this results in invalid lines. That last 'if' block should not be > > there, I think. > > I agree with you. > Could you test without that 2nd "if" block ? > If it works, consider this as the fix. > I made that modif because of a bug with the perfparsed server. > > I will then release perfparse-0.104.8ym3 on my web site, but except Tim who worked on > that version, it's better to upgrade from 0.104.X to 0.104.9 than from 0.104.x to > 0.104.8ym3 and then from 0.103.8ym3 to 0.104.9. > > > So it's not a problem of wrong input from Nagios. > > I thought so, but with nagios 2.0 in beta, you cannot be sure that it is not a bug of > Nagios. For example, with Nagios-2.0a1, sending a kill signal to reboot it would crash > it. > > Thanks for the feedback and the tips. > Yves > > -- > - Homepage - http://ymettier.free.fr - http://www.logicacmg.com - > - GPG key - http://ymettier.free.fr/gpg.txt - > - Maitretarot - http://www.nongnu.org/maitretarot/ - > - Perfparse - http://perfparse.sf.net/ - > > > ------------------------------------------------------- > The SF.Net email is sponsored by: Beat the post-holiday blues > Get a FREE limited edition SourceForge.net t-shirt from ThinkGeek. > It's fun and FREE -- well, almost....http://www.thinkgeek.com/sfshirt > _______________________________________________ > Perfparse-users mailing list > Per...@li... > https://lists.sourceforge.net/lists/listinfo/perfparse-users > |
From: Yves <yme...@pe...> - 2005-01-14 15:04:04
|
> Works! OK, thanks a lot to you for the feedback and help :) Yves > > Tim. > > > On Fri, 14 Jan 2005 15:48:54 +0100 (CET), Yves <yme...@pe...>= wrote: >> > I did a diff between version 104.1 (the last working version on my >> > production environment) and 104.8, and found that in log_reader.c, >> > function log_reader you changed (104.8 first) >> > 502c501 >> > < while(0 < (r=3Dread(log_fd[i]->fd,tmp,10))) { >> > --- >> >> while(10 =3D=3D (r=3Dread(log_fd[i]->fd,tmp,10))) { >> > >> > tmp[r] =3D '\0'; >> > log_fd[i]->file_pos +=3Dr; >> > log_fd[i]->buffer =3D g_string_append(log_fd[i= ]->buffer,tmp); >> > if(strchr(tmp,'\n')) break; >> > if((LOG_FD_CLIENT_SOCKET !=3D log_fd[i]->type)= && (r!=3D 10)) >> break; >> > } >> > if((r>0) && (r<10)) { >> > tmp[r] =3D '\0'; >> > log_fd[i]->file_pos +=3Dr; >> > log_fd[i]->buffer =3D g_string_append(log_fd[i= ]->buffer,tmp); >> > } >> > >> > As a result, the last part of the line gets added twice to the buffe= r, >> > and this results in invalid lines. That last 'if' block should not b= e >> > there, I think. >> >> I agree with you. >> Could you test without that 2nd "if" block ? >> If it works, consider this as the fix. >> I made that modif because of a bug with the perfparsed server. >> >> I will then release perfparse-0.104.8ym3 on my web site, but except Ti= m who worked on >> that version, it's better to upgrade from 0.104.X to 0.104.9 than from= 0.104.x to >> 0.104.8ym3 and then from 0.103.8ym3 to 0.104.9. >> >> > So it's not a problem of wrong input from Nagios. >> >> I thought so, but with nagios 2.0 in beta, you cannot be sure that it = is not a bug of >> Nagios. For example, with Nagios-2.0a1, sending a kill signal to reboo= t it would crash >> it. >> >> Thanks for the feedback and the tips. >> Yves >> >> -- >> - Homepage - http://ymettier.free.fr - http://www.logicacmg.com - >> - GPG key - http://ymettier.free.fr/gpg.txt - >> - Maitretarot - http://www.nongnu.org/maitretarot/ - >> - Perfparse - http://perfparse.sf.net/ - >> >> >> ------------------------------------------------------- >> The SF.Net email is sponsored by: Beat the post-holiday blues >> Get a FREE limited edition SourceForge.net t-shirt from ThinkGeek. >> It's fun and FREE -- well, almost....http://www.thinkgeek.com/sfshirt >> _______________________________________________ >> Perfparse-users mailing list >> Per...@li... >> https://lists.sourceforge.net/lists/listinfo/perfparse-users >> > > --=20 - Homepage - http://ymettier.free.fr - http://www.logicacmg.com - - GPG key - http://ymettier.free.fr/gpg.txt - - Maitretarot - http://www.nongnu.org/maitretarot/ - - Perfparse - http://perfparse.sf.net/ - |