From: Tim W. <tim...@gm...> - 2005-01-14 09:26:39
|
This problem does not seem related to the one I was experiencing a while back (that one definitely got fixed, and the strace output looks totally different). However, I have the same problem on my production machine with 0.104.8 It did not occur on my test platform. (aargh not good!) To Yves: If I understand correctly, at one point (strace line 13516) perfparse WRITES to the pipe 'Command OK unknown. Type help ....'. Of course it picks this back up at the input, and we're off for an infinite loop where it keeps saying it doesn't understand what it has written itself. Could it be that it loses track of where messages end and a new message begins? Tim nagios@plato:/EUnet/nagios/bin$ ./perfparsed --show_config Perfparsed [options] # File where Perfparse logs messages # Error_Log = "string" Error_Log = "/EUnet/nagios/var/log/perfparse_error" # Rotate Perfparse log files # Error_Log_Rotate = "Y/N" Error_Log_Rotate = "Yes" # Keep N days of error log. Compress recent logs and remove too old ones # Error_Log_Keep_N_Days = "value" Error_Log_Keep_N_Days = "7" # When perfparse cannot parse a line, it drops it to that file # Drop_File = "string" Drop_File = "/tmp/perfparse.drop" # # Drop_File_Rotate = "Y/N" Drop_File_Rotate = "Yes" # Keep N days of drop file log. Compress recent logs and remove too old ones # Drop_File_Keep_N_Days = "value" Drop_File_Keep_N_Days = "7" # Port for perfparsed server Put 0 or "" to disable the server # Server_Port = "value" Server_Port = "0" # Log source from nagios (or other tools) that perfparse will scan Authorized values: a file name, '-' for stdin, '|' for a fifo and '>' for a host:port socket For sockets, a command 'history' will be sent before retreiving the data # Service_Log = "string" Service_Log = "|/EUnet/nagios/var/perfparse.pipe" # Save the read position in the nagios log file ? If yes, perfparse will start from that position instead of from the beginning # Service_Log_Save_Position = "Y/N" Service_Log_Save_Position = "yes" # Path for files containing the read position for nagios log files # Service_Log_Position_Mark_Path = "string" Service_Log_Position_Mark_Path = "" # Lock file for perfparsed # Daemon_Lock = "string" Daemon_Lock = "/EUnet/nagios/var/perfparsed.lock" # Run perfparsed as a daemon # Daemonize = "Y/N" Daemonize = "no" # Perform some periodic cleanup every day # Periodic_Cleanup = "Y/N" Periodic_Cleanup = "yes" # Lock file for perfparsed periodic cleanup process # Periodic_Cleanup_Lock = "string" Periodic_Cleanup_Lock = "/EUnet/nagios/var/perfparsed_periodic_cleanup.lock" # Perform some periodic cleanup every day at HHMM # Periodic_Cleanup_Hour = "value" Periodic_Cleanup_Hour = "0230" # Dummy hostname if gethostname() does not work # Dummy_Hostname = "string" Dummy_Hostname = "dummy" # Don't store raw data # No_Raw_Data = "Y/N" No_Raw_Data = "no" # Don't store bin data # No_Bin_Data = "Y/N" No_Bin_Data = "no" # Path where storage modules are # Storage_Modules_Dir = "string" Storage_Modules_Dir = "/EUnet/nagios/lib" # Modules to load (Coma separated values) # Storage_Modules_Load = "string" Storage_Modules_Load = "mysql" # File to contain Storage Modules Status # Storage_Modules_Status_File = "string" Storage_Modules_Status_File = "/EUnet/nagios/var/storage_modules.status" # Storage Module : mysql # ============================== # Database user # DB_User = "string" DB_User = "nagios" # Database password # DB_Pass = "string" DB_Pass = "password" # Database name # DB_Name = "string" DB_Name = "nagios" # Database hostname # DB_Host = "string" DB_Host = "127.0.0.1" ---------- Forwarded message ---------- From: Thomas Eriksson <tho...@sl...> Date: Thu, 13 Jan 2005 11:59:51 -0800 Subject: [Perfparse-users] CPU usage again To: per...@li... Hi An issue with prefparsed using all the CPU when configured for "Method 4" was mentioned in post by Tim Wuyts at the end of November. No resolution was ever posted as far as I can tell. Now, I am coming across the same problem. I did try Yves' suggestion to replace the last argument in the select() call with NULL, but it had no effect. I have far less services checks than the Tim, only about 120. The host is a 3GHz P4 with 1GB of RAM running RHEL3u4. Nagios is version 2.0b1, plugins v1.4-beta1 and perfparse is 0.104.8 I ran perfparsed with strace and it appears to be running fine for a few rounds of events. Then some type of error message is starting to appear in the message buffer: "Command xxx unknown. Type 'help' for help." It then quickly gets into self degenerating loop trying to deal with the error message by putting out more of them. When writing into a file instead of a pipe, I never see these messages. Below is a short extract of the 'strace' once it got into trouble, it gets very large very quickly. If more of the strace can be of any help I can post a larger section. thanks, Thomas --- strace ../bin/perfparsed --- ... select(5, [4], NULL, NULL, NULL) = 1 (in [4]) rt_sigprocmask(SIG_BLOCK, [INT PIPE TERM], [], 8) = 0 read(4, "and \'Comma", 10) = 10 read(4, "nd\' unknow", 10) = 10 read(4, "n.\nType \'h", 10) = 10 write(4, "Command \'", 9) = 9 write(4, "Command", 7) = 7 write(4, "\' unknown.\nType \'help\' for help."..., 33) = -1 EAGAIN (Resource temporarily unavailable) rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 utime("/usr/local/nagios/var/storage_modules.status", NULL) = 0 time(NULL) = 1105566327 time(NULL) = 1105566327 waitpid(-1, NULL, WNOHANG) = -1 ECHILD (No child processes) select(5, [4], NULL, NULL, NULL) = 1 (in [4]) rt_sigprocmask(SIG_BLOCK, [INT PIPE TERM], [], 8) = 0 read(4, "elp\' for h", 10) = 10 read(4, "elp.\nComma", 10) = 10 write(4, "Command \'", 9) = 9 write(4, "Type", 4) = 4 write(4, "\' unknown.\nType \'help\' for help."..., 33) = -1 EAGAIN (Resource temporarily unavailable) rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 utime("/usr/local/nagios/var/storage_modules.status", NULL) = 0 time(NULL) = 1105566327 time(NULL) = 1105566327 waitpid(-1, NULL, WNOHANG) = -1 ECHILD (No child processes) select(5, [4], NULL, NULL, NULL) = 1 (in [4]) rt_sigprocmask(SIG_BLOCK, [INT PIPE TERM], [], 8) = 0 read(4, "nd \'TypeCo", 10) = 10 read(4, "mmand \'Com", 10) = 10 read(4, "mand\' unkn", 10) = 10 read(4, "own.\nType ", 10) = 10 write(4, "Command \'", 9) = 9 write(4, "Command", 7) = 7 write(4, "\' unknown.\nType \'help\' for help."..., 33) = 33 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 utime("/usr/local/nagios/var/storage_modules.status", NULL) = 0 time(NULL) = 1105566327 time(NULL) = 1105566327 waitpid(-1, NULL, WNOHANG) = -1 ECHILD (No child processes) select(5, [4], NULL, NULL, NULL) = 1 (in [4]) rt_sigprocmask(SIG_BLOCK, [INT PIPE TERM], [], 8) = 0 read(4, "\'help\' for", 10) = 10 read(4, " help.\nCom", 10) = 10 write(4, "Command \'", 9) = 9 write(4, "Type", 4) = 4 write(4, "\' unknown.\nType \'help\' for help."..., 33) = -1 EAGAIN (Resource temporarily unavailable) rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 utime("/usr/local/nagios/var/storage_modules.status", NULL) = 0 time(NULL) = 1105566327 time(NULL) = 1105566327 waitpid(-1, NULL, WNOHANG) = -1 ECHILD (No child processes) select(5, [4], NULL, NULL, NULL) = 1 (in [4]) ... ------------------------------------------------------- The SF.Net email is sponsored by: Beat the post-holiday blues Get a FREE limited edition SourceForge.net t-shirt from ThinkGeek. It's fun and FREE -- well, almost....http://www.thinkgeek.com/sfshirt _______________________________________________ Perfparse-users mailing list Per...@li... https://lists.sourceforge.net/lists/listinfo/perfparse-users |