From: Richard D. <de...@ya...> - 2005-07-19 13:22:11
|
I have been running Nagios 1.2 on a older host and decided to upgrade the existing version to a new server with more power (load was 5+ on older system). The changes were also: 1. RHES3 => RHES4 2. Nagios pluging 1.3.x to 1.4 3. Enabled MySQL 4.1.x The initial migration of the Nagios 1.2 went as planned and everything in Nagios worked as expected. I then went to add perfparse using the older default cron job perfparse.sh method to start and learn how it worked. Initially, everything was working and MySQL was updating. So after doing a few QA days, I put the new system into full production. Everthing started out okay, but I found the next morning that Nagios itself was hung. I got Nagios restarted and monitored. Nagios hung again after several hours. The time factor varied so it was from a few to over 12 hours, but when Nagios hung,the processes would all be in a write state (strace check). I needed to killall to remove the hung Nagios processes. I did this a few more times to verify, then backed out perfparse (commented out nagios.cfg section and disabled by setting process_performance_data=0). I commented out the cron job which was running every 10 minutes. I reboot the system to be sure and Nagios has been up and processing (load now 0.5) without any hanging. The question is, has anyone had this experience and what can I do about it. I want to capture the data with perfparse and will use an alternative method as suggested. Again, any advice on setting up perfparse to avoid this hanging. If you need more data, I can get that too. Thanks, Richard ____________________________________________________ Start your day with Yahoo! - make it your home page http://www.yahoo.com/r/hs |