From: Buchan M. <bg...@st...> - 2009-01-05 14:42:16
|
On Monday 29 December 2008 13:46:30 Buchan Milne wrote: > Re-sending my mail from 2008-12-08 that didn't make it to the list .... > > On Monday 13 October 2008 14:35:32 Thomas Kähn wrote: > > Hi Buchan, > > > > On Fri, Oct 10, 2008 at 11:13:58AM +0200, Buchan Milne wrote: > > > On Thursday 09 October 2008 22:02:04 Stewart, Tom L. wrote: > > > > I use to have issues running devmon on a memory constrained Solaris > > > > box. I have now moved it to a RH ES 5.1 system. Since then I have not > > > > had an issue for 62 days straight (knock on wood) and the RH system > > > > has plenty of memory. > > > > > > I have 3 boxes running devmon in production. 2 are RHEL4, one is > > > RHEL5(.2). The RHEL5 box hasn't seen devmon purple in 68 days, one of > > > the RHEL4 boxes has gone purple 7 times since Feb (one stretch of 144 > > > days of green), the other has gone purple 9 times since Feb (one > > > stretch of 88 days of green). > > > > > > Now that I do actually ave some time to spend on devmon I will try and > > > track it down (but we really needed the weathermap feature, and we have > > > no other use for cacti, all the trending we had on it was much easier > > > to implement on devmon). > > > > this sounds good. As I said before - if you need someone to test > > patches please contact me. > > I've now made devmon's --debug mode continue to send status messages to the > BB/Hobbit server, so it is now possible to run devmon in debug mode (e.g. > under screen or similar), until it hits the problem. The last few lines of > output when devmon stops reporting may be useful. > > We're still only seeing this about once a week (on RHEL4 - no occurrences > in 127 days on RHEL5), so if others are seeing it more often, please try > this patch, and then run devmon in a screen session with '-vvvvv --debug' > > http://devmon.svn.sourceforge.net/viewvc/devmon/trunk/modules/dm_msg.pm?vie >w=patch&r1=93&r2=92&pathrev=93 I've been running with this patch on the two affected RHEL4 boxes, and I have noticed that devmon was killed at some stage by a logrotate script that was calling '/etc/init.d/devmon restart'. On the box running RHEL5, it was calling '/etc/init.d/devmon condrestart' which is not a possible argument (explaining why it was not affected, if this was indeed the problem). In the meantime, I've made devmon close and re-open the log file on HUP, so log rotation should work better without having to try and restart devmon. I'm also improving the init script (for Red-Hat style systems) to stop devmon more reliably (in the case where the pid file is not in /var/run, but a sub- directory, which is necessary in the case where devmon is running as non- root). Effectively, I think this is a bug affecting anyone using my devmon packages for Mandriva or RHEL/Centos ... which should be fixed soon. I am not sure if this lograte script is responsible for all the purpleness on the two RHEL4 boxes I was testing on, so I will continue testing for a few days while I look at the last few issues for 0.3.1. However, if anyone who is affected a lot by this issue can check whether this might be an explanation for their problems, that would be great. Regards, Buchan |