Thread: [Shinken-devel] Old Notification being sent every minute
Status: Beta
Brought to you by:
naparuba
From: Michael G. <m.g...@in...> - 2012-09-20 13:54:39
|
Hi I receive a (RECOVERY) notification from shinken about every minute. But the Notification event was two days ago (and has not happend since). Even the Date and Time in the notification is the same. Around the same time every minute i get the following message in the reactionner debug log: Traceback (most recent call last): File "/usr/lib/python2.6/multiprocessing/process.py", line 232, in _bootstrap self.run() File "/usr/lib/python2.6/multiprocessing/process.py", line 88, in run self._target(*self._args, **self._kwargs) File "/usr/local/shinken/shinken/worker.py", line 238, in work self.launch_new_checks() File "/usr/local/shinken/shinken/worker.py", line 156, in launch_new_checks r = chk.execute() File "/usr/local/shinken/shinken/action.py", line 115, in execute return self.execute__() ## OS specific part File "/usr/local/shinken/shinken/action.py", line 284, in execute__ preexec_fn=os.setsid) File "/usr/lib/python2.6/subprocess.py", line 623, in __init__ errread, errwrite) File "/usr/lib/python2.6/subprocess.py", line 1141, in _execute_child raise child_exception TypeError: execve() arg 2 must contain only strings ued:0 TotalReturnWait:0) [1348147321] Warning : [reactionner-1] The worker 18 goes down unexpectly! [1348147321] Info : [reactionner-1] Allocating new fork Worker: 19 I run Shinken from GIT commit: dd9694985da348517c7dbabaf72441cff8445aa8 from: Tue Sep 18 22:55:50 2012 -0300 any Ideas? thanks Michael |
From: nap <nap...@gm...> - 2012-09-20 14:25:53
|
On Thu, Sep 20, 2012 at 3:28 PM, Michael Grundmann <m.g...@in...> wrote: > Hi > > I receive a (RECOVERY) notification from shinken about every minute. > But the Notification event was two days ago (and has not happend since). > Even the Date and Time in the notification is the same. > > Around the same time every minute i get the following message in the reactionner debug log: > [...] > I run Shinken from GIT > commit: dd9694985da348517c7dbabaf72441cff8445aa8 > from: Tue Sep 18 22:55:50 2012 -0300 > > > any Ideas? > Hi, What is yours locales settings on this server? Do you got the scheduler.log entry for this notification? Thanks, Jean |
From: Michael G. <m.g...@in...> - 2012-09-20 14:40:03
|
> What is yours locales settings on this server? en_US.UTF-8 > Do you got the scheduler.log entry for this notification? this message is from the schedulerd.log it also appears every minute 2012-09-20 16:35:27,590 [1348151727] Warning : 12 actions never came back = for the satellite 'reactionner-1'. I'm reenable them for polling |
From: nap <nap...@gm...> - 2012-09-28 11:21:41
|
On Thu, Sep 20, 2012 at 4:39 PM, Michael Grundmann <m.g...@in...> wrote: >> What is yours locales settings on this server? > > en_US.UTF-8 > >> Do you got the scheduler.log entry for this notification? > > this message is from the schedulerd.log > it also appears every minute > > 2012-09-20 16:35:27,590 [1348151727] Warning : 12 actions never came back = > for the satellite 'reactionner-1'. I'm reenable them for polling > Can you look at the reactionner log? If there is nothing, try to launch it in debug mode. Jean |
From: Michael G. <m.g...@in...> - 2012-10-02 07:36:19
|
> > Can you look at the reactionner log? If there is nothing, try to > launch it in debug mode. > Unfortunately i cannot reproduce this error anymore. After an (unrelated) downtime of the monitoring host the problem did not appear again. I will however let the reactionner run in debug mode for a while. Michael |
From: Michael G. <m.g...@in...> - 2012-10-18 09:39:41
|
Another configuration error prevented notification from being sent at all. After i fixed this one the duplicate notifications began to show again. >From reactionner-debug.log: [1350553044] Debug : ======================== [1350553044] Debug : [0][scheduler-1][fork] Stats: Workers:10 (Queued:0 TotalReturnWait:0) [1350553044] Debug : [0][scheduler-1][fork] Stats: Workers:11 (Queued:0 TotalReturnWait:0) [1350553044] Debug : Wait ratio: 1.012525 [1350553044] Debug : Ask actions to 0, got 0 [1350553044] Debug : Loop turn [1350553045] Debug : ======================== [1350553045] Debug : [0][scheduler-1][fork] Stats: Workers:10 (Queued:0 TotalReturnWait:0) [1350553045] Debug : [0][scheduler-1][fork] Stats: Workers:11 (Queued:0 TotalReturnWait:0) [1350553045] Debug : Wait ratio: 1.012315 [1350553045] Debug : Ask actions to 0, got 93 [1350553045] Debug : Loop turn [1350553046] Debug : ======================== [1350553046] Warning : [reactionner-1] The worker 11 goes down unexpectly! [1350553046] Debug : [0][scheduler-1][fork] Stats: Workers:10 (Queued:47 TotalReturnWait:0) [1350553046] Debug : I decide to up wait ratio [1350553046] Debug : Wait ratio: 1.030353 [1350553046] Info : [reactionner-1] Allocating new fork Worker: 12 Process Process-12: Traceback (most recent call last): File "/usr/lib/python2.6/multiprocessing/process.py", line 232, in _bootstrap self.run() File "/usr/lib/python2.6/multiprocessing/process.py", line 88, in run self._target(*self._args, **self._kwargs) File "/usr/local/shinken/shinken/worker.py", line 238, in work self.launch_new_checks() File "/usr/local/shinken/shinken/worker.py", line 156, in launch_new_checks r = chk.execute() File "/usr/local/shinken/shinken/action.py", line 115, in execute return self.execute__() ## OS specific part File "/usr/local/shinken/shinken/action.py", line 284, in execute__ preexec_fn=os.setsid) File "/usr/lib/python2.6/subprocess.py", line 623, in __init__ errread, errwrite) File "/usr/lib/python2.6/subprocess.py", line 1141, in _execute_child raise child_exception TypeError: execve() arg 2 must contain only strings any more ideas? Michael >> >>> On 02.10.2012 at 09:36, in > message <506...@gr...> > , "Michael Grundmann" <m.g...@in...> wrote: > > Can you look at the reactionner log? If there is nothing, try to > > launch it in debug mode. > > > > Unfortunately i cannot reproduce this error anymore. > After an (unrelated) downtime of the monitoring host the problem did not > appear again. > I will however let the reactionner run in debug mode for a while. > > > Michael > > > ------------------------------------------------------------------------------ > Don't let slow site performance ruin your business. Deploy New Relic APM > Deploy New Relic app performance management and know exactly > what is happening inside your Ruby, Python, PHP, Java, and .NET app > Try New Relic at no cost today and get our sweet Data Nerd shirt too! > http://p.sf.net/sfu/newrelic-dev2dev > _______________________________________________ > Shinken-devel mailing list > Shi...@li... > https://lists.sourceforge.net/lists/listinfo/shinken-devel > |
From: Michael G. <m.g...@in...> - 2012-10-23 11:24:54
|
should i open a bugreport for this issue? |
From: Francois M. <fm...@ac...> - 2012-10-27 18:25:53
|
Hello Micahel, Yes, you should open a ticket and link back to this thread. I certainly would like to help you, but I am really not familiar with retention and how notifications are handled. Best bet is to find a way to dump in-memory objects to see if there are corruptions in the data. After that it is going through the logic of the scheduler and reactionner to know how they are handled and find the logic flaw or bug. You can add debug messages. Try using a level like warning so that you can concentrate on what you add, and switch to debug when needed. Cheers, Francois On 12-10-23 7:24 AM, Michael Grundmann wrote: > should i open a bugreport for this issue? > > > > ------------------------------------------------------------------------------ > Everyone hates slow websites. So do we. > Make your web apps faster with AppDynamics > Download AppDynamics Lite for free today: > http://p.sf.net/sfu/appdyn_sfd2d_oct > _______________________________________________ > Shinken-devel mailing list > Shi...@li... > https://lists.sourceforge.net/lists/listinfo/shinken-devel |
From: Michael G. <m.g...@in...> - 2012-10-24 13:24:07
|
To prevent the notifications from flooding my inbox i set the notify-by commands to /bin/true. I expected that this should disable all notifications. It did not (i restarted all shinken services). Is there a way to flush out these old notifications from the reactionner? thanks Michael |
From: Francois M. <fm...@ac...> - 2012-11-06 01:50:42
|
Hi Michael, Try uncommenting the following lines in shinken/action.py. Line 271, 272. The output will help in tracking down what is wrong. Worker 11 Crashed in your debug log. With no traceback. Worker 12 Crashed in your debug log. With the traceback. I am just moving this along. I opened an issue in git, it can be tracked from there. Cheers, X On 12-10-24 9:23 AM, Michael Grundmann wrote: > To prevent the notifications from flooding my inbox i set the notify-by commands to /bin/true. > I expected that this should disable all notifications. > It did not (i restarted all shinken services). > Is there a way to flush out these old notifications from the reactionner? > > thanks > > Michael > > > ------------------------------------------------------------------------------ > Everyone hates slow websites. So do we. > Make your web apps faster with AppDynamics > Download AppDynamics Lite for free today: > http://p.sf.net/sfu/appdyn_sfd2d_oct > _______________________________________________ > Shinken-devel mailing list > Shi...@li... > https://lists.sourceforge.net/lists/listinfo/shinken-devel |