[Nagios-users] nagios freeze while a long time

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Hi,

We use Nagios on several servers, in version 2.9 with NDO 1.4b5 and perf2rd=
d
(nagios write performance data in a pipe file and perf2rrd perform it in rr=
d
file). Running on RHEL4 with package from dag.wieers.com

we have 80 hosts and 420 services on this server.

We can see some huge gaps  in our graphs, perf2rrd works fine, my first
investigation shows this message in nagios.log file :
[1193178252] ndomod: Error writing to data sink!  Some output may get
lost...
[1193178268] ndomod: Successfully reconnected to data sink!  0 items lost,
240 queued items to flush.
[1193178269] ndomod: Successfully flushed 240 queued items to data sink.
[1193187298] Warning: A system time change of 8729 seconds (forwards in
time) has been detected.  Compensating...
[1193190553] Warning: A system time change of 3255 seconds (forwards in
time) has been detected.  Compensating...

we have recompiled nagios with debug mode :
--enable-DEBUG2 shows warning messages
--enable-DEBUG3 shows scheduled events

we don't use le DEBUG0 because it generates too much informations and the
log file increases too fast.

so, I found this message in debug information, with the last gap :
- Masquer le texte des messages pr=E9c=E9dents -

*** Event Check Loop ***
        Current time: Wed Oct 24 00:29:29 2007
        Next High Priority Event Time: Wed Oct 24 00:29:30 2007
        Next Low Priority Event Time:  Wed Oct 24 00:29:29 2007
Current/Max Outstanding Service Checks: 19/65
*** Event Details ***
        Event time: Wed Oct 24 00:29:29 2007
        Event type: 0 (service check)
                Service Description: LOAD_AVERAGE@LOADAVERAGE
                Associated Host:     SGBD1
        Checking service 'LOAD_AVERAGE@LOADAVERAGE' on host 'SGBD1'...

- Masquer le texte des messages pr=E9c=E9dents -
*** Event Check Loop ***
        Current time: Wed Oct 24 00:29:29 2007
        Next High Priority Event Time: Wed Oct 24 00:29:30 2007
        Next Low Priority Event Time:  Wed Oct 24 00:29:29 2007
Current/Max Outstanding Service Checks: 20/65
*** Event Details ***
        Event time: Wed Oct 24 00:29:29 2007
        Event type: 0 (service check)
                Service Description: LOAD_AVERAGE@LOADAVERAGE
                Associated Host:     INTEG
        Checking service 'LOAD_AVERAGE@LOADAVERAGE' on host 'INTEG'...
Warning: A system time change of 8729 seconds (forwards in time) has been
detected.  Compensating...

*** Event Check Loop ***
        Current time: Wed Oct 24 02:54:58 2007
        Next High Priority Event Time: Wed Oct 24 02:54:59 2007
        Next Low Priority Event Time:  Wed Oct 24 02:54:58 2007
Current/Max Outstanding Service Checks: 21/65
*** Event Details ***
        Event time: Wed Oct 24 02:54:58 2007
        Event type: 0 (service check)
                Service Description: MONITOR_TELNET_SUIVI_PS
                Associated Host:     PREPROD1
        Checking service 'MONITOR_TELNET_SUIVI_PS' on host 'PREPROD1'...
Warning: A system time change of 3255 seconds (forwards in time) has been
detected.  Compensating...

*** Event Check Loop ***
        Current time: Wed Oct 24 03:49:13 2007
        Next High Priority Event Time: Wed Oct 24 03:49:14 2007
        Next Low Priority Event Time:  Wed Oct 24 03:49:13 2007
Current/Max Outstanding Service Checks: 22/65
*** Event Details ***
        Event time: Wed Oct 24 03:49:13 2007
        Event type: 0 (service check)
                Service Description: MONITOR_TELNET_SUIVI_PS
                Associated Host:     BIDS15
        Checking service 'MONITOR_TELNET_SUIVI_PS' on host 'BIDS15'...

we can see the jump
    00:29:29 to 02:54:58
and 02:54:58 to 03:49:13

without activity in nagios! I dont understand this!

if you can give me some help to have a nagios server with more stability. I
dont know how to reproduce this bug. At the time  a gap was accuring, the
server time was up to date.

We have on this server more than a gap by day!

best regards,
Olivier

[Nagios-users] nagios freeze while a long time

Nagios Open Source network monitoring software - The Standard In ITM

[Nagios-users] nagios freeze while a long time