#8 devmon goes purple

open
nobody
Devmon (2)
5
2009-11-13
2009-11-13
Anonymous
No

Regarding devmon version 0.3.1-beta1
- Devmon stops working at irregular intervals.
- The dm column in xymon "goes purple" for lack of new information.
- devmon processes remain running.
- running "sudo killall devmon" (on ubuntu 8.10) is enough to return to a "green" state
- no errors or new entries in devmon.log file after communication stops
- upon sending a SIGTERM to the devmon master process, it stops all other processes, so it looks like it is responding to signals as it should

Other users have commented that running a cron job to kill the processes is their workaround. Or to use xymon to run a SCRIPT upon receiving a "purple" alert to restart the services.

(comments compiled from thorsten.erdmann and gthomas on hobbit/xymon mailing list)

Discussion

  • Buchan Milne

    Buchan Milne - 2010-03-10

    I have now committed some changes I have been working on, that should hopefully:

    1)Log more information in case of any socket communication errors
    2)Provide information on fork behaviour in the dm test for the polling host
    3)Provide for terminating forks that seem to be stalled (by sending data to idle forks to ensure they are alive), as a workaround that should prevent having to have scripts to restart devmon
    4)Add timeouts for all socket communication (using "alarm" and "eval"), hopefully fixing the original problem

    If you can reproduce this issue with any better frequency, I would appreciate it if you could run the version currently in subversion (rev 180 or later). Preferably, run it in debug mode with high (5) verbosity, e.g.:

    ./devmon --debug -vvvvv -d /var/lib/devmon/hosts.db -c /etc/devmon.cfg

    If the problem persists, watch to see if the dm test for the polling device goes yellow or red, or polled devices go purple, and provide the log contents for at least the last two poll cycles before any of these changes (or, the relevant problems).

     
  • Buchan Milne

    Buchan Milne - 2011-01-25

    I have now been able to test current svn in an environment that experiences this problem. The changes in svn didn't cure this.

    In this case, it seems like devmon hung while trying to send the actual status messages to hobbit.

    I'm working on adding timeouts here as well.

     

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:





No, thanks