Feature request: handle PID file creation race condition
Brought to you by:
meskes,
paulcrawford
I've found that many daemons have a race condition in their creation of their PID files. That is, the PID file is created by the daemon background process after the foreground process has exited. That means there may be some time after daemons' initscripts have run, before their PID files are created.
Normally this time would be expected to be fairly small. But it depends on system load.
It would be good if watchdog could somehow allow for this situation, by having a parameter of the maximum time to wait for PID files to become "valid" at start-up -- rather than immediately restarting just because a daemon hasn't yet written its PID file at start-up.
Hi Craig, This sort of feature is on the "to do" list.
Currently there is the softboot flag that modifies how errors are handled, but its a basic approach with no sense of time-scale. What I have implemented elsewhere is an error time-out so you can configure a maximum time for (most) error conditions to persist before the watchdog takes action. My original reason was for tested files that go missing briefly during log rotation, but it ought to deal with your PID race problem as well. There are some errors that are not held back, such as failure of the watchdog daemon to fork, or out of file handles, as the system is pretty much dead by that point anyway.
Regards, Paul
Last edit: Paul Crawford 2016-01-13
Hi Craig, Please try the latest GIT code to see if the re-try timer works for you.
It defaults to 60 seconds before proceeding to reboot, but if you want a more rapid response to real faults you might reduce that to 5-10 seconds (provided the PID race condition is less, of course!)
Regards, Paul
New behaviour should have fix this and no feedback to say otherwise so closing.