Menu

#6 File stat functionality is unresistant against system clock changes

Unstable (example)
closed
nobody
None
5
2019-03-01
2013-06-07
No

When the system clock is adjusted forth, the watchdog will trigger because the timestamp of the observed file may be within the old time and the file age is only calculated based on absolute timestamps.

I have written an extension for the check_file_stat() function which solves this problem by also considering the relative system time which is immune against clock adjustments. Works very well. See attached patch.

Tested under Debian only.

Best greetings!

Related

Patches: #6

Discussion

  • Christian Hammers

    Patch

     
  • Paul Crawford

    Paul Crawford - 2013-06-07

    Dear Christian,
    That is one way of dealing with the problem, though in my experience if you suffer a sizeable jump in "clock time" (e.g. NTP syncing after a boot with corrupted CMOS clock) you may have other problems where a reboot is needed!

    Another situation that can cause time-related problems is if the watchdog computer is monitoring a file on a NAS, as normally the modification time will be based on the NAS' sense of time. Running NTP on all machines is really the way to go.

    My own experimental "V6.00" version of the watchdog introduced a re-try timer so you get a bit of time to recover. If the file is normally updated quickly that is enough, or if it is syslog the message about it being out of date is the fix it needs.
    Regards, Paul

     
  • Paul Crawford

    Paul Crawford - 2013-06-08

    Just to add - yet another time-stamp problem is on a cold boot where files are very old but the watchdog is up before everything is fully running (for example, if you rely on automatic log-in and a user account to run GUI based software that is to be monitored).

    My approach of a re-try of (by default) 1 minute provides some 'grace period' for this, though you might also be advised to put something in the watchdog start script to check/touch the tested files on cold boot if you don't use some sort of re-try, or use more complex delayed testing by means of a script, etc.

     
  • Christian Hammers

    Paul,

    thanks for your feedback. In my embedded application it is totally normal that the system has an outdated time while booting because the RTC is only buffered by a relative small gold-cap. The correct time is always set via NTP or by a remote system later. I have not encountered any problems with that, so finally having the correct time is more a reason to party than to reboot.

    I am using the file stat functionality to detect if the user application has hung up. The latter creates and periodically updates a file which is observed by the watchdog daemon. The file does not yet exist when the daemon starts.
    My feeling is that using a re-try to avoid a watchdog reboot is a workaround for not doing it correctly at the first check :)

    Nevertheless I will have a look at your project!

    Best regards,

    Christian

     
  • Paul Crawford

    Paul Crawford - 2013-06-11

    Hi Christian,
    The retry-timer was added mostly to deal with files that go missing momentarily during log rotation, daemon PIDs changing when restarted, etc. However, it turned out to be easier to add the retry timer to all 'objects' that are processed in a similar manner and to drop the "softboot" flag as being no longer needed for filtering such issues (e.g. before you could test for a file being too old and reboot, but the default was not to reboot if the file was missing - not always the best action).

    As it uses 'real time' to compare when errors were last found it would probably not work well with a big clock jump, though I may change to using a monotonic timer for that test along your lines.

    The current code for "V6.00" can be found here if you are interested: http://www.sat.dundee.ac.uk/~psc/watchdog/Linux-Watchdog.html

    Most systems will not have such a big time jump as they use a battery backed up RTC, also such a jump can upset some mounting operations as I have seen complaints when a file systems' last mount time is way in the future, etc.

    Regards, Paul

     
  • Michael Meskes

    Michael Meskes - 2013-06-11

    Sorry, but the URL is not correct. While it correctly shows Paul's sources it is not nor will it ever be the offical URL for watchdog. The source code for upcoming 6.0 version can be found in its git archive here on sourceforge and nowhere else. While Paul deserves credit for a lot of useful changes that had made/will make their way into said git, he does not speak for the project. I have to say that I like the idea of the patch although I haven't found time to really look into it.

     
  • Paul Crawford

    Paul Crawford - 2013-06-11

    Dear Michael,
    My apologies for not making it clear that I was referring to the experimental version (hence the quotation marks around "V6.00") and not anything official. That is hopefully clear to anyone visiting my web page.
    Regards, Paul

     
  • Paul Crawford

    Paul Crawford - 2013-06-12

    Dear Christian,
    I found that your patch's approach of adding "LDFLAGS += -lrt" to the debian/rules file did not work for me as it still was complaining about clock_gettime at the linking stage. However, I found that adding the line "LIBS = -lrt" to the src/Makefile.am file did work.

    I was using Ubuntu 12.04 64-bit, using "autoreconf -i" then "./configure" to create a new Makefile then the usual "make clean" then "make" in the src directory.

    I don't claim understand the whole autoconf/automake process, so it is possible I did something quite dumb.
    Regards, Paul

     
  • Christian Hammers

    You are right, Paul. Thanks for pointing me to that. I have updated the patch accordingly.

    Best regards,

    Christian

     
  • Paul Crawford

    Paul Crawford - 2013-06-13

    Thanks Christian. If you have the time & motivation could you test the latest version of my code with your system & time-jump problem to see how it copes? Any feedback is appreciated so eventually the stable/useful changes in that can be merged back in to the official version.
    Regards, Paul

     
  • Paul Crawford

    Paul Crawford - 2016-01-15

    Hi Christian, not sure if you are still following this project, but if so you might want to try the latest GIT copy as that has the re-try timer which ought to provide a work-around to the clock jump problem that this patch was intended to address (though might need some tweaking). Any feedback/testing is appreciated!
    Regards, Paul

     

    Last edit: Paul Crawford 2016-01-15
    • Christian Hammers

      Hello Paul,

      thanks for that info. I am not really following the project but we still use it in my last patched state so it may be interesting to get that GIT copy. I will come back to you if I find something suspicious...

      Best regards,

      Christian

      Von: Paul Crawford [mailto:paulcrawford@users.sf.net]
      Gesendet: Freitag, 15. Januar 2016 09:22
      An: [watchdog:patches]
      Betreff: [watchdog:patches] #6 File stat functionality is unresistant against system clock changes

      Hi Christian, not sure if you are still wollowing this project, but if so you might want to try the latest GIT copy as that has the re-try timer which ought to provide a work-around to the clock jump problem that this patch was intended to address (though might need some tweaking). Any feedback/testing is appreciated!
      Regards, Paul


      [patches:#6]http://sourceforge.net/p/watchdog/patches/6/ File stat functionality is unresistant against system clock changes

      Status: open
      Group: Unstable (example)
      Created: Fri Jun 07, 2013 08:43 AM UTC by Christian Hammers
      Last Updated: Thu Jun 13, 2013 10:15 PM UTC
      Owner: nobody

      When the system clock is adjusted forth, the watchdog will trigger because the timestamp of the observed file may be within the old time and the file age is only calculated based on absolute timestamps.

      I have written an extension for the check_file_stat() function which solves this problem by also considering the relative system time which is immune against clock adjustments. Works very well. See attached patch.

      Tested under Debian only.

      Best greetings!


      Sent from sourceforge.net because you indicated interest in https://sourceforge.net/p/watchdog/patches/6/

      To unsubscribe from further messages, please visit https://sourceforge.net/auth/subscriptions/

       

      Related

      Patches: #6

  • Andrey Mazo

    Andrey Mazo - 2019-02-26

    Here is another implementation of the same idea:
    https://sourceforge.net/p/watchdog/code/merge-requests/3/
    Basically, the same thing but a bit simpler, without maintaining the correction value.

     
  • Paul Crawford

    Paul Crawford - 2019-02-27

    Should be considered closed with Andrey Mazo's patch having been applied.

     
  • Paul Crawford

    Paul Crawford - 2019-03-01
    • status: open --> closed
     
  • Paul Crawford

    Paul Crawford - 2019-03-01

    Closing now that Andrey Mazo's patch is working OK.

     

Log in to post a comment.