From: John Byrne <john.byrne@hp...> - 2003-11-22 04:15:21
I got to play around with ntp on a RH9 cluster today and ran into a
bunch of problems. We want to run ntp on all the nodes of the cluster
to keep the time in synch for all the nodes. (We actually care more
about them being in synch than the time being correct, but the latter
is nice, too.)
There are two configurations of interest:
1.) All nodes have network access to the ntp servers.
This configuration is simple to deal with. You need to make a CDSL for
the /etc/ntp directory, but this is so that ntpd can create the drift
file properly. (It makes a drift.TEMP and renames it, otherwise we
could CDSL just the file.) The other files in the directory can
actually be shared. It pretty much "just works".
2.) A limited number of nodes has access to the ntp server.
For the simplicity of this argument, assume a non-failover cluster and
the node with access is the master node (#1).
What I tried to do was make the ntpd on the master node the server for
the cluster. So, it synchronizes with the outside clock and all the
other nodes synchronize with it over the ICS interfaces. However,
problems occur because the ntpd on the master node does not become a
valid server to other nodes for a period of several minutes after the
Ignoring a few subtleties, when /etc/init.d/ntpd runs, it checks for
the file /etc/ntp/step-tickers and if it exists it calls the ntpdate
command to set the date with the list of servers there. This sets the
time on the machine to the time on the best of the servers it can find.
If it doesn't find a valid server it will print FAILED on the console.
(We don't want that "FAILED".) After this, ntpd is started and takes
care of keeping the time in synch.
If I start the ntp service, there is no way I have found to make the
initial time set by ntpdate to work on non-master nodes because the
ntpd is not ready to be synchronized with, but unless that is done,
ntpd is not guaranteed to work properly: if the time on the machine is
too far from that of the server, it gives up.
Without modifying the ntp package, it seems to me the best I can do is:
1.) The directory /etc/ntp is a CDSL.
2.) Modify /etc/rc.d/rc.sysinit.nodeup to set the time on the node
coming up via the date command, e.g. date --set="`onnode -p 1 date`".
This takes care of the initial set problem.
3.) /etc/ntp/step-tickers has to be different for master and non-master
nodes. For nodes that can talk to the outside ntp server, the
step-tickers file will contain the list of servers. For nodes that
cannot, the step-tickers file must be empty; this avoids the FAILED
message as service starts on the node.
4.) /etc/ntp/drift will be unique to each node.
5.) /etc/ntp/ntpservers and /etc/ntp/keys should be global; so symlinks
in each nodes' /etc/ntp directory should point to the global directory.
6.) /etc/ntp.conf which also specifies the servers (along with a bunch
of options) can be shared between the nodes if the file specifies both
the external and internal servers. The non-master nodes will ignore the
servers they cannot reach. (It may try to reach them once each minute,
though.) Tools such as ntpq will have their output cluttered with the
The problem with this is that we end up with two classes of nodes and
we have to make sure their files are maintained in the proper state.
The addnode and chnode scripts would probably have to be modified to do
so. There is also the redhat-config-time GUI which needs to be
modified to deal with all this: it writes /etc/ntp.conf,
/etc/ntp/ntpservers, and /etc/ntp/step-tickers.
Does anyone have any better ideas?