Towards a resilient NTP configuration in NTP4 |

NTP 4 introduces some interesting new things that few people seem to know about, are sparsely documented, and are difficult to setup correctly, however they can help with synchronization in the event of total external network failure (even if you don’t have a reference time source).

Now, some reference time sources aren’t expensive (others are), but sometimes you care more about node-to-node synchronization than you do about absolute time accuracy. One example might be a large computational cluster where, if the network is disconnected from the Internet for a while, or if the primary time source is down, you don’t want the individual nodes to drift apart.

A combination of the new manycast and orphan modes can be a boon for this kind of network and create a self-organizing NTP server tree as a fallback state.

Here’s a sample NTP configuration:

# Large cluster config
manycastserver 227.221.9.75
manycastclient 227.221.9.75 key 687
server 10.255.4.150 key 687 iburst prefer
# set stratum to 6 for orphan mode
# don’t accept stuff from anything below stratum 3 for determining orphan
# throw out more than 7 time sources when in orphan mode
tos floor 3 orphan 7 minclock 7 minsane 1 cohort 1

driftfile /var/lib/ntp/drift
logfile /var/log/ntpd.log
keys /etc/ntp.keys
trustedkey 687
requestkey 687
controlkey 687

restrict default notrust
restrict 227.221.9.75 nomodify
restrict 127.0.0.1

First, we setup a multicast address on 227.221.9.75 and also setup a security key on this address to authenticate all servers to each other. (I could have also used the new NTP public key authentication stuff, but didn’t feel like hassling with key generation and distribution). Every machine is both a client and server on this multicast address. Machines broadcast requests and machines answer. If you have a lot of machines, you’ll see many many entries.. at first.. (most will stale out after a little bit)

The server line sets up your typical NTP server arrangement. In this case, I have an NTP server that is a stratum above a server with a GPS reference clock and also synchronizes with some of the NTP pool servers on the public Internet. It uses the same key for authentication, and acts as an NTP server for a network of 1054 client machines. The iburst keyword tells all servers to exchange an initial volley of packets to speed up synchronization (speedups from several to many minutes to a few seconds) and the prefer keyword says that if this server is available, prefer it above all others.

Now we get to the interesting line. tos is short for “terms of service” and adjusts some internal NTP parameters. The TOS parameters are documented at http://www.ee.udel.edu/~mills/ntp/html/manyopt.html.

floor 3 disables NTP server replies from stratums below 3. We want this to be a self organizing network only when our server at stratum 3 is unavailable, so if any other server somehow started sending out to our manycast address with a lower stratum, we want to ignore it.

orphan 7 When no outside source of synchronization is available, go into orphan mode and set my stratum to 7. All servers at stratum 7 then hold an election process. The winner of the election becomes the new server and all other servers synchronize to it so that they all keep the same relative time. When the main (non-manycast) server is available again, orphan mode is disabled and clients synchronize with it again.

minclock 7 continue eliminating servers via the selection algorithm until no more than 7 remain.

minsane 1I don’t think I really need this as it is the default, and I don’t remember why I put that in there, but it doesn’t hurt anything. I think I had it at 4 at one point and ran into some issues when in orphan mode.

One final tweak. Our postinstall script has a perl command that sets the orphan level to a random number between 5 and 7, so when we enter orphan mode we actually have 3 separate self-organizing stratums. The lowest stratum elects a leader and the higher stratums automatically pick the best time source(s) from the lower stratum servers using the usual NTP clock selection algorithms. Why? To speed up the election and synchronization process by breaking down the stratums into more manageable chunk sizes.

So, what does the running ntp config on a machine look like?

remote refid st t when poll reach delay offset jitter ============================================================================== desrad1.nyc.des 10.255.4.150 4 u 59 1024 177 0.108 -0.170 0.345 desrad6.nyc.des 10.255.4.150 4 u 59 1024 177 0.119 -0.167 0.358 desrad3.nyc.des 10.255.4.150 4 u 59 512 77 0.095 0.190 0.096 drdsa07.nyc.des 10.255.4.150 4 u 59 1024 77 0.123 -0.110 0.351 desrad11.nyc.de 10.255.4.150 4 u 59 1024 37 0.087 -0.067 0.028 drdsa01.nyc.des 10.255.4.150 4 u 59 1024 37 0.120 0.034 0.013 drdsa05.nyc.des 10.255.4.150 4 u 56 1024 37 0.113 -0.064 0.054 desrad7.nyc.des 10.255.4.150 4 u 29 1024 37 0.099 -0.086 0.007 drdsa08.nyc.des 10.255.4.150 4 u 59 512 17 0.110 -0.583 0.003 desrad8.nyc.des 10.255.4.150 4 u 59 512 17 0.078 0.227 0.020 drdzf095.nyc.de 10.255.4.150 4 u 59 1024 7 0.083 0.072 0.006 desrad2.nyc.des 10.255.4.150 4 u 59 1024 7 0.100 -0.871 0.012 drdsa04.nyc.des 10.255.4.150 4 u 59 1024 7 0.103 -0.086 0.004 drdsa00.nyc.des 10.255.4.150 4 u 59 1024 7 0.092 -1.037 0.013 desrad10.nyc.de 10.255.4.150 4 u 59 1024 7 0.111 0.299 0.016 drdsa02.nyc.des 10.255.4.150 4 u 59 1024 7 0.080 -0.995 0.019 desrad4.nyc.des 10.255.4.150 4 u 17 64 3 0.090 0.084 0.009 desrad5.nyc.des 10.255.4.150 4 u 45 64 1 0.092 0.610 0.006 desrad9.nyc.des 10.255.4.150 4 u 1 64 1 0.112 0.112 0.005 227.221.9.75 .ACST. 16 u - 64 0 0.000 0.000 0.001 *ntpmastr.nyc.de 10.249.1.1 3 u 59 1024 377 0.093 -0.653 0.258

when ntpmast goes away, the servers take some number of minutes, but eventually elect a leader and settle on a usable hierarchy. They may drift in total from the absolute notion of NTP time, but they remain synchronized with each other, which is more important to us.

Aleksey_Tsalolikhin	spp
Kacoroski, Ski	Damon, Lee
Boris, John	tep
borwick	caseybea
jessetrucks	LOPSA Blogs Admin
McCullough, Mark	hcoyote
stpierre	trey
Matt_Simmons	fatherlinux
nhruby	lois
Constantine, Craig	wnl
jdetke	drich
mhalligan	doug
jlothian	jennine
Philip Kizer	jeremyc
lufthans	English, Paul
moose	warner
Evan_Pettrey	villyard
asachs228	nicolefv
wbilancio	nickanderson
unicityd	mharlow
allberyb	d_white
apthorpe	mdisney
mjulian	solarce
nnmiller	TheDreamer
morgajel	jgsmith