Amazon Time Sync is now available over the internet as a public NTP service

amluto · on Nov 11, 2022

> In the event of a leap second, the Amazon Time Sync service automatically handles this for you by smoothing out the addition, or removal, of the leap second with a 24-hour linear smear from noon to noon UTC.

No, wrong!

Setting your system clock to use smeared time is legitimate. Doing this across a large fleet is even reasonable. But this is a policy decision, and it is not UTC. If a fleet wants shaded time, it should be explicit in the protocol or current configuration.

Smearing time by having a server that claims to be NTP doing the smearing is simply a lie, and will confuse clients who expect the server to tell the actual time. Even chrony’s leapsecmode=slew option expects the upstream server to report the leap second.

Last time there was a leap second, some smeared servers got into the NTP pool, and the result was a mess.

the_mitsuhiko · on Nov 11, 2022

These servers are not in the NTP pool so that concern does not apply. I'm pretty sure most cloud providers use smearing at the NTP level.

donavanm · on Nov 11, 2022

Yes. Everyones been doing smearing (of slightly different flavors) since 2014ish when a kernel bug caused a bunch of havoc across linux and cisco boxes. It was a Bad Time.

For everyone that gets leap seconds right theres 10 who muck it up and have incorrect telemtry at best or data loss and crashes at worst. And this happens every time. So folks like GP who want or need to be Technically Correct are free to do so. But the smeared time from the big providers saves tons of hassle for regular developers and customers.

Disclaimer: principal at AWS, opinions are my own. I have no inside knowledge of this AWS NTP release.

amluto · on Nov 11, 2022

The Bad Time was bad. But it was also bad when, a while later, at least one smearing server got into the public NTP pool, causing regrettable loss of time sync. (How do I know this? One of my servers caught the contagion and disagreed with my other servers by about a second. This can happen regardless of whether I’m trying to use smeared time myself.)

I find it quite regrettable that, 8 years later, large fleets and their unwitting customers use various, not necessarily compatible smears, and that they kludge it in via NTP. IMO a much better solution would be to make all this explicit:

Define UTCS as smeared UTC. Get everyone to agree at to exactly how it is smeared. Publish an RFC or similar. (Getting general purpose software to be aware of UTC vs UTCS is not necessarily required.)

Get all the major software vendors to implement UTCS by default, chrony-style.

Have the major cloud NTP servers continue to report UTC. Make the major NTP software implementations understand this all well enough so a server can run UTCS on its system clock but still report UTC to its own NTP clients.

Beltalowda · on Nov 11, 2022

If we're going to change time standards then we might as well just get rid of our current system of leap seconds altogether; I have never been able to find a good practical reason for them and the current UTC system doesn't even solve long-term time-keeping issues very well. All things considered it doesn't really strike me a problem that perception of time drifts by a few hours over a period of thousands of years, but if we really want to account for that then we can do it with a regular system – the slowing down of the earth's rotation is not completely regular in the short term, but it is in the long term.

But realistically, just don't use leap seconds and check again in a thousand years or so. Who even knows what the world will look like then.

wongarsu · on Nov 11, 2022

You're basically describing TAI (International atomic time). UTC is just TAI with leap seconds applied, as I understand it mostly for political reasons (it was easier to get everyone to use atomic time if it's at most a 0.9 seconds off from the observational method used before).

https://en.m.wikipedia.org/wiki/International_Atomic_Time

naniwaduni · on Nov 11, 2022

We also have GPS time, which is TAI minus 19 seconds, which is probably the most "popular" monotonic time standard we have on the back of, well, GPS. Lots of choices!

Beltalowda · on Nov 11, 2022

Pretty much, yes, or a new standard which is regular that solves the unpredictability problem of leap seconds, as well as the "time actually doesn't work like almost everyone on the planet thinks it does"-problem.

amluto · on Nov 11, 2022

Changing the time standard would be a huge international discussion involving many stakeholders, but that’s not at all what I’m suggesting.

I’m suggesting a new standard specifically for computers, not for commerce. The industry would agree on a smearing of time and anyone who wants to can use it at their discretion. And if the standard was at all sensible, it would say that one SHOULD NOT use it in any protocol intended to synchronize time between computers unless that protocol explicitly indicates whether the time is smeared.

The status quo is embarrassingly bad for interoperability. It’s a kludge implemented in a hurry to avoid a repeat of some bugs being triggered. The industry can and should do better.

ignoramous · on Nov 11, 2022

> ...saves tons of hassle for regular developers and customers.

True. I mean, regulars would even be okay even with roughtime: https://roughtime.googlesource.com/

While BigTech needs even more accurate sources: https://research.google/pubs/pub49716/

brookst · on Nov 11, 2022

> Bad Time

Nice.

cesarb · on Nov 11, 2022

This might be fixed by NTPv5; the proposed draft RFC I looked at (https://datatracker.ietf.org/doc/draft-mlichvar-ntp-ntpv5/05...) has a flag to specify whether or not the server is using leap smear.

toomuchtodo · on Nov 11, 2022

Can the server provide both a smeared and non smeared response simultaneously, with the client electing which correction to apply? This would make the process more deterministic, versus the uncertainty of what time you’re being provided actually is.

amluto · on Nov 12, 2022

Oof, I feel like I should send a comment. This is bad:

1. There is no definition of what the leap smeared timescale means. This may prevent a client from usefully synchronizing with multiple servers using that timescale.

2. There’s enough information in the messages to compute leap smeared time. (The leap second indicator along with UTC is sufficient.). Chrony does this.

As a result, it seems like it would never actually make sense for servers to synchronize to each other in smeared mode.

eternityforest · on Nov 11, 2022

I wish UTC would just standardize smears.

Computers should use perfect unsmeared clocks, translation for humans can just smear with a polynomial predictor that updates every few years. If you miss an update you get gracefully degraded accuracy, no jumps anywhere m

vbezhenar · on Nov 11, 2022

NTP protocol contains leap indicator. So I'd expect that chrony would not do anything if AWS NTP server would do smoothing out and not set that leap indicator.

amluto · on Nov 11, 2022

Chrony would inherit AWS’s skew, which may or may not match chrony’s skew. (But maybe incorrectly because chrony does not expect the length of a second to suddenly change by about 11 ppm.)

Chrony configured to not skew will obviously skew anyway.

recuter · on Nov 11, 2022

  This means your devices and infrastructure outside of AWS, such as IOT devices and on-premises infrastructure, can synchronize to the same highly available time sources that were previously accessible only from within our data centers.

If nothing else they can use it for all the Alexas and Kindles they sell and what not. There's no reason for them not to offer it as a public service, not necessarily nefarious. There's also no real reason for anybody else to use it as far as I can tell.

stevewatson301 · on Nov 11, 2022

What I'd have liked to see is a contribution to the pool.ntp.org project rather than providing this service at their their own domain.

the_mitsuhiko · on Nov 11, 2022

Pretty sure they would not be allowed to join the pool because they use leap second smearing.

mkraft · on Nov 11, 2022

What’s the motivation for this? Telemetry they can monetize?

donavanm · on Nov 11, 2022

Id expect its something that customers have asked for and theres ~0 marginal cost to provide it. I can see two aspects for customers; its simplicity & consistency with edge or on prem nodes using the same time source as your cloud infra. Theres a degree of trust and consistency for those same customers to get time from AWS instead of a random pool.ntp provider.

Disclaimer: principal at AWS but the above are my sole conjecture/thoughts. I have no involvement or inside knowledge of this service release.

xiwenc · on Nov 11, 2022

So if i started with AWS, and then later decide to add infra from other cloud providers. i must somehow remind myself to use AWS ntp servers. If not be ready for surprises?

8organicbits · on Nov 11, 2022

Better default NTP config for Amazon software that usually runs inside AWS but sometimes runs outside? Amazon Corretto, Amazon Linux, etc?

This helps Amazon claim that those "run best on AWS", as NTP would be in the same datacenter (I think?) otherwise it looks like they use a regional endpoint[1].

1. https://globaldnschecker.com/#/A/time.aws.com

Kwpolska · on Nov 11, 2022

Being slightly closer to a NTP server doesn’t at all warrant saying a service runs best on AWS. Accurate time isn’t going to improve your Linux or Java performance in any way.

dementis · on Nov 11, 2022

Accurate time is critical for ensuring Data accuracy and most apps utilize the system time. There are also a lot of Database functions that require accurate time. If you want to do DB clustering then a good reference time is required and Oracle even went so far as to make a service called "Cluster Time Synchronization Service (CTSS)" that specifically validates the system time configuration. Although why they still only support ntp and not chrony is something I would love to know.

What is Data Accuracy, Why it Matters and How Companies Can Ensure They Have Accurate Data.[2020] https://dataladder.com/what-is-data-accuracy/

cheschire · on Nov 11, 2022

Or taking away telemetry from other providers. If no other devices use this, Amazon devices would.

moepstar · on Nov 11, 2022

...and to add, what would be the motivation to use this (and thus feeding them telemetry data).

dementis · on Nov 11, 2022

It looks to be targeted at large organizations that have very strict ITIL based change control and they have to go through a CAB level approval process for every single change. I've worked in one of those environments before and yeah, the CAB would question the need to have two different configurations to effectively handle the same job.

The concept of ITIL is one of those "good on paper" type things but at some places they think its only a starting point and it's like death by meeting.

robertlagrant · on Nov 11, 2022

I don't think it's good on paper any more. Change control should happen in source control with lots of automation keyed off it. Meetings should be designed around that primary principle.

gtirloni · on Nov 11, 2022

What telemetry data goes into a NTP request?

kseistrup · on Nov 11, 2022

No IPv6?

dementis · on Nov 11, 2022

Here is a bit of a comparison with these three NTP servers in a /etc/chrony.conf

  pool time.aws.com iburst
  pool 1.amazon.pool.ntp.org iburst
  pool time.nist.gov iburst

  # chronyc sourcestats 
  Name/IP Address            NP  NR  Span  Frequency  Freq Skew  Offset  Std Dev
  ==============================================================================
  ec2-54-86-189-11.compute>   4   3     7  -1128.596  60538.254    -17ms  9310us
  ec2-18-212-174-188.compu>   4   3     7  -3475.306  40696.289    -32ms  7334us
  ec2-54-224-100-122.compu>   4   3     7  -2748.130  71563.352    -26ms    11ms
  ec2-3-80-167-198.compute>   4   3     7   +827.953  78440.828  +1398us    12ms
  38.229.57.9                 4   4     7   +744.719  67529.711   +117us  9384us
  time.cloudflare.com         4   3     7   +957.450  44361.117  +3899us  6424us
  3dhomejoe.com               4   3     9   -958.576  43000.809    -14ms  7052us
  shed.galexander.org         4   4     7  -4034.244  62138.637    -36ms    10ms
  time-a-b.nist.gov           4   4     7  +1475.911  50544.457  +9056us  5773us

  # chronyc sources
  MS Name/IP address         Stratum Poll Reach LastRx Last sample
  ===============================================================================
  ^- ec2-54-86-189-11.compute>     4   6    17    19  -5776us[-5776us] +/-   76ms
  ^- ec2-18-212-174-188.compu>     4   6    17    20    -22ms[  -22ms] +/-   57ms
  ^- ec2-54-224-100-122.compu>     4   6    17    21  -1057us[-1057us] +/-   77ms
  ^- ec2-3-80-167-198.compute>     4   6    17    21  +2093us[+2093us] +/-   57ms
  ^- 38.229.57.9                   2   6    17    21   -769us[ -769us] +/-  185ms
  ^\* time.cloudflare.com           3   6    17    22  -2043us[+3354us] +/-   34ms
  ^+ 3dhomejoe.com                 2   6    33    19    -23ms[  -23ms] +/-  102ms
  ^- shed.galexander.org           3   6    17    20    -32ms[  -32ms] +/-  114ms
  ^- time-a-b.nist.gov             1   6    17    21    +10ms[  +10ms] +/-   55ms

And for comparison

  pool pool.ntp.org iburst
  pool time.nist.gov iburst

  # chronyc sourcestats 
  Name/IP Address            NP  NR  Span  Frequency  Freq Skew  Offset  Std Dev
  ==============================================================================
  time.cloudflare.com         4   3     6  +1363.306  27212.342  +8424us  3911us
  time100.stupi.se            4   4     8  -1051.634  11864.409    -12ms  2419us
  ntp2.flashdance.cx          4   3     7   +183.591  78497.898  -1031us  9048us
  time.cloudflare.com         4   4     7   +566.874  56659.176  +1998us  8584us
  time-e-b.nist.gov           4   3     6     -5.327   7264.481   +710us   988us

  # chronyc sources
  MS Name/IP address         Stratum Poll Reach LastRx Last sample
  ===============================================================================
  ^+ time.cloudflare.com           3   6    17    11  -1590us[-2743us] +/-   66ms
  ^- time100.stupi.se              1   6    17    10  -5597us[-5597us] +/-  116ms
  ^\* ntp2.flashdance.cx            2   6    17    10    -31ms[  -32ms] +/-  168ms
  ^- time.cloudflare.com           3   6    17    10    -17ms[  -17ms] +/-   98ms
  ^+ time-e-b.nist.gov             1   6    17    11   +504us[ -649us] +/-   56ms

I'll stick with a config that queries Stratum 1 time server vs only Stratum 4 servers.

\\EDIT: Formatting

infinet · on Nov 11, 2022

> a global fleet of redundant satellite-connected and atomic reference clocks

What does it mean? Is Amazon operating atomic clock or using GPS?

nicolinox · on Nov 11, 2022

https://developers.google.com/time

dementis · on Nov 11, 2022

If you don't mind dealing with leap-smeared time then time.google.com is certainly better than the AWS or Amazon NTP server pools just based on the fact that Google's NTP server pool provide Stratum 1.

Uptrenda · on Nov 11, 2022

Anyone know if there's a more accurate alternative to NTP that doesn't require specialized hardware?

zie · on Nov 11, 2022

If you need more accuracy than NTP by itself can give you, then you have specialized needs and you need to understand those needs very well, as no general solution is likely to work for you. that said, there is a protocol for it, PTP: https://en.wikipedia.org/wiki/Precision_Time_Protocol

mlyle · on Nov 11, 2022

PTP and NTP are about the same without hardware support.

With hardware support, and without any thought or tuning-- you probably get your machines synchronized 2 orders of magnitude closer. (From sub-millisecond to a few microseconds). And, of course, PTP can do much better than this.

zie · on Nov 12, 2022

I agree with everything you said. Most problem domains probably have no need for more than what NTP can deliver. Regardless, if NTP alone can't solve your time problems, then you need to really understand what your actual problem is, as there is no guarantee that PTP can solve it either.

mlyle · on Nov 12, 2022

> Regardless, if NTP alone can't solve your time problems, then you need to really understand what your actual problem is, as there is no guarantee that PTP can solve it either.

Yah... on the other hand, there's a whole lot of applications where precision is a nice-to-have but not a necessity--- e.g. better ordering events between multiple machines without vector clocks, etc, when we're troubleshooting.

Jamie9912 · on Nov 11, 2022

Can't ping time.aws.com. When I mtr it, it's over 60 milliseconds away, I think i'll stick to time.cloudflare.com which pings in 5 milliseconds

rswail · on Nov 11, 2022

How is ping latency a measure of time accuracy?

sgtnoodle · on Nov 11, 2022

It's a bound on inaccuracy from assumptions made about communication delays? I believe a protocol like NTP assumes delays are symmetric, and uses half the round trip time. There's no guarantee that delays are symmetric or repeatable, though.

jwsteigerwalt · on Nov 11, 2022

This is really starting to hint of a James Bond/Superman plot. The billionaire villain gently takes control of the worlds timekeeping systems then covertly or overtly started making changes for their own benefit…