Hacker News new | past | comments | ask | show | jobs | submit login

So they state:

> One could argue that we don’t really need PTP for that. NTP will do just fine. Well, we thought that too. But experiments we ran comparing our state-of-the-art NTP implementation and an early version of PTP showed a roughly 100x performance difference:

While I'm not necessarily against more accuracy/precision, what problems specifically are experiencing? They do mention some use cases of course:

> There are several additional use cases, including event tracing, cache invalidation, privacy violation detection improvements, latency compensation in the metaverse, and simultaneous execution in AI, many of which will greatly reduce hardware capacity requirements. This will keep us busy for years ahead.

But given that NTP (either ntpd or chrony) tends to give me an estimated error of around (tens of) 1e-6 seconds, and PTP can get down to 1e-9 seconds, I'm not sure how many data centre applications need that level of accuracy.

> We believe PTP will become the standard for keeping time in computer networks in the coming decades.

Given the special hardware needed for the grand master clock to get down to nanosecond time scales, I'm doubtful this will be used in most data centres of most corporate networks. Adm. Grace Hopper elegantly illustrates 'how long' a nanosecond is:

* https://www.youtube.com/watch?v=9eyFDBPk4Yw

How many things need to worry the latency of signal travelling ~300mm?




I would also love to see an explanation of “why do we need this much accuracy?” that actually goes through the derivation of how much accuracy you need.

Some of the justification for Google’s TrueTime is found in the Spanner docs:

https://cloud.google.com/spanner/docs/true-time-external-con...

Basically, you want to be able to do a “snapshot read” of the database rather than acquiring a lock (for reasons which should be apparent). The snapshot read is based on a monotonic clock. You can get much better performance out of your monotonic clock if all of your machines have very accurate clocks. When you write to the database, you can add a timestamp to the operation, but you may have to introduce a delay to account for the worst-case error in the clock you used to generate the timestamp.

More accurate timestamps -> less delay. From my understanding, less delay -> servers have more capacity -> buy fewer servers -> save millions of dollars -> use savings to pay for salaries of people who figured out how to make super precise timestamps and still come out ahead.

This kind of engineering effort makes sense at companies like Google and Meta because these companies spend such a large amount of money on computer resources to begin with.


Meta uses some variations on Hybrid Logical Clocks, which are very similar to TrueTime, so yes this does apply. Besides performance they very much want to avoid consistency issues that could result in a security breach, eg, if I block Alan and then post "Alan is a dookie head" you don't want some node seeing the second event first. Well really the bigger concern is someone spots this as a potential vulnerability point and scripts something.


This is something like my third attempt to read the spanner paper - I get how it helps ordering of transactions but I am confused if it is used in making transactions atomic across machines ?


> I get how it helps ordering of transactions but I am confused if it is used in making transactions atomic across machines ?

AIUI, you cannot quiet think of it like a regular database where a particular row has a particular value which would necessitate only one writer doing (atomic) updates at a time.

Rather it is a MVCC-like database and a bit like an append-only log: as many writers as needed can write and there are 'multiple values' for each row. The "actual" value of the row is the one with the highest transaction ID / timestamp. So updates can happening without (atomic) locking by just adding to the value(s) that already exist.

When reading, applications just generally get served the value with the highest-value timestamp, and since time is synchronized to such a tiny interval, it is a reasonably sure bet that the highest value is the most recent transaction.

This is similar in concept to a vector clock (see also Lamport):

* https://en.wikipedia.org/wiki/Vector_clock

But instead of logical clocks with 'imaginary time', 'real time' is used down to the sub-microsecond level.


Disclaimer: I work in finance, and have for 15+ years.

> Given the special hardware needed for the grand master clock to get down to nanosecond time scales, I'm doubtful this will be used in most data centres of most corporate networks.

The "special hardware" is often just a gps antenna and a PCI card though. In fact, many tier 1 datacenters actually provide a "service" where they'll either cross connect you directly to a PPS feed from a tier 0 grandmaster time service or plug your server into a gps antenna up on the roof. It isn't really that exotic. For financial application, especially trading ones, syncing a LAN timesync to a handful of nanoseconds is doable and optimal.

It is just a matter of time before non-finance sees reasons that better timesync is useful. Precision Time Protocol aka IEEE 1588 was released in 2002 and IEEE 1588 version 2 was released in 2008. This isn't exactly a new thing.

With the right hardware and a tier 0 timesource, modern ntp on modern hardware with modern networks can keep a LAN in sync subsecond. However, as a protocol, NTP only guarantees 1 second accuracy.


> However, as a protocol, NTP only guarantees 1 second accuracy.

The "one second" number is not inherent to the protocol, but comes from a survey from 1999:

> A recent survey[2] suggests that 90% of the NTP servers have network delays below 100ms, and about 99% are synchronized within one second to the synchronization peer.

* http://www.ntp.org/ntpfaq/NTP-s-algo.htm#Q-ACCURATE-CLOCK

A 2005 survey found (AFAICT, see Figure 1) that north of 99% — more like 99.5% — of servers had offsets less than 100ms, and that 90% have offsets less than 10ms:

* PDF: https://web.archive.org/web/20081221080840/http://www.ntpsur...

The output of "chronyc tracking" from a randomly-selected system I help run:

    Reference ID    : […]
    Stratum         : 3
    Ref time (UTC)  : Wed Nov 23 13:35:21 2022
    System time     : 0.000002993 seconds fast of NTP time
    Last offset     : +0.000003275 seconds
    RMS offset      : 0.000008091 seconds
    Frequency       : 13.191 ppm slow
    Residual freq   : +0.001 ppm
    Skew            : 0.017 ppm
    Root delay      : 0.001615164 seconds
    Root dispersion : 0.000048552 seconds
    Update interval : 65.3 seconds
    Leap status     : Normal
So chrony's algorithm thinks that the system time is currently off by 0.000 002 993 seconds (3e-6), and on average it is off by 0.000 008 091 seconds (8e-6).

All I had to do to achieve this was have some infra nodes (that do other things as well) point to *.pool.ntp.org, and then have my cattle and pets point their NTP software to those infra nodes. No special hardware, no special software, no configuring of network switches and routers: just a few lines of Ansible and I get (estimated) microsecond (1e-6) error levels.


Disclaimer: I've been involved in supporting the NTP Public Services Project since 2003.

I assure you, with the right hardware and paying attention to your latencies, NTP can get you down below one millisecond accuracy. Poul Henning Kamp was doing nanosecond level accuracy with NTP back in the mid-aughts, but then he had rewritten the NTP server code, the NTP client code, and the kernel on the server.

As an NTP service provider, what you really want to keep an eye on is the Clock Error Bound that gives you the worst case estimate for how bad the time is that you could be serving to your customers. For the client side, you mainly care about just the accuracy you're actually getting.


Yes, I've seen it get down to a few milliseconds of sync on the right hardware (boundary clock on the switches, stratum 0 timeserver with pps, etc), but the protocol only guarantees 1 second of sync. Am I incorrect in that assertion?


> But given that NTP (either ntpd or chrony) tends to give me an estimated error of around (tens of) 1e-6 seconds

Is that a hard bound or an average? If it's an average, then what are the limits of the bounds, both in magnitude and duration?

> and PTP can get down to 1e-9 seconds

We use it for audio, and the reason it works well there is because there is no exponential backoff with your peers, allowing for even small timing slips to become enough to notice. 1ms of latency is far too much for our application, we typically aim for 0.25ms.. and we're only running 96kHz. If we lose PTP sync, we notice within a few minutes.

Another advantage of PTP is it can operate as a broadcast and, as the article notes, switches can be PTP aware and help update the timing as the broadcast flows through the network. Conveniently, PTP also allows for multiple timing domains and masters to co-exist on the same network.

It's also an absurdly simple protocol, you can build a receiver for it in about 200 lines of C code. I've actually become quite taken with it since it was forced into our space about 10 years ago.


> Another advantage of PTP is it can operate as a broadcast

NTP can work via broadcast and multicast.


I wouldn't call it absurdly simple.


The way I read it: a larger "Window Of Uncertainty" impacts performance. Having a smaller WOU by using PTP gave them a 100x perf increase in their experiments with the commit-wait solution for "ensuring consistency guarantee."

I'm completely ignorant on the topic in general though... so I'm probably missing something. =)


I have never seen NTP in a datacenter of reasonable size get below an error of 1e-4. PTP without custom hardware can easily get 3 orders of magnitude better.


PTP is harder to do within a datacenter. You either need hardware support in every switch and network interface, or you're doing software timestamping at which point you might as well be using NTP. And PTP doesn't support DNS, only IPv4 or IPv6 or Layer 2 MAC addresses.

PTP also requires some intelligence with regards to configuring your Ordinary Clocks, your Transparent Clocks, and your Border Clocks. And you have to have these configured on every device in the network path.

PTP does have a unicast mode as well as multicast, which can help eliminate unknowable one-way latencies.

It's a pain.

Check the documentation at https://linuxptp.nwtime.org/documentation/ and especially https://linuxptp.nwtime.org/documentation/ptp4l/


> How many things need to worry the latency of signal travelling ~300mm?

Arguably every program? The slowest part of modern programs is memory access. L1 cache memory access is ~1 nanosecond and RAM is ~50 nanoseconds.

Is 49 nanoseconds a lot? No, if you do it once. Yes, if every line of code pays the price.


> But given that NTP (either ntpd or chrony) tends to give me an estimated error of around (tens of) 1e-6 seconds, and PTP can get down to 1e-9 seconds, I'm not sure how many data centre applications need that level of accuracy.

I know that in trading, auditing trades / order books requires extremely accurate timing, and they typically deploy GPS hardware to get the required level of accuracy. As GPS is accurate to a level of 30ns, 1e-6 to 1e-9 (1ns) is exactly the kind of improvement needed to not need GPS hardware anymore.


A good Stratum-1 GNSS from a company like Meinberg or Microchip will include a Rubidium or Cesium reference clock that is then disciplined by GPS and can get you down to sub-nanosecond level accuracy.


> […] improvement needed to not need GPS hardware anymore.

You're simply trading NTP hardware for PTP hardware (grandmaster clocks). There is no way to get to 1e-9 scales without hardware support.


These days, all the vendors I know of are shipping hardware that does both. So, it's not just an NTP server, it's also a PTP server. Maybe you don't use one or the other part of that functionality, or maybe they are licensed separately, but they are there.


Most professional network cards have PTP support and a grandmaster is cheap enough or your colo provider will provide PTP as a service.


It seems a little like it might mean that software errors caused by race conditions can be reduced by making the timing windows smaller. As this is a complex area (it might not appear so, but it is) the pragmatic solution could be to reduce the windows rather than fix the issue (maybe an FB engineer can speak OTR).


First, 300mm is not the real measure in practice for the common use case. PTP is often used to distribute GPS time for things that need it but don't have direct satellite access, and also so you don't have to have direct satellite access everywhere.

For that use case, 1ns of inaccuracy is about 10ft all told (IE accounting for all the inaccuracy it generates).

It can be less these days, especially if not just literally using GPS (IE a phone with other forms of reckoning, using more than just GPS satellites, etc). You can get closer to the 1ns = 1ft type inaccuracy.

But if you are a cell tower trying to beamform or something, you really want to be within a few ns, and without PTP that requires direct satellite access or somet other sync mechanism.

Second, I'm not sure what you mean by special. Expense is dictated mostly by holdover and not protocol. It is true some folks gouge heavily on PTP add-ons (orolia, i'm looking at you), but you can ignore them if you want. Linux can do fine PTP over most commodity 10G cards because they have HW support for it. 1G cards are more hit or miss.

For dedicated devices: Here's a reasonable grandmaster that will keep time to GPS(/etc) with a disciplined OCXO, and easily gets within 40ns of GPS and a much higher end reference clock i have. https://timemachinescorp.com/product/gps-ntpptp-network-time...

It's usually within 10ns. 40ns is just the max error ever in the past 3 years.

Doing PTP, the machines stay within a few NS of this master.

If you need better, yes it can get a bit expensive, but honestly, there are really good OCXO out there now with very low phase noise that can more accurately stay disciplined against GPS.

Now, if you need real holdover for PTP (IE , yes you will probably have to go with rubidium, but even that is not as expensive as it was.

Also, higher end DOCXO have nearly the same performance these days, and are better in the presence of any temperature variation.

As for me, i was playing with synchronizing real-time motion of fast moving machines that are a few hundred feet apart for various reasons. For this sort of application, 100us is a lot of lag.

I would agree this is a pretty uncommon use case, and I could have achieved it through other means, this was more playing around.

AFAIK, the main use of accurate time at this level is cell towers/etc, which have good reasons to want it.

I believe there are also some synchronization applications that have need of severe accuracy (synchronous sound wave generation/etc) but no direct access to satellite signal (IE underwater arrays).


That's the one I was thinking about getting for my home lab. I'm also looking at: https://www.meinbergglobal.com/english/products/synchronizat...

I already have an ancient Meinberg Stratum-1 somewhere that I should pull out of storage and send back to Heiko so that they can put it in a museum. These days, for proper datacenter use, I'd go for something like this one: https://www.meinbergglobal.com/english/products/modular-2u-s...


I would be really curious to drill down into one of these problems. Is super precise really the solution?


Probably high frequency hedge funds because being first matters a lot. That 300mm is the difference between winning and losing, it’s pretty binary up there.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: