How Precision Time Protocol is being deployed at Meta

throw0101a · on Nov 21, 2022

So they state:

> One could argue that we don’t really need PTP for that. NTP will do just fine. Well, we thought that too. But experiments we ran comparing our state-of-the-art NTP implementation and an early version of PTP showed a roughly 100x performance difference:

While I'm not necessarily against more accuracy/precision, what problems specifically are experiencing? They do mention some use cases of course:

> There are several additional use cases, including event tracing, cache invalidation, privacy violation detection improvements, latency compensation in the metaverse, and simultaneous execution in AI, many of which will greatly reduce hardware capacity requirements. This will keep us busy for years ahead.

But given that NTP (either ntpd or chrony) tends to give me an estimated error of around (tens of) 1e-6 seconds, and PTP can get down to 1e-9 seconds, I'm not sure how many data centre applications need that level of accuracy.

> We believe PTP will become the standard for keeping time in computer networks in the coming decades.

Given the special hardware needed for the grand master clock to get down to nanosecond time scales, I'm doubtful this will be used in most data centres of most corporate networks. Adm. Grace Hopper elegantly illustrates 'how long' a nanosecond is:

* https://www.youtube.com/watch?v=9eyFDBPk4Yw

How many things need to worry the latency of signal travelling ~300mm?

klodolph · on Nov 21, 2022

I would also love to see an explanation of “why do we need this much accuracy?” that actually goes through the derivation of how much accuracy you need.

Some of the justification for Google’s TrueTime is found in the Spanner docs:

https://cloud.google.com/spanner/docs/true-time-external-con...

Basically, you want to be able to do a “snapshot read” of the database rather than acquiring a lock (for reasons which should be apparent). The snapshot read is based on a monotonic clock. You can get much better performance out of your monotonic clock if all of your machines have very accurate clocks. When you write to the database, you can add a timestamp to the operation, but you may have to introduce a delay to account for the worst-case error in the clock you used to generate the timestamp.

More accurate timestamps -> less delay. From my understanding, less delay -> servers have more capacity -> buy fewer servers -> save millions of dollars -> use savings to pay for salaries of people who figured out how to make super precise timestamps and still come out ahead.

This kind of engineering effort makes sense at companies like Google and Meta because these companies spend such a large amount of money on computer resources to begin with.

jasonwatkinspdx · on Nov 21, 2022

Meta uses some variations on Hybrid Logical Clocks, which are very similar to TrueTime, so yes this does apply. Besides performance they very much want to avoid consistency issues that could result in a security breach, eg, if I block Alan and then post "Alan is a dookie head" you don't want some node seeing the second event first. Well really the bigger concern is someone spots this as a potential vulnerability point and scripts something.

lifeisstillgood · on Nov 22, 2022

This is something like my third attempt to read the spanner paper - I get how it helps ordering of transactions but I am confused if it is used in making transactions atomic across machines ?

throw0101a · on Nov 22, 2022

> I get how it helps ordering of transactions but I am confused if it is used in making transactions atomic across machines ?

AIUI, you cannot quiet think of it like a regular database where a particular row has a particular value which would necessitate only one writer doing (atomic) updates at a time.

Rather it is a MVCC-like database and a bit like an append-only log: as many writers as needed can write and there are 'multiple values' for each row. The "actual" value of the row is the one with the highest transaction ID / timestamp. So updates can happening without (atomic) locking by just adding to the value(s) that already exist.

When reading, applications just generally get served the value with the highest-value timestamp, and since time is synchronized to such a tiny interval, it is a reasonably sure bet that the highest value is the most recent transaction.

This is similar in concept to a vector clock (see also Lamport):

* https://en.wikipedia.org/wiki/Vector_clock

But instead of logical clocks with 'imaginary time', 'real time' is used down to the sub-microsecond level.

SEJeff · on Nov 22, 2022

Disclaimer: I work in finance, and have for 15+ years.

> Given the special hardware needed for the grand master clock to get down to nanosecond time scales, I'm doubtful this will be used in most data centres of most corporate networks.

The "special hardware" is often just a gps antenna and a PCI card though. In fact, many tier 1 datacenters actually provide a "service" where they'll either cross connect you directly to a PPS feed from a tier 0 grandmaster time service or plug your server into a gps antenna up on the roof. It isn't really that exotic. For financial application, especially trading ones, syncing a LAN timesync to a handful of nanoseconds is doable and optimal.

It is just a matter of time before non-finance sees reasons that better timesync is useful. Precision Time Protocol aka IEEE 1588 was released in 2002 and IEEE 1588 version 2 was released in 2008. This isn't exactly a new thing.

With the right hardware and a tier 0 timesource, modern ntp on modern hardware with modern networks can keep a LAN in sync subsecond. However, as a protocol, NTP only guarantees 1 second accuracy.

throw0101a · on Nov 23, 2022

> However, as a protocol, NTP only guarantees 1 second accuracy.

The "one second" number is not inherent to the protocol, but comes from a survey from 1999:

> A recent survey[2] suggests that 90% of the NTP servers have network delays below 100ms, and about 99% are synchronized within one second to the synchronization peer.

* http://www.ntp.org/ntpfaq/NTP-s-algo.htm#Q-ACCURATE-CLOCK

A 2005 survey found (AFAICT, see Figure 1) that north of 99% — more like 99.5% — of servers had offsets less than 100ms, and that 90% have offsets less than 10ms:

* PDF: https://web.archive.org/web/20081221080840/http://www.ntpsur...

The output of "chronyc tracking" from a randomly-selected system I help run:

    Reference ID    : […]
    Stratum         : 3
    Ref time (UTC)  : Wed Nov 23 13:35:21 2022
    System time     : 0.000002993 seconds fast of NTP time
    Last offset     : +0.000003275 seconds
    RMS offset      : 0.000008091 seconds
    Frequency       : 13.191 ppm slow
    Residual freq   : +0.001 ppm
    Skew            : 0.017 ppm
    Root delay      : 0.001615164 seconds
    Root dispersion : 0.000048552 seconds
    Update interval : 65.3 seconds
    Leap status     : Normal

So chrony's algorithm thinks that the system time is currently off by 0.000 002 993 seconds (3e-6), and on average it is off by 0.000 008 091 seconds (8e-6).

All I had to do to achieve this was have some infra nodes (that do other things as well) point to *.pool.ntp.org, and then have my cattle and pets point their NTP software to those infra nodes. No special hardware, no special software, no configuring of network switches and routers: just a few lines of Ansible and I get (estimated) microsecond (1e-6) error levels.

bradknowles · on Nov 23, 2022

Disclaimer: I've been involved in supporting the NTP Public Services Project since 2003.

I assure you, with the right hardware and paying attention to your latencies, NTP can get you down below one millisecond accuracy. Poul Henning Kamp was doing nanosecond level accuracy with NTP back in the mid-aughts, but then he had rewritten the NTP server code, the NTP client code, and the kernel on the server.

As an NTP service provider, what you really want to keep an eye on is the Clock Error Bound that gives you the worst case estimate for how bad the time is that you could be serving to your customers. For the client side, you mainly care about just the accuracy you're actually getting.

SEJeff · on Nov 23, 2022

Yes, I've seen it get down to a few milliseconds of sync on the right hardware (boundary clock on the switches, stratum 0 timeserver with pps, etc), but the protocol only guarantees 1 second of sync. Am I incorrect in that assertion?

akira2501 · on Nov 22, 2022

> But given that NTP (either ntpd or chrony) tends to give me an estimated error of around (tens of) 1e-6 seconds

Is that a hard bound or an average? If it's an average, then what are the limits of the bounds, both in magnitude and duration?

> and PTP can get down to 1e-9 seconds

We use it for audio, and the reason it works well there is because there is no exponential backoff with your peers, allowing for even small timing slips to become enough to notice. 1ms of latency is far too much for our application, we typically aim for 0.25ms.. and we're only running 96kHz. If we lose PTP sync, we notice within a few minutes.

Another advantage of PTP is it can operate as a broadcast and, as the article notes, switches can be PTP aware and help update the timing as the broadcast flows through the network. Conveniently, PTP also allows for multiple timing domains and masters to co-exist on the same network.

It's also an absurdly simple protocol, you can build a receiver for it in about 200 lines of C code. I've actually become quite taken with it since it was forced into our space about 10 years ago.

throw0101a · on Nov 22, 2022

> Another advantage of PTP is it can operate as a broadcast

NTP can work via broadcast and multicast.

hoseja · on Nov 22, 2022

I wouldn't call it absurdly simple.

yAak · on Nov 21, 2022

The way I read it: a larger "Window Of Uncertainty" impacts performance. Having a smaller WOU by using PTP gave them a 100x perf increase in their experiments with the commit-wait solution for "ensuring consistency guarantee."

I'm completely ignorant on the topic in general though... so I'm probably missing something. =)

pclmulqdq · on Nov 21, 2022

I have never seen NTP in a datacenter of reasonable size get below an error of 1e-4. PTP without custom hardware can easily get 3 orders of magnitude better.

bradknowles · on Nov 23, 2022

PTP is harder to do within a datacenter. You either need hardware support in every switch and network interface, or you're doing software timestamping at which point you might as well be using NTP. And PTP doesn't support DNS, only IPv4 or IPv6 or Layer 2 MAC addresses.

PTP also requires some intelligence with regards to configuring your Ordinary Clocks, your Transparent Clocks, and your Border Clocks. And you have to have these configured on every device in the network path.

PTP does have a unicast mode as well as multicast, which can help eliminate unknowable one-way latencies.

It's a pain.

Check the documentation at https://linuxptp.nwtime.org/documentation/ and especially https://linuxptp.nwtime.org/documentation/ptp4l/

forrestthewoods · on Nov 21, 2022

> How many things need to worry the latency of signal travelling ~300mm?

Arguably every program? The slowest part of modern programs is memory access. L1 cache memory access is ~1 nanosecond and RAM is ~50 nanoseconds.

Is 49 nanoseconds a lot? No, if you do it once. Yes, if every line of code pays the price.

stingraycharles · on Nov 22, 2022

> But given that NTP (either ntpd or chrony) tends to give me an estimated error of around (tens of) 1e-6 seconds, and PTP can get down to 1e-9 seconds, I'm not sure how many data centre applications need that level of accuracy.

I know that in trading, auditing trades / order books requires extremely accurate timing, and they typically deploy GPS hardware to get the required level of accuracy. As GPS is accurate to a level of 30ns, 1e-6 to 1e-9 (1ns) is exactly the kind of improvement needed to not need GPS hardware anymore.

bradknowles · on Nov 23, 2022

A good Stratum-1 GNSS from a company like Meinberg or Microchip will include a Rubidium or Cesium reference clock that is then disciplined by GPS and can get you down to sub-nanosecond level accuracy.

throw0101a · on Nov 22, 2022

> […] improvement needed to not need GPS hardware anymore.

You're simply trading NTP hardware for PTP hardware (grandmaster clocks). There is no way to get to 1e-9 scales without hardware support.

bradknowles · on Nov 23, 2022

These days, all the vendors I know of are shipping hardware that does both. So, it's not just an NTP server, it's also a PTP server. Maybe you don't use one or the other part of that functionality, or maybe they are licensed separately, but they are there.

kierank · on Nov 22, 2022

Most professional network cards have PTP support and a grandmaster is cheap enough or your colo provider will provide PTP as a service.

zh3 · on Nov 21, 2022

It seems a little like it might mean that software errors caused by race conditions can be reduced by making the timing windows smaller. As this is a complex area (it might not appear so, but it is) the pragmatic solution could be to reduce the windows rather than fix the issue (maybe an FB engineer can speak OTR).

DannyBee · on Nov 22, 2022

First, 300mm is not the real measure in practice for the common use case. PTP is often used to distribute GPS time for things that need it but don't have direct satellite access, and also so you don't have to have direct satellite access everywhere.

For that use case, 1ns of inaccuracy is about 10ft all told (IE accounting for all the inaccuracy it generates).

It can be less these days, especially if not just literally using GPS (IE a phone with other forms of reckoning, using more than just GPS satellites, etc). You can get closer to the 1ns = 1ft type inaccuracy.

But if you are a cell tower trying to beamform or something, you really want to be within a few ns, and without PTP that requires direct satellite access or somet other sync mechanism.

Second, I'm not sure what you mean by special. Expense is dictated mostly by holdover and not protocol. It is true some folks gouge heavily on PTP add-ons (orolia, i'm looking at you), but you can ignore them if you want. Linux can do fine PTP over most commodity 10G cards because they have HW support for it. 1G cards are more hit or miss.

For dedicated devices: Here's a reasonable grandmaster that will keep time to GPS(/etc) with a disciplined OCXO, and easily gets within 40ns of GPS and a much higher end reference clock i have. https://timemachinescorp.com/product/gps-ntpptp-network-time...

It's usually within 10ns. 40ns is just the max error ever in the past 3 years.

Doing PTP, the machines stay within a few NS of this master.

If you need better, yes it can get a bit expensive, but honestly, there are really good OCXO out there now with very low phase noise that can more accurately stay disciplined against GPS.

Now, if you need real holdover for PTP (IE , yes you will probably have to go with rubidium, but even that is not as expensive as it was.

Also, higher end DOCXO have nearly the same performance these days, and are better in the presence of any temperature variation.

As for me, i was playing with synchronizing real-time motion of fast moving machines that are a few hundred feet apart for various reasons. For this sort of application, 100us is a lot of lag.

I would agree this is a pretty uncommon use case, and I could have achieved it through other means, this was more playing around.

AFAIK, the main use of accurate time at this level is cell towers/etc, which have good reasons to want it.

I believe there are also some synchronization applications that have need of severe accuracy (synchronous sound wave generation/etc) but no direct access to satellite signal (IE underwater arrays).

bradknowles · on Nov 23, 2022

That's the one I was thinking about getting for my home lab. I'm also looking at: https://www.meinbergglobal.com/english/products/synchronizat...

I already have an ancient Meinberg Stratum-1 somewhere that I should pull out of storage and send back to Heiko so that they can put it in a museum. These days, for proper datacenter use, I'd go for something like this one: https://www.meinbergglobal.com/english/products/modular-2u-s...

BaconPackets · on Nov 21, 2022

I would be really curious to drill down into one of these problems. Is super precise really the solution?

bergenty · on Nov 22, 2022

Probably high frequency hedge funds because being first matters a lot. That 300mm is the difference between winning and losing, it’s pretty binary up there.

RockRobotRock · on Nov 21, 2022

About a year ago, Linus from LTT collaborated with the Facebook team to make a video about the subject: https://youtu.be/JK3eTGkX6qY

Very cool!

qwertox · on Nov 21, 2022

Thanks for sharing! It's really relevant and a nice overview of the problem.

kazinator · on Nov 21, 2022

It is not accurate to say that PTP is a predecessor to NTP.

You need both.

PTP synchronizes clocks to a ridiculous precision, like down to nanoseconds. To do that, it uses support in the ethernet hardware. Hardware adds precision stamps to the PTP-related ethernet frames, so that the time calculations are free of jitters induced by the higher network stacks.

A cool thing is that PTP-aware routers/switches can actually rewrite the timestamps in PTP packets to subtract their own propagation delay accurately.

Something in the network has to use NTP, if it is important for the devices to have the correct calendar date and time; you don't use PTP over the Internet to get time from a remote server.

The real-time clock being synchronized isn't even the system clock in the first place, but a high resolution RTC that is in the network adapters and switches.

As an additional utility on top of the PTP stack, there is the possibility of synchronizing the host system's clock with the help of PTP.

PTP is used for things like generating precise signals in certain applications. E.g. Some devices that have PTP support can be programmed to generate periodic pulses which are clocked by the sychronized real-time clock. With PTP, two such signals from separate devices on the network can align very precisely. Such an application doesn't care about the time of day, just that the signals are synced.

willis936 · on Nov 21, 2022

Just because high precision tasks don't care about time of day as much, PTP still uses TAI timestamps. The only time of day information missing is leap seconds, and those were a mistake that is being corrected. You don't need NTP for time of day when PTP is used.

zie · on Nov 21, 2022

The point being you need a source of truth for PTP to figure out what the current date and time are. You won't be pushing PTP over the public internet, that's ridiculous. You can use a GPS/sat sync or atomic clock or NTP, doesn't matter, but something has to tell PTP what time it is. If you don't, then you will be subject to quartz crystal drift. It may or may not matter to your application(s), but for most people that care about time, they want the TAI timestamps to be at least reasonably accurate.

willis936 · on Nov 22, 2022

Is that really useful though? As soon as you need to synchronize your dataset beyond the local PTP network NTP-based timestamping becomes useless. You need a local GPS-based time reference backing PTP.

bradknowles · on Nov 23, 2022

Most clients won't speak PTP or know how to deal with it.

But NTP is something that many can deal with.

So, use PTP as a refclock for your NTP service, and then let the clients deal with an interface they already know and understand.

Those clients that really care can deal with the additional complexities of PTP, if you want to let them.

zie · on Nov 23, 2022

It totally depends on your use case, and what you actually need in reality.

In general, once you need/want anything past what NTP can deliver for you, you really, really need to understand your use case and then design your time services to meet your specific needs.

bradknowles · on Nov 23, 2022

PTP uses the time stamps that are configured. By default it might use TAI, but that's not the only option.

Feel free to look at the documentation I linked above.

milleramp · on Nov 21, 2022

Companies have been using PTP to synchronize (frame sync) networked cameras for many years. It is much better than wiring up a separate ttl or differential signals. It is amazing the accuracy you can get with this protocol.

gh02t · on Nov 22, 2022

I've also seen seismologists use it to sync geophones and other instruments. Synchronization is very important for doing time sensitive measurements for things like localization of earthquakes.

epgui · on Nov 21, 2022

This is incredible. If I knew I'd be working on things like this (as opposed to optimizing ad revenue or "user engagement" by A/B testing button copy), I'd possibly consider working at Meta.

thatfunkymunki · on Nov 21, 2022

If this is the only thing stopping you, you should apply and tell the recruiter you would like to work in infra. People that don't want to work on product stuff don't get "roped into it", I know people who have worked at Meta/FB for years and have never touched user-facing code or ads stuff.

capableweb · on Nov 21, 2022

> Meta/FB for years and have never touched user-facing code or ads stuff

Does that really matter in the end? If you're against writing code for ads/optimizing for quick dopamine hits, is it really so different to be writing the code that shows the ads vs the code that sets up the infrastructure to serve the ads? Or even infrastructure to support the developers who write the code for the ads?

If you don't want to support the ad-driven internet, don't work for companies who's entire business model is extracting as much data from users as possible in order to show the "right" ads.

epgui · on Nov 21, 2022

Yeah that's exactly my concern!

I want to contribute to building cool things, and building a really cool component becomes much less inspiring if it's only used to steer nuclear missiles. (dramatic but not unreasonable example)

elcomet · on Nov 21, 2022

You could apply to work on the research infrastructure (for instance for AI)

capableweb · on Nov 21, 2022

And instead of writing infrastructure for the developers who write code for ads, you now write AI software/do AI research for a company in order to attract enough talent so they can hire developers to write infrastructure for the developers who write code for ads?

Or you know, just don't work for companies who tend to collect as much user data as they can?

elcomet · on Nov 22, 2022

But at least you're contributing to science

jtsiskin · on Nov 21, 2022

At Meta you choose what team you want to work on during a 3 month “bootcamp” period, very different from Google, Amazon, etc. There’s enough teams solving hard distributed system problems that you’re guaranteed to never need to think about button copy

rawrfml · on Nov 22, 2022

Hm, the number of teams that work on these distributed systems is smaller than I think you believe exists at Meta, though I suppose it depends on the type of system being built. Even the internal post on this topic links to a note from last year that talks about the 3 teams in the core systems area that needs to work directly with PTP.

The vast majority of people will probably end up working on product-related team compared to infrastructure-related teams.

loeg · on Nov 21, 2022

You can also be directly hired onto a team, skipping bootcamp (some roles).

distrill · on Nov 22, 2022

you skip team selection, not bootcamp. every engineer and EM go through bootcamp

loeg · on Nov 22, 2022

Oh, sure.

bagels · on Nov 22, 2022

In good times, yes, there are more roles than bootcampers. In bad times (now) there are more bootcampers than roles...

scheme271 · on Nov 21, 2022

There's other places that are working on this although they probably don't pay as much as Meta. CERN and a few other european physics projects having been using this to create the white rabbit project over the last decade or so ( https://en.wikipedia.org/wiki/White_Rabbit_Project ).

bradknowles · on Nov 23, 2022

We're doing some interesting things with time sync at AWS, and we are doing so at scale for millions of customers around the world, unlike the small scale stuff that Meta has announced here. Most recently, our team recently announced the Amazon NTP Pools service at time.aws.com.

We've got more cool stuff we are working on.

If we can get past this hiring freeze problem, we still have a couple of slots we need to fill. If you know a lot about time sync and the NTP and PTP protocols, and you have a passion for building some of the biggest impact systems on the planet, let me know.

orangepurple · on Nov 22, 2022

This is the same as joining the Sinaloa cartel as a transportation and logistics expert specializing in detection evasion research and development. So many exciting and hard problems to focus on!

CapmCrackaWaka · on Nov 21, 2022

Why do so many tech companies seem to be releasing “secret sauce” for free lately? I see a lot of posts lately detailing how inner production systems work at large companies, and while I’m grateful, I’m curious why the higher ups think it’s worthwhile to release this information.

jtsiskin · on Nov 21, 2022

Facebook in particular is very open about its data center, see https://www.opencompute.org/

See “Commoditize Your Complement” - https://www.joelonsoftware.com/2002/06/12/strategy-letter-v/ , https://www.gwern.net/Complement

bombcar · on Nov 21, 2022

It's an effective marketing campaign for "you should work here" - VERY effective.

And lots of this stuff is NOT secret sauce, it's basic business building-blocks that they need. It's not the advertising formulas.

I'm sure FAANG is very VERY happy that they can just run Linux everywhere and don't have to pay Sun or Microsoft a massive per-CPU fee for everything they do.

NBJack · on Nov 21, 2022

I think you'd normally be right, but Meta just laid off 10k+ people, and is currently in a company wide hiring freeze until at least Q1 of 2023. Much of the rest of FAANG is either doing one or both as well.

In Meta's case, they fired a lot of "boot campers" as well, some only a few days into their job and before they had a team. Some returning interns even had their offers rescinded.

Not to sound cynical, but this article feels more like "let us release this to show impact and keep our jobs." Or damage control for their engineering hiring image.

jauer · on Nov 22, 2022

It's likely that this article was in the works well before the layoffs. Depending on field it can take a ridiculously long time to get a tech blog post approved at Meta.

bombcar · on Nov 21, 2022

"Positive press" is just preemptive damage control.

And anyone knows that a "company wide hiring freeze" is surprisingly thawable in cases where it needs to be.

pcthrowaway · on Nov 21, 2022

People are still getting hired at Facebook/Meta, they're probably just targeting specific people or hiring in cheaper countries now

madsmith · on Nov 21, 2022

1) it makes the engineers happy to be able to speak about the interesting projects they are working on and give back to the wider community.

2) it serves as a venue for attracting other talented candidates who are likewise minded on working on technical problems.

3) when I was employed of FB it was a relatively flat hierarchy, which is to say there weren’t that many higher ups to convince that this should be done.

foobiekr · on Nov 21, 2022

PTP isn’t secret sauce. Routers I worked on were doing PTP in like 2009. 1588 was standardized in 2002.

jimmySixDOF · on Nov 22, 2022

There was the TICTOC workgroup, PWE3 for psuedowires on MPLS and whole labyrinth of IEC standards not to mention closed ring optical network clock syncs from SONNET/SDH. I suppose its secret sauce for social media software.

SpaceManNabs · on Nov 21, 2022

> 1588 was standardized in 2002.

offtopic but this sentence is beautiful to me.

throw0101a · on Nov 21, 2022

> Why do so many tech companies seem to be releasing “secret sauce” for free lately?

Is it really their "special sauce" though? Do these types of releases actually give away how these companies make money?

In this particular case, telling the world how to get to nanosecond levels of timekeeping doesn't really help any competitors take away Metabook's revenues or profits.

benlivengood · on Nov 21, 2022

How FB and Google make money isn't secret either. They have a lot of people looking at their web pages and apps, they record a lot of prior interactions with the ads these people see, they train giant models in near-realtime to predict ad quality, serve the ads that maximize expected profits in an ad auction, and record the subsequent clicks, views, and attributed conversions to convince advertisers to keep spending.

ElevenLathe · on Nov 21, 2022

OTOH if there turns out to be some other way of making money on nanosecond timekeeping, then keeping it secret will help them get into that market before competitors can.

eastbound · on Nov 21, 2022

Unless they’re playing 4-D chess and the goal is to make competitors lose time implementing CAP theorem databases, nanosecond precision and React, while they’re pulling ahead.

xboxnolifes · on Nov 21, 2022

This is just spending "potential money" for engineering talent advertisements.

maximinus_thrax · on Nov 21, 2022

There is no 'secret sauce' here. The real 'secret sauce' stays secret. The goal of articles such as this one is mainly recruiting, of the 'look at what we're doing here, you should totally want to work here' variety, and it is very effective.

neilv · on Nov 21, 2022

Maybe it's the usual reasons that we see companies releasing stuff, not only secret sauce?

I think usually it's for company PR for various purposes (counteract bad press, attract new hires, etc.).

Sometimes to generate a bigger hiring pool that knows the stuff you're releasing. (And the open source story about crowdsourcing contributions, which sometimes might be worth the costs.)

I've also seen it around partnerships and customer collaborations and competition. Including to "commoditize your complement", or to kill one thing with what they'd rather use. (And, in industry/tech standards, corporate representatives often have motivation to try to bias the standard to their employer.)

In some cases, it's for individual employees' careers. Think how academic and some R&D jobs want research publications, or how some companies want people who do "talks".

Sometimes also for getting code/docs public, so employees can still use it when they leave.

EwanToo · on Nov 21, 2022

The question for 2023 is "How many companies will be investing in this without a clear revenue stream?"

It's quite likely we're entering a period where the current baseline performance of core infrastructure will be considered "good enough" and companies won't employ people to work on these general improvements.

bsder · on Nov 21, 2022

Resume building for people to get their next job?

Lots of these companies opened up the ability for internal engineers to write tech blogs for the purposes of recruiting since the FAANGs were all in competition with one another. Presumably, the VPs haven't gotten around to closing the pipeline yet.

m463 · on Nov 21, 2022

I suspect performance metrics

Many higher-level engineering positions have performance reviews with metrics that include some sort of industry expertise or external collaboration.

nappy-doo · on Nov 22, 2022

I worked at FB for a couple of years, in a group that sat near data storage. I used to argue with those folks that without atomic clocks, FB should just shut its doors because there was no way to have reasonable limits on consistency.

While I was hyperbolic in my pronunciation, I wasn't wrong in my data. It's been more than 5 years since those conversations. Glad someone got it done. Took 'em far too long.

dis-sys · on Nov 22, 2022

Nice tech blog, really love it.

From my understanding, basically this new iteration is better then their previous generation chrony based one because the uncertainty in the One Way Delay calculation is largely removed by having those Transparent Clocks capable of reporting their queuing delays. Basically the asymmetry of the delays are gone?

Personally, I don't think Chrony should be considered as a NTP implementation as clearly it can utilize those PTP hardware timestamps as well and those hardware timestamps are the "secret sauce" of PTP's high accuracy. With those PTP enabled switches, together with the fact that Chrony can already send NTP packages as PTP packages, surely Chrony can leverage such new capabilities as well with some reasonable updates.

zzixp · on Nov 21, 2022

I've been seeing a lot more of these Meta engineering articles on HN - I'm a fan! If you're writing them, keep up the great work!

cheriot · on Nov 21, 2022

Agreed. I wonder if the motivation is competing for resources with Meta paying more attention to costs now.

shreezus · on Nov 21, 2022

Yes! I had no idea Meta actually invested in innovation/engineering like this. I always assumed most of their work was just in ad tech & "metaverse".

pcthrowaway · on Nov 21, 2022

I mean, they've created a whole bunch of the tech required to support their scale (React, Jest, PyTorch)

coolestguy · on Nov 21, 2022

That's why they are promoting themselves so much on HN now - to encourage talent to come to Meta

jeffbee · on Nov 21, 2022

Great article, but I have to quibble about the casual ease with which they wave away the complexity of accessing these timestamps. The Linux APIs for accessing them is totally absurd. See for example the gRPC code that associates hardware (possibly) timestamps with messages, for tracing and other reasons. You have to re-arm the timestamp option before every send. And the whole concept of a timestamp for an ethernet frame maps poorly to stream sockets.

https://github.com/grpc/grpc/blob/master/src/core/lib/iomgr/...

gpderetta · on Nov 21, 2022

I don't see the code rearming it at every send nor it matches my experience. The code sets the sock option lazily, and only once . The option is needed to configure the driver and possibly the card. Then the timestamp option need to be passed to the send/recvmsg call to phisically retieve the send/recv timestamp.

Far for being absurd it seems a very straightforward extension of BSD socket API.

amluto · on Nov 21, 2022

I’m wondering whether the extremely careful GNSS part is really needed. A microsecond of offset between two servers in the same datacenter could easily matter, but I suspect that, if an entire datacenter were off by a microsecond, everything would be fine — communicating from that datacenter to anywhere else will take well over a microsecond, so an offset of this type would be a bit like the datacenter wiggling around in space a bit.

On a different note, there’s an Intel feature called the Always Running Timer. In theory one ought to be able to directly determine the TSC <-> NIC clock offset using the ART. I’m not sure anyone has gotten this to work, though.

walrus01 · on Nov 21, 2022

Having rooftop mounted GNSS receive antennas for GPS+GLONASS is extremely common in telecom and ISP infrastructure applications. It's sort of a belt and suspenders approach to obtaining time from low stratum NTP sources and also having a local GNSS timing source to reference from.

Or for use in a case where the network has a total absence of connectivity to any internet-based NTP sources (maybe because your management network doesn't talk to the internet at all, for many good reasons), and in the event of loss of transport connectivity to your own somewhere-in-the-region NTP servers in your management system, you want to be absolutely sure the system clocks on your local radio transport equipment, DWDM equipment, metro ethernet stuff are all precise.

Using receive-only GPS data as a master time reference source is effectively saying "we think that getting the time as it's set by the USAF/Space Force people at Schriever AFB, who run the GPS system, should be treated as our master point of reference and all other clocks should derive from that". It's not a bad policy, as such things go, because GPS maintaining extremely accurate time is of great interest to the US federal government as a whole.

Even a fairly small LTE cellular rooftop site, monopole or tower often has a similar receiver. It doesn't add a lot of cost.

amluto · on Nov 22, 2022

Meta seems to have gone with the expensive option. I think it’s this:

https://www.mouser.com/ProductDetail/HUBER%2bSUHNER/Direct-G...

Admittedly, at their scale, this is peanuts. But I wouldn’t buy one of these for a scrappy startup :). SparkFun will sell a perfectly serviceable kit for a few hundred dollars.

(If you are worried about lightning, then GPSoF looks like cheap insurance.)

bradknowles · on Nov 23, 2022

Their Calnex Sentinel equipment they use for measuring their signal is probably a lot more expensive than that (see https://calnexsol.com/en/product-detail/1033-sentinel).

You wouldn't want to build your entire monitoring system on top of that and be forced to deploy those at hundreds of datacenters around the world.

error503 · on Nov 22, 2022

Accurate delay compensation is necessary to enable redundancy. IF you need multiple GrandMasters at different locations in the facility, using independent RFoF systems and antennas, they will have different GNSS delays, and that difference will propagate down into uncertainty on the hosts. There are other ways to eliminate this uncertainty than to characterize the full delay, but if the racks are on opposite ends of a giant data centre, that might be as or more difficult than just going through those motions.

If you just have one GM, then sure, the delay means you will have a larger fixed offset from TAI/UTC, but that won't be consequential, and you'll still get the benefits of a tightly synchronized monotonic clock. Until that GM fails, and it all goes haywire.

bradknowles · on Nov 23, 2022

It's a hard problem to solve. You end up doing something like the NIST TMAS service (see https://www.nist.gov/programs-projects/time-measurement-and-...) using differential common view measurements to create a "Multi-Source Common-View Disciplined Clock".

nimish · on Nov 22, 2022

Given Meta's resources I'm surprised they didn't go whole-hog with the high accuracy profile (derived from CERN White Rabbit) Why not set up SyncE while you're at it? Not like the hardware cost is any issue and Meta can hire hardware engineers.

Might as well get ps level precision with sub-ns accuracy

dis-sys · on Nov 22, 2022

A quick question related to this blog -

they mentioned that most modern NICs have hardware timestamp capabilities, that is consistent with what I saw. The question is what is the availability of those PTP ready & enabled routers/switches capable of acting as a Transparent Clock? I have two concerns, most of existing routers/switches are not PTP ready, even for those ones with PTP capabilities, its PTP related features may not be enabled in production. Any experience or numbers to share? thanks!

wmf · on Nov 22, 2022

All recent data center switches I've seen advertise PTP; maybe it doesn't work when you turn it on though. Meta makes their own switches so they can enable and debug the software.

vanshg · on Nov 21, 2022

> It’s absolutely essential to ensure an open sky and install a solid stationary antenna

What effect, if any, might a major earthquake have on the time calculation?

matheweis · on Nov 22, 2022

That’s a very interesting question.

1ns of error equates to ~30cm of error (from speed of light), so:

You might get a couple of ns of error from the initial s and p waves. (Assuming on the order of 50cm displacement)

If you’re on the fault and there is a major displacement (say order of 1-2 meters) there might be several more ns of error.

Practically, this seems to suggest that by the time that displacement from an earthquake would cause a significant loss of clock accuracy (say > 10ns), you most likely have bigger problems to worry about..

waynesonfire · on Nov 21, 2022

I remember Google doing this many years ago, does anyone recall the papers that describes distributed storage systems that benefit from highly accurate clocks? Was it spanner or the next generation version of it? I can't remember. I recall finding it fascinating but out or reach since I think they were putting hardware based atomic clocks on their systems, sounded super proprietary.

anonymousDan · on Nov 21, 2022

Yeah, Spanner and the Truetime API. Would be interesting to know whether there are many differences between the implementations (or indeed the abstractions offered). I always wonder whether there are rare occasions where such systems fail (e.g. the bounds reported don't actually capture the true time due to some bug), and what the resulting implications are for systems built on top of them.

bombcar · on Nov 21, 2022

I assume the reply to the write includes this new super accurate timestamp? So the read server knows it's behind a bit as the read request includes it also?

Melatonic · on Nov 22, 2022

I have to say I am not a fan of Facebook in general (their primary business that is) but their engineering is always impressive. And equally impressive is that they do everything in house.

An interesting movie might be a dystopian future where Facebook itself falls out of use but some type of rogue group takes over the Facebook datacenter infrastructure and uses it for their own purposes.

s1mon · on Nov 21, 2022

It's interesting that Meta is doing such serious low level R&D like this, but it's funny because they can't even keep their notifications in sync with their web site. I get a notification (from the mobile app, I think) that Joe has a birthday today, I go to the web page to wish them a happy birthday, and despite reloading, the web site still shows the birthdays from the previous day. Then I have to dig around more to find Joe's page to post a HBD.

So as user, this level of time synchronization seems to be many orders of magnitude of over kill. I'm sure at some level it is actually important, but it seems like the database sync or whatever is going on with notifications is woefully lagging.