> The main reason we can not use system clocks is that system clocks across servers are not guaranteed to be synchronized.
Sentences like this will make me never regret to moving my infrastructure to bare-metal. My clocks are synchronized down to several nano-seconds, with leap-second skew and all kinds of shiny things. It literally took a day to set up and a blessing from an ISP in the same datacenter to use their clock sources (GPS + PTP). All the other servers are synchronized to that one via Chrony.
Even if are "down to several nano-seconds", a slight clock drift can be the different between corrupt data or not, and when running at scale, it's only a matter of time before you start running into race conditions.
For a small web app, fine, but if you're running enterprise level software processing billions of DB transactions per day, clocks just don't cut it.
That’s why you buy NICs with hardware timestamp support and enable PTP. You can detect clock drift within a few packets.
Race conditions are mitigated, not by clocks, but by other logics. The clock was just something done after frustrations in reading distributed logs and seeing them out of order. Logs are basically never out of order any more and there is sanity.
And the really big ones do much more epic clock management, and keep things wildly in sync across the globe. See the recent work with PTP that fb and others are trying to do openly. Google and others have their own internal implementations.
In the cloud, you have very little control over the clocks. If you have baremetal, there's almost always the ability to configure (at least) GPS time sync for a server or two. If you get to the point where you have entire datacenters, there's no excuse NOT to invest in getting good clocks -- and is likely a requirement.
"Not bare metal" does not imply "the cloud". You might be part of a company that doesn't give teams access to raw servers, you might be paying for VMs with dedicated resources from a local data center, or any number of other situations.
Not everyone has the luxury of being able to procure and install hardware and/or run an antenna to someplace with gpc reception.
I’d call that a “private cloud” and I think that’s the actual term for it. You can run a private cloud on bare metal, and if you are, then you can (as in physically able to) have control over things. If you have network cards with timestamp support, you can use PTP, or any number of things. If your org doesn’t support that, that doesn’t mean it isn’t a possibility, it just means you need to find someone else to ask.
No, but you can detect skew in just a few packets and decide if you want to drain the node. If a node continues to have issues, put that thing on eBay and get another. Or, send back the motherboard to the manufacturer if it’s new enough.
Node clocks can be plenty reliable, but like any other hardware, sometimes they get defects.
the A->B link is under DDoS or whatever and delivers packets with 10s latency
the A->C link is faulty and has 50% packet loss
the A->{D,E,F} links are perfectly healthy
node B has one view of A's skew which is pretty bad, node C has a different view which is also pretty bad for different reasons, and nodes D E and F have a totally different view which is basically perfect
you literally cannot "detect skew" in a way that's reliable and actionable
issues are not a function of the node, they're a function of everything between the node and the observer, and are different for each observer
even if clocks were perfectly accurate, there is no such thing as a single consistent time across a distributed system. two events arriving at two nodes at precisely the same moment require some amount of time to be communicated to other nodes in the system, that time is a function of the speed of light, the "light cone" defines a physical limit to the propagation of information
I think you’re missing the forest for the trees a bit. In the code, order mostly doesn’t matter and where it does matter there is a monotonic clock decoupled from physical time, using an epoch framework (https://tli2.github.io/assets/pdf/epochs.pdf).
The clock sync is just to keep human-readable logs in order for debugging. It’s ok if it is sometimes out of order, though in practice, it never is.
The reason services like AWS or Azure are distributed is so that your resources are not concentrated which helps with fault tolerance. If your datacenter goes down, your whole service goes down. Also as was mentioned in other comments, asynchrony also applies in the same datacenter.
That’s what insurance is for. Far less expensive than maintaining fault tolerance at the current scale. If there were a fire or something (like Google’s recent explosion in France from an overflowing toilet), we’d lose at least several minutes of data, and be able to boot up in a cloud with degraded capabilities within ten minutes or so. Not too worried about it.
I never understood why "The main reason we can not use system clocks is that system clocks across servers are not guaranteed to be synchronized." is considered True even with working NTP synchronization?
A few milliseconds difference can mean all the difference in the world at high enough throughput (which is about the best you can get with NTP). When you can control the networking cards and time sources, you can get it within a few nanoseconds across an entire datacenter, with monitoring to drain the node if clock skew gets too high.
this only applies if you carry over physical time from one machine to another, assuming perfect physical synchronization of time.
if you stick to a single source of truth - only one machine's time is used as a source of truth - then the problem disappears.
for example instead of using java/your-language's time() function (which could be out of sync across different app nodes) just use database's internal CURRENT_TIMESTAMP() when writing to db.
another alternative is compare timestamps with up to 1 minute/hour precision, if you carry over time from one machine to another. That way you have a a buffer of time for different machines to synchronize clocks over NTP
You can rephrase this question in terms of causality. Why does it matter that we know if some process happens before some other process at some defined(?) level of precision.
There are ways around this but they are restrictive or come at the cost of increased latency. Sometimes those are acceptable trade offs and sometimes they are not.
root of problem is using clocks from different hosts (which could be out of sync), and carrying over that time from one machine to another - essentially assuming clocks across different machines are perfectly synchronized 100% of time.
if you use a single source of truth for clocks (simplest example is use RDBMS's current_timestamp() instead of your programming language's time() function), and the problem disappears
You can build system that do not require physical clock synchronization, but using physical clock often lead to simpler code and major performance advantage.
That's why Google built True Time, which provides physical time guarantee of [min_real_timestamp, max_real_timestamp] for each timestamp instant. You can easily know the ordering of 2 events by comparing the bounds of their timestamps as long as the bounds do not overlap. In order to achieve that, Google try to keep the bound as small as possible, using the most accurate clocks they can find: atomic and GPS clocks.
Can someone explain why, in an interview context, someone with the technical ability to understand and assess and write and communicate a deep set of domain-specific knowledge like this....might still be asked to do some in-person leetcode tests? How does on the fly recursive algo regurgitation sometimes mean more than being able to demonstrate such depth of knowledge?
If you can't differentiate buzzwords from signals you shouldn't be interviewing.
Plus how do you expect to get good signals from a leetcode whiteboard interview for someone who spends most of their time designing systems and only writes code when it gets to the point where it's faster and less frustrating to pair program vs explaining what needs to get implemented?
To clarify, I don't have a good answer: I still participate in leetcode style interviews (though system design is another component) - but although I sing the song and can't come up with anything better, I don't think it's the best way to go
Or what about having a repository proving these very concepts with almost 2000 commits?
In my experience things like publications, online code repositories, and facts are more than irrelevant but not much more because people don’t know how to independently evaluate these things. Worse, attempting to evaluate such only exposes the insecurity of people not qualified to be there in the first place.
Far more important are numbers of NPM downloads and GitHub stars. Popularity is an external validation any idiot can understand. But popularity is also something that you can bullshit, so just play it safe treat everyone as a junior developer and leet code them out of consideration.
It's easy to fake projects and commits. And when they aren't faked, you can't have an expectation of every candidate having these because it's so much work. And if you decide to bypass some of your usual interview process because a candidate has a project, now you aren't assessing all candidates equally.
I've interviewed people for roles as low-level C++/kernel programmers who did not know what hexadecimal was. Having a quick "What's 0x2A in decimal? Feel free to use paper & pen."[1] question can be a significant time-saver / direct people to more appropriate roles, if any.
[1] Starting to do math with non-base-10 numbers was already a pass, regardless of the number you reach, you'd normally use a computer for that. But it really isn't too hard to do in your head directly, for anyone who's dealt with binary data.
And intentionally excluding candidates is the whole point of designing an interview process, it's lazy to lift your arms up and declare that they're all biased.
On that point, can anyone recommend any good reading, info sources about technical interviewing methods etc? I recently had an interview that was just being asked to remember linux commands like for a certification exam off the top of my head, and It made me wonder what the point was, and if there is better ways.
Because puzzle solving is an iq proxy and iq is correlated with job performance? But really, just do an iq test. Maybe because interviewers are bad at distinguishing gpt style BSing from actual knowledge and need a baseline test.
The correlation between IQ and job performance is typically weak[0] (weaker than the correlation between conscientiousness+agreeableness on job performance in some studies[1]) with a more modest correlation for "high complexity jobs".
Interesting excerpt from [0]:
> Finally, it seems that even the weak IQ-job performance correlations usually reported in the United States and Europe are not universal. For example, Byington and Felps (2010) found that IQ correlations with job performance are “substantially weaker” in other parts of the world, including China and the Middle East, where performances in school and work are more attributed to motivation and effort than cognitive ability.
Sorry for not replying earlier, but I am really grateful for you providing those links. I had not known that the iq-job perf had been challenged and it means I need to adjust my priors when looking at candidates.
> In a distributed system, it is not possible in practice to synchronize time across entities (typically thought of as processes) within the system; hence, the entities can use the concept of a logical clock based on the events through which they communicate.
Also, in order to be constructive: the kindle version was published 21-June and thus is available right now. Sadly the kindle preview does not include the table of contents
I know the author of the blog and the book personally. And the author has built small storage engines himself and grokked code bases of Cassandra and other DBs to understand the patterns in code and not just as theoretical concepts. The blogs has code excerpts as well. Highly recommended read for the hands on folks.
Commenters here are missing the point — the intent is to build otherwise isolated systems with properties that are very difficult to control, such as varying amounts of clock skew, arbitrary process pauses due to GC cycles or CPU consumption and build a system on top that allows for the storage of mutable state. An example would be a cluster of etcd or dqlite instances (which Kubernetes in multi-master setups also use BTW), or at a larger scale, something like DynamoDB.
It’s one of the more easily approached resources on the design of distributed systems, and a good read.
> you definitely cannot rely on clocks to determine (reliable) ordering across machines
You can if you consider "unknown/possibly concurrent" also a valid outcome: for any 2 events A and B, True Time can definitely answers whether A is before B, A is after B, or A is possibly concurrent with B.
Agree, and the timestamp comparisons you can make when the windows do not overlap are the basis of the ordering guarantees (across machines, even globally).
ordering is a logical property which can be informed by physical timestamps, but those timestamps aren't accurately described as "the basis" of that ordering
I don't see how it isn't based on time. They use GPS and atomic clocks to establish the time, establish an uncertainty window, and in Spanner's case will have transactions wait out that uncertainty to guarantee an ordering (globally).
Look up the case where two transactions affecting the same record share a timestamp (or have overlapping error ranges). An non-timestamp tiebreaker determines the order between the two overlapping commits. It is not unlike the pre-agreement on conflict resolution mechanisms for CRDTs.
The context to my first comment included "timestamps within the same window"
Sentences like this will make me never regret to moving my infrastructure to bare-metal. My clocks are synchronized down to several nano-seconds, with leap-second skew and all kinds of shiny things. It literally took a day to set up and a blessing from an ISP in the same datacenter to use their clock sources (GPS + PTP). All the other servers are synchronized to that one via Chrony.