Reading Lamport, again

hoggle · on June 27, 2014

Jane Street is one of the few companies which are betting heavily on OCaml, it makes sense to go all functional in the high frequency trading world I guess. It's purportedly been C/C++ for most of that industry, probably not the best toolset for keeping error rates low in a complex environment. Fascinating to read how far beyond language decisions they need to think to shield themselves against outages (and competitors?).

I did a little bit of experimentation with OCaml by playing with bitcoin exchange APIs during the China boom some months ago, but I unfortunately didn't push further to a point where I'd be using Jane Street's open source libraries. It still was very interesting and IMHO totally worth it if not just for the functional programming practice.

Relevant video on how OCaml gets used at Jane Street:

https://www.youtube.com/watch?v=hKcOkWzj0_s

sqrt17 · on June 26, 2014

I've noticed that recent google infrastructure puts medium-to-strict requirements on timeservers. Does that mean that the idea of causality and vector clocks has been abandoned partly or wholly for a greater reliance on absolute synchronicity?

abeppu · on June 26, 2014

I recently reread the Lamport paper discussed in the article, and the Google Spanner paper, to draw comparisons about the relationship between time, consistency and throughput from those two perspectives. The Lamport paper does dedicate its last real section to physical clocks which need to sync with one another, and which must advance at close to the same rate in order to maintain consistency -- but he doesn't constrain them to be close to any "absolute" correct time, which is what the Spanner / TrueTime system heavily relies on.

My tentative conclusion was that the giant difference between 1978 when the Lamport clocks paper was published, and 2012, is access to GPS. The fact that we get to take advantage of a DoD-built fleet of dozens of satellites that are constantly broadcasting extremely accurate times is pretty extravagant. Google can build systems today that rely on clocks in data-centers thousands of miles apart both being very close to "absolute" time, with explicit, conservative bounds on their error, in part through heavy use of GPS clocks. But for most of us that don't have an elaborate system of fancy clocks in our data-centers, Lamport's approach of having machines update one another's clocks (be they logical or physical) as they exchange messages is still a lot more relevant.

makmanalp · on June 26, 2014

I wonder if this means that an evil entity could cause massive data loss by broadcasting fake GPS clock information, which would mess around with the decisions that these databases make regarding consistency. That would be terrifying.

abeppu · on June 27, 2014

The spanner paper doesn't discuss this in detail, but they have atomic clocks in addition to GPS clocks in large part because GPS clocks can fail or be expoited in this way and atomic clocks cannot.

star-trek-fleet · on June 27, 2014

Spanner has "interval" in its time. It uses high-accuracy atomic clocks to narrow the uncertainty window. It actually proves that Lamport's technique is more fundamental, because even with such hardware, you still can run into situation where the system has no idea what the order of events is.

mjb · on June 26, 2014

I don't think that's true. The ideas of causality, logical clocks and vector clocks have not been abandoned in any sense. In fact, there seems to be significant new interest in eventually consistent systems over the last few years, and interest in vector clocks and highly available (in the CAP 'AP' sense) systems is still growing. The excitement about research into CRDTs is a good illustration of this.

More fundamentally, though, Spanner doesn't involve absolute synchronicity in any real sense. Instead, at least as described in their paper, the focus is on using the special clocks to improve some system properties by bounding absolute clock skew. The ideas of causally related operations, and the orderings between these operations, is still important even in that environment.

star-trek-fleet · on June 27, 2014

"It's not terribly surprising that if causality is baked into your specification, then you're going to have to use causality as part of your implementation." I dont understand this statement. Causality is the base of the ordering specification, that's correct to say as "baked into your specification". What does "use causality as part of your implementation."? The paper does not use causality to implement causality, it uses causality to implement mutex. Any contradiction here?

arghbleargh · on June 27, 2014

I don't see causality as defined in this context as needing any extra "motivation". It's pretty clear that if an agent tries to do A and then B, your system probably should not first execute B and then A. Imagine if the instructions in one of your program threads were not guaranteed to be executed in order. You would never be able to write code that does anything meaningful that way.

gioele · on June 27, 2014

> Imagine if the instructions in one of your program threads were not guaranteed to be executed in order.

Modern CPUs do not process instructions in the same order they are stored in memory. Out-of-order CPUs rely on the implicit order created by the dependency between instructions and operands.

As in the Lamport paper, the order in which the instruction are performed is meant to preserve the order in which the effects are produced instead of the order in which the commands are given.

oct · on June 27, 2014

The causal ordering defined in the paper is stronger than that. It says that if process P performs event A and later sends a message to process Q, then any event that occurs in Q after Q receives the message causally follows A. I don't think the blog post was meant to imply that the order of events within a single process does not need to be respected, though it doesn't specifically distinguish the single process case.

davidw · on June 27, 2014

Interesting stuff. I'm working on a system that we think will use version vectors. There are some other ideas like Interval Tree Clocks out there, but they look more difficult to translate into something that can reside in an SQL database. For that matter, it's been a tough challenge trying to translate some of these ideas into something that can work in practice, over HTTP, be sort of transactional in its behavior, and be able to work with single records at a time, rather than trying to transfer everything at once.

EGreg · on June 27, 2014

I think the author misunderstands. The spec is for a mutex, and it's written in an informal language, where the implementation fulfills a more formally expressed spec.