Lamport's Bakery algorithm, demonstrated in Python

eigenvalue · 2024-01-19T15:57:53 1705679873

I was watching the very good oral history of Leslie Lamport from the Computer History Museum and was struck by how he said that the Bakery Algorithm was the thing he was most proud of, despite Paxos being much more elaborate and, in my mind, important. He also described the Bakery Algorithm as being the only thing that he "discovered" rather than "invented"; he later clarified that this was because it "came out of nowhere" and had no precedent to his knowledge from any previous ideas from him or anyone else.

Although it seems pretty simple on the surface, it came as a surprise to many people who had been working hard on the mutual exclusion problem in the early 1970s, with one of Lamport's colleagues (Anatol Holt) describing it as "impossible" upon first hearing it. Not only did it completely solve the mutual exclusion problem, but it managed to do so without requiring atomicity of reads or writes, and it was fully described and proved in just 3 pages. This was something that both Dijkstra and Knuth were unable to do despite each writing at least one paper on the subject prior to Lamport's solution in 1974.

Anyway, I thought it might be fun to demonstrate the principles of how it works in Python. In some ways, Python is uniquely unsuited for this because it has a Global Interpreter Lock (GIL), the main purpose of which is precisely to avoid the kinds of mutual exclusion issues that the Bakery Algorithm solves! Still, we can "fake it" by using the multiprocessing library and by introducing random delays at various points in the code.

Here is the oral history video btw:

https://www.youtube.com/watch?v=SXt3-iZpQQc

svat · 2024-01-19T16:51:33 1705683093

Thanks for sharing! That video is part 1, and here is part 2: https://www.youtube.com/watch?v=uK9yGNuGWKE

And here are the transcripts: https://archive.computerhistory.org/resources/access/text/20... (Part 1, 2016-08-12) and https://archive.computerhistory.org/resources/access/text/20... (Part 2, 2016-11-11)

(All four available from https://www.computerhistory.org/collections/oralhistories/?s...)

The papers by Dijkstra and Knuth that you mention are I think:

• (1965 Dijkstra) "Solution of a problem in concurrent programming control" (in CACM)

• (1966 Knuth) "Additional comments on a problem in concurrent programming control" (Letter to CACM, reprinted with an addendum as Chapter 13 of his Selected Papers on Design of Algorithms)

[From the addendum and comments on the latter, I don't get the impression that it had an error (and I don't get that impression from Lamport's comments in the transcripts either). He cites Gary L. Peterson's "Myths About the Mutual Exclusion Problem" (1981, https://zoo.cs.yale.edu/classes/cs323/doc/Peterson.pdf) as a later improvement though. But I may have misunderstood, as I have not read any of these papers/letters or even understood the question :)]

eigenvalue · 2024-01-20T22:22:45 1705789365

Thanks for adding those references. I realized that Lamport also wrote about this stuff on his website listing all his publications, which is a good read:

"This paper describes the bakery algorithm for implementing mutual exclusion. I have invented many concurrent algorithms. I feel that I did not invent the bakery algorithm, I discovered it. Like all shared-memory synchronization algorithms, the bakery algorithm requires that one process be able to read a word of memory while another process is writing it. (Each memory location is written by only one process, so concurrent writing never occurs.) Unlike any previous algorithm, and almost all subsequent algorithms, the bakery algorithm works regardless of what value is obtained by a read that overlaps a write. If the write changes the value from 0 to 1, a concurrent read could obtain the value 7456 (assuming that 7456 is a value that could be in the memory location). The algorithm still works. I didn't try to devise an algorithm with this property. I discovered that the bakery algorithm had this property after writing a proof of its correctness and noticing that the proof did not depend on what value is returned by a read that overlaps a write. I don't know how many people realize how remarkable this algorithm is. Perhaps the person who realized it better than anyone is Anatol Holt, a former colleague at Massachusetts Computer Associates. When I showed him the algorithm and its proof and pointed out its amazing property, he was shocked. He refused to believe it could be true. He could find nothing wrong with my proof, but he was certain there must be a flaw. He left that night determined to find it. I don't know when he finally reconciled himself to the algorithm's correctness.

Several books have included emasculated versions of the algorithm in which reading and writing are atomic operations, and called those versions "the bakery algorithm". I find that deplorable. There's nothing wrong with publishing a simplified version, as long as it's called a simplified version.

What is significant about the bakery algorithm is that it implements mutual exclusion without relying on any lower-level mutual exclusion. Assuming that reads and writes of a memory location are atomic actions, as previous mutual exclusion algorithms had done, is tantamount to assuming mutually exclusive access to the location. So a mutual exclusion algorithm that assumes atomic reads and writes is assuming lower-level mutual exclusion. Such an algorithm cannot really be said to solve the mutual exclusion problem. Before the bakery algorithm, people believed that the mutual exclusion problem was unsolvable--that you could implement mutual exclusion only by using lower-level mutual exclusion. Brinch Hansen said exactly this in a 1972 paper. Many people apparently still believe it. (See [91].)

The paper itself does not state that it is a "true" mutual exclusion algorithm. This suggests that I didn't realize the full significance of the algorithm until later, but I don't remember.

For a couple of years after my discovery of the bakery algorithm, everything I learned about concurrency came from studying it. Papers like [25], [33], and [70] were direct results of that study. The bakery algorithm was also where I introduced the idea of variables belonging to a process--that is, variables that could be read by multiple processes, but written by only a single process. I was aware from the beginning that such algorithms had simple distributed implementations, where the variable resides at the owning process, and other processes read it by sending messages to the owner. Thus, the bakery algorithm marked the beginning of my study of distributed algorithms.

The paper contains one small but significant error. In a footnote, it claims that we can consider reads and writes of a single bit to be atomic. It argues that a read overlapping a write must get one of the two possible values; if it gets the old value, we can consider the read to have preceded the write, otherwise to have followed it. It was only later, with the work eventually described in [70], that I realized the fallacy in this reasoning."

Link:

https://lamport.azurewebsites.net/pubs/pubs.html#monitor1

exDM69 · 2024-01-20T11:53:42 1705751622

> Even if this read and write are not atomic, the system still works

This statement is not really true for modern computers, especially ARM and other architectures which are more sensitive to memory orderings than x86 CPUs.

The reason is the lack of memory barriers.

The CPU and the compiler are free to reorder loads and stores. Memory barriers are inserted in between to enforce specific ordering.

You need an acquire memory barrier after acquiring a lock to make sure that any loads and stores inside the critical section do not get reordered to before the lock was grabbed.

So while the algorithm may work for acquiring a critical section without atomics, it does not enforce that the stuff inside the critical section stays inside the critical section within the same thread.

In fact most normal loads and stores in ARM are "atomic" but to achieve consistency in concurrent programming they are followed or preceded by an appropriate memory barrier instruction e.g. in std::atomic_load or _store.

gpderetta · 2024-01-20T21:21:13 1705785673

Actually acquire/release barriers or operations are not enough. I'm pretty sure you also need a #StoreLoad for example between Step 1 and Step 2 in the lock algorithm, to prevent the loads from other thread state in step 2 to be reordered before the stores to the own thread state in step 1.

Generally #StoreLoad is required when you need bidirectional synchronization between two or more threads.

Re atomics, I think the algorithm still requires word sized operations to be atomic, but I think that was considered a given. What the algorithm doesn't require is atomic load/stores (or more complex RMW) across more than one word.

Vecr · 2024-01-20T21:46:18 1705787178

Could you implement it using AtomicUsize (using only relaxed operations) in Rust? With AtomicUsize representing the words. You aren't allowed to put pointers in those.

exDM69 · 2024-01-21T09:58:26 1705831106

No, it can not use relaxed memory ordering.

Looking at disassembly of relaxed atomic operations, they are just normal loads and stores with no memory barriers or special instructions. That is not enough to make this algorithm guard a critical section.

gpderetta · 2024-01-21T01:38:17 1705801097

If atomicusize is sequentially consistent, then yes.

Vecr · 2024-01-21T02:26:47 1705804007

It can be (but that's not Relaxed, that's SeqCst), but that can be a lot of overhead in optimization loss and potentially hardware cost on some architectures.

gpderetta · 2024-01-21T10:06:53 1705831613

Sorry I missed you mentioning relaxed. I'm pretty sure that relaxed atomic access is not sufficient.

gpderetta · 2024-01-21T01:45:27 1705801527

Forget about the last bit re atomics, apparently the algorithm is indeed resistant to teared writes.

contravariant · 2024-01-21T02:08:39 1705802919

I tried to think how this might go wrong. The part that I can see going wrong is with the 'choosing' array, an optimizer may choose to combine the two writes and just write 'false' immediately. I feel like that breaks a lot of reasonable guarantees, but let's run with it.

So basically the problem is that two processes might pick the same number, see the other process with an (outdated) lower number and choosing=false, and go ahead into the critical section.

So basically you need a guarantee that the write to the choosing array lands before picking a number. You can simplify things a bit by simply setting 'number' high instead of using 2 arrays, but still.

exDM69 · 2024-01-21T09:50:20 1705830620

Even if the algorithm "works", it does not guard the critical section correctly. The code that is supposed to go between lock() and unlock() may be executed before the lock is acquired or after it is released because CPU is allowed to reorder anything as it wishes in the absence of memory barriers.

Other replies here go further and suggest that the implementation of the algorithm is also broken without barriers, which is probably true as well.

arter4 · 2024-01-20T19:17:24 1705778244

>You need an acquire memory barrier after acquiring a lock to make sure that any loads and stores inside the critical section do not get reordered to before the lock was grabbed.

What do you mean? That loads and stores inside a critical section could, in fact, be reordered by the CPU and be executed before grabbing a lock?

exDM69 · 2024-01-20T22:39:55 1705790395

Exactly. A load from or store to memory protected by a mutex could be executed before the store that sets the mutex as locked. Unless there is an appropriate kind of memory barrier in between.

arter4 · 2024-01-21T09:52:51 1705830771

So even the Lamport bakery algorithm requires memory barriers, because modern processors can reorder instructions? Does this mean that modern CPU architectures necessarily require hardware-supported atomic instructions for proper concurrency?

Also, is there any evidence as to how common this memory re-ordering is?

exDM69 · 2024-01-21T10:18:03 1705832283

> So even the Lamport bakery algorithm requires memory barriers, because modern processors can reorder instructions?

Yes, that is correct. Memory ordering needs to be enforced with barriers for correctness.

> Does this mean that modern CPU architectures necessarily require hardware-supported atomic instructions for proper concurrency?

A non-tearing word size write and appropriate memory barriers are enough. Atomic read modify write instructions like compare and swap are useful but not strictly necessary. I think it would be possible to implement the Lamport's Bakery algorithm from this article correctly with a lot of barriers, but its performance would be awful.

> Also, is there any evidence as to how common this memory re-ordering is?

It is very common (almost all loads and stores get grouped by the CPU) but its effect is so small that it rarely manifests in practice. A reordering would only move a load or store by a few dozen nanoseconds in wall clock time.

I have some practical experience from stress testing high contention memory ordering sensitive code. It would often take a minutes of CPU time to make a bug manifest. That's somewhere between one in a million to one in a billion odds of an unfortunate memory ordering ruining your day.

eigenvalue · 2024-01-20T18:58:31 1705777111

One additional clarification is useful to make, which is WHY the algorithm is robust to non-atomic reads and writes.

Basically, if one process is reading a value while it is being written to by another process, then it’s possible that the reading process will not just get an old value, but it might get a complete garbage value caused by a partially written or garbled memory value. But even then it still works, and doesn’t even require any kind of “retry” logic, for this reason:

Recall that when a process wants to enter the critical section, it first looks at the numbers (or tickets) assigned to other processes to determine its own number. The process chooses a number that is one higher than the highest number it observes.

Now, if a process reads a garbage value due to a concurrent write, it typically just ignores this value. This is because the garbage value is unlikely to be higher than the maximum valid number it can read from other processes. As a result, this garbage value does not impact the process’s decision about what number to take for itself.

The key is that the process bases its decision on the valid numbers it can read. The algorithm does not depend on every read being perfect. If a process cannot determine a clear maximum due to a garbage value, it bases its decision on the highest valid number it can discern.

The process then enters a waiting phase where it checks if it’s its turn to enter the critical section, based on its number and others’. Again, the presence of a garbage value in one of the reads does not impede this process, as the algorithm only requires a relative ordering based on numbers.

Despite the possibility of reading invalid values, the bakery algorithm maintains safety (no two processes are in the critical section simultaneously) and liveness (every process eventually gets its turn). This is because its core logic - ordering based on numbers and waiting for its turn - remains intact.

vlovich123 · 2024-01-20T20:24:02 1705782242

But stochastically, it’s possible you’re given a legal value that violates mutual exclusion, no?

I’m very fuzzy as to how the busy wait loop that checks all sibling threads for their lock counter number can distinguish this scenario. Cause you could be TID 1, and obtain ticket X even though TID 2 got there first and entered mutual exclusion because at the time when it checked you weren’t even assigned a ticket and entered mutual exclusion… I’m missing something subtle about the algorithm.

eigenvalue · 2024-01-20T20:37:25 1705783045

The concern is that if a process reads a partially updated ticket number from another process, it might end up with a ticket number that falsely represents its position in the queue. The algorithm is designed to handle such cases:

If the read value is too low (because of a partial update), the process will wait longer than necessary, which doesn’t violate mutual exclusion but may affect fairness temporarily.

If the read value is too high, it doesn’t allow the reading process to jump ahead in the queue unfairly.

The key point is that even if a process reads a partially updated value, it either waits longer than necessary or gets a number that still respects the ordering. This is because the algorithm uses relative ordering (i.e., based on comparing ticket numbers) rather than absolute values.

vlovich123 · 2024-01-21T01:29:49 1705800589

Yeah something wasn’t sitting well with me about this reasoning and Wikipedia says this:

> Lamport's bakery algorithm assumes a sequential consistency memory model. Few, if any, languages or multi-core processors implement such a memory model. Therefore, correct implementation of the algorithm typically requires inserting fences to inhibit reordering

So yeah, you do need fences because a multi processor superscalar CPU is going to reorder the entering and ticket variables and break the exclusion rules. But then if you have fences, don’t you have the ability to implement a simpler mutual exclusion mechanism by definition?

ur-whale · 2024-01-20T18:36:24 1705775784

> The Bakery Algorithm's unique feature is its ability to ensure mutual exclusion in concurrent programming without requiring atomic reads and writes.

This is the key, and somewhat surprising. Had someone told me critical section was possible without hardware support (atomic reads and writes), my knee-jerk reaction would have been "not possible".

valray · 2024-01-20T19:08:46 1705777726

If I understand this correctly, the algorithm depends on the underlying platform to ensure correct ordering. Two processes may each get the same bakery ticket number. The conflict is resolved by the process with the lower process id being allowed to go first. This process ID is in effect a ticket number issued by the underlying hardware. If the processes were running in a distributed environment, there would not be a common underlying platform to resolve this conflict.

eigenvalue · 2024-01-20T19:10:47 1705777847

In that context you could simply pick another thing as the tie breaker, like the IP address expressed as an integer (ie, remove the periods and treat like a number) being lower.

valray · 2024-01-20T19:30:13 1705779013

Ah, yes, that would work.

But then there is reliance on an underlying platform to resolve the conflict by providing a globally unique identifier. Instead of process ID, there is the IP address mechanism.

So it's turtles all the way down, if I understand correctly.

eigenvalue · 2024-01-20T19:35:54 1705779354

Not really. Each process could, at the beginning, generate a random nonce and broadcast it and thereafter use that as its tiebreaker number. The point is to just have a consistent way to break ties that is convenient to implement in whatever domain you’re working in. And in a networked domain, without a valid IP address, the worker isn’t going to be doing much in the process anyway since it wouldn’t be able to communicate at all.

somat · 2024-01-21T01:34:50 1705800890

But it's not in a distributed environment, it is a shared memory environment. That is the whole point. If it were a distributed environment the processes would not have to worry about smashing each others memory because they would not share memory in the first place.

Dylan16807 · 2024-01-21T01:54:31 1705802071

Most shared memory environments are distributed, at least to some extent. The process ID is not an issue, you can do the equivalent in a distributed environment just fine. But the algorithm depends on strict memory ordering, and that's much harder to provide than atomic writes. Most shared memory systems don't offer it.

brchr · 2024-01-20T21:15:29 1705785329

Leslie Lamport gives a short oral history of the Bakery Algorithm in Episode 3 of "Algorithms at Work": https://www.audible.com/pd/Episode-3-Concurrency-How-to-Coor...

otrack · 2024-01-21T22:47:05 1705877225

On th topic, I would advice reading the boulangerie algorithm. An improved version of the original solution:

https://lamport.azurewebsites.net/tla/boulangerie.html