Hacker News new | past | comments | ask | show | jobs | submit login
Recursive (Re-Entrant) Locks (2013) (stephencleary.com)
63 points by archagon on Feb 18, 2023 | hide | past | favorite | 32 comments



Posix recursive locks were invented as a dare:

https://groups.google.com/g/comp.programming.threads/c/tcrTK...


Plaintext link now that Google Groups disables “view source” for any reason and no reason: http://al.howardknight.net/?ID=167673665600 (mid:uIpie.5556$1F4.5083@news.cpqcorp.net).


Thanks for linking this, one of the most informative things I've read in a while


"We should have called locks "bottlenecks" "


The refactoring to C_UnderLock is straightforward but the name alone gives away the problem: you have moved the burden to the developer rather than guaranteeing it at runtime. A private method with a screaming comment above it helps somewhat of course, but if there is something I have learned as a developer is that given a couple of decades, any foot will be shot.

lock(something) when recursive will do one or two things: either aquire the lock or simply validate that it’s held.

Is there a different pattern that simply allows validating in C_UnderLock that the lock is in fact held?

so in A and B you do lock(_mutex) but in C you do assert_lock(_mutex) which either panics or continues? What would that look like?


> so in A and B you do lock(_mutex) but in C you do assert_lock(_mutex) which either panics or continues? What would that look like?

Make "lock being held" into a resource, which can then be passed down to callees to assert (at compile-time) that a lock is being held. Even better if the lock itself is associated with (and lends on locking) values, then just having access to the protected resource indicates that you hold the lock.

Obviously without language support for ownership and borrowing this has limited safety as you can stash a lock token and reuse it later, despite not holding the lock. Then again, even in Rust if you really wanted to fuck someone's day you could probably `transmute` garbage into a lock token.


In C++ this can be done by having functions that need to be called with the lock held, but won't take the lock themselves, take the lock guard as a parameter.

There are also patterns like the synchronized wrapper were the lock is bolted on transparently and all the functions in a class can call each other without worrying about the lock.


You can do it with the type system, although in C# it is a bit clunky.

For example let's say you have a class Resource that needs to be locked and a method Foo(Resource) that needs to operate on the resource under a lock.

You could change Foo(Resource) to Foo(LockedResource). LockedResource could be a class that is defined using a lock object and a resource and takes the lock or validates that lock is taken in the constructor.

Then by making the object disposable the release can be handled as well.

This way the type system handles the validation.

This is kinda like a budget version of having the type system support locking and concurrency like Rust does.


In f# this would be even easier. (I wrote some capabilities based code before (for example, this part is only allowed read accesses to the DB)) When I tried something similar in c# I almost despaired.


Disposable (unlike a Rust borrow) doesn’t give much in terms of guarantees though. You aren’t forced to dispose at method boundaries. But perhaps in C# you could use some locking token and make it one of the types that can’t leave the stack?


> so in A and B you do lock(_mutex) but in C you do assert_lock(_mutex) which either panics or continues? What would that look like?

What's wrong with doing exactly what you said? If somebody ends up misusing C_UnderLock, their program will crash, and they can go fix their bug.


I’m wondering if there is such a primitive like assert_lock() or similar, which will let me do that.



Looks like

if (!Monitor.IsEntered(_x)) throw new InvalidOperation.

Will do the trick in C#


The big problem I have with locks is under high use, they slow parallel code down MORE than if you ran it serially. This is because of "false sharing" when one core grabs the lock, it clears the cache line of every other core that will grab the lock, and they have to fetch from the main memory to read the lock. Fetching from main memory is very slow. I use compare-and-swap in those cases, and almost every lock I've used I've converted to compare-and-swap. I'd like to explore "restartable sequences" for parallel data structures (https://tinyurl.com/vzh3523y) but AFAIK this is only available on linux and we develop on MacOS and Linux.


What you describe (multiple cores sharing the same lock) sounds like true sharing, not false sharing. False sharing means cache line contention is when cores are using separate structures (i.e. not actually shared) which happen to be on the same cache line.

Compare and swap still typically requires the cache line to bounce around between cores, so if that’s the primary cost of locks for you it seems like compare and swap doesn’t really fix the problem of frequent synchronization between cores.


Ah - thank you. "True sharing" makes way more sense to describe the situation. The general problem with locks I've seen again and again - and maybe CAS doesn't improve it, but I thought it did. It is challenging to profile this stuff carefully. I've been disappointed many times by my attempts at multithreaded programming when locks are involved. I've implemented a multithreaded Common Lisp programming environment (https://github.com/clasp-developers/clasp.git) that interoperates with C++. Two locks that cause me massive headaches are (1) in the unwinder when we do a lot of stack unwinding in multiple threads and (2) in the memory manager when we do a lot of memory allocation in multiple threads. Parallel code runs multiple times slower than serial code. In another situation where we do parallel non-linear optimization without allocating memory, we can get 20 CPU working in parallel with no problems.


If false sharing is truly the bottleneck in your code, can't you just make sure the lock reserves the entire cache line? Each core still needs to fetch from L3 cache to grab the lock itself, but that's roughly the same cost as a compare-and-swap.


The concurrent queue code in Java has/had an array the locks were placed in. Now the problem with putting locks in arrays instead of scattering them across the heap is that you get false sharing immediately. So the array is 8 or 16 times as big as it needs to be to get one lock per cache line.


The @Contended annotation will pad the memory layout of fields to fill a cache line and avoid false sharing.


Alright, case study: I want to write a function 'F', that needs to initialise some state by calling 'I' the first time 'F' is called. 'I' itself may call 'F' however; and this needs to be handled specially in 'F' on the second call so that 'I' doesn't get called a second time. Until the first invocation of 'I' has returned (i.e., state has been initialised), no one else but the thread of the first call to 'F' should be allowed to call 'F'. Only the first call to 'F' must call 'I', not any other call. How do I solve this without using recursive mutex?

This may sound like some nightmare architecture, but I ended up in this situation when writing a simple thread-safe malloc override using LD_PRELOAD.


Assuming that 'I' can receive a payload, and that it doesn't need to call 'F' directly, you could write a function 'G' that takes a pointer to the locked state of 'F'. Then, 'F' can lock a non-recursive mutex and call 'G', 'G' can pass that pointer to 'I', and 'I' can pass the pointer back to 'G'. (If 'I' cannot receive a payload, you could also write a variant where 'G' simply assumes that the state is locked, and accesses it directly.) That's basically what this article is suggesting.

Otherwise, the only thing I could think of is a thread-local boolean to tell 'F' that it is being called from 'I'. Which is, as I understand it, basically what a recursive mutex does, by remembering the thread that locked it. But it's still technically a different construct. The tricky part of the problem under this condition is that only the recursive call of 'F' from 'I' can proceed, so somehow that call must be distinguished from an identical call to 'F' from a different thread.

Overall, things are much trickier, especially in multithreaded code, when you can't directly thread state through your functions. If there's one point I'll give in favor of the FP philosophy, it's that.


Have both F and I call a private F_unlocked.

I assume the issue is you can't (easily) change I (for example F is malloc). Then any solution in practice would look like a recursive mutex.

That's alright, recursive mutexes are bad, but one legitimate use case is retrofitting thread safety on top on a non thread safe interface (infamously xlib was made thread safe by just bolting on a global recursive mutex).


Isn't a method worrying about whether it needs a lock or not and being coded appropriately way better (since the writer of that method/function thinks through it when writing it) than relying on someone who is maintaining code to think through the use-cases and call the appropriate non-recursive version of a child method/function depending on whether what they are writing has already taken the lock or not? In an ideal world, yeah, the maintainer would carefully worry about every line of code; but in the real world where the maintainer is enhancing functionality or fixing issues, it's too much to expect diligence on the part of the maintainer and to err on the side of safety at the cost of recursively taking locks. Recursively taking locks doesn't cause real problems except when there are other locks taken in between which might cause dead-locks with other parts of the code that take the same "other" locks without the recursive lock being taken.


Here's a situation I ran into recently:

* ClassA conforms to a protocol that allows it to return its description as a string

* The protocol includes a regular `description` function as well as an optional `debugDescription` function

* An object conforming to this protocol can be passed to a global logging function, which then calls `description` or `debugDescription` depending on whether the debugger is attached or not

* ClassA's state is protected by a lock

* ClassA's description implementation goes through its state and converts it to a string representation

* Accordingly, ClassA's description implementation needs to also be locked

* The global logging function can be called from within ClassA's state lock, and can take the current instance of ClassA as a parameter

The recommended approach of making `description` call `description_underLock` doesn't work, because the logging function decides whether to invoke `description` or `debugDescription` depending on the context. In other words, if the logging function is called from inside the state lock with `self` as an argument, then we'll get a deadlock if we don't manually invoke `description_underLock`; but if we directly give the logging function the string from `description_underLock` or `debugDescription_underLock` instead of `self`, we now have to replicate the internal logic of the logging function in our layer. Alternatively, we can impose conditions such as "never log `self` from within the state lock," but that feels arbitrary and restrictive.

I think recursive locks make sense in this case, and probably other cases where protocols and/or indirection is involved. We just have to make sure that our description implementation is dead simple (mostly a "leaf node" function) and doesn't have side effects.


Reentrant locks fail hard if you even have to lock more than one object. Backing off and dropping all the locks is...fraught.


> The non-recursive approach would first refactor A into a new private method C that is clearly documented (often with a naming convention as well) to assume that it is called while under lock. Then both A and B call C while holding the lock.

I'm sorry but that's literally advocating for spaghetti code. If the author really wrote a lot of real world code, it doesn't look like it was clean or well structured.


The blog post explains why it's the opposite, not using recursive locks kills two birds with a stone because you know which invariants are enforced at locking boundaries. Therefore, functions that take the lock know exactly what invariants to expect, those that are called with the lock taken don't.

It's the opposite of spaghetti code.


Yeah it seems pretty disappointing to be advocating for invariants protected only by thoughts and prayers (i.e. comments, naming convention) in a statically typed[1] language like C#.

This bit me in Rust when I was goofing off / exploring with one of the AOC'22 puzzles back in December. One of Rust's selling points is supposedly "fearless concurrency" but it turns out deadlocking is trivial to do: did you know a mutex is not re-entrant in rust? Just acquire the lock again in a nested function and blam, at runtime when you reach that codepath you'll learn you have a deadlock.

I'll probably never stop trying new approaches, I love exploring languages for how they can open your eyes but so far Clojure's approach to this problem - by that I mean immutable everything by default and then using atom's, I'm not talking about the richer STM stuff, I haven't played with that yet - is peak concurrency handling in the context of writing business applications.

[1] As I wrote this the 'dynamic' type jumped into my mind, would its presence mean its technically wrong to say c# is statically typed? It's not "only" statically typed I guess is what I'm uncomfortable with.


How is spaghetti code to refactor some common aspect into a reusable method with clearly defined boundaries?


Indeed. I worked on a non-trivial project (~300kLOC C++) which was heavily multi-threaded and relied on recursive locks. It had lots of issues with not taking locks at the right time and so on.

I took some time to refactor it all to not use recursive locks, instead having a clean boundary between internal (non-locking) methods and external methods which did the locking and called the internal ones.

Not only did our locking issues go away immediately, it lead to a much simpler developer experience since there was never any question if this new function needed to lock or not etc.

I've not written that heavy multi-threaded code since, but my takeaway from the experience was that requiring recursive locks was a sign of poor design.


I forgot to mention performance increased a fair bit as well, since a lot less locking was done.

Instead of one "public" function call resulting in locking the same lock recursively 4-5 times as the "public" function called other functions and so on, a single locking operation was done at the boundary.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: