Haiku's (Kernel) Condition Variables API: Design and Implementation

wbl · on April 25, 2023

So what I not understand about this design is how one uses it. In the classic condvar application one waits atomically with the lock because otherwise the other thread may deliver the signal before the wait happens, leading to a deadlock.

Here it seems there is no atomicity. How does this get used?

waddlesplash · on April 25, 2023

You are correct that deadlocks can be caused by signals occurring before the wait starts, and thus some sort of mechanism to ensure this does not happen is needed, but I explained as much in the article. The point of this API is that the atomic lock-switch is not restricted to just locks; and further, in some situations, no lock-switch is needed at all.

The former is simple enough, and directly equivalent to what FreeBSD's API allows you to do (and what Haiku's API provides as "convenience methods", as the article notes), and is "atomic" -- it just pushes the atomicity up a level and lets the programmer control it more directly:

    ConditionVariableEntry entry;
    gSomeConditionVariable->Add(&entry);
    mutex_unlock(&someLock);
    /* (I could unlock more locks here, if needed, I'm not limited to 1) */
    entry.Wait();
    mutex_lock(&someLock);

The latter case is the more interesting and unique one, and the article references one place it is actually used in practice (team/process creation), though it doesn't give a pseudocode example, so let me try to give one here:

    ConditionVariableEntry entry;
    someLongRunningOperation.conditionVariable->Add(&entry);
    someLongRunningOperation.start();
    /* (I can do whatever I want here, no need to Wait immediately) */
    entry.Wait();

Since this "long-running operation" is not even started until after the local Entry has been Add'ed to the Variable, there's no possible way for this operation to complete and signal before we have started 'waiting' (because, even if the Wait() call is at the end, it's the Add() call that counts.)

tialaramex · on April 25, 2023

> Since this "long-running operation" is not even started until after the local Entry has been Add'ed to the Variable, there's no possible way for this operation to complete and signal before we have started 'waiting'

What you've got there is a "Happens before" constraint. Your "no possible way" is assuming Sequential Consistency, but I assure you that your CPU does not in fact provide Sequentially Consistent ordering of individual CPU operations across multiple cores.

You need some way to reliably Order the events. It's possible that the vaguely named "atomic" operations in Haiku provide you with adequate ordering, but who knows since they don't specify.

waddlesplash · on April 25, 2023

> Your "no possible way" is assuming Sequential Consistency,

I am assuming events cannot finish before they are started, yes! I am pretty sure that even the (in)famous DEC Alpha could not possibly have time-traveling results.

Once again: there is a distinction between the API itself, and the API's implementation. Once the "Add" function returns, the "Entry" is now waiting on the "Variable". How the "Add" function ensures that is entirely abstracted away from the consumers of the API.

The implementation of "Add", at least, is synchronized using a very standard spinlock, just like similar operations are for mutexes, semaphores, etc. not just on Haiku, but on all the BSDs and Linux. We don't even need to think about CPU operations ordering for that, the spinlock takes care of it for us.

I am pretty sure Linux (and every other operating system, ever) would be in all kinds of trouble if you could read stale values from memory protected by spinlocks, so I don't know why you are casting doubt on these things.

> It's possible that the vaguely named "atomic" operations in Haiku provide you with adequate ordering

Hey, you already wrote a comment elsewhere in this thread about this point, and I replied to your comment with a bunch of information proving definitively: they do, in fact, provide that ordering. But in the cases I described in the parent comment here, it does not matter, because here we are talking about API consumption, not implementation.

tialaramex · on April 25, 2023

> I am assuming events cannot finish before they are started, yes! I am pretty sure that even the (in)famous DEC Alpha could not possibly have time-traveling results.

You seem to think this is a joke, but it isn't. Obviously from the CPU's point of view there is no "time travel" but that's cold comfort for users. The Alpha doesn't promise that there's any coherent ordering at all unless you've imposed one, which the sequentially consistent atomics Haiku is apparently asking for do.

To get Sequential Consistency you're paying a very high price on such weakly ordered architectures. On x86 (and x86-64) the relative price is smaller, because everybody is paying for Acquire-release basically all the time, but it's still substantial on larger modern systems.

This price isn't unique to Haiku, but the choice to pay it (almost) everywhere is, at least in terms of operating systems people would be using today.

waddlesplash · on April 25, 2023

> The Alpha doesn't promise that there's any coherent ordering at all unless you've imposed one

Yes. But why did you bring this up in this thread about API usage? It's the implementation's problem to make this work out. "Add()" should be the equivalent of a full memory barrier (at least for the condition variable's memory) no matter how that happens internally.

> This price isn't unique to Haiku, but the choice to pay it (almost) everywhere is, at least in terms of operating systems people would be using today.

Haiku is, in many ways, poorly optimized when compared on such details with Linux or FreeBSD, all the developers know this fact, and we make no secret of it. If this was your entire point in the first place, why not just say so?

By the way, as far as I can tell, OpenBSD's kernel atomics (sys/atomic.h) do not have different versions for different memory orderings; in fact they use the older-style GCC builtins and not the C++11-style ones, so they are also using sequential consistency everywhere they use atomics. Is OpenBSD not a "modern operating system people would be using today"?

tialaramex · on April 26, 2023

I guess, ultimately if the argument is we don't care about performance or capability then, you know, fine, although if you don't care it's weird to do two rewrites focused on performance and write a blog post to highlight the work.

AIUI OpenBSD doesn't lean exclusively on those primitives, you might notice for example it has futexes these days. On the other hand I also don't know anybody who runs OpenBSD.

saagarjha · on April 25, 2023

FWIW your response comes across to me as quite rude and patronizing. I assume you didn’t mean this but the fact that you’ve decided to capitalize terms as if you have some sort of true definition for them, plus you dragging this conversation towards the specific complaint you had in another comment (Haiku might use sequential consistency when it doesn’t need to…although it’s not even clear if you’ve done the appropriate work to verify this before popping off). Maybe consider this for next time you comment? If you need an example that demonstrates curiosity rather than smugness look at the first comment in this thread.

tialaramex · on April 25, 2023

The capitalization reflects terms of art in the C++ 11 memory model. So, not my "true definition" but the one provided by the language used, in this case C++. This matters because Haiku is written in languages which use this model (or in some cases languages which don't specify any model but in practice conform to the C++ 11 model)

If you're Linus Torvalds you can insist compiler vendors adjust things as you prefer to some extent, thus the Linux memory model isn't quite the C++ 11 memory model despite the fact that GCC is used to compile Linux and GCC notionally confirms to C11 (and thus has the C++ 11 memory model), much of what Linux does is not conforming to the ISO document. Haiku can't expect the same benefit of the doubt.

The question of whether Haiku needs sequential consistency where it has it is vexed. Hyrum's law applies. The least scary approach might be to follow C++ and provide sequential consistency by default with an opt-out, then introduce use of the opt-out carefully.

saagarjha · on April 26, 2023

I'll respond to your comment anyways but I do want to remind you that I think it's more a reflection of you talking about what you want to talk about rather than being particularly relevant to this thread.

Operating systems are often written in C or C++. Both of these share a formal memory model to provide a set of useful semantics for well-formed programs. For most code the choice to adhere to this model is the right one. In fact, in many cases it can be appropriate to pick more "heavyweight" constructs despite the fact that they can be a bit slower because they are easier to reason about. LLVM's libc used sequential consistency for its shared_ptr implementation for quite a while until it was updated to use a more efficient set of primitives.

On the flip side, sometimes it is not appropriate to use this memory model. Linux has its own because it predates the C(++)11 memory model. Other good reasons to use your own model can be if the standard one doesn't efficiently map to what you are trying to do on the hardware you're on, or if the operations you need to perform are not encoded in the standard. These kinds of things are actually quite common in operating systems, which is why most of them do not strictly conform: C11 has no concept if "this region of code runs with interrupts disabled" or "I need a full serializing barrier for device memory". Haiku chooses to do its own thing, which may or may not be appropriate for its use case. Coming in and claiming immediately that whatever it's doing must be bad is inappropriate and lacks context.

wbl · on April 26, 2023

Huh? Happens before subsumes in thread order. Any events that are caused by long running operation must happen after the addition of the event to the condvar.

wbl · on April 25, 2023

Ah so the add starts the period during which notifications will be caught ahead of the wait call.

doublerabbit · on April 25, 2023

I don't even know what this is but if it allows another operating system to co-exist other than the regular three.

Cool!, great work.

tialaramex · on April 25, 2023

BeOS, which is the foundation for most of this work, is from the 1990s. But in the 1990s C++ didn't yet have a clearly defined Memory Model for the purposes of concurrent algorithms and BeOS doesn't try to come up with one.

In the subsequent 25+ years, C++ developed a Memory Model which distinguishes ordering for these atomic operations. Because BeOS pre-dates that, it doesn't document an ordering, let alone provide a mechanism to pick one. Unfortunately different concurrent algorithms have different ordering requirements, if what you're given is looser the algorithms may malfunction, if it's stricter the performance may be hampered.

I reckon that before trying to claim you've innovated here it might be a good sense check to compare baseline. What exactly are the Haiku atomic operations, in terms of the C++ 11 Memory Model ? If you were Linux (or to some extent Windows) you could trail blaze here, because you're innovating before 2011, you're inventing the model - but this is 2023, so Haiku is off in the jungle on its own and everybody else has a map now, figure out where you are on that map first.

waddlesplash · on April 25, 2023

Haiku uses the System V ABI (mostly.) So, we're doing the same things Linux and the BSDs are here, simply by using GCC or Clang without any special tuning here.

> I reckon that before trying to claim you've innovated here it might be a good sense check to compare baseline.

The baseline is "what are other operating systems' kernel- and userland-level condition variables APIs?" And none of the ones I looked at had anything like what Haiku has here, they all have something which is the more classical "lock-switched condvars" just like POSIX has.

The API itself does not depend on what memory ordering semantics are any more than a "mutex_lock()" API does. The implementation will be somewhat contingent on it, of course, but those are two separate matters.

> What exactly are the Haiku atomic operations, in terms of the C++ 11 Memory Model?

The atomic_() functions are (on most architectures, x86 included) implemented using GCC/Clang's __atomic_* functions, with various __ATOMIC_* orderings chosen as appropriate. You can see them defined in the system header here: https://github.com/haiku/haiku/blob/master/headers/os/suppor...

> because you're innovating before 2011, you're inventing the model

No, not really? GCC has had atomic builtins since at least 4.1.0 in 2006. The documentation (https://gcc.gnu.org/onlinedocs/gcc-4.1.0/gcc/Atomic-Builtins...) says: "In most cases, these builtins are considered a full barrier. That is, no memory operand will be moved across the operation, either forward or backward." -- which is basically equivalent to today's __ATOMIC_SEQ_CST.

> so Haiku is off in the jungle on its own and everybody else has a map now, figure out where you are on that map first.

We already did that years ago. The atomic_*() functions linked above in SupportDefs.h have been implemented using the C++11-standard GCC builtins since 2014, and the older __sync_* builtins for years before that.

Anyway, the algorithm described in this article, even if Haiku's atomic functions were not 1:1 with C++11-standard definitions (which they are, as noted above), is clearly portable to other OS kernels. So I am not sure what basis your comment has, regardless.

tialaramex · on April 25, 2023

> The atomic_() functions are (on most architectures, x86 included) implemented using GCC/Clang's __atomic_* functions, with various __ATOMIC_* orderings chosen as appropriate.

There is no one-size-fits-all choice here, so "as appropriate" is almost definitionally a mistake. It turns out that "as appropriate" for Haiku means Sequential Consistency except on simple load and store, which have Acquire-release semantics instead for some (unexplained in my brief exploration) reason.

Still that does answer the question of why these structures seem to work for Haiku despite the lack of what you'd ordinarily expect in terms of ordering guarantees, it's eating a Fence for each Entry and a Fence for each mutex operation. It's a steeplechase!