> It is incompatible with non-blocking I/O since your process will be stopped if...

loeg · on Jan 23, 2021

> Yeah, but the same problem occurs in normal memory when the OS has swapped out the page.

I'd argue that swapping is an orthogonal problem which can be solved in a number of ways: disable swap at the OS level, mlock() in the application, maybe others.

mmap is really a bad API for IO — it hides synchronous IO and doesn't produce useful error statuses at access.

> So perhaps non-blocking I/O (and cooperative multitasking) is the problem here.

I'm not sure how non-blocking IO is "the problem." It's something Windows has had forever, and unix-y platforms have wanted for quite a long time. (Long history of poll, epoll, kqueue, aio, and now io_uring.)

amelius · on Jan 23, 2021

> it hides synchronous IO and doesn't produce useful error statuses at access.

You can trap IO errors if necessary. E.g. you can raise signals just like segfaults generate signals.

> I'm not sure how non-blocking IO is "the problem."

The point is that non-blocking IO wants to abstract away the hardware, but the abstraction is leaky. Most programs which use non-blocking IO actualy want to implement multitasking without relying threads. But that turns out to be the wrong approach.

loeg · on Jan 23, 2021

> The point is that non-blocking IO wants to abstract away the hardware, but the abstraction is leaky.

Why do you say it doesn't match hardware? Basically all hardware is asynchronous — submit a request, get a completion interrupt, completion context has some success or failure status. Non-blocking IO is fundamentally a good fit for hardware. It's blocking IO that is a poor abstraction for hardware.

> Most programs which use non-blocking IO actualy want to implement multitasking without relying threads. But that turns out to be the wrong approach.

Why is that the wrong approach? Approximately every high-performance httpd for the last decade or two has used a multitasking, non-blocking network IO model rather than thread-per-request. The overhead of threads is just very high. They would like to use the same model for non-network IO, but Unix and unix-alikes have historically not exposed non-blocking disk IO to applications. io_uring is a step towards a unified non-blocking IO interface for applications, and also very similar to how the operating system interacts with most high-performance devices (i.e., a bunch of queues).

amelius · on Jan 23, 2021

> Why do you say it doesn't match hardware?

Because the CPU itself can block. In this case on memory access. Most (all?) async software assumes the CPU can't block. A modern CPU has a pipelining mechanism, where parts can simply block, waiting for e.g. memory to return. If you want to handle this all nicely, you have to respect the api of this process which happens to go through the OS. So for example, while waiting for your memory page to be loaded, the OS can run another thread (which it can't in the async case because there isn't any other thread).

loeg · on Jan 24, 2021

A CPU stall on L3 miss (100ns?) is orders of magnitude shorter than the kinds of blocking IO we don't want to wait on (10s-100s of µs even for empty-queue NVMe; slower for everything else).

The OS can't run another thread while fulfilling an mmap page fault because it has to actually do the IO to fill the page while taking that trap. And in the async scenario, CPUs and high speed devices can do clever things like snoop DMAs directly into L3 cache, avoiding your L3 miss scenario as well.

The comparison between L3 miss and mmap faults is apples and oranges.