Further grist for the mill about the effectiveness of seccomp-style filtering for multitenant Docker, since it's unlikely anyone was filtering out `io_uring_setup`.
People can do whatever they want with seccomp-bpf obviously, but is it really that uncommon to use it for whitelisting? As for kernel vulnerabilities being a weakness of sandboxing in general, if anyone still doesn’t understand that by now it must be willful and I don’t know if they can be helped.
No matter how you mask off attack surface for the kernel, you're not super likely to want to disable io_uring, is the point I'm making. It's easy to find recent threads here with people sticking up for shared-kernel multitenant isolation.
(Be forewarned that I'm talking my book a bit here, since we have a commercial thingy built on multitenant VMM isolation).
BTW while on the topic, what do you think about having a heavy host kernel with a guest vmm attached to the network with a hardened firecracker and a dedicated network interface. Would you feel it's 'better' than shared kernel/os + namespaces? Or is it 'smallest hardened root hypervisor or no go'. Not sure I'm making sense...
The heavyweight host (which is the normal state of affairs) is problematic attack surface; moving the workload into a hardened VMM on that improves security regardless.