Sure, they're not looking for headlines, but I wouldn't call it "honest". A chart with a linear scale that shows the full range from 0 to max (no baseline that makes small differences look huge) would be far more honest in my opinion.
Skimming this I did not really understand the programming model difference from glommio or tokio-uring. Aside from striving to be cross-platform, how is this significantly different?
It’s frequently used to describe the problem of calling an async function from a sync one or vice versa. There’s other applications of the term though (pure vs impure functions).
I think it just happens to be that we like assigning colors to differentiate things in the mathematical side of CS.
The associated video has quite a cool intro [0], however after that it’s fairly slow and lack details, apart from perf and cross platform characteristics (which are cool).
If you're doing sequential I/O, you will hardly be able to be faster than std library - it is synchronous, goes directly to the operating system, which does read-ahead. 40+ years of OS design was meant for this. io_uring can't beat it either, it's just too ubiquitous of a usecase to not be hyper optimized.
If you do any non-sequential processing, let's say a database, then you can see standard library show its weaknesses - feeding the operating system with multiple requests becomes more efficient than waiting for each request to finish in the synchronous std case.
I've always wondered why completion i/o systems require you to hand them a buffer rather than a buffer pool. Handing them a buffer means that you are keeping a huge amount of memory allocated when you have a huge number of connections.
For TCP, you'd generally want to have a buffer pool and issue one repeating request that takes buffers out from that pool. This is what io_uring allows you to do.
The difficulty comes from APIs ability to expose that buffer pool nicely. I am on the lookout to model this better, but no clear solution that does not make assumptions about the backend at play.
For now, however, with careful design you can skip the allocated buffers and supply a stack reference. So long as you have assurances the stack wont move (Rust's pinning guarantee), you should be able to pass that reference to the io system without problems. Not saying this results in zero copies being made - the backend usually has strict alignment requirements and alike, but hey, we at least get rid of one allocation!