The idea of avoiding context switches by using memory regions to communicate with the kernel reminds me of FlexSC[0], which was a way to have "async" system calls and "flexible scheduling" of them.
I've never totally followed why AIO is its own set of interfaces instead of a generic async system call mechanism based on something like this. Write a bunch of requests to a page in the form of the registers that would make up the syscal, then wait for one to be completed using a futex or something (where futex(2) remains an actual system call).
[0]: https://www.usenix.org/legacy/events/osdi10/tech/full_papers...