What do you think are the next steps for a next generation event loop?
I've been experimenting with barriers/phasers, LMAX Disruptors and my own lock free algorithms.
I think some form of multithreaded structured concurrency with coroutines and io_uring.
I've been experimenting with decoupling the making sending and recv independently parallel with multiple io_urings "split parallel io" - so you can process incoming traffic separately from the stream that generates data to send. Generating sends is unblocked by receive parsing and vice versa.
On 5.1.5 Summary of Benchmarking Results (page 44)
> Of the three different applications and frameworks, DPDK performs best in all aspects con-
cerning throughput, packet loss, packet rate, and latency. The fastest throughput of DPDK
was measured at about 25 Gbit/s and the highest packet rate was measured at about 9 mil-
lion. The packet loss for DPDK stays under 10% most of the time, but for packet sizes 64
bytes and 128 bytes, and for transmission rates of 32% and over, the packet loss reaches a
maximum of 60%. Latency stays at around 12 μs for all sizes and transmission rates under
32% and reaches a maximum latency of 1 ms for packets of size 1518 bytes with transmission
rates of 64% and above.
> Based on these results, it was determined that DPDK can optimally handle transmission
rates up to around 64 bytes, above rate 64% performance increases are non-existent while
packet loss and latency increase.
> io_uring had a maximum throughput of 5.0 Gbit/s and was achieved at a transmission
rate of 16% or higher when the packet size was 1518 bytes. The packet loss was significant,
especially for transmission rates over 16%, and when packet size was below 1280 bytes. Gen-
erally, the packet loss decreased when packet sizes increased for all different transmission
rates. The packet rate reached a maximum of approximately 460,000 packets per second. For
higher transmission rates and for larger packet sizes, the packet rate decreased. This reached
a minimum of around 40,000 packets per second for a transmission rate of 1%. The latency
of io_uring is highest at size 1518 and transmission rate 100% with a latency of around 1.3
ms. For lower transmission rates under 64%, the latency decreases when packet size increase,
reaching a minimum of around 20 to 30 μs.
> The results of running io_uring at different transmission rates show that io_uring reaches
its best performance on our system at around transmission rate 16%. Above rate 16% there
are no improvements in performance and latency and packet loss increase.
Ok 25Gbps vs 5Gbps seems like a huge difference, specially since io_uring was having higher packet loss as well
What do you think are the next steps for a next generation event loop?
I've been experimenting with barriers/phasers, LMAX Disruptors and my own lock free algorithms.
I think some form of multithreaded structured concurrency with coroutines and io_uring.
I've been experimenting with decoupling the making sending and recv independently parallel with multiple io_urings "split parallel io" - so you can process incoming traffic separately from the stream that generates data to send. Generating sends is unblocked by receive parsing and vice versa.
Interested in seastar and reactors.