QEMU AioContext removal and how it was done

bonzini · on Jan 3, 2024

What Stefan (the author) calls my vision, I call my arrogance. :)

Some bits of my plan, especially with respect to how the async runtime in QEMU is designed, were good. However I had vastly underestimated the complexity of one step. The various disk images form a graph that can change on the fly, for example if you make a live snapshot of a VM disk. The method I had thought of to handle changes to the graph would have added a lot of technical debt. Fortunately it was nacked by the maintainer Kevin Wolf and replaced with something better.

For more details on both my plan and what was actually done, see https://kvm-forum.qemu.org/2023/Multiqueue_in_the_block_laye... (slides, also linked from the post) or https://youtu.be/Ubped0PgvZI?si=IsckfZ7uDNYJNp_y (video).

The important thing, at every step, was being committed to getting it done. Some of the intermediate steps were better in terms of bugs fixed, but worse in terms of complexity because you had to juggle both the "good" locks and the legacy global locks.

epilys · on Jan 3, 2024

Thanks for your work, Paolo!

Question about TSA: did you consider combining it with a formal specification of the locking model e.g. with Alloy? Or other formal method.

bonzini · on Jan 3, 2024

This is really everybody else's work. All I did was some mentoring of Emanuele, who is also the first presenter in the linked video.

We didn't use formal methods, but some small parts of the async runtime were validated with spin. The Promela sources are in the QEMU source repository.

In the end, the locking changes are relatively low tech. The verification part are where the magic happens, and TSA plus my call graph analysis tool vrc are enough for that.

alhirzel · on Jan 3, 2024

I like how short and sweet this summary is; it's impressive to think that years worth of person-hours of labor can be wrapped up like this.

It reminds me of a short version of AOSA: https://aosabook.org/en/

Are there any more write-ups like this about open source software? (Apart from things like LWN)

eadler · on Jan 3, 2024

You'd like the Dolphin Emulator blog.

https://dolphin-emu.org/blog/

alhirzel · on Jan 3, 2024

Thank you!

geertj · on Jan 3, 2024

Rather that adding more fine grained locking, would there have been another way? In particular I wonder of the problem domain of QEmu could have benefited from a thread per core architecture. Do guest OSs try to pin high tps devices to singular cores these days, and if so could than provide a natural way to shard the IO workload?

stefanha · on Jan 3, 2024

I didn't go into the various ways in which AioContext lock was replaced in the article. You're right, sometimes new fine-grained locks weren't necessary.

When there is really only one thread accessing some data then locking isn't needed. That's what was done for the SCSI emulation layer where request processing only happens in 1 thread. Here is a new function that was introduced to schedule work on the thread that runs SCSI emulation (a rare operation that is not performance critical and allows the rest of the code to avoid locks): https://gitlab.com/qemu-project/qemu/-/blob/master/hw/scsi/s...

QEMU's IOThreads allow the user to configure the threads and get something similar to thread per core architecture. But if 1 thread becomes a bottleneck, then some form of thread synchronization is needed again even with thread per core architecture. Some problems can be parallelized and they work well with thread per core.