We don't just paste pixels on the screen, we composite them. This involves alpha blending, projecting the scene onto an orthographic output projection, and more. Even in two dimensions, OpenGL is very useful here. On systems without suitable hardware, you should be able to use OpenGL in software (via mesa) and it will still be very fast here.
Ah, thanks for that clarification; I'm pretty ignorant about graphics. Good to know that Mesa will be fast in this case. I tried running full-blown GNOME on an ARM board (Wandboard) without proper driver support for the GPU, and GNOME Shell used ~40% of a CPU at idle. I'm guessing that something like Sway, even though it uses OpenGL, is less demanding.
Even with software rendering, a wlroots compositor should use 0% when idle thanks to damage tracking (ie. only redrawing parts of the screen that changed).