Hacker News new | past | comments | ask | show | jobs | submit login

Right, the client->compositor buffer is shared.

But unless you are running a full screen app, at some point compositor will have to take all those pixels from the window's backing buffer and blit them into framebuffer, moving millions of pixels every frame.

This is something that really needs video acceleration. It works today, but it would not be feasible in the past, where some people still had to rely on VESA card interfaces.




I would assume that there is a proper damage protocol in place that would move a "millions of pixels" only when there is an actual change.

The only way to avoid that is to give applications direct access to the front buffer and use a stecil buffer or some form of masks to prevent applications from drawing above each other.


Modern graphics hardware can composite multiple buffers at scan-out time (which does not consume extra memory bandwidth), but it's not clear to me whether anyone besides perhaps DWM uses that.


Multiple means about 4 (that's the number for Skylake & Kabylake gen Intel), and one of them is cursor plane. There might be also misc limitation regarding overlaps, so it might not be generally usable. Android in the earlier years used it for the notification area, as it basically split the screen vertically when the notification shade was moved.

Additionally, these buffers can be scaled at scan-out time. So what it is used for is emulating lower resolutions for Xrandr clients under XWayland (Wayland doesn't allow to switch resolutions to random apps).

Under MacOS, scaling at scan-out time is used for fractionally scaling of the entire framebuffer without using GPU.


I was under the impression that it was mostly the mobile GPUs that supported blending a large number of planes at scanout time. I've written software for random ARM SoCs where there were a dozen planes or so that you had to program the ordering and bounds of. The first was typically the default framebuffer, another was the cursor, two were the outputs of the hardware video decoders, and the rest were up to the application developer to use.

The big desktop GPUs seem to only have the standard framebuffer, a cursor plane, and a small number (<= 2) of overlay planes. It seems that the general consensus is that they tend to have such a ridiculous amount of horsepower that rendering everything into an output buffer and displaying that won't even kick the GPU out of its idle power state.

That being said, I had a few hours of fun hacking glxgears and glmark2 to render into the cursor plane on Wayland.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: