The big difference is Wayland does not do direct access to hardware -- it instead gives access to a backing buffer which is then copied via hardware acceleration. This does make protocol more complicated, but also ensures that that the programs are absolutely contained by their clipping region.
The backing buffers are not so much copied as shared.
A client application allocates backing surfaces (wl-egl does this for you for simple cases) and renders into them. When you call swap buffers, it sends a handle to the client's backing surface to the server via IPC.
Honestly that's one of the most amazing things about wayland, or more specifically the improvements in the Linux graphics stack over the last decade. Offscreen rendering is now implicitly supported, along with being able to share references to buffers on hardware devices between processes, rather than something like ANGLE, where the opengl commands themselves were sent between processes.
But unless you are running a full screen app, at some point compositor will have to take all those pixels from the window's backing buffer and blit them into framebuffer, moving millions of pixels every frame.
This is something that really needs video acceleration. It works today, but it would not be feasible in the past, where some people still had to rely on VESA card interfaces.
I would assume that there is a proper damage protocol in place that would move a "millions of pixels" only when there is an actual change.
The only way to avoid that is to give applications direct access to the front buffer and use a stecil buffer or some form of masks to prevent applications from drawing above each other.
Modern graphics hardware can composite multiple buffers at scan-out time (which does not consume extra memory bandwidth), but it's not clear to me whether anyone besides perhaps DWM uses that.
Multiple means about 4 (that's the number for Skylake & Kabylake gen Intel), and one of them is cursor plane. There might be also misc limitation regarding overlaps, so it might not be generally usable. Android in the earlier years used it for the notification area, as it basically split the screen vertically when the notification shade was moved.
Additionally, these buffers can be scaled at scan-out time. So what it is used for is emulating lower resolutions for Xrandr clients under XWayland (Wayland doesn't allow to switch resolutions to random apps).
Under MacOS, scaling at scan-out time is used for fractionally scaling of the entire framebuffer without using GPU.
I was under the impression that it was mostly the mobile GPUs that supported blending a large number of planes at scanout time. I've written software for random ARM SoCs where there were a dozen planes or so that you had to program the ordering and bounds of. The first was typically the default framebuffer, another was the cursor, two were the outputs of the hardware video decoders, and the rest were up to the application developer to use.
The big desktop GPUs seem to only have the standard framebuffer, a cursor plane, and a small number (<= 2) of overlay planes. It seems that the general consensus is that they tend to have such a ridiculous amount of horsepower that rendering everything into an output buffer and displaying that won't even kick the GPU out of its idle power state.
That being said, I had a few hours of fun hacking glxgears and glmark2 to render into the cursor plane on Wayland.
It only comes 15 years after win32 and macOS which both allowed to share graphics card buffers across processes for a loooong time (check out Spout and Syphon for one of the main usage of this tech)
Exactly. This is why I was so happy about it. It was a pattern I was familiar with on macOS (IOSurface) and I'd known Windows had it.
I'd noticed at the time that Chrome used ANGLE (a framework to present an OpenGL ES interface on top of a different graphics framework at runtime with the option of going interprocess) so that the tab sandbox processes could draw into the GPU sandbox process, I was curious why (at least for macOS) that they weren't rendering into an IOSurface and sharing the handle to the other processes (The answer being to use the lowest common denominator of features across the platforms you want to support).
By comparison, the state of 3D acceleration was sad on Linux. You couldn't touch your fancy GPU without an X server running. Even if wayland itself fails as a successor to Xorg, the work that was done to separate the graphics drivers, 3D stack, and buffer management from being a monolithic plugin to Xorg was well worth the time investment.
I strongly disagree with some points made in the document:
> Window borders Do them in the application library!
Client side decorations have the huge disadvantage that you can't easily close unresponsive applications. To do so would need the compositor to somehow detect if an application has crashed or not which means nothing less than literally solving the halting problem.
> in particular, allthe X facilities for supporting window managers I would just flush.
In retrospect this is a huge mistake. Some niche window managers today have a more active community than the whole Wayland ecosystem. In fact the most popular haskell program is a window manager. So something must be right about enabling the modularity that makes window managers possible.
> Client side decorations have the huge disadvantage that you can't easily close unresponsive applications. To do so would need the compositor to somehow detect if an application has crashed or not which means nothing less than literally solving the halting problem.
You can close unresponsive applications whether they repaint their decorations or not; anyway, chances are, that when they don't repaint their decorations, they won't process your WM_DELETE_WINDOW anyway.
What you can't do without CSD is synchronized painting of decorations and window content. With CSD, you can have pixel-perfect window, without it, you have to coordinate between several processes and chances are you won't do it all in 16,6 ms (for 60 Hz).
> Some niche window managers today have a more active community than the whole Wayland ecosystem.
Sorry, no. Wayland is quite big now, just some niche DEs ignore it, due to lack of manpower, not by principle.
> In fact the most popular haskell program is a window manager.
If I had to guess, I would say that the most popular haskell program is a batch document converter. It is used ouside X11 too. Not that haskell usage is an relevant popularity indicator.
> So something must be right about enabling the modularity that makes window managers possible.
There are wayland compositors aiming exactly for that. See Sway.
I still have nightmares battling the win32 windowing system. Look at all the window when resizing, how they stutter, streak, flip. That's the broken windowing system. At it's core, it's a classic race condition, the window size, controlled by the os is updated, the software needs to repaint it, if it doesn't do it before the next compositor redraw, yikes.
Sadly, nobody seems to care and microsoft "fixed" it in UWP.
The win32 window system gives you a lot of options, and it's sometimes hard to figure out what combination of callbacks/flags you need.
Windows shouldn't be stuttering or streaking if you do it right, which applications like Windows Explorer and Chrome do correctly. Applications like Skype and Discord seem to paint the whole window using some widget library, so they have redraw issues while resizing the window.
Uhm.. what windows are you using? Win 10 Chrome does "lag" if you resize the window. It's very noticable, but people seem to accept it.
Windows explorer is almost perfect. But even that one sometimes draws the old sized content if you resize the window.
Sublime seems perfect, Visual Studio stutters quite badly.
I use a Vulkan Swapchain, to get a swapchain image, you have to say for which window and must use the current size, then you can do all the drawing and present. If the compositor decides to update inbetween, it will show the old swapchain image with the new window size.
It’s complicated and has a learning curve, but not terribly hard overall.
The main reason for these complications is backward compatibility. Windows 3.x only supported a single CPU core and had cooperative multitasking. Windows 95 had preemptive multitasking, but only supported single core still.
> it's a classic race condition
It’s not a race condition because windows have thread affinity. All these messages, WM_SIZE, WM_PAINT and the rest of them, are serialized by the OS into a single queue consumed by the thread who owns the window.
I did not mean a thread race condition. Classic was the wrong word. I mean race condition between acquiring a swapchain (for the current window and size) and finishing drawing to it. There is no way to stop the compositor from updating the window in between. The correct way of doing would be to change the window sizes when presenting the new swapchain.
That's because win32 was developed before graphic cards in PCs were a thing. They made a design decision to make the application own redrawing the window if it got overdrawn.
I see MacOS windows freeze up all the time with the beach ball.
Current GUI frameworks are based on a leaky abstraction in that there is a single drawing thread and if that thread hangs the UI hangs. (e.g. as all the classes and objects are using the same resource to get work done, any of them can stop any others from doing anything at all despite private/protected/encapsulation etc.)
The big difference is Wayland does not do direct access to hardware -- it instead gives access to a backing buffer which is then copied via hardware acceleration. This does make protocol more complicated, but also ensures that that the programs are absolutely contained by their clipping region.