But that gets back to the basic issue here - the answer to "how should Zoom take screenshots if someone is using a Wayland compositor?" turns out to be "Well, you needed to implement a separate server that re-imagines video as streams, like PulseAudio. Then they have to interface with that. Of course, Wayland doesn't support that (it explicitly makes it hard, in fact), so someone needs to design a protocol extension [0] first to make all this work".
That is a very complex answer. On X, the answer is "grab the relevant window". I can see why Zoom gave the whole mess a pass until the community sort themselves out. I'm doing the same thing; although it looks like (checks) 15 years of patience has been rewarded and Wayland isn't too much of a backslide from X servers.
Unfortunately automating I/O still isn't all that clear, I'd have clicking automated before I move over; it probably is possible but it is just a messy situation to figure out how.
[0] Technically not a Wayland extension, they bypassed the entire system. So now "Wayland compositor" tells us nothing whatsoever about whether the system can take screenshots, but we have to look up compositor-specific support for screencapture. The situation is ... technically better than X, but still silly. They could have avoided years of wasted time by just acknowledging that people want to intercept data as part of the protocol. The situation is a mess and it is silly that we can't have generic Wayland screenshot applications.
The thing is, just grabbing the root window didn't work for 100% either. Outside of the permission problem, there were others: at first, people noticed, that the mouse cursor is missing, so the screen grabbing apps had to handle that as an extra (get the cursor shape, it's position, superimpose it on the from previous step pixmap). Then, if you happened to use Xv, the content was missing too, you got the chroma key used instead. This use case was never handled while Xv was used, the problem was solved when it became obsolete.
The point is, you either implemented a quite complex way to get a screenshot, or just used another implementation as a library. For a screencast, you discover a new here-are-lions land, since you are going to read out VRAM for each frame, with corresponding "speed" and CPU load.
So yes, separate server handles that and more for you. Compositors finally can use overlays (that's how they emulate different scales or resolutions for clients like games, while using native resolution of the display). Pipewire/gstreamer implement a hardware accelerated, zero copy screencasting for you, so you don't have to do it yourself, just call the right portal.
Yes, apps have to change. But they already do change, at least on other platforms. Apple does way more changes in their system, and do you see anyone waiting for several releases until they update? They changed their arch for christ's sake, and except for the long tail, everyone updated within few months. Why it is so difficult on the linux side?
Maybe the reason is not that it is difficult to update, but that the linux users accept crap and apple users don't. Do not accept crap by the proprietary vendors, and they will step up their effort. It's that simple.
That is a very complex answer. On X, the answer is "grab the relevant window". I can see why Zoom gave the whole mess a pass until the community sort themselves out. I'm doing the same thing; although it looks like (checks) 15 years of patience has been rewarded and Wayland isn't too much of a backslide from X servers.
Unfortunately automating I/O still isn't all that clear, I'd have clicking automated before I move over; it probably is possible but it is just a messy situation to figure out how.
[0] Technically not a Wayland extension, they bypassed the entire system. So now "Wayland compositor" tells us nothing whatsoever about whether the system can take screenshots, but we have to look up compositor-specific support for screencapture. The situation is ... technically better than X, but still silly. They could have avoided years of wasted time by just acknowledging that people want to intercept data as part of the protocol. The situation is a mess and it is silly that we can't have generic Wayland screenshot applications.