It's restricted to constant QP rate, and I haven't managed to get it to produce a playable file yet. Maybe I'm holding it wrong. But anyway, it's exciting seeing bits of this land.
I feel this, and not even in a situation that had complex transparency needs. I went down a path that needed the surrounding background perfectly matched to the video background, and it ended up being easier to render the video to a canvas, get the rendered pixel value, and update the surrounding background. Still necessary even knowing what the video background color should be given browser differences in rendering the video’s color. Prompted a fun conversation internally, though[1].
Lord have mercy. I work with colorspaces. They are the bane of anyone's existence who is trying to do real photo or video work. Your article is terrifying.
How do we bring this to the attention of the people working on these decoders and renderer pipelines?
I feel like I should re-run your experiments with photos now, especially taking into account the unusual colorspaces used there, like XYB:
I recently downloaded an "avif" thinking that I was downloading a gif. A little annoyed, I started poking around at it with ffmpeg and discovered that the file contained two video streams. I extracted both individually with -c copy into individual mkv's so I could play them individually.
The first video was the content I desired, the second video was solid white but for the same length of time as the first video. I was honestly a little flummoxed about the white stream for about 15 seconds before it hit me "this must be an alpha channel".
I don't suspect I have ever seen it in action, so I am extremely curious how well alpha channels can possibly turn out with lossy video formats where the result of any given frame is a matter of interpretation? In lossless formats like GIF the borders of objects at any given frame are perfectly defined, but lossy formats, especially ones using discrete cosine transform, where the object ends and background begins is not clear cut.
GIF only has binary transparency. Period. Also GIF is only lossless if you don't care about file size (using multiple 0 duration frames with different palettes trick) or material is limited in palette.
From my testing VP9 videos with transparency are fine if you aren't stingy with bitrate, and in general if source material isn't CGI things will be crusty at edges anyway (e.g. greenscreen with motion blur).
You get 256 colors per frame. Want more colors? Use more frames! There's some really impressive software/gifs at https://gif.ski/ if you want to see just how far it's possible to push this terrible format.
In theory, AVIF supports palette-based blocks, so it can express perfectly sharp edges and even transcode a GIF with perfect accuracy. In practice these modes are not used.
A blurry alpha channel is just having soft edges. A naive encoder will cause some of the previously transparent background bleed into visible pixels, usually causing darkened halos around the edges. This is a common problem in video game assets, and the fixes are the same: either bleed color around the edges (make transparent pixels have the right RGB values), or use premultiplied alpha color space. AVIF (also in theory) supports marking images as having premultiplied alpha.
Video compression may only cause issues at the edges due to chroma subsampling, in which case bleeding chroma to transparent pixels would help. All other cases would be errors in the pipeline, such as incorrect gamma handling, wrong supersampling implementation, or mixing up whether content is premultiplied or not.
Also premultiplication is just a way to cache the multiplication operation:
// using normal image
// note that everything after the plus depends entirely on the image
out = current * (1-img.a) + vec4(img.r*img.a, img.g*img.a, img.b*img.a, img.a)
// using premultiplied image
out = current * (1-img.a) + img
Which might make sense if you're drawing same bitmap very many times.
Wow, that’s fascinating. I have a tiny macOS app that makes GIFs that I use for screen recordings, it has occurred to me before now that it produces surprisingly good quality images with no visible dithering, I wonder if it’s doing this.
(still feels very stupid to have to do it but until Google Docs lets you embed a silent auto playing mp4…)
By that definition 24-bit image is "lossy" to 32-bit images, and 32-bit images lossy to 64-bit images ad infinitum and lossless formats don't exist.
A gif is lossless in that it does not lose data. When you define a frame within the formats limitations, you will always receive that exact frame in return.
The conversion might have lost data but that's not the gif losing it. That's just what happens trying to fit a larger domain in a smaller domain.
The individual frames within a gif are bitmaps that do not lose detail across saves, extractions, rotations, etc. Each individual pixel is displayed exactly as originally defined.
Compare this to JPEG or MP4 where your individual pixel becomes part of a larger cosine field. There is no guarantee this pixel will exist and even if it does, its exact value is almost certainly changed. This is what "lossy" is.
> By that definition 24-bit image is "lossy" to 32-bit images, and 32-bit images lossy to 64-bit images
Only assuming your base images have that many bits of information.
Most cameras max out at 16 bits per channel when shooting RAW, and even those rarely have mostly noise in the lower bits.
I'm sure you can find an example of some supercooled telescope sensor that actually does capture 16 bits per channel, maybe even more.
In the real world your image source is usually either JPG (24 bits per pixel, already debayered and already lossy) or RAW (16 bits per pixel max, bayered).
In realistic scenes dynamic range is more important than precision. That is you can frequently find things that are many orders of magnitude brighter than each other and proper representation involves lots of bits and using them properly.
Yep, that second stream is the alpha channel. Lossy alpha channels have been used since VP8 (~15 years I think). They seem pretty well tested. You can also see examples in the article.
Where alpha is a requirement, this is very clever.
However, in many cases, the requirement isn't actually transparency, the requirement is a video background seamless with the containing page.
With most pages white or off-white, this can be done at production. Even responsive dark mode can be done in production if two clips are made.
We used this simpler technique for borderless animated video (e.g., a streaming video spokesperson walking across your e-commerce product page) 20 years ago.
The optical illusion of borderless transparency works so surprisingly well it's unbelievable it's not seen widely.
That certainly works but it’s a pretty big limitation. When catering with dynamic viewports on the web you’re really going to struggle to line everything up if you absolutely anything in the background, providing an alpha channel really is a much easier alternative.
The canonical way to handle responsive videos is to embed multiple source elements within the video element, and them use the media attribute to respond to the user's viewport size, and preferred color scheme: https://developer.mozilla.org/en-US/docs/Web/HTML/Element/so...
It gets worse, it's not just video rendering - Chrome (used to?) use a different color profile than all other browsers, making the same CSS values brighter than the other browsers. It was different enough that our designer could see something was wrong just glancing at my Firefox window while walking past behind me.
I can't find the original bug report (this was around 2018 and it was already pretty old, like 5+ years), but at the time they were refusing to fix it even though the choice of color profile went against the CSS spec.
Edit: Because Chromium, not Chrome. Came right up just switching to that, the bug report was from 2010, and from the recent comments looks like it's still an issue:
Hard agree, and you’d think white and black would work best, but then there’s the legacy black levels / white levels thing alluded to at the end of OP’s article...
It will take transparent MP4/MOV as input and output compatible versions for Safari (HEVC) and Chrome (WEBM). Then you simply use a tag that includes both e.g:
They do know about avconvert (I am the article author).
avconvert gives you almost no control over the output. You'll end up with a file orders of magnitude bigger than the alternatives covered in the article.
Compressor is the next best thing, as it gives you some control, but you still end up with a file multiples bigger than the ideal solution.
It has always both surprised and dismayed me that transparent video on the web is still so damn hard. It's nice to see someone managing to make it work and then sharing their solution.
> Flash allowed to create web experience that is extremely difficult to do with modern technologies.
Like what exactly? I played/viewed a lot of Flash content back in the day, but I don't remember anything out of the ordinary that simply isn't possible/is extremely difficult today, want to jog my memory?
There was a period of time after Flash was deprecated and web technologies weren't ready to replace it, but I think we're beyond that today, unless I'm missing something essential Flash could do.
I recall only once seeing Flash being used to do something "out of the ordinary". It was a video at YouTube (back when it used Flash for its video player), which in the middle of the video "escaped" its container and exploded over the whole page filling it with an animation (playing over the rest of the page elements), going back to normal once the animation finished. That probably required special support from YouTube's flash player, and would never be allowed with its current use of the browser's built-in video player.
Definitely as possible today with a canvas covering the entire page but being transparent, probably exactly what they did with the <object> container for that trick too. Flash was constrained to the container, just like <canvas> is, and surely it wasn't actually "escaping" from that container, but the container was drawn over the entire window.
But, hard to know exactly how it was implemented without seeing the source, so I guess we'll never know.
From what I've heard elsewhere (never used it myself), it was the editor for creating flash content that nothing today compares to, that modern tools for creating content are the thing that's extremely difficult (to use) compared to the flash tooling.
I agree that the broad idea of client-side rendering is capable of much more than we currently see widely deployed in web technologies. However, I think citing Flash itself as an example is problematic because, while it was really good at some things, it was also flawed in many ways like being single-threaded, proprietary and having security issues.
Instead of citing a particular technology (whether Flash, HTML5, WebGPU, etc), which risks getting into the weeds of defending any shortcomings in one or the other, I'd rather propose that client-side rendering in general is still under-utilized and capable of so much more. I also think the under-appreciated elephant in the room is that Apple and Google have both been guilty of subtly manipulating web standards processes to nerf client-side applications. And they've been very clever about it. Sometimes proposing potentially exciting enhancements but later sabotaging them - always with some plausibly deniable cover, whether security, backward compatibility, accessibility, etc. Other times they'll 'embrace and extend' a new proposal - and then keep extending and extending until it either becomes practically unimplementable, unperformant, bloated or just collapses in committee under it's own gravity.
Bottom line: powerful, performant client-side applications securely delivered in real-time through the open web are bad for app store walled-garden models and businesses that rely on controlling centralized social media or gaming platforms. Advanced client-side technologies and standards aren't as good or widely deployed as they could be because powerful interests don't want them to be.
It's frustrating to see full vs limited range is still an issue in 2024. I was hoping it would have been figured out for web video by now, but we're still in a situation (at least on Windows) where you can end up having to go into the nvidia control panel and mess with settings to make video look correct in webpages... And then in the case where you manually split the alpha channel out, limited range could mess up the alpha channel. Painful.
I wonder if SVG filters would let you do the "manual" approach without JavaScript. IIRC they're hardware-accelerated by default in Chrome, and they usually still work in browsers that disable WebGL for security/privacy reasons.
Animated WebP is based on VP8. However, the alpha channel is stored losslessly, which makes compression even worse than VP8.
VP8 is of course a generation behind VP9, so the results will be significantly worse than the VP9 example in the article, which is significantly worse than AV1.
I had to use apng a while back and it was surprisingly difficult to generate one that worked well. I forget if we kept it long term, it was janky as heck. I don’t believe webp was as widely supported at the time either which might have been easier.
Unfortunatekly, animated WebP still has some outstanding decoder issues on iOS devices frequently causing the video to stutter or play at a decreased framerate.
Animated WebP is based on VP8. However, the alpha channel is stored losslessly, which makes compression even worse than VP8.
VP8 is of course a generation behind VP9, so the results will be significantly worse than the VP9 example in the article, which is significantly worse than AV1.
Also, the demo crashes Safari for me (it plays progressively slower and slower).
Animated WebP is an absolute dumpster fire on mobile safari (and thus all iOS derived browsers based on WebKit), with the animation often stuttering and dropping frames before it corrects itself.
I had to revert back to autoplaying no-audio MP4 files for this reason.
TFA did mention using animated AVIF, but not WebP for some reason. The issues still stand though, no playback controls, no programmatic playback, no audio. For my use case, I was not able to get an animated WebP to just play once and stop.
Edit: also no desktop Safari support for transparent animated WebP.
x265 has added support for alpha very recently, but only using their standalone CLI. https://bitbucket.org/multicoreware/x265_git/commits/c8c9d22...