Hacker News new | past | comments | ask | show | jobs | submit login
Video with Alpha Transparency on the Web (jakearchibald.com)
151 points by surprisetalk 4 months ago | hide | past | favorite | 62 comments



> There's a feature request for the open source & cross-platform x265 codec to support transparency, but it doesn't seem to be going anywhere.

x265 has added support for alpha very recently, but only using their standalone CLI. https://bitbucket.org/multicoreware/x265_git/commits/c8c9d22...


Ohhhh! I totally missed this. Thanks for the heads up! I wonder if it was triggered by this article, or a weird coincidence.


It's restricted to constant QP rate, and I haven't managed to get it to produce a playable file yet. Maybe I'm holding it wrong. But anyway, it's exciting seeing bits of this land.


I feel this, and not even in a situation that had complex transparency needs. I went down a path that needed the surrounding background perfectly matched to the video background, and it ended up being easier to render the video to a canvas, get the rendered pixel value, and update the surrounding background. Still necessary even knowing what the video background color should be given browser differences in rendering the video’s color. Prompted a fun conversation internally, though[1].

[1] https://www.mux.com/blog/your-browser-and-my-browser-see-dif...


Lord have mercy. I work with colorspaces. They are the bane of anyone's existence who is trying to do real photo or video work. Your article is terrifying.

How do we bring this to the attention of the people working on these decoders and renderer pipelines?

I feel like I should re-run your experiments with photos now, especially taking into account the unusual colorspaces used there, like XYB:

https://facelessuser.github.io/coloraide/colors/xyb/


This is a great article!


I recently downloaded an "avif" thinking that I was downloading a gif. A little annoyed, I started poking around at it with ffmpeg and discovered that the file contained two video streams. I extracted both individually with -c copy into individual mkv's so I could play them individually.

The first video was the content I desired, the second video was solid white but for the same length of time as the first video. I was honestly a little flummoxed about the white stream for about 15 seconds before it hit me "this must be an alpha channel".

I don't suspect I have ever seen it in action, so I am extremely curious how well alpha channels can possibly turn out with lossy video formats where the result of any given frame is a matter of interpretation? In lossless formats like GIF the borders of objects at any given frame are perfectly defined, but lossy formats, especially ones using discrete cosine transform, where the object ends and background begins is not clear cut.


GIF only has binary transparency. Period. Also GIF is only lossless if you don't care about file size (using multiple 0 duration frames with different palettes trick) or material is limited in palette.

From my testing VP9 videos with transparency are fine if you aren't stingy with bitrate, and in general if source material isn't CGI things will be crusty at edges anyway (e.g. greenscreen with motion blur).


> multiple 0 duration frames with different palettes trick

What's that?


You get 256 colors per frame. Want more colors? Use more frames! There's some really impressive software/gifs at https://gif.ski/ if you want to see just how far it's possible to push this terrible format.


That's amazing.


In theory, AVIF supports palette-based blocks, so it can express perfectly sharp edges and even transcode a GIF with perfect accuracy. In practice these modes are not used.

A blurry alpha channel is just having soft edges. A naive encoder will cause some of the previously transparent background bleed into visible pixels, usually causing darkened halos around the edges. This is a common problem in video game assets, and the fixes are the same: either bleed color around the edges (make transparent pixels have the right RGB values), or use premultiplied alpha color space. AVIF (also in theory) supports marking images as having premultiplied alpha.


Video compression may only cause issues at the edges due to chroma subsampling, in which case bleeding chroma to transparent pixels would help. All other cases would be errors in the pipeline, such as incorrect gamma handling, wrong supersampling implementation, or mixing up whether content is premultiplied or not.

Also premultiplication is just a way to cache the multiplication operation:

    // using normal image
    // note that everything after the plus depends entirely on the image
    out = current * (1-img.a) + vec4(img.r*img.a, img.g*img.a, img.b*img.a, img.a)
    // using premultiplied image
    out = current * (1-img.a) + img
Which might make sense if you're drawing same bitmap very many times.


I didn't mention it in the article, but the web component supports premultiplied alpha https://github.com/jakearchibald/stacked-alpha-video?tab=rea...


GIF is not lossless. It reduces the pallette to 256 colors. From your normal 16.7 million colors for 8-bit videos or images.


As a little known quirk of history:

    a statement that the GIF image file format is limited to 256 colors is simply false.
https://web.archive.org/web/20140908023322/http://phil.ipal....

How is pretty sneaky.


Wow, that’s fascinating. I have a tiny macOS app that makes GIFs that I use for screen recordings, it has occurred to me before now that it produces surprisingly good quality images with no visible dithering, I wonder if it’s doing this.

(still feels very stupid to have to do it but until Google Docs lets you embed a silent auto playing mp4…)


By that definition 24-bit image is "lossy" to 32-bit images, and 32-bit images lossy to 64-bit images ad infinitum and lossless formats don't exist.

A gif is lossless in that it does not lose data. When you define a frame within the formats limitations, you will always receive that exact frame in return.

The conversion might have lost data but that's not the gif losing it. That's just what happens trying to fit a larger domain in a smaller domain.

The individual frames within a gif are bitmaps that do not lose detail across saves, extractions, rotations, etc. Each individual pixel is displayed exactly as originally defined.

Compare this to JPEG or MP4 where your individual pixel becomes part of a larger cosine field. There is no guarantee this pixel will exist and even if it does, its exact value is almost certainly changed. This is what "lossy" is.


> By that definition 24-bit image is "lossy" to 32-bit images, and 32-bit images lossy to 64-bit images

Only assuming your base images have that many bits of information.

Most cameras max out at 16 bits per channel when shooting RAW, and even those rarely have mostly noise in the lower bits.

I'm sure you can find an example of some supercooled telescope sensor that actually does capture 16 bits per channel, maybe even more.

In the real world your image source is usually either JPG (24 bits per pixel, already debayered and already lossy) or RAW (16 bits per pixel max, bayered).


In realistic scenes dynamic range is more important than precision. That is you can frequently find things that are many orders of magnitude brighter than each other and proper representation involves lots of bits and using them properly.


Yep, that second stream is the alpha channel. Lossy alpha channels have been used since VP8 (~15 years I think). They seem pretty well tested. You can also see examples in the article.


If you want a lossless format with animation and alpha channel and gigantic file sizes, there's always APNG


Where alpha is a requirement, this is very clever.

However, in many cases, the requirement isn't actually transparency, the requirement is a video background seamless with the containing page.

With most pages white or off-white, this can be done at production. Even responsive dark mode can be done in production if two clips are made.

We used this simpler technique for borderless animated video (e.g., a streaming video spokesperson walking across your e-commerce product page) 20 years ago.

The optical illusion of borderless transparency works so surprisingly well it's unbelievable it's not seen widely.


That certainly works but it’s a pretty big limitation. When catering with dynamic viewports on the web you’re really going to struggle to line everything up if you absolutely anything in the background, providing an alpha channel really is a much easier alternative.


"Just" ensure your responsive design snaps to checkpoints where this registration matters, and prepare video assets accordingly.

One way to ensure this is layout using, ahem, tables. (Don't laugh, it works for the site you're reading right now...)

Not everything we've come up with in the last 20 years made life easier...


The canonical way to handle responsive videos is to embed multiple source elements within the video element, and them use the media attribute to respond to the user's viewport size, and preferred color scheme: https://developer.mozilla.org/en-US/docs/Web/HTML/Element/so...


When using this technique, this comment and the accompanying article are a must-read https://news.ycombinator.com/item?id=41232070


It gets worse, it's not just video rendering - Chrome (used to?) use a different color profile than all other browsers, making the same CSS values brighter than the other browsers. It was different enough that our designer could see something was wrong just glancing at my Firefox window while walking past behind me.

I can't find the original bug report (this was around 2018 and it was already pretty old, like 5+ years), but at the time they were refusing to fix it even though the choice of color profile went against the CSS spec.

Edit: Because Chromium, not Chrome. Came right up just switching to that, the bug report was from 2010, and from the recent comments looks like it's still an issue:

Migrated Chromium bugs? https://issues.chromium.org/issues/40401125

The original page I remember from back then, linked from above: https://bugs.chromium.org/p/chromium/issues/detail?id=44872&...


Hard agree, and you’d think white and black would work best, but then there’s the legacy black levels / white levels thing alluded to at the end of OP’s article...


Spent hours solving the same problem, landed on this tool: https://rotato.app/tools/converter

It will take transparent MP4/MOV as input and output compatible versions for Safari (HEVC) and Chrome (WEBM). Then you simply use a tag that includes both e.g:

    <video playsinline preload="metadata">
      <source src="video.mp4" type="video/mp4; codecs=hvc1">
      <source src="video.webm" type="video/webm">
    </video>


It's worth reading the section of the article that details why this isn't a great solution: https://jakearchibald.com/2024/video-with-transparency/#vp9-...


It works fine in practice, the reason they're struggling is they don't know about Apple's avconvert command is the only real route to do it.

avconvert -p PresetHEVCHighestQualityWithAlpha -s ./input.mov -o ./output.mov

is the trick


They do know about avconvert (I am the article author).

avconvert gives you almost no control over the output. You'll end up with a file orders of magnitude bigger than the alternatives covered in the article.

Compressor is the next best thing, as it gives you some control, but you still end up with a file multiples bigger than the ideal solution.


It has always both surprised and dismayed me that transparent video on the web is still so damn hard. It's nice to see someone managing to make it work and then sharing their solution.


Unpopular opinion: Flash allowed to create web experience that is extremely difficult to do with modern technologies.


> Flash allowed to create web experience that is extremely difficult to do with modern technologies.

Like what exactly? I played/viewed a lot of Flash content back in the day, but I don't remember anything out of the ordinary that simply isn't possible/is extremely difficult today, want to jog my memory?

There was a period of time after Flash was deprecated and web technologies weren't ready to replace it, but I think we're beyond that today, unless I'm missing something essential Flash could do.


I recall only once seeing Flash being used to do something "out of the ordinary". It was a video at YouTube (back when it used Flash for its video player), which in the middle of the video "escaped" its container and exploded over the whole page filling it with an animation (playing over the rest of the page elements), going back to normal once the animation finished. That probably required special support from YouTube's flash player, and would never be allowed with its current use of the browser's built-in video player.


Definitely as possible today with a canvas covering the entire page but being transparent, probably exactly what they did with the <object> container for that trick too. Flash was constrained to the container, just like <canvas> is, and surely it wasn't actually "escaping" from that container, but the container was drawn over the entire window.

But, hard to know exactly how it was implemented without seeing the source, so I guess we'll never know.


From what I've heard elsewhere (never used it myself), it was the editor for creating flash content that nothing today compares to, that modern tools for creating content are the thing that's extremely difficult (to use) compared to the flash tooling.


I've heard that in the past too, but sounds like that's more about Macromedia Flash/Adobe Flash than it is about flash. The claim was:

> Flash allowed to create web experience that is extremely difficult to do with modern technologies

Which sounds like it's about the output, not the process for getting to the output.


I agree that the broad idea of client-side rendering is capable of much more than we currently see widely deployed in web technologies. However, I think citing Flash itself as an example is problematic because, while it was really good at some things, it was also flawed in many ways like being single-threaded, proprietary and having security issues.

Instead of citing a particular technology (whether Flash, HTML5, WebGPU, etc), which risks getting into the weeds of defending any shortcomings in one or the other, I'd rather propose that client-side rendering in general is still under-utilized and capable of so much more. I also think the under-appreciated elephant in the room is that Apple and Google have both been guilty of subtly manipulating web standards processes to nerf client-side applications. And they've been very clever about it. Sometimes proposing potentially exciting enhancements but later sabotaging them - always with some plausibly deniable cover, whether security, backward compatibility, accessibility, etc. Other times they'll 'embrace and extend' a new proposal - and then keep extending and extending until it either becomes practically unimplementable, unperformant, bloated or just collapses in committee under it's own gravity.

Bottom line: powerful, performant client-side applications securely delivered in real-time through the open web are bad for app store walled-garden models and businesses that rely on controlling centralized social media or gaming platforms. Advanced client-side technologies and standards aren't as good or widely deployed as they could be because powerful interests don't want them to be.


I don't think your private files being accessed remotely is a very desirable experience.


It's frustrating to see full vs limited range is still an issue in 2024. I was hoping it would have been figured out for web video by now, but we're still in a situation (at least on Windows) where you can end up having to go into the nvidia control panel and mess with settings to make video look correct in webpages... And then in the case where you manually split the alpha channel out, limited range could mess up the alpha channel. Painful.


I wonder if SVG filters would let you do the "manual" approach without JavaScript. IIRC they're hardware-accelerated by default in Chrome, and they usually still work in browsers that disable WebGL for security/privacy reasons.


The folks at Wix experimented with this (although I couldn't get their demo working) https://twitter.com/YDaniv/status/1820558358648435020

They say it works in Chromium, it's buggy in Firefox, and slow in Safari.



PNG is a lossless format, so you file sizes will be huge. Although it can build upon previous frames, it can only do so in a really simple GIF-like way (drawing over the top). For example, the bouncing ball example on Wikipedia is 62 kB https://en.wikipedia.org/wiki/APNG#/media/File:Animated_PNG_..., whereas the equivalent VP9 is 7 kB https://static-misc-3.glitch.me/alpha-video/.

Animated WebP is based on VP8. However, the alpha channel is stored losslessly, which makes compression even worse than VP8.

VP8 is of course a generation behind VP9, so the results will be significantly worse than the VP9 example in the article, which is significantly worse than AV1.

There are also playback issues in Safari.


With apng the filsize will get huge very quickly.

Webp is the same, only a little bit better.

In both cases you need all the frames as complete pictures after all.


I had to use apng a while back and it was surprisingly difficult to generate one that worked well. I forget if we kept it long term, it was janky as heck. I don’t believe webp was as widely supported at the time either which might have been easier.


Unfortunatekly, animated WebP still has some outstanding decoder issues on iOS devices frequently causing the video to stutter or play at a decreased framerate.


> Here's a demo, but… don't get your hopes up

-- Continues to render perfectly fine and smooth with 60 fps on my Windows/Firefox/Thinkpad

Ah, the problems of good software running on high-powered machines. Closed, bug not reproducable!


fwiw, Firefox is the browser that get this closest to ok.


Here you go - problem solved - maybe? This link shows animated transparent webp:

https://static.crowdwave.link/transparentanimatedwebp.html

Seems to work on Chrome, Firefox and Safari.

The video file is here: https://static.crowdwave.link/transparentanimatedwebp.webp

I put the result of the code below here:

This python program will generate webp frames and if you have ffmpeg will convert them to transparent animated webp:

    from PIL import Image, ImageDraw
    import random
    import subprocess
    
    # Image parameters
    width, height = 640, 480
    frames = 1000
    shapes = 10
    
    def random_color():
        # Generate a random RGBA color with varying levels of transparency (alpha)
        return (random.randint(0, 255), random.randint(0, 255), random.randint(0, 255), random.randint(50, 200))
    
    def random_position():
        return random.randint(0, width - 80), random.randint(0, height - 80)
    
    def random_size():
        return random.randint(20, 80)
    
    def create_gradient(width, height):
        base = Image.new('RGBA', (width, height))
        top = Image.new('RGBA', (width, height))
        for y in range(height):
            alpha = int((y / height) * 255)  # Alpha varies from 0 to 255
            for x in range(width):
                top.putpixel((x, y), (255, 255, 255, alpha))  # White gradient
        return Image.alpha_composite(base, top)
    
    # Create a series of images
    for i in range(frames):
        # Create an image with a transparent gradient background
        img = create_gradient(width, height)
        draw = ImageDraw.Draw(img)
    
        for _ in range(shapes):
            shape_type = random.choice(['rectangle', 'ellipse'])
            x1, y1 = random_position()
            x2, y2 = x1 + random_size(), y1 + random_size()
            color = random_color()
    
            if shape_type == 'rectangle':
                draw.rectangle([x1, y1, x2, y2], fill=color)
            elif shape_type == 'ellipse':
                draw.ellipse([x1, y1, x2, y2], fill=color)
    
        # Save each frame as a PNG file
        img.save(f'frame_{i:04d}.png')
    
    print("Frames created successfully.")
    
    # Create an animated WebP from the generated frames
    subprocess.run([
        'ffmpeg', '-y', '-i', 'frame_%04d.png', '-vf', 'fps=30,scale=320:-1:flags=lanczos',
        '-loop', '0', '-pix_fmt', 'yuva420p', 'output.webp'
    ])
    
    print("Animated WebP created successfully.")



Then put the output.webp and this index.html on your local disk somewhere and load the index.html in a browser.

Pull the slider to change the background so you can see it is transparent.

    <!DOCTYPE html>
    <html lang="en">
    <head>
        <meta charset="UTF-8">
        <meta name="viewport" content="width=device-width, initial-scale=1.0">
        <title>Animated WebP Display with Background Image</title>
        <style>
            body {
                display: flex;
                justify-content: center;
                align-items: center;
                height: 100vh;
                margin: 0;
                background-image: url('transparentanimatedwebp.jpg');
                background-size: cover;
                background-position: center;
                background-repeat: no-repeat;
            }
            img {
                max-width: 100%;
                height: auto;
            }
        </style>
    </head>
    <body>
    
        <img src="transparentanimatedwebp.webp" alt="Animated WebP">
    
    </body>
    </html>


Animated WebP is based on VP8. However, the alpha channel is stored losslessly, which makes compression even worse than VP8.

VP8 is of course a generation behind VP9, so the results will be significantly worse than the VP9 example in the article, which is significantly worse than AV1.

Also, the demo crashes Safari for me (it plays progressively slower and slower).


Animated WebP is an absolute dumpster fire on mobile safari (and thus all iOS derived browsers based on WebKit), with the animation often stuttering and dropping frames before it corrects itself.

I had to revert back to autoplaying no-audio MP4 files for this reason.


TFA did mention using animated AVIF, but not WebP for some reason. The issues still stand though, no playback controls, no programmatic playback, no audio. For my use case, I was not able to get an animated WebP to just play once and stop.

Edit: also no desktop Safari support for transparent animated WebP.


Can be done but requires a server.

>> Edit: also no desktop Safari support for transparent animated WebP.

Do you mean the link above that I posted? Works fine in my desktop Safari.

https://static.crowdwave.link/transparentanimatedwebp.html


> 8bit is pretty minimal when it comes to gradients, so this ~15% reduction can result in banding.

If you can already have an access to WebGL shaders, probably banding can be fixed in shaders too.


Apple ProRes 4444 deserves a mention. It's used a lot in professional contexts like digital signage, event projections etc.

Edit: sorry, ProRes is of course not "on the web".


It's a good 'original' format to feed to ffmpeg.


Phew, old times, I had to solve this with a crafty solution using Flash back in the day.

(Un)surprised to know this is still an issue in 2024.


Yep, in Flash it was done with VP6.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: