When I want compressed images, I prefer lossy codecs supported by GPUs. A nice thing about them, they decompress on demand by GPU cores, thus saving not just storage, also VRAM bandwidth.
The author of the OP also develops Basis, which is designed to take advantage of GPU texture compression formats. GPU formats are efficient in terms of memory space and bandwidth usage, but they have a fixed bits-per-pixel so they're not the most efficient for storage and transmission. There's also a rift between which formats are supported on desktop and mobile GPUs. Basis layers on a second level of variable-bitrate compression which is designed to be efficient to transcode to various compressed GPU formats (for both desktop and mobile) on the fly.
Delivering images in a GPU-friendly manner on the web could be a massive memory saver (PNG/JPGs end up being 32 bits-per-pixel in memory, while GPU formats are usually 4bpp or 8bpp) but web standards haven't caught up yet, for now the only way to display these formats on the web is inside a WebGL context. It works, but it's a lot of machinery just to show a picture.
GPUs can do LZ4 decompression. So you could still save ~50% PCIe bandwidth and a bunch of CPU time with these formats... but not VRAM, since (unlike BC7) its still a raw bitmap in memory.
Apparently modern GPUs losslessly compress textures in memory by themselves, but the only info I can find is that this is proprietary AMD/Nvidia/Intel voodoo magic.
AFAIK lossless compression of textures is not an automatic thing for GPUs, however framebuffers - the things you render to - are losslessly compressed to reduce bandwidth usage for rendering and scan-out. You're correct that it's vendor secret sauce. People with access to game console SDKs know a bit more about how it works since you can see the compressed frame buffer contents and you have to manually call the right APIs in the SDK to decompress the frame buffer before you can read it back.
Historically lossless compression has also been applied to the depth buffer when rendering, though it isn't done the same way as the lossless framebuffer compression since depth values are floating point with a limited range. You can find more information about how that stuff works online unlike color compression, since people optimize around it - try searching for "hierarchical Z".
Lossless framebuffer compression becomes especially important in cases where you are using MSAA or SSAA, where you might have a framebuffer containing 4 or more "samples" for every rendered pixel. If all of the samples are from the same polygon, it's quite possible that they will have the same value, so it's really profitable to compress them. Then at the end when you do your "resolve" to turn those 4+ MSAA/SSAA samples into a single pixel, that step becomes easier too.
isnt that one called compression just for marketing purposes? Afaik its a caching technique aimed at lowering latency cost of touching Zbuffer. It actually takes more ram than just raw non duplicated data.
The full resolution depth data doesn't always exist, which is why it can reduce bandwidth usage. It probably depends on the architecture, I've seen cases where there is no full resolution depth buffer and you have to generate one via decompression.
> AFAIK lossless compression of textures is not an automatic thing for GPUs, however framebuffers - the things you render to - are losslessly compressed to reduce bandwidth usage for rendering and scan-out.
Textures can be auto losslessly compressed too, but note it's only useful for textures the CPU/GPU is writing to; you should use ASTC or whatever usual format most of the time.
Fascinating little feature for Apple to add to their GPUs - I wonder if it's tech they licensed from PowerVR or they invented it. I'm not aware of an equivalent in the Windows space or on game consoles but it would be cool to see one.
I don't agree with the argument that high-speed bandwidth is more available. The mean may be going up but from what I've experienced, and deployed is that places that already have good internet are getting faster links while little progress is being made in areas with slow or no-internet accessibility.
Yeah. And where I live, available mobile data volume per month is still pretty limited in most plans, and it is increasing only slowly. Unlimited volume plans are very expensive.
The cost of decompression is something I've run into, regarding 3D data. Draco, which is a highly efficient compression system for 3D mesh data, is primarily used with browser based webGL / gltf model rendering, and it achieves excellent compression ratios for online transmission. However, the decompression cost at the viewer's end often means that you will wait longer to see your 3D model than if you had used a simpler format, especially over fast links.
Draco is just a bad algorithm. It's just an implementation overcomplicated paper from the 90s (Edgebreaker), basically compressing the index buffer but nothing else. And it also reorders the index buffer to fit topology, meaning your post-transform cache optimization is undone. And it's really slow, to the order of 1us/tri [0].
There are other papers and algorithms that are similar in size to Draco, but without the massive overhead, e.g. EXT_meshopt_compression. Even in 90s papers, there are far more competitive algorithms [1]
What if we want to rather use that increase in disk space, bandwidth, computation power to show 8K (high resolution) 3D (more than 2D data) 360 (different viewpoints / interactivity) videos (animated in time) instead of little 2D static images ?
I think we'll care more about increase in quality vs. those little cost : disk space is on shared servers / cloud computing, bandwidth is getting better and better, computation power is also increasing all the time and with new methods of doing computation like GPU, Chiplet / SoC / one-task-chip or simply more efficient CPUs etc.
I don't get it, you're complaining that an image codex is not a video codec? Well... duh.
2D images are used for their own sake, not as a compromise because you'd rather show a moving 3D object rendered at 8K but you can't. So obviously this is about the problem of compressing 2D images, it's off-topic to argue that it can't do video or whatever.
We went through that when flash first got popular. There were some good use-cases for sure, but pages that substituted video for images just ended up being far too busy while also being harder to maintain.
Then do that. It's not a video codec. The problem is computation power is increasing much more slowly than bandwidth, and is likely to continue to. 8K PNGs (by my napkin calculations) already take around 1 second of CPU time to decode. There are other solutions, and certainly better options today, but it's plausible that in the next 10-15 years, this will become the best option for fast-loading, high-resolution, 2D images. (Or texture data for 3D graphics. Or 360, I guess - not sure why that'd be different.)
Wouldn't fast decompression become very relevant for viewing lightfield data? If these media assets are indeed very big as you suggest, then doesn't it make sense to stream the relevant view data based on view orientation?
For something like that you'd low latency and thus fast decompression.
In the article Richard shows us the compression stats for the png algorithm vs lz4, for the original lossless image. It would also be interesting to see how the png algorithm performs on the lossy, preconditioned versions of the image. Does it also get any benefit, or does lz4 pull ahead?
Yes, thank you for pointing this out. To be more precise about it:
Original PNG: 1310987 bytes
Lossy PNG: 1154386 bytes (90.4% of LZ4i)
Lossy LZ4i: 1276477 bytes
More Lossy PNG: 693278 bytes
More Lossy LZ4i: 818774 bytes (84.7% of LZ4i)
So it's notable that this lossy PNG technique has diminishing returns for LZ4i - it actually becomes more effective as a PNG preprocessor as you get more lossy, relative to LZ4i!
Still, there's a decent argument that PNG is no longer Pareto-optimal. There's stuff that gets close to the same compression ratio but decompresses much faster, and there's stuff that compresses much better and is also lossless, e.g. JPEG-XL lossless mode.
It's a cool idea to preprocess an image with minuscule changes that make it more suitable for compression with fast LZ codecs. However, I don't buy that this is the future.
Maybe I’m missing something, but isn’t that how JPEG works? It preprocesses the DCT blocks to throw away data (or add noise if you want to read it that way). By doing so, it lends itself to shorter encoding of the data (compression of a sort).
Yep, and jpg is pretty much the bar you have to beat for any lossy compression. It's fast and low power by today's standards, with excellent quality for the compression ratio it achieves. As someone said back in the day, it was alien technology from the future - or something like that.
The big difference is that this preprocessing doesn't require a reversal step. JPEG's DCT requires an IDCT (and chroma conversion etc.), which the client needs to understnad and know about, while this merely introduces controlled noise that happens to please LZ compressors.
Rich is an authority on the matter and I trust him that this will be a trend in the future.
However I greatly dislike this trend because I value my storage space. The less storage space a game wastes, the more games I can have installed. I would much rather like to have more games installed than have each game decode bitmaps more efficiently.
I also think it's a false dichotomy to say you can only choose between slow decoding methods with high bandwidth savings and fast decoding methods with low bandwidth savings. zstd has shown that you can have both fast decoding and high compression.
I don't understand why he would be using lz4 when he can have zstd. The speed advantage may look big on paper but nobody cares if your level loads in 3.2 vs 3.1 seconds.
In this day and age you can buy a 2TB M.2 SSD for $140. Even if you only gamed AAA that’s a ton of games. How often would you really make use of all of that space, to the extent where keeping extras on an external HDD would be too much of a hassle?
Perhaps my lived experience is too different. I typically keep a small handful of games that I play frequently, like Halo Infinite or Cyberpunk 2077. Both have huge memory footprints, but not much when HD space is measured in terabytes…
The problem is that if there is zero push-back, then game companies will just expand to meet the limits. Modern Warfare hit 256GB[0], literally larger than most SSDs at the time. The reason it was huge was supposedly because it shipped uncompressed audio to support a fraction of the user-base who couldn't support decompression fast enough.
I wonder how the average game install size trends vs. bytes per dollar of different storage types (HDD and NVME separate). My gut reaction suggests that storage is scaling faster than game install needs, but then I remember back to how MW2019 took up some 200GB. The other muddy thing in this comparison is that it seems gamers play higher proportions of older games as time goes on, the AAA scene is definitely not what it used to be 5-15 years ago.
Timeless classics like KOTOR 1&2, Fallout New Vegas, Mass Effect etc take so little space I barely notice the storage.
The entire library of video games ever made before the year 2000 could fit onto a single SD card off the shelf from Best Buy for the cost of a mid-range restaurant dinner for 2.
Not so sure about the last line. DOS games alone eXoDOS v5 is ~500GB packed, ~600GB installed, ~1.2TB every game unpacked. Not quite SD level, but $50 dodgy Chinese 2TB SATA SSD sure. Windows up to 2000 will probably be another couple TB.
It comes close to that, but there are some confounding factors wherever you introduce analog elements - and Golden Age arcade games were doing all kinds of wacky analog things that necessitate high fidelity digital sampling. See "Journey" and its tape playback of rock band Journey, or "Dragons Lair" and every other laserdisc game. Classic pinballs are likewise presented digitally through a mishmash of ROM emulation and simulated physical hardware.
It's definitely the case, though, that games have essentially topped out on what they need to support a high fidelity experience in terms of "just" pushing more data down the pipe. While new AAA can push a little farther still, there is broad consensus that diminishing returns are here, and...well, the future is going to be in having AI do the details and make the optimizations, and the resulting size from that is most likely going to stay linear with the number of assets, rather than having each asset metastasize further detail.
I agree with this, and especially when you look at cost of storage over time which is to say the filesizes that we think are large today will be small by tomorrow’s standards and the drives that are expensive today will be almost trivially cheap by tomorrow’s standards.
I can't see this being used for games, those need to deliver textures in GPU-friendly formats like BC/ASTC, or a close analogue like Basis for performance reasons.
Lz4 decompresses with very little CPU load - a fraction of Zstd's load. When bottlenecked by CPU time rather than I/O it could make a large difference.
The computational cost of an endian swap rounds to zero, and file format implementers expect serialised ints to be big-endian by default. It's just the Principle of Least Astonishment at this point.
Yeah, computational cost is indeed pretty close to zero. However, mental overhead of developers, and technical complexity of the implementation, is not zero. These things cost development time, and most importantly they cause bugs.
A principle of least astonishment is being able to memcpy() a structure or mmap() a file, and get the correct numbers in the fields without swapping bytes all the time.
An implementation that memcpy()'s little-endian ints into a struct (without le32toh() or similar) would be subtly incorrect, breaking whenever somebody dares to run your code on a big-endian platform.
The "problem" is there either way, and I'd rather have it be noticeable immediately, than at an unknown point in the future.
Designing software for hypothetical future which is very unlikely to happen is a classic case of overengineering.
Little-endian platforms have won, very few people are targeting MIPS, power or SPARC processors these days. Technically some ARM CPUs are bi-endian, but practically all operating systems people run on them are only supporting little-endian mode. GPUs are little-endian as well.
BE systems aren't a hypothetical future, they're here in the present, in production systems. Yes, they're not very common, but that only makes the edge-cases sharper.
The overengineering argument would make sense for a specific implementation, but not for the format as a whole, IMHO.
It’s not a database format, it’s an image format. They are typically used in conjunction with a graphical display capable of rendering these images.
I don’t think many of these legacy BE systems have such a thing. AFAIK most of these systems are headless: supercomputers with Power10, networking equipment, old SPARC servers, etc.
In other words, serialising an int into bytes is an operation that fundamentally requires you to care about endianness, if you want your files and/or implementation to be platform-interchangeable.
A format that tricks you into thinking you don't need to think about it is a suboptimal one, in my opinion.
The site is more readable if you remove the min-width CSS from the body and content -- it is set for content wider than a HD screen at 1500px. Or at least set it to something like 800px.
I tend to have my browser at 50% width split with another window, and the text goes off the screen making it so I have to scroll left/right to read the text.
Without the min-width it is perfectly readable. The images could also benefit from having the width set to 100% so they scale with the page width instead of being clipped on smaller devices/windows.
But the sample images seem to have been originally quite heavily compressed with JPG or something similar, because zooming in gives very visible compression artifacts.
> Computing bottlenecks change over time. As disk space, disk bandwidth, and internet bandwidth increases, older image codecs that squeeze every last bit out of the resulting file become less valuable for many use cases.
Not a bad question, though a difficult concept to measure in reality.
We can estimate the emissions caused by crypto-currency as that is almost entirely CPU-bound by about the same amount (or an easy to estimate average) per unit. The costs of transmission and storage are going to be relatively minor.
For images the compression cost, while the process is similarly CPU-bound, is pretty small unless you are doing something small like compressing to hundreds/thousands of algorithm/parameter combinations and picking the best result (perhaps by some heuristic more complex than final file size). Also transmission and strorage costs are difficult to estimate because for any given image how do you guestimate how many times it will move and be copied?
Having said that, the article's statement that: “[as] bandwidth savings from overly lossy image codecs will become meaningless, [] the CPU/user time and battery or grid energy spent on complex decompression steps will be wasted.” implies that energy use is at least a coincidental consideration here. Though image quality is by far the primary concern, and that despite being subjective is probably an easier metric to measure in this case, with compression time likely to be ahead of energy use too.
Optimizing for a metric that benefits all of humanity is not stunting technological improvements. It’s improving in a direction that makes human-sense instead of some other kind of sense.
Also, renewable energy doesn’t mean infinite/limitless energy. It often means there are more limitations than non-renewable counterparts. Eg: storing energy is still not a solved problem without really toxic (environmentally and politically) materials.
Yeah, it seems to me (not an expert!) that energy will cost less and less over time, both in terms of how much you pay and how much CO2 is produced, thus CO2 production won't have much relation to image compression.
Isn't that more or less the same thing in the end?
Storage at this point is plentiful. Most people can live just fine with 1 TB of it, which is trivially found in both HDD and SSD forms. Using less saves you nothing, as the media is still there.
Bandwidth is also plentiful and available 24/7. It's also very bursty -- you only download your game once, while you may play it a lot. So saving 50% download time makes no difference energy-wise.
What we can control a lot is CPU power usage. We now have 16 core consumer CPUs that are designed to sleep when not in use, so difference in power usage at runtime can be dramatic.
So an algorithm that trades disk space for decoding efficiency is probably the best way to save power.
Okay, but that's a rounding error. Gigabit internet is widely available, I'm making preparations to upgrade to 10G.
So on my current connection, that's maybe 2-5 seconds of download time. Meanwhile, the router is on constantly anyway, so whether it's 200 MB or 400MB makes next to zero difference energy-wise.
What is actually a rounding error is the amount of people with a connection like yours. This assumption that fast connections and computers are "widely available" sure makes our jobs easier but it's not true.
I myself live in western Europe and I can't do cloud backups because it would take me about two weeks of 24/7 uploading which then constantly interferes with every other internet connection I want to use and makes things like streaming videos, zoom sessions or online gaming impossible. Game updates usually mean I won't be gaming that evening. And that's already the better connection after switching providers.
The actual optimization here would be fewer updates and games releasing in a finished state.
I think you might be missing the point a bit. I'm answering the question "Why not optimize for CO2 emission instead ?"
So, in what plausible scenario would a gamer's CO2 output (which is mostly power use) be significantly influenced by download time?
The vast majority of people spend a lot more time playing than downloading. If your connection is really bad, then you're probably going to buy physical media or play something smaller.
There's no plausible scenario I can think of in which a significant amount of people sits there waiting for an hour for a game to download, on a daily basis, and where the amount of power consumption caused by that is significant when compared to what's consumed by actual game play.
Yeah, I get that giant updates might be annoying to you personally. But the discussion is about CO2.
Can I force my telco to upgrade to fiber because it is a waste of carbon for my computer to be switched on for hours downloading content over ADSL?
I have two ADSL connections that can download at 20 Mbits/s with a load balancer.
With my XBOX ONE I don't really have a choice of physical media, even if I have physical media I'd expect to download between 2-40 GB of patches or additional content. It is definitely a hassle but not prohibitive, generally I install a game before I expect to play it and keep up on downloads ahead of time.
For PC games it is similar except most of the games I play I get off Steam I don't even know if physical media is available at all.
At 20 Mbps, 40GB takes a bit over 4 hours to download. Supposing you have an idle power draw of 60W, that's 0.240 KWh. I think that's reasonable. Downloading is almost effortless for a modern computer, and I'll assume that you're not getting any more use of it, so the monitor turned itself off.
Now compare that to a CPU easily having a TDP of 100W, plus a GPU drawing 180W, plus the rest of the computer, plus a monitor, etc, you can easily reach 400W. That'll blow through the power you consumed to download in half an hour.
Also I don't think anyone releases 40GB patches on a daily basis, so if you game for say, 3 hours a day for a week, the costs of the download is already down to 3%.
Which again is what I was getting at: you should optimize the biggest sources of waste first. Anything that makes your actual gameplay more efficient and consume less CPU or CPU resources will decrease your consumption far more noticeably than almost any optimization you can do to downloading stuff.
Okay, say you're on a 10 Mbps connection. That update takes you what, 5 minutes? Yeah, that's an annoyance, but that's time for a coffee or a bathroom break. Most games don't update daily or anywhere close to it, so your download time is still going to be <1% of your play time for most people.
If your hardware/connection is really awful, yeah, maybe downloading the latest DOOM isn't a great experience, but luckily there's no lack of smaller things to play.
Either way, it's completely irrelevant carbon/energy-wise. Your power usage is going to be dominated by a game trying to render 3D graphics at 60 FPS, not by the tiny increase of the power draw of a router during an update download.
for one thing there is the question of how to attribute carbon emissions for all the "middleboxes" that are drawing tiny amounts of power throughout the whole process but could add up to a lot.
The best way to optimize for CO2 is to ask the user to do it at the right time of day for their grid. Otherwise it's the same thing as using less energy which is ~ the same thing as being faster (assuming fixed resources).
But watch out for induced demand (the user turning the graphics settings up because you made the FPS better.)
Energy savings in mobile form factors makes sense to me, but I agree that worrying about emissions due to desktop/server workloads is laughable when we still haven't written native replacements for things like Teams and Discord. At scale, those 2 applications are probably worth more emissions than an actual Panamax-sized cargo vessel.
A super-awesome feature for the web is that it is progressive - loading a beautiful preview with just the first bytes loaded (and the bitstream can be truncated at any point in time - letting the client download only enough bits for the resolution of the viewer - meaning no need for the server to host different sizes of the same image!)
My idea is more fun. To give a simple example: One can serve the odd rows of pixels first and construct the image when half the data is received. We can slice it horizontally and vertically as often as we want. Say 2784x1856 can be served as 100 images 280x190 one after the other.
When rendering a web document requires only a small version it needs only request the part it wants. Crappy resolution is desirable on a slow connection, with limited ram or a small screen. If the next customer wants to zoom in on it on his 4K display he can download the entire thing.
Grainy tv signals are similar too. Much better than the it works or it doesn't work.
Progressieve jpeg can do at most 5 passes and is a lossy format.
It is designed to display fast then improve the resolution. Perhaps there are implementations that can partially load or stop loading it but it wasn't the goal was it?
Imagine a thumbnail. You have some data there. One clicks on it and a larger version is shown using the thumb in stead of starting from scratch, now one zoons in on the image, surely loading different images from scratch every time is not the right approach? in stead we just don't serve the high resolution version that we [obviously] do have.
This is wavelets aka JPEG2K. The problem is that 1. they don't compress well, because they look blurry which is unpleasant 2. they are inefficient to decode because of poor memory locality. So that's why most codecs use macroblocks instead.
IIRC Ogg Vorbis also originally supported this for audio but nobody used it.
The problem is that it destroys compression rates. PNG filtering and compression depends on being able to make predictions based on adjacent pixels; interlacing ruins that.
For color images on Windows, D3D 11.0 and newer versions require support of BC7 decompression: https://learn.microsoft.com/en-us/windows/win32/direct3d11/b...
On Linux, some embedded GPUs I targeted supported ASTC decompression: https://en.wikipedia.org/wiki/Adaptive_scalable_texture_comp...