Hacker News new | past | comments | ask | show | jobs | submit login
Quite OK Image is now my favorite asset format (nullprogram.com)
194 points by nalgeon on Dec 18, 2022 | hide | past | favorite | 135 comments



There are a lot of dimensions on which QOI could be improved, but most cases involve trade-offs, e.g. more complexity for better compression. The areas where QOI could have been strictly better, the dimensions on which it is not on the Pareto frontier, are more meaningful criticisms — missed opportunities.

This summer I spent a lot of time researching lossless compression formats as a streaming format for our 3D capture and reconstruction engine. QOI was on our list too.

My biggest criticism about QOI is that in many cases it doesn't compress at all, or even increases size. It works well on the demo images that come with the package but these are carefully selected. It also works well on high-res, high-quality images with lots of single-color background and/or smooth gradients. But give it grainy, noisy, smaller-format images and it completely breaks down.

We ended up sticking with PNG, which btw. has fast open source implementations too [1][2].

[1] https://libspng.org/ [2] https://github.com/richgel999/fpng


All of these QOI derivatives improves upon the coding efficiency AFAIK.

In QOIP, opcodes themselves are stored in the format, LZ4 or zstd compression

https://github.com/chocolate42/qoipond

QOIR adds: color profile, EXIF, premultiplied alpha, lossy mode, tile support, LZ4 compression, AFAIK it does this while beating the original QOI in terms of speed AND efficiency.

Lots of benchmarks in: https://github.com/nigeltao/qoir

QOIX is my very own version that adds: Greyscale [+alpha] 8-bit image support, lossy 10-bit support (16-bit encoded as 10-bit, intended for elevation maps), so it's really 3 codecs , optional LZ4. https://github.com/AuburnSounds/gamut


While what you're saying is true - QOI performs badly with some types of images - I want to push back on the notion that the demo images have been "carefully selected" to tell a narrative. TFA alleges the same. This is absolutely not the case!

The benchmark suite[1] I compiled contains many different types of images. The "industry standard" sets like the Kodak and Tecnick Photos, texture packs and icons are complete. No images have been removed from these sets. Some of the other collections (photos and game screenshots from Wikipedia and those from pngimg.com) are a random sample. I did not remove any images just because they compressed badly.

Also, the fact that some images compress badly is a non issue, if the whole set of assets that you need for your game/app combined have an acceptable compression ratio. If that's not the case, by all means, chose another image format.

On average, for a lot of assets types, QOI usually performs quite ok.

[1] https://qoiformat.org/benchmark/


Fair enough. In our testing we couldn't get it to compression ratios lower than ~90% on the kind of images we tested (low quality webcam-type sensors, sub-VGA resolution), while it achieved very impressive compression ratios on the demo images, so I assumed a selection was made. Perhaps it's just works really well for certain kinds of images and doesn't for others.


Low quality sensor photographs are all but impossible to compress because they're filled with compressions worst nightmare: random noise.

Lossless compression of random noise isn't possible and lossy compression of it requires content aware algorithms (e.g. AI) to get results.

Even if you don't readily see the random noise, it's there. Subtly changing the hues of pixels just slightly enough to be incompressible.


And the frustrating thing is that the “noise” is actually information; it tells you something about the capture device and encoding toolchain. It’s just an incredibly inefficient way to encode “this was captured on a cheap image sensor on a severely memory constrained device”


While you can't compress the random noise itself, it's often easy to still get decent compression of the whole image, depending on how serious the noise is.

Imagine an image with large blocks of constant colour in 8-bit colour depth has 2 bits of random noise on every pixel. We indeed can't ever compress to less than 2 bits per pixel, but we can still get down to that.


They might be impossible to compress using RLE-based mechanisms, but the theoretical possibility is there. The noise range for the pixels is not 0-255, so the image has less entropy than a fuly random one and thus should be compressable.

I am imagining something like a "base layer" with averaged brightness + arithmetic coding for differences might do it.


These sorts of images do compress quite ok though (pun intended) with PNG.


QOI will be pretty bad with exceptionally noisy images. that said, if you denoise first, you aren't degrading image quality.


> QOI usually performs quite ok

Nice :-)


I guess you are taking photographs. One should never try to store photographs with no loss. With a lossy format, like JPG or WebP, you can get 5x smaller file size, while giving up 0.000001% of the original information (which usually corresponds to specific values of noise, that are not important in practice).


Okay, the decoder is less than a kilobyte. But where is that an advantage over a 50KB PNG decoder, when the compression ratio suffers so much?


I came away from reading the article wondering what context I was missing. What use-case is the author targeting? It feels like I need to read more about this author to understand the article.


I think it's in the title: assets. The example case is replacing the BMP assets used by a game with QOI with a resulting 10x size reduction: https://github.com/skeeto/chess/commit/5c123b3

I think this functions as a test case. For a game as simple as a chess UI, PNG would probably be fine unless you're code-golfing on the final output binary size or refusing to use common dependencies. But for some programs (e.g. large video games), preloading all your assets is very common and decoding speed can be crucial. Maybe the assets could even be left in compressed form in memory in order to reduce the system requirements? I'm not sure if this is common or not.

Side note: video game assets are often enormous in size because developers refuse to implement even very basic compression because of the supposed performance impact. Getting the decompression speed up can result in tremendous reduction in disk usage because it allows doing compression without losing performance.


> video game assets are often enormous in size because developers refuse to implement even very basic compression because of the supposed performance impact

This claim needs some real world evidence to back it up (and usually it's not about a performance impact, but instead a perceived image quality impact).

IME a lot of care is taken for compressing asset data, and if there would be a chance to reduce the asset size by another few percent without losing too much detail (that's the important part), it would be done. In the end, textures need to end up in memory as one of the GPU compatible hardware compressed texture formats (e.g. BCx) - which is important not only for reducing memory usage, but mainly for increasing texture sampling performance on the GPU (reduced memory bandwidth and better cache locality when reading texture data from GPU memory).

Those hardware-compressed texture formats rule out any of the popular image formats like JPEG, PNG etc... that might have a better overall compression, but are hard to decode on the fly to one of the hardware texture formats (but there are now alternatives like https://github.com/BinomialLLC/basis_universal), but even a generic lossless compressor (even good old zip) on top of BCx is already a good start.


> This claim needs some real world evidence to back it up (and usually it's not about a performance impact, but instead a perceived image quality impact).

We're talking lossless compression here, so image quality is not the issue.

Fortunately someone else has already done this research. There's a tool for Windows to control the compact.exe behavior for individual folders called CompactGUI: https://github.com/IridiumIO/CompactGUI

They maintain a database of compression results here: https://docs.google.com/spreadsheets/d/14CVXd6PTIYE9XlNpRsxJ...

Reductions in storage use of greater than 50% are so common that they're hardly even worth remarking on. My experience with compressing a bunch of games is that the biggest gains come from compressing bloated asset packs. Hard to know what else could be taking up more than 50% of the storage space in a particular game.


Very interesting list, thank you! From glancing over it, poor compression mostly seems to be an 'indie phenomenon' though (which is a bit surprising, because I would expect that Unity automatically takes care of asset compression, but maybe not).


It may have tools that let you, but an indie project has 500 other fires to put out, and filesize is rarely a bottleneck.


> But for some programs (e.g. large video games), preloading all your assets is very common and decoding speed can be crucial.

Those games typically don't need lossless image compression. There are much better (and faster) algorithms for lossy texture compression, many of which can even be decoded directly by the GPU. The S3TC family is one popular example.

One interesting commercial product in this space is Oodle Texture: http://www.radgametools.com/oodletexture.htm


Lossy causes some issues with normal maps, so QOI seems good for those cases where lossless is preferred to balance loading (and streamed loading) performance and space use, but yes, in general S3TC is good for sRGB color textures in PC games.

Using a lossy hardware compression (like S3TC) that's natively supported by the hardware is especially a good way to save on GPU RAM! Double the texture resolution for free, essentially.


It's true that lossless is often not the best approach, although it's worth pointing out that anyone can implement Quite OK while both the approaches you mention are not really friendly to open source.


S3TC is patent-free as of October 2017, and is implemented in open-source graphics drivers.


Are the best available compressors for the format open source as well? That would be news to me, although I admit I haven't looked into the format in quite some time.

More generally, my point about video games was just that you can frequently cut their size by 50-75% just by turning on file compression on the game directory. Developers are obviously missing some easy wins in this area - including huge wins with lossless compression alone.


basisu (Binomial's Basis Universal GPU Texture Codec) is pretty much state of the art here. Open source, offers excellent compression and very quickly decodes to pretty much all GPU native formats.


basisu is really impressive, but it’s not perfect for all use cases.

I tried using it in a WebGL game, but found that a) decoding was too slow, and b) the quality was too low.

PNG is much bigger, so you’d think it wouldn’t be good for a web game, but once an image has been downloaded and cached, the most important factor is the decode-and-upload time. On the web, PNG and JPEG handily beat basisu there.

In a different situation, basisu could have worked out for me. On native rather than web, maybe the decode time would have been fine. With more photographic rather than geometric assets, maybe the quality would have been fine.


The advantage of using a native hardware compression format is that you save on GPU RAM, which is especially useful for web which may display on mobile systems.


That’s true, but it’s a trade-off rather than a straight win. Hardware formats do have a few disadvantages:

1. Larger size on disk

2. Reduced quality at all MIP levels

3. Complexity of each platform wanting its own special format.

Basis fixes 1 (by adding an extra layer of compression) and 3 (by transcoding to almost any format at load time). But it doesn’t fix 2, and it adds another downside -- decompressing and transcoding is relatively slow.


PNG/JPEG are not natively understood by GPUs, though. GPUs have compressed formats they natively understand, and basisu can transcode to them quickly. You'll be burning a lot more GPU texture memory as you'll be asking the GPU store raw image data instead of compressed image data.


Look at Rich Geldreich's bc7enc_rdo.


Use Universal Basis


> I'm not sure if this is common or not.

> Side note: video game assets are often enormous in size because developers refuse to implement even very basic compression

Be generous; you even said in your previous sentence you don't know if it's common or not, so how do you know developers refuse to implement basic compression? If it's that easy, I'm sure there are plenty of AAA game studios (including mine) that will hire you to just implement basic compression.

There are enormous tradeoffs to make that are platform dependent; read speeds from external media are _incredibly_ slow but from internal media may be fast. Some assets are stored on disk to mask loading/decompression. Lastly, assets are just enormous these days. A single 4k texture is just shy of 70MB before compression and a modern game is going to be made of a large number of these (likely hundreds of them), and many games are shipping with HDR - these are multiples of the size. That's becore you get to audio, animation, 3d models, or anything else.

My current projects source assets are roughly 300Gb, and our on disk size is 3GB or so. My last project was 5Tb of source assets and 60GB on disk. Of course we're using compression.


> Be generous; you even said in your previous sentence you don't know if it's common or not, so how do you know developers refuse to implement basic compression? If it's that easy, I'm sure there are plenty of AAA game studios (including mine) that will hire you to just implement basic compression.

The thing I said I didn't know whether it was common was keeping assets in compressed form in memory. I admit I don't know much of the specifics of how game rendering works. What I do know something about is the extremely poor compression applied to many video games in their on-disk form. I'm willing to grant that your studio may indeed be an exception to this, but the general principle isn't possible to deny. Just by enabling lossless Windows file compression on a game folder, you can frequently see the size of a game on disk drop by 50% or more, as I discuss in my comment here: https://news.ycombinator.com/item?id=34042164

Surely a lossless format designed specifically for encoding assets would be even more effective and fast than generic Windows file compression!


I was trying to explain to my son the other day when he was complaining about the Xbox without the drive slot. I asked him how big of a download Fortnite or call of duty were - I think he said 60 gigs and then I told him blu ray tops out at 50gb.

He understands why the Xbox doesn’t have the cd slot anymore.


Complex games used to be delivered on multiple floppy disks, later multiple CDs. I don't see the capacity of a single Blu-ray disc being a limiting factor.


One of the big limiting factors for a game like Fortnite is that the assets change significantly over time. Textures, materials, animations, audio have all changed dramatically over the last few years. There's likely very little from the initial install that is still usable, and thats before you get to file load order and fragmentation.


I don't recall having to swap discs on PS1/2/3/4 at all.


In addition to other examples given, Metal Gear Solid 1 memorably has a disc-swap.

Games heavy on prerendered video frequently were multi-disc: Final Fantasy 7 is three discs, 8 is four, 9 is four... Xenogears is multiple discs, as is Chrono Cross. Lots of JRPGs, though not exclusively: Riven was on 5 (!) discs, again a case of a lot of prerendered content.

It was pretty rare on PS2 with DVDs, and as far as I know totally eliminated on PS3 with Blu-ray (Metal Gear Solid 4 has/had an "install swap" where it would copy over from the disc only a chapter at a time, but all coming from one disc).

Then the concept made a comeback by the tail end of the PS4's life, with several games having "play" and "install" discs, though these aren't quite the same experience, as you just use the install/data disc once then put it away. Fittingly one of these PS4 multi-disc games is the remake of Final Fantasy 7 (one that covers only a relatively small portion of the original, to boot).


Offhand, many final fantasy games shipped on multiple discs, as did armoured core. Gran turismo and oddworld.

The last of us part 2 and GTA 5 both needed multiple discs on PS4.


PS1 had dozens of games that shipped on multiple discs, PS2 had somewhat fewer but the ones that I remember off the top of my head are Xenosaga 2 and 3.


FFVII shipped on 3 discs. Most of it was cutscenes and you better believe they were compressed to hell.


It doesn't work as well for games like fortnite. Matchmaking map playlists tied to specific discs? Sure, it could be done.


What performance penalty? Decoding is often faster than moving more bits from storage.


In the first paragraph they say:

> QOI has become my default choice for embedded image assets.

Embedded means they are working on stuff with low amounts of storage, computing speed and memory. If you are developing for embedded you are developing for electronics products typicyally. Every byte you can shave off can ultimately increase the return on investment. Using a traditional image format on embedded might not be possible because you don't have the program memory to store a complex decoder. If you decoder is 10 times bigger than the assets it is meant to decode, maybe there is not so much benefit in using it.

The simplest decoder you can go with would be just storing the value of the pixels in a bitmap and teading that out of that array. This however has the downside that you got no compression on the assets at all. If you have very simple color spaces (e.g. 1bit), tiny resolutions and few assets this might be an acceptable choice, but the choice gets worse as you get more assets, higher resolutions more channels or more bits per channel.

That means according to the author there is a space between just storing bitmaps and just using a png decoder where there were no good goto solutions before and they found a good solution with QOI.


> That means according to the author there is a space between just storing bitmaps and just using a png decoder where there were no good goto solutions before and they found a good solution with QOI.

What space? I feel like that's an assumption you're making and it might be true or it might not be, but even if you're right that's very vague and I want more specificity.


QOI lies between uncompressed images and entropy coded compressed images. It also happens to work heuristically quite well for such a simple spec. Maybe its defining feature is its simplicity, foremost (and as a consequence of that, speed, secondarily)


I know where it lies on the compression spectrum, I'm asking what use case prefers it.


Wherever the codec has to be as small as possible and do as little as possible while not caring about the payload size


"As little as possible" sounds like RLE or not compressing at all to me.


Article mentions images embedded in the application, not computers embedded in another device.


It reminds me of that Steve Jobs insult thread, it’s a technology looking for an audience.

The pitch as I can tell is that the specification is simple. So the guy who spends a lot of time writing code based on specifications likes it?

I guess that’s the niche audience. Maybe a better pitch would be “image format specifications should be more like QOI”.


Embedded.

We're currently fighting to gain 10-20kB in our binary.


Ah but will you trade that for 5MB in assets?


Eh, the assets can be streamed. The decoder resides in .text


Embedded can mean many different things, so it's still not clear what the challenge is..

But it sounds to me like you're trying to reduce the amount of code on the NVM of an MCU, where space is very limited, but you have slower off-chip (QSPI, SDC, ..) memory where you can store images?


Some people consider implementation simplicity as a virtue.

It's nice to be able to understand large parts of your stack at a source code level.

I'd like to see more minimal viable X.


A minimal viable Deflate decompressor (that you’d need for PNG) is not exactly complex[1], although slower than mainline zlib.

[1] https://github.com/madler/zlib/tree/master/contrib/puff


Yeah, but that doesn't make any sense at all. So, don't listen to them. If any photo saves 50kb from compression you can easily see that thinking this is a mistake.


JavaScript developers tell me that disk space and network bandwidth are cheap these days. While they're mostly saying that to try and justify extremely bad architectural decisions (what's a couple thousand NPM dependencies between friends?), maybe we can apply that logic here. Maybe it's worth making every image file twice as big, if we can use a much simpler decoder implementation that's not a constant source of bugs and security vulnerabilities.


By using a bespoke decompressor? No thanks. Take a standard format and make sure all array bounds are checked. Data-oriented compression formats tend not to be very complicated and can be implemented with few instances of simple data structures.


I’m not saying I agree with them but I do sympathise with the argument.

Is old timers spent the first part of our career understand most, if not all, of the technology stack. It’s actually really empowering having that level insight. So I can see why people might still crave for those days of simplicity back.

It’s not not a decision I’d personally follow but I do see the attraction in their decision.


It's worth understanding DEFLATE, and then you can have the same amount of insight into PNG.


I’m no expert in this field but from what I’ve learned about PNG and deflate, there is a little more to PNG than just deflate.


Yes but you can learn those parts just about trivially.


PNG and JPEG are still well within the realm of what an average dev can understand, at least with some nice videos on youtube that give visual examples and make the math less opaque.


Its a tradeoff. Decoder is real fast and can be used in embedded contexts (or ported to languages simply). Encoder is also really fast which can be useful for game textures by example.


Given the context, simplicity and control. I assume part of the inefficiency can be reclaimed by compressing the assets separately, although the lack of filtering would hurt.

Encoding and decoding speed as well, because qoi is much simpler (simplistic even) it decodes at 3x libpng (which I’d assume is more relevant than the 20x compression for OP, though then encode might be more relevant for other applications but then IO trade offs rear their ugly heads).


The QOI benchmarks shows it only lagging a small bit behind PNG in compression ratio, while performing much faster (multiples) in both encoding and decoding. Doesn't seem like an awful trade-off. Surely you'd expect a much larger, slower and more sophisticated encoder and decoder would at least net 25-50% improvements to compression ratio, but it seems to net much less.

If you are artificially or truly constrained in either CPU usage, or maybe just static code size, it seems like it would have some appeal, even if niche.


> The QOI benchmarks shows it only lagging a small bit behind PNG in compression ratio

The benchmarks are misleading. By size, they mostly consist of photographs and other true-color images, which neither compressor handles well. This has the effect of hiding QOI's lackluster performance on synthetic images which PNG does compress well.

In particular, QOI performs dramatically worse than PNG on synthetic images containing any sort of vertical pattern. As a worst-case example, an image consisting of a single line of noise repeated vertically will compress down to basically nothing in PNG, but QOI will fail to compress it at all.


Yes, but the point is that it's not really that far off. PNG will outperform in screenshots with a lot of empty space, so if you were dealing with that situation, obviously PNG is better. OTOH, QOI still performs relatively well: while the difference is multiples in some cases, to me it seems that those cases are mostly ones where PNG's compression ratio is obscenely good, and therefore QOI is actually still producing reasonably small image files, too.

I can see that QOI performs well for what you might use as texture maps for 3D rasterization, for example. Definitely seems to have some applicability to me.


> The QOI benchmarks shows it only lagging a small bit behind PNG in compression ratio, while performing much faster (multiples) in both encoding and decoding.

Presumably that’s with stock libpng, which uses zlib. I wonder if anyone tried patching it to use the substantially faster (on x86) libdeflate[1] instead? It doesn’t do streaming, but you shouldn’t really need that for PNG textures.

[1] https://github.com/ebiggers/libdeflate


if you throw lz4 or zstd on top of it, of it, it actually can beat png


> if you throw lz4 or zstd on top of it, of it, it actually can beat png

PNG is literally just "perform some simple filtering on the image data, then run that through zlib and wrap it with some metadata". It shouldn't come as any great surprise that you can outperform PNG with a newer stream compressor. You'd probably get even better results by using PNG's filters instead of QOI.


LZ4[1] intentionally compresses worse than Deflate—where Deflate is LZ77 with Huffman in a bitstream, LZ4 is just LZ77 with a good choice of octet-aligned encoding. It’s “we’re saturating Gigabit Ethernet and have some compute left over” compression, not “we need to squeeze our backups as much as possible” compression. If filtering+LZ4 does better than Deflate, then you’ve chosen the filter well.

Curiously, for some kinds of serialized binary data varints or even zero-packing[2] can be a good enough filter[3].

[1] https://lz4.github.io/lz4/

[2] https://capnproto.org/encoding.html#packing

[3] https://news.ycombinator.com/item?id=24073752


> You'd probably get even better results by using PNG's filters instead of QOI

Did anybody ever try this? It would be quite interesting and does not seem too difficult.

The only thing I could find is https://github.com/catid/Zpng which does not use the normal PNG filtering.


So, if you throw another >100KB decoder on top of the 1KB decoder, you can beat png (haven’t verified). What’s the point.


I once ported an LZ4 decoder from 68k to my own toy CPU design - the lz4_depack routine weighs in at a whopping 72 bytes. https://github.com/robinsonb5/EightThirtyTwoDemos/blob/maste...


Also I could see it with zstd (which is a lot more than 100k, though very fast indeed) but I’m a bit doubtful about lz4, it’s generally worse than deflate (which png uses) at compression and qoi lacks filtering.


QOI's QOI_OP_DIFF and QOI_OP_LUMA opcodes can act like horizontal filtering once the results are fed into a stream compressor. There's nothing to provide vertical filtering, though, which is a pretty huge limitation.

I wonder if you'd get better compression using a zigzag scan?


there was some work done to find better scan arrangements. unfortunately, is pretty hard to do so without storing the previous row in memory which has a pretty large effect on speed


lz4 is also pretty small https://nigeltao.github.io/blog/2022/qoir.html is an implementation of QOI with some tweaks+lz4 and it's still pretty tiny


golf


Golf of what sort? It's hard to beat RLE or RLE+palette for minimal code size. I'd expect either one of those to be a tenth the size of a QOI decoder.


these do well on synthetic images but really poorly with noise. QOI manages pretty good performance even with noise.


That doesn't really explain the code golf answer.


QOI is in the same ballpark as RLE simplicty-wise but performs much better in a larger number of common cases. That's very much "golf".


I don't know about that. My estimate would be that QOI is ten times as complex as RLE and ten times less complex than DEFLATE-style compression. It could (barely) share a ballpark with either one.


I'm developing game engine + game and since several months I use the QOI format. This format is amazing in my situation: I can load the textures much faster compare to PNG. Generally, it is 2-3 times faster even if the size on disk is 1.5x-2.5x larger. Note: I use a NVME disk and I'm not sure my statement is still true on less performant hard drive.


Why are you using PNGs as a texture format? Generally, you'll use BC textures for the most part, or barring that, your images will be compressed as part of your archive format with some sort of super-compression algorithm that compresses the pakfile block-by-block. Perhaps you're referring to your texture compression/conversion pipeline which ingests PNG from your DCC tool of choice?


Try Zstandard. It can decompress about four times as fast as zlib, so it should do very well on PNGs. Also give LZ4 a shot.


There's a run length for identical pixels but no run length for literal pixels.

Some images barely compress which means there should be many runs of uncompressed pixels. Even if only a few pixels long, not having a byte of overhead for each should improve size on these pictures. Maybe they just wanted it to win on throughput by having fewer branches.


Since when writing encoders and decoders in one sitting is a metric for a good asset format?


"linear" is not a color space. Presumably they mean something like, sRGB but with linear gamma, but without understanding the design goals of this format it's hard to say what purpose the linear encoding serves.


IIRC it is a late addition and only added because of people wanting some kind of color space guarantee. I don't think it was ever accurately specified what they do mean---presumably both share the sRGB color gamut and only differ by transfer functions, but it's a frequent misconception [1] that transfer functions can be parametrized with a single value, gamma, so I guess that's where the "linear" came from (implying gamma = 1.0).

[1] The sRGB EOTF has a nominal gamma of 2.2 but it is actually a piecewise function with gamma varying from 1.0 to 2.4.


Compare how Netpbm declared[1] that colour images are BT.709 (HDTV) relatively late in its existence, and seemingly every other implementation ignores that and treats them as sRGB (same—or close enough—primaries, different transfer function).

[1] https://netpbm.sourceforge.net/doc/ppm.html


> I guess that's where the "linear" came from (implying gamma = 1.0).

"Linear" means the colors are stored in a state that means you can do linear transformations on them (basic math in other words) without loss. You can't do this with gamma corrected colors - for example as you say, the sRGB EOTF is a piecewise function so by definition it's not a linear transformation.

So yes, when talking about "linear" colors, they still come from some gamut like sRGB or a larger HDR gamut. The linear part means they are safe for use in a renderer's internals.


I've implemented color-managed workflows so I should have suspected that of course sRGB would be like that, it has to.


Yeah, if you have the slightest experience with color space you would know that's a misconception, but I've seen enough cases where the 2.2 gamma is uniformly applied throughout the brightness curve...


Pretty much all modern 3D rendering is done in this color space that supposedly doesn't exist.

Also, normal maps should be flagged as linear, since they should not get converted from the sRGB curve upon sampling.


Linear is a class of color spaces, so which one? It's like if I said some data is in "text format". Is it JSON? YAML? XML? The Unity docs[1] refer to linear vs gamma, not linear vs sRGB. Same with other resources I'm finding. They do call out sRGB as the standard non-linear space, which is consistent with what I'm saying: linear and gamma are classes of color spaces, so which linear space is it?

Normal maps aren't images for all that they have an intuitive visual representation. In particular they don't have colors therefore a color space has no meaning.

ETA: a better analogy is "plain text encoded": it's probably utf8, but it could be windows-1252, windows-1250, or who knows what else.

[1] https://docs.unity3d.com/Manual/LinearLighting.html


Rendering pipelines are in linear RGB. Which has rgb(1,1,1) and rgb(0,0,0) the same as sRGB, but with 1.0 gamma.

When loading images into the GPU, you flag them as either sRGB or linear. When sampling from a shader the GPU will convert sRGB to linear (this is a hardware feature because many images are in sRGB space), and not make any changes to textures flagged as linear.

Rendering is done in linear RGB space in properly coded game engines, and the final output is generally also linear (with some post processing magic to stay in boundaries). The final swap will convert it to sRGB or whatever your display profile is set to, this is handled by the display driver as well.


You're slightly confused I think. We operate and perform lighting in linear color spaces, but the choice of white point and primaries offers a few degrees of freedom. LDR renderers might opt to stay within linear sRGB primaries, but it's easy to clip against the gamut boundary if doing HDR lighting. A color transform is needed to go to Rec2020 or wider.


Possibly, I might be. (Hence, post processing magic, regarding white points and clipping.) Wasn't the final color space transform handled by the driver? I don't recall having to do any manual color transforms to present a linear RGB render target correctly on an sRGB display, at least. Haven't had the chance yet to experiment with high gamut displays, unfortunately, so I might be missing something.

Fortunately I mostly just deal with embedded graphics that do their technically incorrect blending math in unmanaged sRGB/display space. (:


Ok so, the final display gamut, initial source color gamut, and working gamut are three separate spaces. Effectively, the gamut defines where your primaries are, which are the positions of the unit RGB cube within a larger volume. These three working spaces can be all completely different, and the only thing to ensure is that you are consistent about transforming from one space to the next. It's possible to do operations that take you outside the final gamut, after which you clip or do some other operation. There are no hard and fast rules, although some approaches will certainly look much better depending on lighting conditions (color temperature) and so on.


Does that just mean that the color space is untagged / undefined? (That is in practice how I see a lot of games rendered - way oversaturated on my wide gamut screen.)


In practice, linear texture means "don't mess with the data when sampling", and sRGB texture means "please convert to linear when sampling" (this is a hardware feature).

On modern engines, rendering is done in linear. A lot of older engines render in unmanaged maybe-sRGB and look like potato, in which case tagging the textures does nothing.

In general either the final render output is linear or "I don't care" probably sRGB, in the first case it's up to your display driver to do the final conversion to your display color space, in which case colors should look correct, but the game engine may have already limited colors (by some post processing magic) within sRGB boundaries if it's not aware of your display range. Or it may have messed up if it can't make sense if the display range. In the other case, I'm not sure if there's a solid definition, the driver will probably assume the color space is already good.


For normal maps, the correct color space is definitely "no color space", because the R/G/B channels are used to store a 3D vector and not an actual color.

For video game graphics, the ideal color space is "looks good on the median user's television once we apply our gamma curve", which is not actually a real color space. It's more like a statistical distribution of color spaces.


Normal maps, certainly. I was more confused by the parent's claim that this applied to "pretty much all modern 3D rendering".

With video game graphics I'm surprised that PC games aren't handled differently. To the extent that TVs aren't garbage (huge assumption) traditional SDR TVs mostly target sRGB. But PC monitors can be almost anything, there are a ton of wide gamut (e.g. P3) screens out there, and saturated games can be retina-searing on them.


It'd probably be reasonable to say the output of a game's 3D renderer is sRGB or something, and to call the eye-searing colors on your P3 monitor a bug.

But I think kaetemi's comment was talking about the intermediate calculations, which don't usually use sRGB. The "linear" color space used for those doesn't need to have a rigorous definition; even if there's no mathematically correct way to map a video-game-linear color to an sRGB color, the game company just hand-tweaks it until the game looks good.


The output is either linear RGB in modern pipelines or unspecified probably-sRGB for older pipelines (which usually have wrong lighting math).

It's the driver that does the linear RGB to display space conversion when the engine swaps the render buffer.

In the case of non-specified probably-sRGB color space I have no idea if it converts from sRGB to your display space, or if it just does nothing because it not specified.


It sounds like gp's issue is only with games, i.e. everything else is rendering correctly. Which sounds to me like the video driver isn't applying the correct output profile (but the OS does so regular applications work).

Can you actually specify a color profile in the driver itself? I haven't had a color managed workflow in forever but I remember only ever being able to tell the OS.


Yeah, I believe the issue here is that games are not usually color managed because that would require tagging the rendered image with a space and then converting it, which would probably delay the render by at least a frame.


Presumably they mean something like, sRGB but with linear gamma

Yes, it’s exactly that, because:

- almost all the time, the display is sRGB, so that’s your gamut;

- almost all the time, you want to do your lighting and compositing in a linear space.

- and sometimes your channels are normals or something, not colors at all, as others have noted.

So it’s linear in the sRGB 0..255 gamut.

It is a bit of a shame that most easy-to-use tools and workflows are limited to sRGB, so it’s really fiddly to support HDR displays and print.


> almost all the time, the display is sRGB, so that’s your gamut;

This is becoming less and less of a safe statement. On the one hand "yay HDR"!

On the other hand, we're in for many years of weird bugs. For example recently I've been working on an app that I want to look good in HDR and I tried to share screenshots. Everything looked good to me. People on the other side were complaining about weird colors. Several hours of investigation later I realized that Windows was switching color spaces in my screenshots. Pressing the print screen button gave me a subtly different result than what I was actually seeing and of course I didn't spot it when emailing off the screen shots.


I'm reading this on an OLED display with about 90% of the gamut of Rec.2020.

It's just a laptop, nothing special.

Restricting colour to sRGB is going to be the CGA/EGA in a VGA world of the future.


I hope you’re right!


My fav format is whatever is widely supported so I don't have to bash my head on the wall dealing with all kinds of conversions and other issues.

My least fav is when some system i have to deal with creator chose their superior fav obscure format that no one uses and now i have to deal with it too.


Back in the (DOS) days, Targa/TGA was somewhat popular for the asset usage. Simple format plus RLE for a small size benefit.


Amazing how few people do some basic research - this isn't new or exciting at all. TGA is fine and has been around for decades.

It also has the added benefit that many applications still support it natively. I really don't get the appeal of inventing yet another custom file format that has so few objective benefits.


Unlike QOI, TGA only has RLE as a means of compression, and therefore has worse compression on many images than QOI. At the same time, TGA supports a much larger range of format variants (arbitrary color depths, color palettes, grayscale mode), making it much more complex to implement than QOI.


And what exactly stops a decoder from simply disregarding unnecessary format variants?

If you go as far as to implement your own decoder anyway, this argument is simply not applicable. Supporting only a subset of the format is perfectly acceptable - see TIFF, for example; not every application supports every possible feature.


You still need extra code to check that the format variant matches your expectations. I guess you could go very bare-bones and just assume that it’s in the right format and only check for buffer overflows, at the cost that errors will be harder to diagnose when the format is wrong. And you’d still have the overhead of the larger metadata structures of TGA.

The fact is, none of the existing formats is a perfect fit for the embedded use-case here. It makes sense to introduce a new, simple and straightforward format if it addresses certain typical use-cases better than existing formats do.


It's a cute format. Better than most RLE images, if I remember correctly, getting you a good way towards PNG.

So might not be best for every situation, but why not have that in your toolkit? Especially if being able to understand all of your stack is important to you.


> With error checking removed, my decoder is ~600 bytes of x86 object code — a great story for embedding alongside assets.

Soon in an IOT device near you. Ready to be exploited.


Is RGBA order really less common than ARGB?


Depends on your platform, there are some platforms where the compositor/hardware demand ARGB and some where they demand RGBA. As a result if your backbuffer/etc are not in that format you eat some overhead.

This isn't a big justification for it in a texture format, though. A channel swap isn't that expensive to do at load time and the driver can often do it for you during the upload depending on the API you're using. If you really want to load blazing fast your textures should be pre-compressed in hardware formats that you can mmap in, not stored in QOI or PNG.


Without sacrificing much simplicity channel swap can be made even more inexpensive with PSHUFB on x86 or VTBL on aarch64.


Sounds like a great way to lose your files in 10 years unless you re-implement the decoder after wading through link rot.


The size of decoder is so small, so one can append the source code of decoder to most images and not make them much bigger in the process.


But what if the decoder rots?


You mean standalone C99 code becomes non-compilable? Absurd.


The author notes his decoder is 100 lines of libc-free C code: these files are safe!


Why wouldn't they just use TGA? Just as simple, more flexible if need be and understood even today by most image manipulation programs and 3d modelling software.


I spend some free time reverse-engineering obscure game formats, and my observations is that this is universally true of everything.


It’s amazing how unaware people are of lossless WebP. Compared to PNG, it decodes faster, compresses better, with a minor code size increase. When it comes to games though, BC7 all the way …




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: