QOI: Lossless Image Compression in O(n) Time

sbrki · on Nov 24, 2021

Very nice.

Regarding the comments about complexity of existing formats - I agree, with the exception of PNG. In terms of complexity, QOI is very similar to PNG filtering phases, i.e. when you strip primary compression phase (Huffman coding) from PNG.

Also, I admire how the author combined a lot of common and cool tricks into QOI:

- Block tags form a sort of a prefix code, making it easy to differentiate block types with differing sizes

- Look-back array is used (common trick from LZW family)

- Delta-coding is used when storing pixel diffs

- In the end, if nothing else works, full pixel value is written

Also, the fact that everything is byte aligned is very appreciated.

This is a breath of fresh air and really impressive how good performance (both processing and size reduction) this gets with so little (very readable!) C.

superjan · on Nov 24, 2021

Traditional compression research usually prioritized compression ratio, so more images would fit on your floppy disks and were faster to download over 28.8k modems. It usually means increased complexity.

Png has it’s own story as it’s #1 design goal was to avoid techniques that were patented at the time. So they combined a simple set of prepasses with zip/gzip compression. There were better techniques known at the time, but the idea was that combining simple, and known patent-free solutions was the safest way to achieve a free format.

BlueTemplar · on Nov 24, 2021

The most important is probably the difference between lossy and lossless compression...

(The GP-mentioned Bink is lossy ! Then he also for some reason uses a picture tending towards photo realistic which does NOT play well with lossless !)

superjan · on Nov 24, 2021

Yeah there’s a difference in techniques used for lossy vs lossless. But my point was that in compression research, runtime performance was much less important than compression performance, that was no different for lossy compression.

jijji · on Nov 24, 2021

speaking of runtime performance if you look at lzma for instance it's horrible on runtime performance but it produces some of the best compression ratio of all of them

account42 · on Nov 25, 2021

lzma usually decompresses significantly faster than bzip2, which is what it mostly replaced in e.g. package distribution. It also decompresses fast enough that for slow enough connections download + decompression is faster than deflate. Compression is slow though, yes.

khoobid_shoma · on Nov 24, 2021

Yeah, really nice and efficient C programming with a great idea!

jiggawatts · on Nov 24, 2021

I know you don't like "design by committee" and overly complex formats. Also, the compression algorithm itself is somewhat independent of the "container format" that it's used with.

However, if I may make a suggestion: If you do end up rolling your own container format for any reason, always include a way to store the colour space of the image!

Treating RGB images as arrays of bytes without further tags is like treating char* as "text" without specifying the code page. It could be UTF8. Or Latin-1. Or something else. You don't know. Other people don't know. There's often no way to tell.

Similarly with colour spaces: sRGB, Display P3, and Adobe RGB are just close enough to be easily confused, but end up looking subtly wrong if mixed up.

cyber_kinetist · on Nov 24, 2021

I think the author's intending to use this only as a data packing format (ex. for app/game development) and not a general-purpose exchange format. If all you want to do is pack image/texture data for your game then I totally get why they assume RGBA8 storage and leave out metadata such as color space information.

jameshart · on Nov 24, 2021

Right - this thread of people asking for colorspaces, dpi data, arbitrary metadata collections… all are exhibiting precisely the ‘expert group’ committee mindset that this format is supposed to avoid.

If you want all that, there are formats galore that support it! The point is more: where are the formats for people who don’t need all that? Here is one. Please don’t try to ruin it.

alkonaut · on Nov 24, 2021

You don’t need to support a ton of complex stuff, but if an rgb image format doesn’t then it should just have a version marker and the spec can simply say “the rgb data represents sRGB colors”.

Just like e.g. some compilers don’t support a lot of source file encodings but their non-support consists of clearly stating that a source file is UTF-8.

Note that this requirement is only for an image exchange format. A raw-er image format used internally in e.g a game doesn’t need it as badly.

But in an exchange format every user needs to know whether the image data is sRGB or Adobe. Because the reader didn’t encode the data.

BlueTemplar · on Nov 25, 2021

Of course a closer equivalent here would be a source file only supporting (zero-padded) ASCII... so quite limiting in terms of real world needs !

sitkack · on Nov 25, 2021

Because this thing looks like it could fill the same need as ProRes while being simpler and open source. For the applications are you are talking about, the format cannot be ruined because it is an internal implementation detail, in the same way that the stb_* libraries can never be ruined. They are yours and everyone's for forever.

The GPs advice about recording color space is spot on. Why design a new container when there are tons of ones off the shelf ones. The collection of files inside a Zip is my goto.

pornel · on Nov 24, 2021

I know games tend to assume color spaces from context, and do other hacky things like reusing alpha for height maps or RGB for vector coordinates.

But this approach is a source of endless conflict with image processing tooling, because nobody except the game author knows what the channels mean. Game developers keep complaining about image processing tools doing anything other than bytes-in-bytes-out on that data (e.g. clearing "unused" alpha or treating RGB as human-perceptible colors), because it breaks the unstated assumptions.

dv_dt · on Nov 24, 2021

Yup embed enough metadata so you can actually nicely automate/integrate doing the right thing with tooling.

zcw100 · on Nov 24, 2021

I wonder if color profiles will be important to images used for AI. Right now, as far as I can tell, people generally ignore details like that but I wonder if someday people are going to say, "oh wait, we've been completely ignoring color profiles. We could have been using that" or "we didn't realize how ignoring that messed things up"

kortex · on Nov 24, 2021

I'm in the computer vision space and have looked into the effect of color spaces and encodings.

People had basically exactly the epiphany you described. For example, there are several color models which try to be more lighting-invariant or (racially) color blind (this is a very similar problem, humans are all more or less "orange" hue-wise due to melanin's spectrum). TSL is one such space, there are others.

https://en.m.wikipedia.org/wiki/TSL_color_space

dr_zoidberg · on Nov 25, 2021

Quick question, from the context and how the wikipedia article was written I assumed TSL was fairly new, however looking at the dates of the published papers it all seems be late 90's early 2000's. Is this still in use somehow? I've never heard of it before, even though I've dabbled a bit in image processing from time to time in my line of work.

Const-me · on Nov 24, 2021

Videogames don't need color spaces, but they have different compression formats on GPUs, like S3TC, BC7, or ASTC. These things save not just storage, also VRAM bandwidth.

account42 · on Nov 25, 2021

Videogames do need color spaces once you want to do any advanced enough lighting, it's jus that the only ones you want are sRGB or linear (floating point) so it can often be assumed by context.

S3TC, BC7, and ASTC are all lossy compressions so they are really not comparable.

BlueTemplar · on Nov 25, 2021

Why wouldn't they need color spaces ??

A lot of people don't want to play games that look bad - I'd assume that this is going to be a particular issue when a game goes for photo-realistic graphics, only to end with people with orange skin colour on lots of monitors...

pulp_user · on Nov 26, 2021

What are you even talking about? Using sRGB won't make skin magically orange.

pixel_fcker · on Nov 27, 2021

It will if the output is expecting RGB in Rec.2020 or P3 and you give it sRGB data…

Asooka · on Nov 24, 2021

I work in a VFX adjacent space, so I have some pretty strong feelings about what data should be stored in the colorspace tags. The colorspace data should be an arbitrary string. I would also like to have "usage intent" tag for whether this is color or non-color data. You would think that the colorspace would tell you, but people do silly things like save specular maps as sRGB. For completeness I would want the following fields:

- colorspace - arbitrary string, the colorspace of the color data on disk.

- rgb_primaries - arbitrary string, indicates the physical meaning of the linear RGB values.

- transfer_function - arbitrary string, indicates the function to use to linearise the values from disk into the RGB primaries.

- intent - one of display, texture, data, or an arbitrary string. Display images get tone-mapped, textures do not, data only gets linearised, even if it has rgb_primaries attached, other values only have internal meaning to the concrete art pipeline.

For the first three, rgb_primaries and transfer_function take precedence over colorspace. The reason I want them is because people do mix and match. You could say that you can put the same data in some string format in the colorspace tag, but then everyone will invent their own slightly different way of specifying it and interoperability will suffer. Best to just force the values to be unstructured.

A lot of arbitrary strings, but that's how it is - there is really no standard list of colorspace names. These four fields can cover pretty much everything I have seen people do.

Though currently a lot of people just tag the filename with a colorspace suffix and call it a day. OpenColorIO even has a function to directly get a colorspace from a filename.

sam0x17 · on Nov 24, 2021

I've never understood the need for color profiles on images themselves. On monitors and displays, sure, those vary, but why allow people to specify a color profile on an image, thus making images no longer a ground truth for pixel value? It seems like the main use case is dealing with images that already have color profiles. What is the point?

dahart · on Nov 24, 2021

The point is to use bits effectively and not waste them, by accounting for what usually happens on most displays and in our eyes & brains. If we don’t account for it, then (for example) a 1-bit difference between two bright colors means something very different from a 1-bit difference between two dark colors.

So what is “ground truth” exactly? We need to know what units we’re talking about; is it lumens, or “tristimulus” values (what we want our eyes to see), or LCD monitor voltages, or something else? First we have to agree on what ground truth means.

A big reason for common formats like PNG and JPEG to need color profiles is because 8 bits per channel is not enough color resolution unless you store the colors non-linearly. 8 bits is barely enough (or some would say almost enough) to represent low-dynamic-range images on TVs and monitors even when you have the colors gamma encoded or use a perceptual color space.

The problem with storing “ground truth” values for pixels in 8-bits-per-channel formats is that many experts would probably say this implies a linear color space. And a linear color space encoded in 8 bits leaves noticeable gaps between colors, such that you can see color banding if you try to render a smooth gradient. You lose up to 2 bits of color precision in some regions, notably the dark colors.

So, in order to not have color banding, we correct the colors before storing them by trying to make the colors have even steps between any 1-bit change in the spectrum. The color profile is just a record of what was done to correct the colors, so someone else can un-correct them as needed.

You might think we could do away with color profiles for high-dynamic-range images, but the non-linearity of our vision and our displays means that linear encodings are still inefficient with their bits even if you have 32 bits per channel.

kaetemi · on Nov 24, 2021

Because your pixels need to transform from the image color profile to the display color profile.

By default this all falls back to sRGB, which is not the "ground truth pixel value" you may be assuming it is.

sam0x17 · on Nov 24, 2021

Right, which begs the question why on earth would there be more than one image color profile. The image color profile should just be implied and the same everywhere.

kaetemi · on Nov 24, 2021

For correct editing operations, and correct display. Sadly, it's not pedantry, it's reality.

The 0.5 value is not half the brightness of 1.0 in sRGB. It merely means half the power output to your CRT.

If you want correct editing operations, you need to work in linear RGB.

It gets more fun. What is white? The color of your backlight? The color of white paper under your particular desk light?

And what is black? Is it pure darkness, or the black ink of your printer? Does your printer represent pure black, or dark gray?

We're stuck with images stored in the display profile of old CRTs by default, because that was the most practical option at the time.

sam0x17 · on Nov 24, 2021

> We're stuck with images stored in the display profile of old CRTs by default, because that was the most practical option at the time.

The analogue in photography is photographs lose color clarity and fade as they age. Why should I care if this is the case in images as display technology evolves when this is already a problem with every physical medium?

> For correct editing operations, and correct display. Sadly, it's not pedantry, it's reality.

I've been designing images and graphics for the web and sometimes for print off and on as a regular part of my various job functions since ~2009 and I have yet to see a situation where a color profile on an image isn't a huge pain and something that needs to be stripped away to get seemingly correct results.

Back to my point, I still don't see the value of having a color profile applied to an image. Images exist in the ether as platonic ideals, and we try to approximate them with our various display technologies. Why complicate the matter by also having multiple "flavors" of the platonic ideal itself? When I say (255, 0, 0), I expect the display to show me its best approximation of red. When I say (255, 255, 255), I expect the display to show me something as close to white as possible (and at the brightest possible setting). When I say (0, 0, 0), I expect the display to show me something that looks as close to black as possible. It's up to the display technology to decide whether this means just turn off that pixel on the screen or disengage the backlight or do whatever, but at the end of the day it's just trying to approximate black.

This is complicated enough, why do I need an image that will look good on only one kind of display and will have it's greens look pink on other displays. Isn't having a color profile for the display enough?

cornstalks · on Nov 24, 2021

> When I say (255, 0, 0), I expect the display to show me its best approximation of red.

Which red? There isn't a single, true "red". And different materials produce different reds (my work's Sony BVM-X300's reddest red is going to look way different than your monitor's red). Not all displays use only RGB color primaries, too. For example, at Dolby's office in San Francisco they have a projector in their theater that uses 5 (IIRC) color primaries, not 3. 6-primary projectors exist. And other non-RGB displays.

> When I say (255, 255, 255), I expect the display to show me something as close to white as possible

Which white? D50? D55? D60? D65? D67? Something else? And yes, these different white points (and many others) are actually used in practice.

> (and at the brightest possible setting).

100 nits looks way, way different than 4,000 nits. Some monitors can do 10,000 nits.

> When I say (0, 0, 0), I expect the display to show me something that looks as close to black as possible. It's up to the display technology to decide whether this means just turn off that pixel on the screen or disengage the backlight or do whatever, but at the end of the day it's just trying to approximate black.

Which black? This might sound dumb, because we can agree that there is an objective "absolute black" (i.e. zero photons). But when an artist creates an image, the monitor they use has some version of black. If you don't account for that, the image may be distorted. Blacks can be crushed, for example.

An absolute color space exists. It's called XYZ. We could use it. Some image formats support it.

spc476 · on Nov 24, 2021

And the red in my left eye isn't the same as the red in my right eye. Yes, when the lighting conditions are just right, I see different hues out of each eye [1]. I have to wonder how one would (or should?) correct for that in an image file format.

[1] I think from a stupid thing I did as a kid.

cornstalks · on Nov 24, 2021

I think the best we can do with file-level metadata is account for only objective metrics. Subjective differences are best handled outside of the file metadata. The user can tune their display settings to match their preferences. This allows for correcting for physiological differences and stylistic preferences. If all files had correct metadata (for objective metrics), it would be a lot easier for an end user to tune the system to their liking because their end results would be consistent with their preferences.

dekhn · on Nov 24, 2021

took me a while to notice that I perceive color and light intensity differently in my two eyes. I think this is actually pretty natural (IE, it happens commonly?). Either way, I can also see polarization (haidinger's brush) which confused me a bunch when I was trying to explain what I saw and everybody else thought I was crazy).

formerly_proven · on Nov 24, 2021

> An absolute color space exists. It's called XYZ. We could use it. Some image formats support it.

XYZ is really annoying to work in though. ACES is a pragmatic solution here, quite literally: https://blog.frame.io/wp-content/uploads/2019/09/ACES-APO.jp...

Okay, fair enough two out of three primaries are imaginary colors, but nobody said you have to use the whole color space when your processing pipeline is meant to be using 32 bit floats. Delivery formats might want to be using a more real color space.

pixel_fcker · on Nov 24, 2021

ACES (AP0) is an archive format for when you want to be absolutely sure you’re not clipping any colours. As a working space it’s terrible and AP1 should be preferred, either as ACEScg or ACEScc

Dylan16807 · on Nov 24, 2021

AP1 is almost the same as rec.2020? I'm not sure what you mean by 'working space' but if preserving all colors was a primary or even secondary goal I definitely wouldn't cut myself down that far.

pixel_fcker · on Nov 27, 2021

AP1 is essentially Rec.2020, yes.

By working space I mean a space in which a user manipulates the data in some way, e.g. image editing, grading, compositing or rendering.

formerly_proven · on Nov 24, 2021

You're assigning a color profile to the image in your model - the color profile of the display. Hence no color conversion needed, as source and target match.

What "red" and "green" are has changed quite dramatically with different display technologies. A display designed to meet Rec.2020 can show colors that other displays literally cannot produce and the deviation between the primaries is so big that everything looks like garbage if you don't do a color space conversion. Take some sRGB content and display it on a DCI P3 display. Looks like shit. Humans look like crabs.

> On monitors and displays, sure, those vary, but why allow people to specify a color profile on an image

The sole reason why we have device profiles defining device color spaces is so that we can convert from the image's color space to the device's color space. If images don't have a profile assigned to them, you don't need a device profile.

So you either have both image and device profile, or neither. Just one doesn't make sense.

sam0x17 · on Nov 30, 2021

But if you were forced to always use the same color profile on images? Why allow more than one? Wouldn't that vastly simplify the problem, as then you just have to worry about the device profile, instead of worrying about variation on both ends?

kaetemi · on Nov 24, 2021

I get what you mean, but... the color profile of the image is the profile of the display where the image can be displayed without adjustment. Think of it as the display profile of the guy that sent you the image. The magic math transforms the image to your display profile. That means it will look exactly the same on both displays. If they both have a correct display profile.

If your green is going pink, then either your profiles are wrong, or your software is broken. Maybe it really is pink, and it just looks green for you, because you're ignoring color profiles.

But the fact is, most software is broken, and you should store images with the sRGB profile.

And also, you can actually calibrate consumer hardware, so that you can scan a photo, and reprint it. And the scan, display, and print will look exactly the same. (It's not the case by default, because consumer printers do what you do, stretch or fit the color space to the printer's. The Vivid and Natural profiles in the driver, respectively. This is a good default for documents, not for professional photography.)

sam0x17 · on Nov 30, 2021

Right, so sRGB should really just be the only allowed profile for images. That's my whole argument. There should be some standard profile all images should use, and then displays should deal with the work of converting things to a suitable display profile. Allowing more than one image-level profile just makes things way more complex for no perceivable benefit.

romwell · on Nov 24, 2021

>That means it will look exactly the same on both displays. If they both have a correct display profile

We can continue believing that Santa exists, or we can accept that effectively nobody has correct color profiles, and doesn't care either.

It's nice metadata you have there, would be a shame if I applied night mode to it at 6PM.

> also, you can actually calibrate consumer hardware

...with professional hardware that costs more than the one you have at hand.

Again, pretty much everyone will just tweak their monitor/printer settings until they get results that look OK.

>display and print will look exactly the same

Until you turn off the lights. Or until you turn on the lights (what color temperature is you light?). Or until the wind blows, moving that cloud over the sun, changing light temperature.

Or — wait for it — actually, that's all, while you've been waiting the sun has set.

The point being, all we can shoot for is FF0000 being what would be "red" for most, 00FF00 being "green", and 0000FF being "blue" — and then accept the "fade" of the digital image from one screen to another as a fact of life.

So, learn how to stop worrying and love unspecified RGB colorpsaces. Magenta only exists in your brain anyway [1]

Side note: did you ever think about how your recorded music isn't adjusted for the frequency response of the loudspeakers, and half the people will listen to it with the "BASS MEGABOOST" feature on anyway?

That's why the art of mastering exists. It's accepting the reality, and putting work into making it work with uncalibrated, crappy hardware — as well as hi-fi gear.

PS: have fun calibrating your vision to make sure that you don't see green where I see red

[1]https://medium.com/swlh/magenta-the-color-that-doesnt-exist-...

littlestymaar · on Nov 25, 2021

As you mention, our brain adapts pretty easily to varying lighting conditions in the real world, and that could also work on a screen[1], but the ambiant context is what matters: if you look at an image “A” in your said “undefined color space” after having spent quite some times looking at sRGB images for a while, then your image “A” would look absolutely horrible until your brain starts to adapt, like when you put sunglasses on or off for instance. The big difference being: with sunglasses, we have no choice but wait for our brain to adapt, but on a computer all the user would do is close the image.

[1]: even though for some reason I don't know, it works much less well: if you try and take pictures with the wrong white balance setting, the picture will look like shit no matter how long you look at it.

As a side note,

> [1]https://medium.com/swlh/magenta-the-color-that-doesnt-exist-...

This article is pretty terrible, as her author mixes everything up.

kaetemi · on Nov 25, 2021

You can get pretty close without expensive calibration gear, using consumer gear that has a reasonable factory sRGB calibration.

https://blog.kaetemi.be/2021/02/01/calibrating-an-inkjet-pri...

kaetemi · on Nov 24, 2021

The sRGB color #ffffff should never mean "the brightest white on this monitor", unless you're using an average CRT.

Just imagine you're using an HDR display, where the brightest white is as bright as the sun.

BlueTemplar · on Nov 25, 2021

Thankfully, "as bright as the sun" is waaaaaaay outside of what a monitor is capable of : SDR is (supposed to) top out at 10^2 cd/m^2, HDR10 at 10^3, Dolby Vision maxes out at 10^4 cd/m^2, but the midday sun is... ~10^9 !

makapuf · on Nov 24, 2021

OK but then when will you use that "bright as sun" color? If not, why provide them? If so, what color will you use ?

formerly_proven · on Nov 24, 2021

sRGB is quite literally defined as "the average 90s CRT in an office environment", i.e. full-white sRGB on a HDR display should be around 100 nits or so in those reference conditions (i.e. display set to "reasonable" brightness).

kaetemi · on Nov 24, 2021

VR displays. Probably the color of the sun in linear RGB color space for rendering purpose, then converted to the display color space.

adgjlsfhk1 · on Nov 24, 2021

The color you will use isn't in SRGB, but is in other color spaces such as Rec2020.

saagarjha · on Nov 25, 2021

Do you have a recent iPhone? Try taking a picture with the sun in it: the white of it will appear white as well, but the phone will display it considerably brighter than the white UI surrounding it.

spider-mario · on Nov 25, 2021

> This is complicated enough, why do I need an image that will look good on only one kind of display and will have it's greens look pink on other displays.

That is precisely what attaching a profile to the image is supposed to avoid. It indicates what (255, 0, 0) in that image actually means. Then you convert that to the display profile to get the triplet that you must send to the display to actually get something that looks like it.

> Isn't having a color profile for the display enough?

What would you do with just one profile?

account42 · on Nov 25, 2021

> The analogue in photography is photographs lose color clarity and fade as they age. Why should I care if this is the case in images as display technology evolves when this is already a problem with every physical medium?

Because with digital data we can do better?

Sharlin · on Nov 24, 2021

What does "255" mean? If you just take it to mean "max saturation that your display can output" then it's going to be perceived as a different color depending on the viewer's display. That is undesirable. And due to historical reasons (the capabilities of CRT monitors) the meaning of "max saturation" was de facto standardized to what we now call sRGB. If you want to encode a greater range of colors than sRGB then you need to be able to say that hey, this "255" is more saturated than what sRGB considers "255" and should not be displayed on sRGB without conversion.

AlotOfReading · on Nov 24, 2021

Because display technology can't reproduce the full set of colors we see and and moreover, we have a finite number of bits to encode the value of any pixel. A color space/profile is both a target for display manufacturers to produce and a way of standardizing the trade off between bit depth and quantization error.

floatingatoll · on Nov 24, 2021

Unfortunately, that's not the case in pragmatic reality.

When I say #ff0000, do I mean "sRGB red", as a TV would display in pure red phosphors back in the 1970s, or do I mean "1000K" / 700nm red (daylight is 6500K), which can only be reproduced by specialized displays?

Most people from the time before Apple made Display P3 wide color displays standard in all their products — so, most HN commenters — believe that #ff0000 just means "only red, no green or blue" — but all web browsers currently are left to invent their own answer for what #ff0000 means, and they do not all invent the same answer. Yet.

So it is with images.

In the beginning times, people made sixteen-color images for NTSC monitors, and then other people learned you could display synthetic colors that don't exist by mangling the image bitstream to the monitor, and those sixteen colors were hard-coded but varied by like +/-5% per device due to acceptable manufacturing tolerances, so they were internally consistent anyways.

And so we end up, today, trying to solve color profiles for file formats that specify colors as RGB hex values — which are, by definition, restricted to sRGB and thus wildly insufficient. But if we plan too far, we get file formats that are totally ridiculous - TIFF comes to mind - that can represent anything under the sun, at the cost of having a 1:1 mapping of "pixels" to "bytes" or worse.

You may also find the history of the EXR format relevant here, as it supports a functionally infinite amount of dynamic range in images, and there are definitely good niche use cases for it — but pragmatically, it's ridiculously overkill for normal everyday purposes. You could argue that everyone should use EXR in Lab, but then someone would suggest Oklab (since it makes better perceptual compromises) or HSLuv (what a great name), or you could try storing raw CIE 1931 chromaticity coordinates, which has been deprecated by CIE 170-2 to address serious flaws in the 1931 system.

Tying this back to engineering, there are often very good reasons for designing a processor or a byte store to be RISC or CISC, big or little endian, and we simply cannot predict what will work most pragmatically for all cases universally ahead of time, any more than the metric system was invented early enough in time to prevent miles per hour from being a unit of measure.

mlhales · on Nov 24, 2021

A color space and color profile are not the same. The point is knowing what a byte represents in a byte array. Is it red, hu, saturation, alpha, brightness etc?

romwell · on Nov 24, 2021

The format in question specifies this though.

kortex · on Nov 24, 2021

No single color space is complete. Each has trade-offs. I would actually say that color profiles help get closer to ground truth, that is, rendering intent.

demute · on Nov 24, 2021

CIE XYZ is complete.

lstamour · on Nov 24, 2021

Isn’t that kind of like saying Real numbers are complete? Yes, it’s true, but we don’t have an infinite level of precision for each colour. So we have to, practically speaking, limit how many colours can be specified to some reasonable (and arbitrary) number, e.g. 10 million or 1 billion or more (in terms of number of colours in a colour space) and representation in digital bits. Also, since I conflated the two a bit, it is often helpful to restrict what the max range of a colour is so that you can get more useful data out of the same precision. It’s kind of like creating a dictionary to compress data, you can express more useful values with less data by choosing to display a subset of colours and by picking where the white point is for your display. These optimizations used to mean more than they do today, perhaps, but it can still take a lot of effort to transfer 10-bit or 12-bit raw pixel data at 4K resolution, 120 times a second… And it’s worth pointing out that while 8-bit is all you might need as precision for perceptually lossless HDR (with dithering), that kind of range isn’t good enough for colour editing work. It’s like starting a photo edit from the JPEG rather than the RAW file.

adgjlsfhk1 · on Nov 24, 2021

8 bit just objectively isn't enough for hdr. 8 bits is few enough that you can pretty easily see banding on a simple color gradient in sdr. Dithering helps a lot, but once you get to HDR 10 bits is pretty much the minimum.

Dylan16807 · on Nov 24, 2021

> Isn’t that kind of like saying Real numbers are complete?

It's more like saying the range from 0..2 is complete so don't use an encoding that caps out at 0.8 or 1.1

The difference between all visible colors and a more constrained color space is only a factor of two or three. Less than half a bit per channel.

BlueTemplar · on Nov 25, 2021

How do you explain then that even 10 bits aren't enough for a colour space like Rec.2020, which doesn't even feature all colours ?

Dylan16807 · on Nov 25, 2021

What are you comparing it with?

If you compare Rec.2020 with 10 bits per channel and sRGB at 9.5 bits per channel, both of them using the same peak brightness, you should see similar amounts of banding on both.

Increasing the maximum brightness can require a lot of extra bits to keep things smooth, but that's a separate thing from which colors are possible.

BlueTemplar · on Nov 25, 2021

Ah, I see your point, but I still don't understand this argument, because in practice not only we want extra brightness in colours around the usual white points, but also we need the extra potential physical brightness to extend the colour space itself in terms of "colours only" (not sure that this even makes sense ?) because of the differing needs of different colours.

(I'd assume that this is especially relevant for backlit displays like the liquid crystal ones, since they have to use the same potential maximum source power for all of the channels.)

Dylan16807 · on Nov 25, 2021

Gamut is affected by brightness but not that much.

Rec.709 has 10 and 12 bit modes too, and it needs them, even though it has the same gamut as sRGB.

The reason 10 bits isn't enough for Rec.2020 is because 10 bits was never enough to fully represent any gamut. Once you add enough bits to properly support a gamut, then extending it to match human vision is a tiny cost.

And to be extra clear, when I initially said "The difference between all visible colors and a more constrained color space", I was talking about taking a color space and expanding its gamut without changing anything else about it.

kortex · on Nov 24, 2021

CIE XYZ has imaginary colors, though.

I guess when I said "no color space is complete" I meant "no color space is faithfully bijective to nominative human visual sensory response." Because such a thing is basically impossible (or insanely nonlinear and unwieldy).

Also these color systems all or almost all assume a "norminal" (to steal a word from Everyday Astronaut, normal+nominal) average human vision system. If your system isn't norminal, average, human, or even visual (visible light), then your color space is an incomplete map of the territory.

So for any given array of pixel channel, there is either an implicit assumption or explicit color space of what those values "mean".

Dylan16807 · on Nov 24, 2021

Why does it matter if it's bijective?

kortex · on Nov 24, 2021

So GP's question was basically "why do we need color profile?" and the short answer is "to know what images should look like".

In order to know what an image should look like, a "perfect" or "complete" representation, at minimum the real colors must be injective to the color space (every value has at most 1 color). You could argue that the color space needn't be injective (values that have no color), which is probably fine if your image is static and you never manipulate it. As soon as you manipulate values of your image in your color space, you run the risk of falling outside the color domain.

But really to unpack this whole thread, my essential point is that color spaces are about engineering tradeoffs. sRGB can't reproduce many colors. CIE 1931 represents all color but has imaginary colors.

Ergo, I contend there is no "complete" color space.

Dylan16807 · on Nov 25, 2021

Yes, it has to be injective from the real world to the file format. It doesn't have to be bijective.

> As soon as you manipulate values of your image in your color space, you run the risk of falling outside the color domain.

If your manipulation could go outside the domain, then your manipulation is incorrect. Like, if I have a number from 0 to 99 that I store in a byte, it's fine that a byte is not bijective to those numbers, and if I end up with a byte representing 105, then I did something wrong that corrupted my byte.

If I'm doing incorrect math, a bijective color space doesn't fix the problem, it just guarantees that I get an incorrect color out the other end instead of possibly getting an incorrect color or possibly getting an error value.

BlueTemplar · on Nov 25, 2021

The point is that the average human ""real" color space" has a very weird shape :

https://upload.wikimedia.org/wikipedia/commons/a/a9/Visible_...

Good luck representing that with only 3 channels, without using "bytes representing 105" (aka imaginary colors), and trying to stay percepetually uniform !

It gets even worse : the real color space is non-Euclidean : it stretches out, parallel lines drift further apart, etc.

Dylan16807 · on Nov 25, 2021

> without using "bytes representing 105" (aka imaginary colors)

I think you misunderstood my point.

I'm saying it doesn't matter if certain byte sequences are invalid and don't correlate to real colors.

Imagine you're encoding two digit numbers into bytes. It's not bijective but that's fine, no correct two-digit math will give you an invalid byte.

Or imagine you're encoding numbers into bytes the way UTF-8 works. It's not bijective to all sequences of bytes but that's fine. No correct manipulation will give you a sequence that can't decode.

If you're trying to do incorrect math, you've already lost, and no color space will help. For example, overflowing your red channel in sRGB could turn your eggshell into a cyan.

> and trying to stay percepetually uniform

Where did that come from?

Zero encodings will be perceptually uniform, therefore being non-uniform isn't a disqualifier.

BlueTemplar · on Nov 25, 2021

Right, IIRC it's the "work" color spaces that are trying to go for perceptual uniformity, not the "storage" ones ?

contravariant · on Nov 24, 2021

If you agree with their model of human vision anyway.

makapuf · on Nov 24, 2021

Using one given color profile is part of the trade-off.

NL807 · on Nov 25, 2021

Simple question: what is the "ground truth"?

If you author an image on Display A and save its colours, then how does Display B know to render colours correctly that looks consistently with Display A?

You have two options:

1.) Tag Display A's colour space into the image, then convert colours to match Display B's colour space when rendering;

2.) Convert colour data from Display A colour space into a generic colour space (like sRGB, Display P3, ACEScg, etc) and embed that in to the image, then convert colours to match Display B's colour space when rendering.

fabrixxm · on Nov 24, 2021

It has data size in header: https://github.com/phoboslab/qoi/blob/master/qoi.h#L77-L82 Just put exif info after data :P

kortex · on Nov 24, 2021

Since we are talking container formats, I would recommend something in the IFF/RIFF family (which includes "true" Four-byte-type IFFs like RIFF, WAV, WEBP, and variants like PNG, MOV, MP4). They are about as simple as you can get as a container format. It has its limitations but it's basically the simplest "boxed" format you can implement - Type-Length-Value.

I would absolutely recommend basing on some sort of container format (vs "raw" data structures), it makes it really easy to add functionality without breaking backwards compatibility.

Basically all container formats follow this idea of "collection of boxes" pattern. The IFF series are box-length based. Ogg uses capture patterns but no explicit box size (I think). The former means you have to know the box size ahead of time (or seek back and write it in), the latter is better for streaming, but harder to seek (without indexes).

amelius · on Nov 24, 2021

Yes, also store the DPI.

In general, allow arbitrary key-value pairs to be stored.

mkl · on Nov 24, 2021

> Yes, also store the DPI.

I don't think that's a good idea. Most images, like photos and screenshots, don't have physical dimensions. Some image tools like to put in bogus default DPI values that are guaranteed to be wrong, and it can be a bit of pain.

amelius · on Nov 24, 2021

How else would you reproduce images at exactly the right size? Without DPI, the format would be useless for many scientific purposes.

But I get your worries. As a compromise, DPI could be optional.

account42 · on Nov 25, 2021

Reproducing images at the "right" size is a very niche requirement. Most images don't even have a right size, including all photographs unless they either use an ortographic projection (they won't) or the object is completely flat and parallel to the focus plane (with recilinear projection, which most normal camera lenses come close to). Having this as a standard attribute in a generic image format makes no sense.

pulp_user · on Nov 26, 2021

You are complaining about a format - with the stated goal to be very simple - not supporting some scientific usecase. What...? Use a different format then. That's not what this is trying to be. In fact, it's the exact opposite of what it's trying to be.

mkl · on Nov 24, 2021

Yes, optional is fine.

makapuf · on Nov 24, 2021

If you want any interop you'd better standardize some keys, allowing for extensions. Store those in key,len,data records and you have iff/ riff formats ... used by .avi and .wav formats.

grishka · on Nov 25, 2021

This is something I've never understood. DPI only makes sense if you need to accurately represent some physical dimensions in an image. Most of the time, you do not. You would just map it onto screen pixels somehow, most probably dictated by the UI layout around it.

Color profiles however, yes, those are indeed important.

em3rgent0rdr · on Nov 24, 2021

That metadata could be stored in the filename to keep it out of the actual file.

dylan604 · on Nov 24, 2021

No, the filename is the worst place to put metadata. Nobody can control what will happen to a file name.

em3rgent0rdr · on Nov 24, 2021

well nobody can control what will happen to a file's content either if you want to be pedantic.

If are worried about dumb users renaming it...I think most users know to not change things after a '.', so could just put the colorspace code there. And then could put another '.QOI' to end the filename.

If want to keep cruft out of the file itself, then putting it in the filename works.

dylan604 · on Nov 24, 2021

you're talking apples and kumquats. changing the name of the file (metadata in and of itself) does not affect the file at all. changing the contents of the file will have affects on the file. some changes might not negatively affect the file (like adding metadata when none was present) if the format allows for it. you can know the file will no longer be the same by running a hash on it. name changes will not affect the hash.

also, changing the name of a file is not like, uncommon. if it is an image sequence, some people prefer an underscore before the frame number, some prefer a dot, and some even prefer a space. some places have strict file naming policies, and file names will be modified to comply.

anyone depending on crucial information to be preserved in a filename is just a Sisyphean effort of control.

em3rgent0rdr · on Nov 25, 2021

the color scheme is not "crucial information". It is just meta-data. It could be contained in the file, or in the filename, or outside the file in a separate metadata file.

j-pb · on Nov 24, 2021

> To keep things O(n) when encoding, there's only one lookup into this array. The lookup position is determined by a “hash” of the rgba value (really just r^g^b^a).

Is a bit imprecise. The algorithm would still be O(n) even with a linear search through the "seen pixel array", as it is bounded by 64 length and therefore a constant factor that gets eaten by the Big-O notation.

Because of this hash the algorythm seems to do something else than first described though. Instead of "a running array of the 64 pixels it previously encountered", it's actually the previous 64 pixel _values_ previously encountered, which allows for much bigger jumpback.

By deterministically computing this array during execution it itself seems to act as a kind of compressed jumpback table, allowing for approximated arbitrary jumpback with just 6 bit.

Quite impressive!

Edit: I think there might be an interesting tweak that could increase the effectiveness of the hashing technique used.

If one were to walk and store the pixels in a Z-Curve order, the runtime would stay the same but the locality of the color value pool, might be increased.

sltkr · on Nov 24, 2021

> Because of this hash the algorythm seems to do something else than first described though. Instead of "a running array of the 64 pixels it previously encountered", it's actually the previous 64 pixel _values_ previously encountered, which allows for much bigger jumpback.

It's somewhere in between. Because these values are hashed into single slots, assuming that hash codes are random, the probability that a value is still in the table after k intervening values is pow(63/64, k).

That means the table is unlikely to contain all 64 values, but it will also retain some older values. Since older values are less likely to be useful in the future the encoding performance is likely somewhat worse than if the algorithm used strictly the last 64 distinct values.

This means that the algorithm is a bit sensitive to hash collisions, which seem particularly likely for primary colors (i.e. with each of the components at value 0 or 255):

  - Slot 0: black, ALL SHADES OF magenta (x ^ 0 ^ x = 0), yellow (etc.) and cyan.
  - Slot 15: white (255 ^ 255 ^ 255 = 255), red (255 ^ 0 ^ 0 = 255), green (etc.), blue.

Except black and white, these are unlikely to occur often in photographic images but could well occur in bitmaps with a handcrafted palette, or when the RGB values are converted from a system with a lower color space. For example, the 16 color CGA palette maps to just 3 different slots.

j-pb · on Nov 24, 2021

Yeah, one should probably add a randomisations step for each channel for good measure.

With random byte->byte lookup tables those costs should be hidden by the cache.

so hash = HR[r] ^ HG[g] ^ HB[b] ^ HA[a].

ball_of_lint · on Nov 24, 2021

In an image with very limited colors you would have long runs of the same color and it would compress well from just the run-length encoding alone.

Also, it's non-obvious that older values will be less useful. Going through the pixels in order like this doesn't preserve locality well, so older values may actually be closer to a given pixel than more recent ones.

MauranKilom · on Nov 24, 2021

> In an image with very limited colors you would have long runs of the same color and it would compress well from just the run-length encoding alone.

...unless those images are dithered. Which may be a relatively common thing to do in small color spaces!

tda · on Nov 24, 2021

I was about to post the same idea about a locality preserving pixel order. I was thinking about hilbert curves, never heard of z-curves. Would be nice to see if/how that affects the compression ratios. How surprising that a few simple principles can give quite a good image format

j-pb · on Nov 24, 2021

After making the edit, I saw that someone made the same suggestion on twitter!

Hilbert curves have slightly better locality than Z-curves, but Z-Curves are trivial to produce.

If you have a N dimensional space over unsigned integers, then you can place each point on a one dimensional Z-Curve by interleaving their bits into a new integer.

So for any point (X, Y) in the X-Y coordinates of an image the Z-Curve coordinate is X_1,Y_1,X_2,Y_2...

Which can be achieved with a bunch of bit shifts and masks, (or a bunch of wires if you happen to use it in an FPGA ^^')

phkahler · on Nov 24, 2021

I added Z-order curve to a ray tracer many years ago. Rather than using nested loops over X and Y pixel coordinates I used a single loop and de-interleaved the count to get X and Y. My thought was that this would increase the cache hit rate due to the locality of the pixels. It worked and the speedup was on the order of 10 percent. Because images are rarely power of 2 or even the same in both dimensions I opted to make a list of 32x32 blocks and rendered the blocks in Z-order as well as the pixels within (which still helped vs just tiling).

Locality is a good thing to exploit.

formerly_proven · on Nov 24, 2021

CPUs can do this quickly:

   __m128i src = _mm_set_epi32(0, 0, x, y);
   __m128i res = _mm_clmulepi64_si128(src, src, 0);
   uint64_t zeorder = _mm_extract_epi32(res, 0) | (_mm_extract_epi32(res, 2) << 1);

Though a blocked iteration order should perform better due to better locality.

j-pb · on Nov 24, 2021

Thanks for the pointer!

If anybody else want to know how this works:

https://stackoverflow.com/questions/30539347/2d-morton-code-...

https://en.wikipedia.org/wiki/Carry-less_product

user-the-name · on Nov 24, 2021

Hilbert curves are not that much more complex. Maybe ten or so lines of code. Interleaving bits is already a pretty complicated operations if you don't have builtins for it.

a_e_k · on Nov 24, 2021

I agree that de-interleaving is fairly complicated, but interleaving can be done fairly easily with a lookup from a table for each dimension and then combining them: xbits[x] | ybits[y] | channelbits[c].

The nice thing about this is that just by changing the tables, you can get things like z-curve, tiles, row-major, or column-major pixel orders (or even combinations), plus swizzled, interleaved, or planar layouts for the color channels.

mzs · on Nov 24, 2021

Do you have a link to discussions about QOI on twitter?

j-pb · on Nov 24, 2021

https://twitter.com/WAHa_06x36/status/1463456861978599426?s=...

olaulaja · on Nov 24, 2021

Based on some quick* testing using z-order instead of going row by row ends up taking 1-10% less space. Most images were on the lower end but I didn't run into any taking more space.

* Swapping to z-order requires about 5 lines of changes if you ignore all the little edge cases like non power-of-two image sizes. Images used in testing may or may not be representative of anything.

mkl · on Nov 24, 2021

https://en.wikipedia.org/wiki/Z-order_curve

Related to this method, it's sometimes used for texture mapping on GPUs.

ErikCorry · on Nov 24, 2021

I wonder how much worse the compression would be if you just had the last 64 colors, but didn't update when using the run length byte codes. Seems like it would be close to the same, and the decoder wouldn't need to hash the pixels.

st_goliath · on Nov 24, 2021

Out of curiosity, I modified the benchmark program to simply toss the pixel data into lz4/lz4hc for comparison (patch file: https://paste.debian.net/1220663/).

I'm actually quite impressed how the resulting size is a little bit smaller than lz4hc (actually not even that far away from libpng), while the encoding speed is quite close to lz4, despite seemingly not having as many hacks and tweaks under the hood to get there. So the encoder is IMO doing amazingly well actually.

However, the decoding speed of liblz4 is off by roughly an order of magnitude. But again, liblz4 probably benefits from decades of tweaking & tuning to get there. It would be interesting how much performance could be gotten out of the qoi decoder.

Here are the totals I get for the "textures" benchmark samples:

          decode ms   encode ms   decode mpps   encode mpps   size kb
  libpng:       2.2        32.4         58.67          3.94       160
  stbi:         2.1        17.0         61.50          7.49       228
  qoi:          0.7         0.7        191.50        170.54       181
  lz4:          0.1         0.6       1226.06        206.40       258
  lz4hc:        0.1        70.9       1029.26          1.80       200

And for the "wallpaper" samples. The spreads seem to be a bit more in favor of qoi here:

          decode ms   encode ms   decode mpps   encode mpps   size kb
  libpng:     131.9      2287.1         66.63          3.84      8648
  stbi:       147.5      1177.1         59.55          7.46     12468
  qoi:         56.3        56.5        156.13        155.60      9981
  lz4:         14.3        53.1        614.53        165.50     18019
  lz4hc:       13.9      1901.7        630.94          4.62     12699

willvarfar · on Nov 24, 2021

I believe (from quick code inspection) that the symmetry in encode/decode performance for QOI is because it has to generate the hash-table on decode too.

Normal fast encoders use some kind of hash table or search to find matches, but encode the offset of the match in the source rather than in the table. QOI is encoding the offset into the table, which gives much shorter offsets but means the decoder has to maintain the table too.

(The slow PPM compressors etc do maintain the same table in both encoder and decoder, and have symmetrical encode/decode performance too. See http://www.mattmahoney.net/dc/text.html)

Its not really in the spirit of of QOI to add the complexity, but I'd imagine allowing the encoder to specify how big the hash table was would be only a small tweak, and encoding literal blocks instead of pixel-by-pixel will improve handling input that QOI can't compress better.

I would be curious to know if planar helps or hinders. Perhaps even see what QOI makes of YUV etc. And I want to see 'heat maps' of encoded images, showing how cheap and expensive parts of them are, and which block type gets used. Yeah, going a bit beyond the spirit of QOI :D

And from looking at the code I'm a bit confused how 3-channel is handled for literals because the alpha still looks like it gets encoded. Would have to walk through the code to understand that a bit better. Is it going to deref off the end of the source RGB image? Etc.

(People interested in compression may be interested in the go-to forum for talking about it https://encode.su/threads/3753-QOI-(Quite-OK-Image-format)-l...)

zenogais · on Nov 24, 2021

Like the author said this is completely unoptimized. The natural next step in optimization might be to profile and then SIMD optimize the slow bits in compression and decompression. This would likely produce a significant speedup and may even bridge the gap with lz4.

phire · on Nov 25, 2021

The algorithm is extremely resistant to SIMD optimizations.

Every pixel uses a different encoding, 95% of the encodings rely on the value of the previous pixel, or the accumulated state of all previously processed pixels. The number of bytes per pixel and pixels per byte swing wildly.

SIMD optimization would basically require redesigning it from scratch.

sitkack · on Nov 25, 2021

SIMD only gets you up to the width that your hardware platform supports and every SIMD program has to be rewritten for the new width.

Two other immediate avenues are multithreading, which think could be quite effective for this algorithm or GLSL, of that I have no opinion.

injinj · on Nov 24, 2021

I believe libpng default zlib compression level is 6. libpng may use CRC (or adler) checksums, which account for around 30% of the decode time. In my experience with zlib, the higher compression levels have faster decode times. My reasoning at the time was (early '00s) that higher compression tends to code longer runs of copying of the dictionary. More time in the copy loop rather than the input and decode loop.

Trixter · on Nov 24, 2021

> liblz4 probably benefits from decades of tweaking & tuning to get there

Decade, singular. LZ4 was proper around 2011. But the real reason it's faster is because its not doing nearly as much work as QOI. LZ4 is just about as asymmetric as you can get.

eisbaw · on Nov 24, 2021

What about qoi followed by lz4?

sitkack · on Nov 26, 2021

Any possibility you could repost the patch? Was it pulled because of a bug?

a_e_k · on Nov 24, 2021

    QOI_DIFF16 {
        u8 tag  :  3;   // b110
        u8 dr   :  5;   // 5-bit   red channel difference: -15..16
        u8 dg   :  4;   // 4-bit green channel difference:  -7.. 8
        u8 db   :  4;   // 4-bit  blue channel difference:  -7.. 8
    }

I'd think that the extra bit ought to be given to green, since that tracks most closely with luminance. That would make this somewhat analogous to the RGB565 layout for 16-bit pixels (https://en.wikipedia.org/wiki/High_color#16-bit_high_color).

kasper93 · on Nov 24, 2021

Neat work.

Though I wouldn't be myself if I didn't have few nitpicks ;)

- You are mixing lossy and lossless compression in your rant. And you are mixing containers with formats. Thing about lossy compression is that it is by design stupidly slow and complex to produce smallest possible results which has best perceived quality. Compress once and trade time for both image and compression quality. Lossless is not that bad, although your simple solution is well, simple :)

- I feel like the benchmark suite is lacking. For better overview you probably should include libpng results with max compression level and lowest compression level. Lossless modes of AVIF and WEBP would be nice. (also could throw similar project to yours like https://github.com/catid/Zpng) Not saying the benchmark is bad, but IMHO doesn't paint the full picture. From quick test I got significantly better compression on libpng, ofc in expense of time, but you didn't configure libpng for speed either. So we have some results, but they are not really representative imho.

- I understand it is pet project, but don't forget about importance of image metadata, like colorspace and so on.

- Also keep in mind that compression is easy at the beginning, but the tricky part is how to squeeze more.

EDIT: formatting...

t0rakka · on Nov 26, 2021

I wrote a small test bench for my own uses as wanted to compare with different codecs and see how "raw compression" would work, how would QOI+ZSTD combination work and so on.

Here's example output..

  image: 1502 x 1800 ( 10560 KB )
  ----------------------------------------------
           encode(ms)  decode(ms)   size(KB)    
  ----------------------------------------------
  qoi:          31.2        21.6       2452  
  qoi+zstd:     46.7        19.3       1832  
  zstd:         47.4         8.7       2742  
  lz4:          14.1         3.0       4108  
  png:          92.4        18.0       2795  
  zpng:         24.1        13.3       1394  
  jpg:           2.7         1.8        517  <-- lossy
  webp:        199.6        19.1        227  <-- lossy

SkeuomorphicBee · on Nov 24, 2021

Really cool encoding, it is a bit incredible that such simple rules can archive such good results.

It is so interesting that if the image have few colors, method #2 alone (An index into a previously seen pixel) encodes the whole image into an indexed colour space (with the hash table becoming the pallet).

Ive seen already some comments recommending adding some vertical locality by tiling, Hilbert order, or Z order. I have yet another different vertical locality suggestion to try:

While encoding/decoding keep a ring buffer the size of one full line of the image, storing each encoded/decoded bit there. Then add a variation of method #3 (The difference to the previous pixel) which instead encodes the difference to the pixel above the current (the last pixel from the ring buffer). Maybe, due to the limited "tag" space it would not be worth it to implement it for DIFF8, but only for DIFF16 (taking one bit from red to the additional tag bit) and DIFF24 (I'm not sure which bit I would steal there).

Edit: Also, when thinking about video encoding in the future you may want to keep a ring buffer fitting the full last frame of video. And then temporal locality is probably even more important, so then you may want to change the tags to encode more commands relating to the pixel from the previous frame.

gfody · on Nov 24, 2021

having done my own experiments with indexed pallets I suspect method #2 is responsible for most of the compression here. most photographs only have half a million distinct colors and it drops considerably as you focus in on an area. if you're willing to relax the single-pass requirement, partitioning the whole image by pallet similarly to how one encodes high-color images into multi-framed gifs can yield surprisingly good compression.

tda · on Nov 24, 2021

nice idea, I think it is somewhat more in line with the keep it simple option, as I e.g. have no idea what a hilbert curve for an arbitrary size rectangle looks like. And you still get compression for x and y directions (though no run length encoding the vertical, perhaps a further addition).

I quite like this approach of defining a few compression methods, see which one works best, and apply that one. I used a similar strategy for time series compression where I would take a chunk of data, apply z few different compression algorithms on it (RLE, delta RLE, etc) and just keep which ever one was smallest. Of course here it is done on a per pixels basis, and I used arbitrary size chunks. The latter is trivial to parallelize. Same could be applied here though, split the image in half and compress both parts independently

pedrocr · on Nov 24, 2021

These kinds of formats have been somewhat explored by camera makers for their raw formats. Since the computing resources in the cameras can be quite limited there are a few formats that try to occupy this space of simplicity with still some tricks for compression. Samsung's various SRW encodings remind me most of this:

https://github.com/pedrocr/rawloader/blob/a59bb78d156277781a...

IshKebab · on Nov 24, 2021

Yeah I don't think there are any new ideas here. RLE, differential encoding and dictionary encoding have all existed for decades.

The interesting thing is that it comes close to PNG.

formerly_proven · on Nov 24, 2021

Lossy RAW is sometimes even simpler than that. E.g. Nikon's old compression was just remapping values in a log-like fashion, which reduced resolution in the brightest few stops[1] to save a few bits per sample.

[1] Perception is pretty logarithmic as usual, so for a 14-bit readout, half the values represent the brightest stop and a little bit, the other half holds the other 12+ stops.

pedrocr · on Nov 24, 2021

The simplest form is just to pack 10 or 12 bit values. There's all kinds of variations of that:

https://github.com/pedrocr/rawloader/blob/a59bb78d156277781a...

There are a few formats that use a curve and less bits. They do become lossy and doing dithering on decompress is useful to avoid banding.

The Nikon one you mention was only used very early and is decoded by decode_8bit_wtable() in that file. It's just looking up the 8 bit value in the table and then adding some randomness to prevent the banding.

haxiomic · on Nov 24, 2021

This is impressive, I was curious if it was just better on simpler images with many duplicate pixels but it's nice to see it working well with photography too

e.g. https://phoboslab.org/files/qoibench/images/wallpaper/EwZDbL...

            decode ms   encode ms   decode mpps   encode mpps   size kb
    libpng:     148.4      3995.5         55.88          2.08     12223
    stbi:       161.0      1858.3         51.50          4.46     19199
    qoi:         60.8        95.6        136.49         86.78     12868

I'm interested if there's a standardized benchmark used in the academic literature this could be tested against

In any case the results are impressive enough that this will 100% be used in projects I work on!

Many thanks to the author for their work <3

orlp · on Nov 24, 2021

When I download that PNG file it is 10785 KB. If I re-encode it using FLIF 0.4 (to my knowledge still the best off-the-shelf lossless image compression) it is 7368 KB. That's 57% of the size of qoi, quite a lot smaller. You pay for it in encode/decode time though.

minus7 · on Nov 24, 2021

FLIF has been superseded[1] by JPEG XL, which, in my very limited testing, performed better (speed and compression-wise; libjxl) than FLIF.

Interesting though that FLIF 0.4 was released 3 days ago. I haven't checked out that new release (and won't), but the previous one wasn't great code and was a pain to use (ended up using its CLI instead of the lib). We ended up going back to PNG because of interoperability and speed. I haven't looked at libjxl's API yet.

Edit: 8605KB in 11.5s with libjxl, so it's not actually better than FLIF 0.4 (7368KB in 23.9s)

[1]: https://flif.info/#update

orlp · on Nov 24, 2021

Yes, I am aware it has been 'superseded' but in my experience for lossless compression it's still usually better.

fastball · on Nov 24, 2021

Interestingly I actually managed to shave 400kb off of that PNG by just running it through some PNG optimizers. Not great savings but were still some to be had!

gfody · on Nov 24, 2021

  pngcrush -brute  10355kb
  cjxl -q 100 -e 9  8413kb

haxiomic · on Nov 24, 2021

Can be made smaller for sure, but if I'm say working with 10,000 images in my app and decoding all the time, a 3-4x speedup in the decode time can be a major saving for ~ 10-40% increase in file size with a lib I can just drop in my project without effort

pantulis · on Nov 24, 2021

This is as cool as it gets.

I have a honest question about the source code, a lot of time since I coded anything meaningful in C, but I remember the header files did not have that much code on it, while here I see most of the code is in the header file. Why is this the case? Inline compilation? What are the advantages?

lifthrasiir · on Nov 24, 2021

There is a strong tradition of single-file C/C++ libraries [1] because it has been difficult to integrate larger C/C++ libraries into the codebase.

[1] https://github.com/p-ranav/awesome-hpp

flohofwoe · on Nov 24, 2021

There's an important difference between STB-style 'single file libraries', and C++ 'stdlib-style' header-only-libraries: STB-style libraries put the implementation code into a separate section inside an "#ifdef IMPLEMENTATION" block, while most C++ header-only-libraries use inline code (and thus pay a higher compilation cost each time this header is included).

STB-style libraries skip the implementation already in the preprocessor and compile the implementation only once for the whole project.

sltkr · on Nov 24, 2021

So what's the compelling reason to ship these as a single source file, instead of using the more logical and standard solution of a single header and a single source file?

Unlike C++ header-only libraries, C libraries like this still need their implementation included exactly once.

I understand the benefits of keeping the implementation in a single source file (simplifies integrating in existing build systems, interprocedural optimizations, etc.) but combining the source and header files too makes them awkward to use, with little benefit beyond being able to claim "our implementation is only 1 file!" which is about as impressive as saying "our implementation is only 1 line!" if you've just deleted all the whitespace.

lmm · on Nov 24, 2021

> So what's the compelling reason to ship these as a single source file, instead of using the more logical and standard solution of a single header and a single source file?

C doesn't have a packaging format or dependency manager, so having to keep track of two files would be pretty cumbersome.

flohofwoe · on Nov 24, 2021

In general, STB-style headers simplify C/C++ build system shenanigans.

E.g. one benefit is that you can configure the implementation via preprocessor defines without the defines leaking into the build system. Just add the config defines in front of the implementation include.

It's also trivial to write "extension headers" which need to directly peek into the (otherwise private) implementation of the base library. Just include the extension header implementation after the base library implementation.

PS: but also see: https://github.com/nothings/stb#why-single-file-headers

sltkr · on Nov 24, 2021

> In general, STB-style headers simplify C/C++ build system shenanigans.

Too vague. How is it better, concretely?

> E.g. one benefit is that you can configure the implementation via preprocessor defines without the defines leaking into the build system

That's not a benefit. How is this:

   #define LIB_IMPLEMENTATION
   #define LIB_KNOB 123
   #include "lib.h"

Any better than the two-file version:

   #define LIB_KNOB 123
   #include "lib.c"

?

> It's also trivial to write "extension headers" which need to directly peek into the (otherwise private) implementation of the base library. Just include the extension header implementation after the base library implementation.

I'm not sure what this means. Can you give an example? Is this being used in the above library?

> PS: but also see: https://github.com/nothings/stb#why-single-file-headers

OK, that only offers this explanation:

> Why not two files, one a header and one an implementation? The difference between 10 files and 9 files is not a big deal, but the difference between 2 files and 1 file is a big deal. You don't need to zip or tar the files up, you don't have to remember to attach two files, etc.

This is kind of nonsense to me. The difference between 2 files and 1 file is barely relevant, and doesn't weigh against the awkwardness of having no clear distinction between declarations and definitions, and having to pull in definitions by setting a magic preprocessor header.

flohofwoe · on Nov 24, 2021

    > Any better than the two-file version:
    >    #define LIB_KNOB 123
    >    #include "lib.c"

See, now you have 3 files, the .h/.c file pair of the library, and your own implementation source file with the configuration which includes a .c file. How is this better than having two files?

> Too vague. How is it better, concretely?

I can only assume from that question that you haven't had the "pleasure" to work much with C/C++ build systems yet, at least when it comes to integrating 3rd party libraries.

Re. extension headers:

    #define IMPLEMENTATION
    #include "base_library.h"
    #include "extension_library.h"

Since the implementation of the base library is in the same compilation unit as the extension library, the extension library has direct access to the private implementation details of the base library without having to add a separate 'private but actually public API'.

> no clear distinction between declarations and definitions

STB-style libraries have this clear distinction, C++ style header-only libraries don't.

sltkr · on Nov 24, 2021

> See, now you have 3 files, the .h/.c file pair of the library, and your own implementation source file with the configuration which includes a .c file.

The third source file is what the library authors suggested; I'd just create a seperate Makefile target for the implementation, which is then linked into the binary. (Yes, that's an extra target, but that costs very little and allows you to avoid unnecessary rebuilds.)

Why don't you give me a concrete counterexample where the single-header-setup makes things simpler? If it's so obviously beneficial it shouldn't be so hard to come up with a concrete example.

> #define IMPLEMENTATION > #include "base_library.h" > #include "extension_library.h"

Thanks for explaining, but that doesn't really motivate having a single file, since you could just as easily do:

  #include "base_library.c"
  #include "extension_library.c"

And you've saved a line (and removed the need for obscure preprocessor macros).

> ... without having to add a separate 'private but actually public API'.

Personally, I'd say this is exactly the right way to model this, but even if you disagree: it sounds like this simplifies things mainly for the library/extension authors, and not the library users, because if the extension author forward-declares the implementation details they depend on, there is no additional file to manage for the users either.

wizzwizz4 · on Nov 24, 2021

> The difference between 2 files and 1 file is barely relevant,

To you. To me doing Rust stuff, it's been quite helpful to have single-file C programs; it makes it a lot easier to compile and link things.

ur-whale · on Nov 24, 2021

Packing the entire library in a header file is a somewhat recent and much welcome trend.

See for example:

https://github.com/nothings/stb

https://github.com/nothings/single_file_libs

mkl · on Nov 24, 2021

I wouldn't call it recent. Boost has been doing it since it was started in the late 1990s.

flohofwoe · on Nov 24, 2021

Boost or C++ stdlib headers are different than STB-style single-file-libraries though in that the implementation code is inline and needs to be parsed each time the header is included.

tommiegannert · on Nov 24, 2021

Boost did it because C++ template definitions must be in headers.

That seems unrelated to this idea that C code using it is equally reasonable. That's a developer choice.

ChrisMarshallNY · on Nov 24, 2021

The original C++ templates used to be entirely contained in the headers. I assume that is no longer the case, but it has been a long time, since I wrote C++.

When I was writing template SDKs, the headers were pretty much the entirety of the library.

CornCobs · on Nov 24, 2021

Single header libraries are hugely more ergonomic and easy to use. Just get the file, #include it and you're done! Most C libraries on GitHub prefer this mode of distribution now

phkahler · on Nov 24, 2021

>> Single header libraries are hugely more ergonomic and easy to use. Just get the file, #include it and you're done! Most C libraries on GitHub prefer this mode of distribution now

I find that odd. How is that significantly better than dropping 2 files into a project and #including the header? A good build system should just compile all the C or C++ files in a given place, so nothing needs to be changed other than dropping the files in and #include where called from.

hn_acc_2 · on Nov 24, 2021

The difference will be painfully clear once you try to walk a junior developer through installing and linking new files to a large project in Visual Studio, vs "copy this .h file here and include it".

Whether or not a "good build system" should handle it, the fact that single-file libraries are much preferred these days should demonstrate most people don't have such a build system

10000truths · on Nov 25, 2021

So much of the C/C++ ecosystem revolves around knowing your build tools that, for most intents and purposes, it should honestly be considered an integral part of the languages. If I ever teach an introductory C/C++ course, I would dedicate at least 1/4 of my time to explaining:

* What a translation unit is, and how compiling a translation unit to an object file works

* What a linker is, and how the linker combines object files into a library/executable

* What make/autotools/cmake are, and how they work

* What FHS is, and how it standardizes the lib/bin/include paths that C/C++ build tools read from/install to

And so on. Any C/C++ course that goes beyond a “Hello world” program without explaining these concepts in detail does its students a disservice.

BlueTemplar · on Nov 25, 2021

Yeah, in our class we had to figure out this mostly on our own, so after it I'm still at the level where I barely manage to make make work, I'll have to look into those other ones, thanks !

michael-ax · on Nov 25, 2021

its not 'significantly better'.

but also ask "why are the tables of content prefixing the contents of all my books?"

its taken ages, but with this model, c lib authoring has finally taken something that was a core strength of pascal: single-file modularity.

though in theory there's no difference between theory and practice, in practice, there is. and this kind of packaging makes a lot of sense for a lib like this.

rytill · on Nov 24, 2021

Why use two files instead of one?

bluedino · on Nov 24, 2021

What if you want to use the functions in other C files? What if you want to compile them all separately?

hn_acc_2 · on Nov 24, 2021

1) Include it wherever it's used. The convention is to use the preprocessor[1] so it only appears once in the final binary

2) Most developers just want to compile to a single binary. Any developer who for some reason needs separately compiled objects should be able to quickly achieve that from a single-file library

[1] https://github.com/phoboslab/qoi/blob/324c2243b2fb35f741a88d...

wronex · on Nov 26, 2021

Maybe you've been living under the same rock as me. I was under a similar impression untill I decided to learn cmake last week.

I was blown away by its simplicity and power.

You can download almost any C/C++ library from GitHub then add two lines to your cmake file. It will then compile and link with your project.

Blown away.

Cmake can generate project files for Visual Studio, makefiles, and countless other formats. Better yet, you can open a cmake folder directly in Visual Studio. No need to generate a project :)

This comes as no surprise to anybody, I'm sure, but cmake is great.

It was tricky to find good information about it though. It has been around forever, but the nicer features were added recently it seems.

giorgioz · on Nov 24, 2021

>I can almost picture the meeting of the Moving Picture Experts Group where some random suit demanded there to be a way to indicate a video stream is copyrighted. And thus, the copyright bit flag made its way into the standard and successfully STOPPED MOVIE PIRACY BEFORE IT EVEN BEGAN.

This bit made me laugh out loud :D

barrkel · on Nov 24, 2021

I rather wonder if things like this are used to encode a proof of intent.

If something is marked as copyrighted, it may be trivial to remove the mark, but at least you had to remove the mark, and that's evidence of intent rather than ignorance.

akx · on Nov 24, 2021

Or a piece of software in your toolchain has been unaware of the bit and inadvertently stripped it.

Is that evidence of intent still?

barrkel · on Nov 24, 2021

I'm not arguing in favour of it, I'm hypothesizing why it might be added.

Your toolchain should set the bit last thing before distribution, I'd expect.

If you're talking about a toolchain on the consumption side after distribution, well, consumers shouldn't be modifying content, is what I'd expect the position to be. No derivative works and all that.

giorgioz · on Nov 25, 2021

I was wondering the same with the COPYRIGHT note at the top of all files in a code repo. Clearly someone that have the code can easily remove the COPYRIGHT header but then it will demostrate they had actively red it and decided to remove it. Plus if you add the copyright header in many places it might be forgotten in some.

ur-whale · on Nov 24, 2021

> This bit made me laugh out loud :D

Yeah, it's funny, until you actually go read the MPEG spec. documentation.

Then any and all inklings of laughter will very quickly evaporate.

h4kor · on Nov 24, 2021

What does it say about these bits?

I'm currently tinkering around with mp3s and wondered how many "useless" bits are in each frame.

I wonder if they are ever used to trace distributors of pirated music, similar to yellow dots in printers.

gnramires · on Nov 25, 2021

He might be referring to the hundreds of pages (and very complicated implementations, usually of encoders) video formats have.

People think of technology as a straight line moving forward. But it's perhaps better to imagine like species in nature. It's just everything getting bigger. The smallest ant or bacteria evolves to better adapt to its niche, as the largest elephant or whale will develop adaptations (sometimes getting physically larger, depending on the environmental changes). The rule is that many scales have a role or niche. I think technology is similar.

BlueTemplar · on Nov 24, 2021

Indeed, however remember the mandatory "What colour are your bits ?" :

https://ansuz.sooke.bc.ca/entry/23

SavantIdiot · on Nov 24, 2021

What I like about this approach is that it requires zero differential calculus. People forgot that Radix2 DCTs come from a century's worth of advanced math in the S domain. This approach is applied-computer-science based, I can get my head around that a lot better than a Laplace/Z transform.

But what shocks me is how did absolutely no-one apply such a naive RLE in the 30 years of internet image formats?

Is this a case of "man reinvents wheel again because why do research" or is this exceptional?

Trixter · on Nov 24, 2021

RLE on runs of deltas is in very many compression schemes; it's a common technique (probably so common that many authors don't deem it worth mentioning). Lossless video codecs in particular (huffyuv, lagarith, utcodec, etc.) have had this going back 20 years at least.

michael-ax · on Nov 25, 2021

definitely ubiquitous.

we applied rle encoding for time-series and triac control systems 43 years ago after getting some guidance from a mainframe guy - but since RLE is practical so that we even use it to give direction to people on the street, its probably so old that its even found its way into the way our genes are encoded. :D

a_e_k · on Nov 24, 2021

> But what shocks me is how did absolutely no-one apply such a naive RLE in the 30 years of internet image formats?

Off the top of my head, both PCX [0] and TGA [1] use similarly naive RLE as their compression method.

And while they also offer fancier compression methods, BMP [2], TIFF [3], and OpenEXR [4] all offer RLE modes as well.

[0] https://en.wikipedia.org/wiki/PCX

[1] https://en.wikipedia.org/wiki/Truevision_TGA

[2] https://en.wikipedia.org/wiki/BMP_file_format

[3] https://en.wikipedia.org/wiki/TIFF

[4] https://en.wikipedia.org/wiki/OpenEXR

WoodenChair · on Nov 25, 2021

Just to add to this list, even MacPaint used RLE on a per-scan line basis. I recently wrote about decoding MacPaint files[0]. So, replying to the gp post, yes it is in a sense reinventing the wheel. But sometimes the wheel needs to get reinvented because you have a new vehicle that could use a different type. And I would say the original poster felt he needed part of the modern wheel without the tire, so he had to go back to earlier wheel designs in a modern context. Okay, I've stretched the analogy too far!

[0] http://www.observationalhazard.com/2021/09/building-retro-di...

formerly_proven · on Nov 24, 2021

This is pretty neat. A lot like the filtering done in PNG, just without slapping zlib on the result. As far as SIMD goes, I agree with the author and don't see much potential, as instructions are variable-length and most instructions encode deltas. It's probably quite close to optimal by just letting the CPU figure it out on the micro-level.

If each line were encoded independently they could be en- and decoded in parallel (once you add a skip instruction to tell where the next line starts, this does preclude streaming of intra-line output though). The hit to compression ratio should be small as long as images are wide enough for the color buffer reset to not matter.

junon · on Nov 24, 2021

To answer the author's question: No, SIMD would not help with this.

This is neat. I wonder if the author would be willing to write a Kaitai Struct definition for it.

Something else interesting: QOI is 1,2,2 letters off from PNG. I'm quite certain this is an accident but it's interesting nonetheless.

jiggawatts · on Nov 24, 2021

I suspect SIMD would help with the encoding. The lookup table is small enough to fit into 8 AVX2 registers, so instead of hashing, you could use direct lookup, which would improve compression ratio further (a little bit).

Const-me · on Nov 24, 2021

> The lookup table is small enough to fit into 8 AVX2 registers

Indeed.

> so instead of hashing, you could use direct lookup

However, I don’t think that part gonna work. See the code using that table: https://github.com/phoboslab/qoi/blob/master/qoi.h#L324-L328

SIMD registers aren’t indexable (at least not on AMD64), the register needs to be known to the compiler.

Lanes within each register aren’t indexable either. The insert and extract instructions are encoding lane index in the code. There’re workarounds for this one, like abusing vpshufb or vpermd, but with that amount of overhead I doubt SIMD will deliver any profit at all.

junon · on Nov 24, 2021

Yes encoding might benefit of course, I was more considering decoding speed I suppose.

jiggawatts · on Nov 24, 2021

There are some clever tricks that can be pulled with the latest instructions sets like AVX-512. The registers are huge and the instructions available are so varied that there are clever ways to use them in "off label" ways to implement lookup tables and bit-level parsers.

junon · on Nov 25, 2021

AVX-512 is deprecated and will be removed from future chips, so that's probably not as useful as one might think.

jiggawatts · on Nov 25, 2021

I doubt that. What would be hilarious is if AMD's upcoming Zen 4 supported it consistently, but not all Intel CPU models did.

javajosh · on Nov 24, 2021

Very cool - I hope browsers support this soon!

Funny meta comment: when I read "offering a 20x-50x speedup in compression and 3x-4x speedup in decompression" I caught myself reacting with disappointment to the "only" 3x-4x speed up. This is funny because 3x improvement is enormous; it's just small compared to 30x. There's a lesson somewhere in there about sharing stats together to achieve an emotional effect.

xiphias2 · on Nov 24, 2021

It's trivial to port this to JS / WebAssembly without losing speed (using bytearrays) ..it's a nice afternoon project actually

bob1029 · on Nov 24, 2021

This is very impressive work.

One thing I did recognize after my last trip into the rabbit hole - JPEG can be the fastest format by orders of magnitude if you are using an appropriate encoder. The project mentions 20x-50x speedups over PNG which are commendable, but once you throw LibJpegTurbo into the mix (it does use SIMD), you will find a substantially different equation on your hands. I can do 1080p 4:4:4 JPEG encodes in ~3ms on my workstation today.

I am not saying you should always use jpeg (some people like to keep all their original pixels around), but if you want to go really really fast you should consider it. The latency is so low with this approach that I have been looking at the feasibility of building a streaming gaming platform around it. Current status of that effort is somewhere between "holy shit it actually works" and "maybe?"

bno1 · on Nov 24, 2021

Jpeg is not losses though

dexen · on Nov 24, 2021

I am in awe with how the post neatly interleaves the why and the how. Having recently implemented L4 decompressor (itself a rather reasonable and "lightweight" standard), I like the QOI even better. Hoping to see the video decoder author mentions as possibility!

jiri · on Nov 24, 2021

This code is nice for RGB, but what about YUV420 format? There should be even more efficient way to store image in YUV420 (takes half bytes than RGB 888), and most images (JPEGs, direct cameras output in Android etc) are already in YUV420. I wonder if auther has tried YUV420 already and failed for any reason ...

orlp · on Nov 24, 2021

    QOI_DIFF24 {
        u8 tag  :  4;   // b1110
        u8 dr   :  5;   // 5-bit   red channel difference: -15..16
        u8 dg   :  5;   // 5-bit green channel difference: -15..16
        u8 db   :  5;   // 5-bit  blue channel difference: -15..16
        u8 da   :  5;   // 5-bit alpha channel difference: -15..16
    }

It bothers me more than it should that the 5-bit signed differences aren't -16..15 matching two's complement but -15..16 instead.

phoboslab · on Nov 24, 2021

I should probably fix that. Is there a clever way to unpack a 5bit signed int other than treating it as unsigned and subtracting 16?

benlivengood · on Nov 24, 2021

You'll have to shift the values around anyway, so:

int intermediate = byte_or_word << N; // N chosen to put the five bits in the highest positions in the int.

int value = intermediate >> bits_per_int - 5;

Instead of masking rely on arithmetic shift right to keep the sign bit.

adgjlsfhk1 · on Nov 24, 2021

That seems easiest.

wizzwizz4 · on Nov 24, 2021

Especially since your compiler will make it a super magical optimised version.

selcuka · on Nov 24, 2021

Who knows? Maybe it's the result of a heuristic that achieves a slightly better compression ratio.

hoseja · on Nov 24, 2021

Yes, this is baffling. Wonder if the format could still be amended to fix this?

dahart · on Nov 24, 2021

This is pretty neat! The compression rates and encode/decode rates are impressive for such a simple rule set.

> SIMD acceleration for QOI would also be cool but (from my very limited knowledge about some SIMD instructions on ARM), the format doesn't seem to be well suited for it. Maybe someone with a bit more experience can shed some light?

I don’t know about SIMD encoding, but for decoding at least, the core problem is to make some unit of the image (pixels, scanlines, blocks, etc) independently addressable (readable). The current format is complicated due to pixels of different sizes; there’s no way to know where data for pixel 1000 starts without reading all preceding 999 pixels. You could build a prefix-sum array of pixel indices, but that will fully undermine your compression.

One question might be how well does QOI work on independent scan lines or blocks? Is the compression still pretty good if the look-back window is limited in size and restarts every once in a while?

To make a GPU capable format, the very first thing you would probably need to do is separate the tag bits from all the other data, so a thread could identify the pixel type by direct lookup into the tag buffer. It then needs to know where to go to find the remaining data, so you’d probably need at least one byte of data that could be used to reliably locate the rest of the data. I don’t see an easy way to make this happen for separate pixels, but it might not be bad if the image is encoded in blocks or scanlines.

mmastrac · on Nov 24, 2021

Wouldn't it make sense to transcode to a GPU-specific lossless, compressed format for rendering?

dahart · on Nov 24, 2021

Yes, totally. Depends on your use case, but there certainly could be good reasons to keep images in compressed form even GPU RAM. It’d be ideal if decode is both random-access and predictable amounts of work per pixel. This format checks the predictable work box, but would need a little design to make it random access.

benlivengood · on Nov 24, 2021

I wonder if the author tried making pixel diffs capable of using lookback. They'd be one byte longer but would possibly happen a lot more often than full pixel values. It's also reminiscent of the delta blocks from MPEG. If anything I would trade the 5-bit run for delta-from-lookup on a whim (the cases where the ratio between short runs vs. exact lookback is extreme seems like it might only happen for pinstripe patterns). I should probably implement the change and test it before posting...