Hacker News new | past | comments | ask | show | jobs | submit login
JPEG XL: How it started, how it’s going (cloudinary.com)
389 points by ksec on July 20, 2023 | hide | past | favorite | 204 comments



Question: why do we see stable video and audio "container formats" like MKV that persist as encodings come and go (where you might not be able to play a new .mkv file on an old player, but the expected answer to that is to upgrade your player to a new version, with universal support for pretty much any encoding being an inevitability on at least all software players); but every new image encoding seemingly necessitates its own new container format and file extension, and a minor format war to decide who will support it?

Is this because almost all AV decoders use libffmpeg or a fork thereof; where libffmpeg is basically an "uber-library" that supports all interesting AV formats and codecs; and therefore you can expect ~everything to get support for a new codec whenever libffmpeg includes it (rather than some programs just never ending up supporting the codec)?

If so — is there a reason that there isn't a libffmpeg-like uber-library for image formats+codecs?


The original entrant in this competition is TIFF, and—like Matroska or QuickTime add indexing to raw MP3 or MPEG-TS—it does provide useful functionality over raw codec stream non-formats like JPEG (properly JIF/JFIF/EXIF), in the form of striping or tiling and ready-made downscaled versions for the same image. But where unindexed video is essentially unworkable, an untiled image is in most cases OK, except for a couple of narrow application areas that need to deal with humongous amounts of pixel data.

So you’re absolutely going to see TIFF containers with JPEG or JPEG2000 tiles used for geospatial, medical, or hi-res scanned images, but given the sad state of open tooling for all of these, there’s little to no compatibility between their various subsets of the TIFF spec, especially across vendors, and more or less no FOSS beyond libtiff. (Not even viewers for larger-than-RAM images!) Some other people have used TIFF but in places where’s very little to be gained from compatibility (e.g. Canon’s CR2 raw images are TIFF-based, but nobody cares). LogLuv TIFF is a viable HDR format, but it’s in an awkward place between the hobby-renderer-friendly Radiance HDR, the Pixar-backed OpenEXR, and whatever consumer photo thing each of the major vendors is pushing this month; it also doesn’t have a bit-level spec so much as a couple of journal articles and some code in libtiff.

Why did this happen? Aside from the niche character of very large images, Adobe has abandoned the TIFF spec fairly quickly after it acquired it as part of Aldus, but IIUC for the first decade or so of that neglect Adobe legal was nevertheless fairly proactive about shutting up anyone who used the trademarked name for an incompatible extension (like TIFF64—and nowadays if you need TIFF you likely have >2G of data). Admittedly TIFF is also an overly flexible mess, but then so are Matroska (thus the need for the WebM profile of it) and QuickTime/BMFF (thus 3GPP, MOV, MP4, ..., which are vaguely speaking all subsets of the same thing).

One way or another, TIFF is to some extent what you want, but it doesn’t get a lot of use these days. No browser support either, which is likely important. Maybe the HEIF container (yet another QuickTime/BMFF profile) is better from a technical standpoint, but the transitive closure of the relevant ISO specs likely comes at $10k or more. So it’s a bit sad all around.


I think TIFF has some unique features that makes it more prone to certain security issues[1] compared to other formats, such as storing absolute file offsets instead of relative offsets. So I am not sure TIFF is a good container format, but many camera raws are TIFF-based for some reason.[2]

[1] https://cve.mitre.org/cgi-bin/cvekey.cgi?keyword=libtiff

[2] https://exiftool.org/#supported (search for "TIFF-based")


> I think TIFF has some unique features that makes it more prone to certain security issues[] compared to other formats, such as storing absolute file offsets instead of relative offsets.

That’s an impressive number of CVEs for a fairly modest piece of code, although the sheer number of them dated ≥ 2022 baffles me—has a high-profile target started using libtiff recently, or has some hero set up a fuzzer? In any case libtiff is surprisingly nice to use but very old and not that carefully coded, so I’m not shocked.

I’m not sure about the absolute offsets, though. In which respect are those more error-prone? If I was coding a TIFF library in C against ISO or POSIX APIs—and without overflow-detecting arithmetic from GCC or C23—I’d probably prefer to deal with absolute offsets rather than relative ones, just to avoid an extra potentially-overflowing addition whenever I needed an absolute offset for some reason.

There are things I dislike about TIFF, including security-relevant ones. (Perhaps, for example, it’d be better to use a sequential format with some offsets on top, and not TIFF’s sea of offsets with hopefully some sequencing to them. Possibly ISO BMFF is in fact better here; I wouldn’t know, because—well—ISO.) But I don’t understand this particular charge.


Absolute file offsets demand a particular memory layout or some extra bookkeeping that could be avoided with relative offsets. If I were to write a JPEG parser, I could write a function to handle one particular segment and not have to worry about other segments because relative offsets makes parsing them independent, compared to TIFF where I need to maintain a directory of things and make sure the offsets land in the right place.

I think parsing file format with absolute offsets is similar to handling a programming language with all GOTOs, compared to relative offsets which are more like structured control flow.


If you’re interested in BMFF and don’t care to spend ISO prices, you can always go back to the original, Apple’s QuickTime File Format: https://developer.apple.com/standards/qtff-2001.pdf


> Admittedly TIFF is also an overly flexible mess, but then so are Matroska (thus the need for the WebM profile of it)

Webm went way too far when they stripped out support for subtitles. The engineers who made that decision should be ashamed.


As much as I’m fond of my collection of Matroska files with SSA/ASS subtitle tracks, I don’t think those are appropriate for the Web, what with all the font issues; and SRT is a nightmare of encodings. But apparently there’s a WebM-blessed way[1] of embedding WebVTT ( = SRT + UTF-8 − decimal commas) now? Which is of course different[2] from the more recent Matroska.org-blessed way[3], sigh.

[1] https://www.webmproject.org/docs/container/#webvtt-guideline...

[2] https://trac.ffmpeg.org/ticket/5641

[3] https://matroska.org/technical/codec_specs.html#s_textwebvtt


> I don’t think those are appropriate for the Web, what with all the font issues

Fun fact: several broadcast standards use Bitstream TrueDoc Portable Font Resource, which was supported for embedded web fonts way back in Netscape 4:

https://people.apache.org/~jim/NewArchitect/webrevu/1997/11_...

https://web.archive.org/web/20040407162455/http://www.bitstr...

“The PFR specification defines the Bitstream portable font resource (PFR), which is a compact, platform-independent format for representing high-quality, scalable outline fonts.

Many independent organizations responsible for setting digital TV standards have adopted the PFR font format as their standard font format, including:

— ATSC (Advanced Television Systems Committee)

— DAVIC (Digital Audio Visual Council)

— DVB (Digital Video Broadcasting)

— DTG (Digital TV Group)

— MHP (Multimedia Home Platform)

— ISO/IEC 16500-6:1999

— OCAP (OpenCable Application Platform)”


All text based subtitles share the (non-)issue of encoding. Nothing wrong with SRT, it's UTF8 in Mkv anyway.


Video container formats do something useful: they let you to package several streams together (audio, video, subtitles), and they can take of some important aspects of av streaming, letting codec part to focus on being a codec. They let you to use existing audio codecs with a new video codec.

OTOH a still image container would do nothing useful. If an image is all that needs to be contained, there's no need for a wrapper.


> a still image container would do nothing useful

It would, at least, create a codec-neutral location and format for image metadata, with codec-neutral (and ideally extensible + vendor-namespaced) fields. EXIF is just a JPEG thing. There is a reason that TIFF is still to this day used in medical imaging — it allows embedding of standardized medical-namespace metadata fields.

Also, presuming the container format itself is extensible, it would also allow the PNG approach to ancillary data embedding ("allow optional chunks with vendor-specific meanings, for data that can be useful to clients, but which image processors can know it's safe to strip without understanding because 'is optional' is a syntactic part of the chunk name") to be used with arbitrary images — in a way where those chunks can even survive the image being transcoded! (If you're unaware, when you transcode a video file between video codecs using e.g. Handbrake, ancillary data like thumbnail and subtitle tracks will be ported as-is to the new file, as long as the new container format also supports those tracks.)

Also, speaking of subtitle tracks, here's something most people may have never considered: you know how video containers can embed "soft" subtitle tracks? Why shouldn't images embed "soft" subtitle tracks, in multiple languages? Why shouldn't you expect your OS screen-reader feature to be able to read you your accessibility-enabled comic books in your native language — and in the right order (an order that, for comic books, a simple OCR-driven text extraction could never figure out)?

(There are community image-curation services that allow images to be user-annotated with soft subtitles; but they do it by storing the subtitle data outside of the image file, in a database; sending the subtitle data separately as an XHR response after the image-display view loads; and then overlaying the soft-subtitle interaction-regions onto the image using client-side Javascript. Which makes sense in a world where users are able to freely edit the subtitles... but in a world where the subtitles are burned into the image at publication time by the author or publisher, it should be the browser [or other image viewer] doing this overlaying! Saving the image file should save the soft subtitles along with it! Just like when right-click-Save-ing a <video> element!)


> Why shouldn't images embed "soft" subtitle tracks

That would be a layered image format, like .psd (Photoshop).

It's an interesting idea, memes could become editable :)


GIF89a actually defines something like this https://www.w3.org/Graphics/GIF/spec-gif89a.txt

“The Plain Text Extension contains textual data and the parameters necessary to render that data as a graphic, in a simple form. The textual data will be encoded with the 7-bit printable ASCII characters. Text data are rendered using a grid of character cells defined by the parameters in the block fields. Each character is rendered in an individual cell. The textual data in this block is to be rendered as mono-spaced characters, one character per cell, with a best fitting font and size.”

“The Comment Extension contains textual information which is not part of the actual graphics in the GIF Data Stream. It is suitable for including comments about the graphics, credits, descriptions or any other type of non-control and non-graphic data.”

I hesitate to say GIF89a "supported" it since in practice approximately zero percent of software can use either extension. `gIFt` was dropped from the PNG spec for this reason: https://w3c.github.io/PNG-spec/extensions/Overview.html#DC.g...

If it had been well-supported we might have avoided the whole GIF pronunciation war. Load up http://cd.textfiles.com/arcadebbs/GIFS/BOB-89A.GIF in http://ata4.github.io/gifiddle/ and check out the last frame :)


While funny, I think parent meant more like alt text.


Correct. Not subtitles as a vector layer of the image, but rather subtitles as regions of the image annotated with textual gloss information — information which has no required presentation as part of the rendering of the image, but which the UA is free to use as it pleases in response to user configuration — by presenting the gloss on hover/tap like alt text, yes; or by reading the gloss aloud; or by search-indexing pages of a graphic novel by their textual glosses like how you can search an ePub by text, etc.

In the alt-text case specifically, you could allow for optional styling info so that the gloss can be laid out as a visual replacement for the original text that was on the page. But that's not really necessary, and might even be counterproductive to some use-cases (like when interpretation of the meaning of the text depends on details of typography/calligraphy that can't be conveyed by the gloss, and so the user needs to see the original text with the gloss side-by-side; or when the gloss is a translation and the original is written with poetic meter, such that the user wants the gloss for understanding the words but the original for appreciating the poesy of the work.)

Concrete use-cases:

• the "cleaner" and "layout" roles in the (digitally-distributed) manga localization process, only continue to exist, because soft-subbed images (as standalone documents) aren't a thing. Nobody who has any respect for art wants to be "destructively restoring" an artist's original work and vision, just to translate some text within that work. They'd much rather be able to just hand you the original work, untouched, with some translation "sticky notes" on top, that you can toggle on and off.

• in the case of webcomic images that have a textual "bonus joke" (e.g. XKCD, Dinosaur Comics), where this is currently implemented as alt/title-attribute text — this could be moved into the image itself as a whole-image annotation, such that the "bonus joke" would be archivally preserved alongside the image document.


Region annotation is used for some images on Wikimedia Commons and a lot of Manga pages on the booru sites[1]. It's really, really good for translations.

[1]: https://danbooru.donmai.us/posts/6510411?q=arknights


I think we could do that with SVG, no? SVG is a vector format of course, but can also have raster parts embedded.



That's a very basic view, take a look at TIFF or DICOM specs. It can be useful to have multiple images, resolutions, channels, z or t dimensions, metadata, ... all in a single container as it's all one "image"


captions / alt-text could also very reasonably be part of the image, as well as descriptions of regions and other metadata.

there are LOTS of uses for "image containers" that go beyond just pixels. heck, look at EXIF, which is extremely widespread - it's often stripped to save space on the web, but it's definitely useful and used.


Container formats for video often need to:

- contain multiple streams of synced video, audio, and subtitles

- contain alternate streams of audio

- contain chapter information

- contain metadata such as artist information

For web distribution of static images, you want almost none of those things, especially regarding alternate streams. You just want to download the one stream you want. Easiest way to do that is to just serve each stream as a separate file, and not mux different streams into a single container in the first place.

Also, I could be wrong on this part, but my understanding is that for web streaming video, you don't really want those mkv* features either. You typically serve individual and separate streams of video, audio, and text, sourced from separate files, and your player/browser syncs them. The alternative would be unnecessary demux on the server side, or the client unnecessarily downloads irrelevant streams.

The metadata is the only case where I see the potential benefit of a single container format.

* Not specific to mkv, other containers have them of course


Container formats increase size. Now for video that doesn't matter much because it doesn't move the needle. For images a container format could be a significant percentage of the total image size.


Yes, I focused mostly on the lack of benefit, but even for a single stream, size is another important cost.


> The alternative would be unnecessary demux on the server side, or the client unnecessarily downloads irrelevant streams.

HTTP file transfer protocols support partial downloads. A client can choose just to not to download irrelevant audio. I think most common web platforms already work this way, when you open a video it is likely to be in .mp4 format, and you need to get the end of it to play it, so your browser gets that part first. I am not entirely sure.


I believe mp4 files can be repackaged to put the bookkeeping data at the front of the file, which makes them playable while doing a sequential download.


That metadata is usually put around the end of the file for compatibility reasons, but one can use ffmpeg's `-movflags faststart` option to move it to the beginning (very common in files that are meant to be served on the web).


> You typically serve individual and separate streams of video, audio, and text, sourced from separate files, and your player/browser syncs them.

That's one school of thought. Some of the biggest streaming providers simply serve a single muxed video+audio HLS stream based on bandwidth detection. Doesn't work very well for multi-language prerecorded content of course, but that's just one use case.


That's true, but my understanding is they serve a specific mux for a specific bandwidth profile, and serve it by just transmitting bytes, no demux required. I didn't mean to imply that wasn't a common option. I only meant to say I don't think a common option is to have a giant mux of all possible bandwidth profiles into one container file, that has to be demuxed at serve time.

My understanding is that YouTube supports both the "separate streams" and "specific mux per-bandwidth profile" methods, and picks one based on the codec support/preferences of the client.


Containers are just containers — you still need a decoder for their payload codec. This is the same for video and images. For video, containers are more important because you typically have several different codecs being used together (in particular video and audio) and the different bitstreams need to be interleaved.

The ISOBMFF format is used as a container for MP4, JPEG 2000, JPEG XL, HEIF, AVIF, etc.

And yes, there are ffmpeg-like "uber-libraries" for images: ImageMagick, GraphicsMagic, libvips, imlib2 and gdk-pixbuf are examples of those. They support basically all image formats, and applications based on one of these will 'automatically' get JPEG XL support.

Apple also has such an "uber-library" called CoreMedia, which means any application that uses this library will also get JPEG XL support automatically.


I'm guessing it's mostly up to mostly tradition/momentum on how the formats where initially created and maintained.

Videos has (most of the time at least) at least two tracks at the same time that has to be syncronized, and most of the time it's one video track and one audio track. With that in mind, it makes sense to wrap those in a "container" and allow the video and audio to be different formats. You also can have multiple audios/video tracks in one file, but I digress.

With images, it didn't make sense at least in the beginning, to have one container because you just have one image (or many, in the case of .gif).


We're starting to see a move towards this with HEIF / AVIF containers, however in cases where "every bit must be saved" the general purpose containers like ISO-BMFF introduce some wastage that is unappealing.


> however in cases where "every bit must be saved" the general purpose containers like ISO-BMFF introduce some wastage that is unappealing.

Sure, but I don't mean general-purpose mulimedia containers (that put a lot of work into making multiple streams seekable with shared timing info.) I mean bit-efficient, image-oriented, but image-encoding-neutral container formats.

There are at least two already-existing extensible image file formats that could be used for this: PNG and TIFF. In fact, TIFF was designed for this purpose — and even has several different encodings it supports!

But in practice, you don't see the people who create new image codecs these days thinking of themselves as creating image codecs — they think of themselves as creating vertically-integrated image formats-plus-codecs. You don't see the authors of these new image specifications thinking "maybe I should be neutral on container format for this codec, and instead just specify what the bitstream for the image data looks like and what metadata would need to be stored about said bitstream to decode it in the abstract; and leave containerizing it to someone else." Let alone do you ever see someone think "hey, maybe I should invent a codec... and then create multiple reference implementations for how it would be stored inside a TIFF container, a PNG container, an MKV container..."


But HEIC/AVIF did exactly that, defined image format on top of standard container (isobmff/heif). JPEG-XL is the odd one out because it doesn't have standardized HEIF format, but for example JPEG-XS and JPEG-XR are supported in HEIF.


JPEG XL uses the ISOBMFF container, with an option to skip the container completely and just use a raw codestream. HEIF is also ISOBMFF based but adds more mandatory stuff so you end up with more header overhead, and it adds some functionality at the container level (like layers, or using one codestream for the color image and another codestream for the alpha channel) that is useful for codecs that don't have that functionality at the codestream level — like video codecs which typically only support yuv, so if you want to do alpha you have to do it with one yuv frame and one yuv400 frame, and use a container like HEIF to indicate that the second frame represents the alpha channel. So if you want to use a video codec like HEVC or AV1 for still images and have functionality like alpha channels, ICC profiles, or orientation, then you need such a container since these codecs do not natively support those things. But for JPEG XL this is not needed since JPEG XL already does have native support for all of these things — it was designed to be a still image codec after all. It's also more effective for compression to support these things at the codec level, e.g. in JPEG XL you can have an RGBA palette which can be useful for lossless compression of certain images, while in HEIC/AVIF this is impossible since the RGB and A are in two different codestreams which are independent from one another and only combined at the container level.

It would be possible to define a JPEG XL payload for the HEIF container but it would not really bring anything except a few hundred bytes of extra header overhead and possibly some risk of patent infringement since the IP situation of HEIF is not super clear (Nokia claims it has relevant patents on it, and those are not expired yet).


> JPEG XL uses the ISOBMFF container, with an option to skip the container completely and just use a raw codestream

Hey, thanks for the clarification! I was basing my info on Wikipedia (my bad), ISO BMFF page doesn't mention JXL at all, and even JPEG XL page has only small print in infobox saying that its "based on" ISO BMFF but the main article text doesn't mention that at all.

> But for JPEG XL this is not needed since JPEG XL already does have native support for all of these things — it was designed to be a still image codec after all

I suppose that is bit the thing grand-parent comment was complaining about, format not designed for general-purpose containers but rather as an standalone thing. I suppose it could be fun thought experiment to imagine what JXL would look like if it were specifically designed to be used in HEIF.

Of course it is well understandable that making tailored purpose-built format ends up better in many ways vs trying to fit into some existing generic thing.

> It would be possible to define a JPEG XL payload for the HEIF container but it would not really bring anything except a few hundred bytes of extra header overhead and possibly some risk of patent infringement since the IP situation of HEIF is not super clear (Nokia claims it has relevant patents on it, and those are not expired yet).

I suppose JXL-in-HEIF would allow some image management tools to have common code path for handling JXL and HEIC/AVIF files, grabbing metadata etc, and possibly would not need any specific JXL support. But that is probably not a practical concern in reality.


And at the same time, we are likely going to use codec-specific extensions for all AOM video codecs (.av1, .av2) as well as for images (.webp2, not sure if .avif2 will ever exist but I guess so), even when the same container is used, as we did with .webm (which was a subset of .mkv)


> with universal support for pretty much any encoding being an inevitability on at least all software players

This is not a good assumption. MKV supports a loooot of things which many video players will not support at all.

And IIRC some browsers do not support MKV.



But do they all support all of ISOBMFF?


I think that's because video is a much more active and complex topic than still images.

We are still using image formats from the 90s, and their matching containers, and they are good enough, so there is not much work for going beyond that. There is no real incentive for making a more flexible format. By comparison, video is the biggest bandwidth hog and people care a lot.

And mkv supports video, multiple sound tracks, subtitles,... All using different codecs made by different people (ex: h265+opus or vp9+vorbis, or any other combination). An image container usually only has the image and a few metadata.


videos can have their own "containers" too, for instance in AV1 the stream is stored inside of an obu, which is wrapped in a external container. (such as matroska) if you really wanted to you could (and can) put images into containers too, PNGs in a matroska are actually pretty useful way for transfering PNG sequences.

you can also with a simple mod on a older commit of ffmpeg (the commit that added animated jxl broke this method and I haven't gotten around to fixing it) by simply adding jxl 4cc to mux JXL sequences into MKV.


There is ImageMagick, which last I checked (a year ago?) didn't support KTX, but it did have DDS and a lot of other niche image formats.


Question: why don't we:

1. put a reference to the decoder into the header of the compressed file

2. download the decoder only when needed, and cache it if required

3. run the decoder inside a sandbox

4. allow multiple implementations, based on hardware, but at least one reference implementation that runs everywhere

Then we never need any new formats. The system will just support any format. When you're not online, make sure you cached the decoders for whatever files you installed on your system.


We used to have formats like this, and then the attacker points to his decoder/malware package.

Apart from that of course the decoder has to be fast and thus native and interface with the OS, so the decoder is X86 on the today version of Windows, until the company hosting it dies and the patented, copyrighted decoder disappears from the internet.


Formats like what? You're not skipping the "sandbox" part are you?

A decoder can be extremely isolated. It's much easier to sandbox a decoder than to sandbox javascript, for example.


If you assume you just say the magic word ‘sandbox’ and everything is safe, then yes, security is a solved problem. This is however a prime example of the saying that in theory, there is no difference between theory and practice but in practice there is.


You're treating sandboxing like it's all the same, but it's not. Multimedia decoders are one of the absolute easiest things to sandbox. Webassembly was designed for sandboxing and even that's very overcomplicated for what a decoder needs.

If we had standard headers, then reading metadata wouldn't be part of the decoder. The decoder would only need to take in bytes and output a bitmap, or take in bytes and output PCM audio. It doesn't need to be able to call any functions, or run any system calls, and the data it outputs can safely contain any bytes because nothing will interpret it.

It's like taking the very core of webassembly and then not attaching it to anything. The attack surface is astoundingly small.

You just need to give it a big array of memory and let it run arithmetic within that array, plus some control flow instructions. Easy to interpret, easy to safely JIT compile.


I suggest you try building it and see ‘just’ how easy it is.


I have made secure emulators for simple CPUs before. Seriously, you only need a handful of opcodes and they only need to operate on a big array. It's hard to do wrong!

The part of sandboxing that's hard is dealing with I/O, or giving useful tools to the sandboxed code, or implementing data structures for the sandboxed code. You don't need any of that for a multimedia decoder. You just let it manipulate its big block of bytes, and make sure you bounds check.

A Java VM exposes tens of thousands of functions to the code inside it. A barebones sandbox exposes zero. It just waits for the HLT opcode.

And when it gives you raw RGB data, or raw PCM data, there's no way to hide a triggerable malicious payload inside. If the code does something bad, the worst it can do is show you the wrong image.


Yes, and you can also download a million projects like this, the sandbox exists.

But my suggestion would be that you build a video codec out of it. Preferably one that has the properties the market demands: performance and energy efficiency.


Not a codec, the only thing customized for this would be the container format.

You'd use existing codecs, and the way you get good performance and energy efficiency on a video codec is by having a hardware implementation. Software decoding doesn't even come into that picture.

As far as practical software decoding outside of battery-powered video, can I just point at webassembly? Especially the upcoming version with vector instructions. You could use normal webassembly, or even an extra-restricted version. It gets pretty good performance, and when you remove its ability to talk to the outside world it goes from pretty good to extremely good security.


The container format would be decoded by a sandboxed codec that can be found by decoding the container?

WebAssembly codecs indeed exist, and they are impractical due to a lack in performance.


> The container format would be decoded by a sandboxed codec that can be found by decoding the container?

The container parser would not be dynamically downloaded, and may or may not be sandboxed.

We don't need a new container with almost every codec. We just need the new codec itself.

> WebAssembly codecs indeed exist, and they are impractical due to a lack in performance.

Mostly because they don't have vector instructions yet, I bet. But plenty of webassembly is within 50% of native, which is good enough for lots of things, which includes image decoding for sure.


So now container decoders have been magically vetted and secured so they don’t need the sandbox. Which is quite surprising considering most vulnerabilities in streams are in the container decoders and their multitude of hardly used features, but okay.

The challenge remains for you to actually provide the codec you describe. Which a few comments ago was trivial because it was a hardware codec anyway, now it’s just a bit of WebAssembly away. Well that should be trivial because cross compilers to WebAssembly exist. So why don’t you just provide a few real world examples? Your probably not the first to think of these ideas, there has to be a reason why it hasn’t been done yet.


> So now container decoders have been magically vetted and secured so they don’t need the sandbox.

Not "magically". But you only need one or two, and they don't need to be very fast, so you can put a lot of effort into making them secure.

But more importantly, browsers already have many container decoders. This is not an expansion in attack surface. The goal here is allowing a lot more codecs compared to current browsers without a significant increase in attack surface compared to current browsers. Pointing out flaws that already exist doesn't disqualify the idea.

> So why don’t you just provide a few real world examples? Your probably not the first to think of these ideas, there has to be a reason why it hasn’t been done yet.

Image decoders in webassembly already exist. Did you even look? Including JXL!

Video decoding needs more support structure in the browser, and I already said some decoders need things that are being added to webassembly but aren't done yet. Even then, the first google result for "av1 webassembly" is a working decoder from five years ago.


This is what PrintNightmare did?

You no longer need "printer drivers", they're supposed to be automatically downloaded, installed and ran in a sandbox. You never need any "new drivers". The system will support any printer.

Except the "sandbox" was pretty weak and full of holes.


> Except the "sandbox" was pretty weak and full of holes.

Nothing prevents you from installing only the trusted ones.

Second, software is getting so complicated that if we don't build secure sandboxes anyway then at some point people will be bitten by a supply chain attack.


You can’t cache things online anymore as that is a way to track people (generate a unique “decoder” per user and check if they have it cached or not)


The solution is to cache things per user, not system wide. Or per "container" like in Firefox containers.


Because that is complex, unnecessary, and dangerous?


See also- https://dennisforbes.ca/articles/jpegxl_just_won_the_image_w...

It loads JXL if your client supports it.

Recent builds of Chrome and Edge now support and display JXL on iOS 17. They have to use the Safari engine underneath, but previously they suppressed JXL, or maybe the shared engine did.


Afaik WebKit added support in iOS 17 so it’s just a transitive win


There's only Safari on iOS. EVERY browser on iOS is just a skin on top of Safari's WebKit.

See 2.5.6 here - https://developer.apple.com/app-store/review/guidelines/


Literally specifically said that. Yet Edge and Chrome on iOS actively suppressed JXL in prior builds (yes they can do that — actually calling it a skin is technically wrong as while the engine is Safari, they have a lot of flexibility), and very recently exposed it.


Wasn’t this supposed to change with iOS 17?


Apple is dropping requirements for all browsers to use WebKit, but that doesn’t mean Mozzila/Google/Brave have started releasing them, especially with iOS 17 still being a beta.

https://9to5mac.com/2023/02/07/new-iphone-browsers/


The problem with trying to replace JPEG is that for most people it's "good enough". We already had "JPEG 2000", which would have been a step up in terms of performance, but it never saw any real adoption. Meanwhile, "JPEG XL" is at best an incremental improvement over "JPEG 2000" from the user's POV, which raises the question why people would care about this if they didn't about the previous one.


The big reason is that JPEG XL is a seamless migration/lossless conversion from JPEG.

You get better compression and services can deliver multiple resolutions/qualities from the same stored image (reducing storage or compute costs), all transparent to the user.

So your average user will not care but your cloud and web service companies will. They are going to want to adopt this tech once there's widespread support so they can reduce operating costs.


On top of not being backward compatible, JPEG 2000 was significantly slower and required more RAM to decode, which at the time it was released was a much bigger deal than that is today. And for all of its technical improvements for some domains (transparency, large images without tiling, multiple color spaces), it was not substantially better at compressing images with high contrast edges and high texture frequency regions at low bitrates because it just replaced JPEG's block artifacts with its own substantial smoothing and ringing artifacts.


JPEG 2000 had very bad implementations for a long time.

Second Life went with JPEG2000 for textures, and when they open sourced the client, they had to switch to an open source library that was dog slow. Going into a new area pretty much froze the client for several minutes until the textures finally got decoded.


JPEG2000 ran into a patent license trap. JPEG XL is explicitly royalty free.


But that doesn't address the point that JPEG XL is only marginally better and has a gigantic mountain to climb if it ever hopes to displace JPEG (and likely never will given the vast set of JPEG files that exist and will never be converted).


JPEG XL can losslessy transcode JPEG into a smaller format. JPEG2000 (or WebP or anything but Lepton[0]) didn't offer that. Besides, we have gif and png for approximately the same space. gif still isn't gone. Displacement isn't necessary for a new format to become useful.

[0] https://github.com/dropbox/lepton


Even if JPEG-XL could be considered only 'marginally' better than JPEG, and as good or even worse than avif/webP in some specific context, it is unique in the fact that it can also do lossless, HDR, extremely high res, complex color channels, generation loss protection, multi layers, advanced authoring features, etc... It's not a format only meant for the web but plenty of other use cases such as in science, medecine, art print, etc.

And not only that, it's reasonably fast to encode on consumer hardware.

JPEG XL has the ambition to supplant all the images formats of the next 20+years.


> JPEG XL is only marginally better

You will have to define what is marginally better. WebP is definitely marginally better than JPEG. And JPEG XL is easily 30- 40% BD-Rate at the same quality at BPP 0.8 or over.


It's actually pretty rare how successful and long-lasting JPEG has been when you think about it. Relatively simple, elegant compression, but still quite sufficient.

(Having said that I do wish for JPEG XL to become a true successor)


JPEG was, and remains, "alien technology from the future" (Tim Terriberry)


Majority of the usual users wouldn't even notice (save for possibly faster page load). JPEG XL has mostly only benefits - backward compatible, can be converted without loss to and from, has better compresión thus smaller sizes and less data to transfer/store and it has nice licensing. JPEG2000 had nothing of that...


> JPEG2000 had nothing of that...

That’s not true. JPEG 200 had substantially smaller file sizes and better progressive decoding – something like responsive images could have just been an attribute telling your browser how many bytes to request for a given resolution. It also had numerous technical benefits for certain types of images - one codec could handle bitonal images more efficiently than GIF, lossless compressed better than TIFF, effortlessly handle colorspaces and bit depths we’re just starting to use on the web, etc.

What doomed it was clumsy attempts to extract as much license revenue as possible. The companies behind it assumed adoption was inevitable so everything was expensive - pay thousands for the spec, commercial codecs charged pretty high rates, etc. and everyone was so busy pfaffing around with that that they forgot to work on things like interoperability or performance until the 2010s. Faced with paying money to deal with that, most people didn’t and the market moved on with only a few exceptions like certain medical imaging or archival image applications. In the early 2000s the cost of storage and disk/network bandwidth meant you could maybe try to see the numbers as plausibly break-even but over time that faded while the hassle of dealing with the format did not.


There is also JPEG-XR! Life is confusing


JPEG XR started off as a Microsoft format about 20 years ago, but no one trusted them to not sue if reimplemented. Microsoft supported XR, but almost nothing else did.


They gave up on it too. Internet Explorer was the last browser to support JPEG XR.


Image file generated by Zeiss microscope uses JPEG XR.


Yes, I wonder if they got a good deal from MS or why is that. Most of the other vendors went with JP2000 (Aperio, Olympus even Motic)


Yes if it was just about compression ratio nobody would bother, but that's not it's only feature.


Even if it could only encode JPG into a smaller format losslessly it would still be a major selling point that basically everyone in at least the e-commerce world would want. Think about how many hundreds of terabytes of JPG files eBay and Alibaba send per day, and cut that size in half for no quality loss.

Amazon alone sold over 375 million items this last Prime Day. Let’s say that was 200 million items loaded/day (ignoring the unpublished number of failed sales), with 9 images (the maximum from a cursory glance) at 2000x2000 for a 1:1 ratio and zoomability. For a 90% quality JPG at 24-bit color that’s 410KB. ((410KBx9)200,000,000) = 738TB. Now imagine cutting that in half with no perceptive difference except faster loading to the end-user.

For end users the other options may be more desirable, but I would argue the importance is in the compression itself.


great writeup. i wish it had started with the intro of "wtf is JPEG XL" for those of us not as close to it. but the ending somewhat approximates it. i'm still left not knowing when to use webp, avif, or jxl, and mostly know that they are difficult files to work with because most websites' image file uploaders etc dont support them anyway, so i end up having to open up the file and take a screenshot of them to convert them to jpeg for upload.

so do we think Chrome will reverse their decision to drop support?


> so do we think Chrome will reverse their decision to drop support?

The argument was that there's no industry support (apparently this means: beyond words in an issue tracker), let's see how acceptance is with Safari supporting it.

An uptick in JXL use sounds like a good-enough reason to re-add JXL support, this time not behind an experimental flag. Maybe Firefox even decides to provide it without a flag and in their regular user build.


We all know what the real argument is: NIH.


One of the three main authors of JXL works for Google. But in the Google Research Zurich office, so he might as well not exist to the Chrome team, I guess.


Last time I looked, all the top commiters were from Google.


Ah yes, NIH from a contributor to the spec, makes complete sense.


I've seen people rage-quit because their work was standardized but with modifications they themselves didn't approve of.


[flagged]


This is definitely not the case, the team behind Pik is eager to get JPEG XL deployed, but Chrome is blocking it.


I'm not sure what we are misunderstanding from each other. I'm saying google could very well be blocking it's deployment in chrome because it is not the implementation they originally came forward with if they are that petty and NIH.


The team that wrote the original Google proposal (PIK) joined forces with the team that wrote FUIF and together they created JXL, so no, those particular Googlers are not petty about this.

They're a distinct team from the Chrome team though.


While that's a possibility the article made it seem like Google's PIK team and Cloudinary worked together on JPEGXL


Same with av1. It's not verbatim vp10, but incorporates concepts developed by xiph.org and Cisco (and whoever else).


What I don’t understand is, why still push for JPEG XL when webP already has a lot of support and AVIF has a lot of momentum?


JPEG XL and AVIF have tradeoffs.

AVIF works extremely well at compressing images down to very small sizes with minimal losses in quality but loses comparatively to JPEG XL when it comes to compression at higher quality. Also I believe AVIF has an upper limit on canvas sizes (2^16 pixels by 2^16 pixels I think) where JEPGXL doesn't have that limitation.

Also existing JPEGs can be losslessly migrated to JPEGXL which is preferable to a lossy conversion to AVIF.

So it's preferable to have JPEG XL, webP, and AVIF.

- webP fills the PNG role while providing better lossless compression

- AVIF fills the JPEG role for most of your standard web content.

- JPEG XL migrates old JPEG content to get most of the benefits of JPEG XL or AVIF without lossy conversion.

- JPEG XL fills your very-high fidelity image role (currently filled by very large JPEGs or uncompressed TIFFs) while providing very good lossless and lossy compression options.


Possibly an underrated but potentially very useful unique feature of JXL is that it completely eliminates the need to use a third party thumbnail/image-scaling rendering site or workflow. If you need a full size JXL image rendered down to 25% size for one of your web views, you literally just truncate the bitstream at 1/4 the total (or whatever percentage of the total number of pixels of the full-size image you need, that's a trivial math calculation) and send just that.

That's tremendously simpler, both from an architectural and maintenance standpoint (for any site that deals with images), than what you would usually have to do, such as relying on either a third party host (and added cost, latency (without caching), and potential downtime/outage) or pushing it through the (very terrible and memory/cpu-wasteful codebase at this point) ImageMagick/GraphicsMagick library (and potentially managing that conversion as a background job which incurs additional maintenance overhead), or getting VIPS to actually successfully build in your CI/CD workflow (an issue I struggled with in the past while trying to get away from "ImageTragick").

You get to chuck ALL of that and simply hold onto the originals in your choice of stateful store (S3, DB, etc.), possibly caching it locally to the webserver, and just... compute the number of pixels you need given the requested dimensions (which is basically just: ((requested x)*(requested y))/((full-size x)*(full-size y)) percentage of the total binary size, capping at 100%), and bam, truncate.

Having built out multiple image-scaling (and caching, and sometimes third-party-hosted) workflows at this point, this is a very attractive feature, speaking as a developer.


That's just progressive decoding, though, and is only possible if you encoded the image correctly (which is optional). You can also do similar things with progressive jpeg, png, and webp, with jpeg being the most flexible.

The unique part AFAIK is that you can order the data blocks however you want, allowing progressive loading that prioritizes more important or higher detailed areas: https://opensource.googleblog.com/2021/09/using-saliency-in-...

(your thumbnails may or may not look terrible this way, as well. really better suited for progressive loading)


The thing with JPEG XL though is that its design is inherently progressive. Even when there is no reordering you will get 8x downsampled image before everything else (and the format itself exploits a heck out of this fact for better compression).


Apart from limited resolution probably the biggest problem with AVIF: It doesn't support progressive decoding. Which could effectively cancel out its smaller file size for any web applications. AVIF only shows when it is 100% finished. See

https://www.youtube.com/watch?v=UphN1_7nP8U

This comparison video is admittedly a little unfair though, because AVIF would have easily 30% lower file size than JPEG XL on ordinary images with medium quality.


Hehe, I see we have been down the same route. Sad to say but ImageMagick is awful at resource usage. VIPS can do 100x better in many specific cases, but is a little brittle. I do not it that incredibly difficult to build though


This is fascinating, I had never heard of this aspect of JXL.


Or JPEG XL can takeover all of it.

- JPEG XL can do lossless compression better than PNG if I’m right.

- At low bit rates, JPEG XL isn’t that far from AVIF quality. You will only use it for less important stuff like “decorations” and previews anyway so we can be less picky about the quality.

- For the main content, you will want high bit rates which is where JPEG XL excels.

- Legacy JPEG can be converted to JPEG XL for space savings at no quality loss.


Thank You both. Couldn't have said it better.

The use cases of WebP is limited, the actual advantage over decent JPEG and isn't that big, and unless you use a lot of lossless PNG I would argue it should have never been pushed as the replacement of JPEG. To this day I still dont know why people are happy about WebP.

According to Google Chrome, 80% of images transferred has an BPP 1.0 or above. The so called "low bit rate" happens at below BPP 0.5. The current JPEG XL is still no optimised for low bitrate. And judging from the author's tweet I dont think they intend to do it any time soon. And I can understand why.


AVIF is even more limited in resolution than that, just 8.9 megapixels in baseline profile or 35 megapixels in the advanced profile.

If you have image-heavy workflows and care about storage and/or bandwidth then JPEG-XL pairs great with AVIF: JPEG-XL is great for originals and detail views due to its great performance at high quality settings and high resolution support, meanwhile AVIF excels at thumbnails where resolution doesn't matter and you need good performance at low quality settings.


JPEG XL Lossless: about 35% smaller than PNG (50% smaller for HDR). Source: https://jpegxl.info/ So with JPEG XL WebP may not serve any real purpose anymore.


Memory is hazy but doesn’t JXL have better colors or color profile support?


You can scroll down (on mobile) to see an overview image comparing technical features on https://jpegxl.info/. It doesn't mention color profiles (although I presume that just means they're all equal there), but jxl does support higher max bit depth per channel (32 vs 10 for AVIF) and more channels (4099 vs 10). So for raw sensor data, and intermediate formats for image processing, where information loss should be avoided, it should be a lot better.

I'm hoping it gets adopted as a better underlying technology for various RAW formats, and hopefully a better successor to the DNG format while we're at it (currently these are TIFF based). I'm not even a professional photographer, and my hard drive is still mostly occupied by RAW files.


Yeah, the points you mention are what I remember what photographers really dig about JXL. Also higher bit depth is a big deal for some pro photographers.


I actually studied photography (technically contemporary art, but photography was my main medium) but chose to not pursue a career in it. You are correct, bit depth matters. It is unlikely 32 bits will ever be needed for RAW files though.

Specifically, it matters for source files and intermediate files.

With RAW files from the camera, the higher the bit depth of the analog-to-digital conversion (ADC) step, the less posterization this introduces on the signal. Theoretically at least, you're still limited by the sensor's dynamic range, and there are other subtleties involved, like light perception being logarithmic instead of linear, but RAW encodings being linear[0][1]. But in simple terms: paired with a sensor with high dynamic range and good ADC, a higher bit depth results in less noise and higher dynamic range. Which allows one to recover more fine detail from shadows and highlights. Which makes the camera more forgiving in normally difficult lighting scenes (low light and/or high contrast). So a higher bit depth can aid in giving photographers creative freedom when shooting, and more flexibility in editing their photos without loss of fidelity.

So yes, it is an important cog in the machine that is the whole processing pipeline.

Having said that, as I mentioned our eyes perceive light logarithmically. The dynamic range of the human eye is... complicated to determine, because it adjusts so quickly. At night it may go up to 20 stops, during the day 14 stops is likely to be the typical range[2]. So it's probably not a coincidence that digital cameras have "stalled" at using 14 bits for their RAW files, typically: the photographer likely wouldn't be able to see more contrast in the lights and shadows before taking a photo anyway!

[0] https://www.dpreview.com/articles/4653441881/bit-depth-is-ab...

[1] No I don't understand why floating point ADCs aren't used either, seems like it would be a more sensible approach to me and they do exist: https://ieeexplore.ieee.org/abstract/document/776106

[2] https://clarkvision.com/imagedetail/eye-resolution.html


According to the article, WebP requires more CPU to decode. JPEG XL also supports lossless transcoding from JPEG, so it could be used for old image sets with no loss in image fidelity.

There are arguments for the new format, but the Chrome people seemed unwilling to maintain support for it when pick-up was non-existent (Firefox could have moved it out of their purgatory. Safari could have implemented it earlier. Edge could have enabled it by default. Sites could use polyfills to demonstrate that they want the desirable properties. And so on.)

To me, the situation was one of "If Chrome enables it, people will whine how Chrome forces file formats onto everybody, making the web platform harder to reimplement, a clear signal of domination. If they don't enable it, people will whine how Chrome doesn't push the format, a clear signal of domination", and they chose to use the variant of the lose-lose scenario that means less work down the road.


> There are arguments for the new format, but the Chrome people seemed unwilling to maintain support for it when pick-up was non-existent

Of course there is no pick-up when Chrome, with its massive market share, doesn't support it. Demanding pick-up before support makes no sense for an entity with such a large dominance.


- Polyfills (there _is_ polyfill-enabling code - maintained by Chrome devs.)

- Microsoft enabling the flag in Edge by default and telling people that websites can be 30% smaller/faster in Edge, automatically adding JXL conversion in their web frameworks

- Apple doing the same with Safari (what they're _now_ doing)

- Mozilla doing the same with Firefox (instead of hiding that feature in a developer-only build behind a flag)

None of that happened so far, only the mixed signal of "lead and we'll follow" and "you are too powerful, stop dominating us." in some issue tracker _after_ the code has been removed.


Why are you talking about Microsoft, Apple, and Mozilla, when Chrome has a larger market share than all of them?

> "you are too powerful, stop dominating us."

That's twisting things. The problem was that the argument of the Chrome team against JPEG XL was self-refuting. They were themselves the main cause of what they complained about.


Because Microsoft, Apple and Mozilla can still exert pressure: "Support this feature we enabled and benefit from 20% less traffic with users of our browsers" and "Use Edge/Safari/Firefox to browse the web faster (and with metered connections: cheaper)" still has an effect on Chrome's decision making.

Chrome had that code, hidden behind a flag. There wasn't any kind of activity. No questions "when will you put it in by default in Chrome?". No other Blink-based browser (Edge, Brave, Vivaldi, Opera) that could easily pick up the support by enabling that damn flag by default did so. Firefox hid JXL support even better than Chrome. No image sharing site that did the math and considered "200KB for a polyfill saves us and our users megabytes in traffic on each visit" and acted on that.

That doesn't look like anybody is interested in JXL support.

I'm bringing this up again and again because I dislike that notion of "Chrome is the market leader and we're powerless to do anything about it. Bad Google." It neither encourages the Chrome folks to do better nor anybody else to pick up the slack. It's 100% complaint, no matter what Chrome does.


That's why I hate Chrome's monopoly.


> so do we think Chrome will reverse their decision to drop support?

Nope.

Microsoft could probably push Google over the Edge. They have a lot of influence over Chrome with Edge/Windows defaults, business apps and such.


If they don’t they’re going to look pretty stupid… Chrome Leadership stated “we’ll only support JXL if Safari do” to at least one large tech company who were unhappy with JXL being dropped by Chrome

(and no I can’t tell you how I know this)


Microsoft also weirdly go out of their way to strip AVIF support from Edge


I noticed that.

It feels disturbingly tribal.


The only way to strip Chrome of their monopoly power is to remove their decisionmaking mattering: Switch all your stuff on your websites to JXL, let Chrome provide a bad experience, and then it's up to them if they fix it.


Sadly, unless many webmasters do it, it'll likely feel like "your site is broken" instead of "Chrome is broken" to users.

Maybe with a banner like "You are using Chrome. You might have a degraded experience due to the lack of support for better image formats. Consider Firefox".


Provide alternative formats, making the browser autoselect. Or provide a JXL polyfill.

In either case: Have Chrome telemetry report home that "user could have 20% faster page load with JXL support".


JXL polyfill is definitely the way to go to force Chrome's hand.


It’s not going to force Google’s hand. Using a polyfill will slow page loads and adds additional fragility.

Nothing changes for Chrome users, especially sites using the <picture> element where the first supported image format is used.


It will slow page loads for Chrome and browsers not supporting it but it will be blazing fast on Safari. You keep the file size savings and high image quality.

I’m sure YouTubers and tech sites will love to do Safari vs Chrome (and Co.) content to spread the message that Chrome is inferior.


Feels a little like a déjà vu from an older browser by a big company that had a monopoly.. IMO, the sad truth is that what will happen is that JXL simply won't be used because it's not worth it to lose customers in exchange for a few kilobytes saved. Google has won, it has a monopoly on search, advertisement and the browser and decides de facto of all standards.


Chrome did not settle things with its decision to not use JXL at this time.

(WASM) Polyfill and we're done.


> Feels a little like a déjà vu from an older browser by a big company that had a monopoly..

Any resemblance with previous events would be totally unintentional, of course :-)


This gives me a ping of nostalgia from back in the day with JPEG was new and you had to have an external application to see jpg files until the browsers started adopting the standard. Then you had to decide if there were enough jpg images on the sites you liked to warrant changing browsers!


I feel like JPEG XL's problem is branding. The name suggests it's like JPEG, but the file size will be bigger which isn't something I want.


Don't think that's the problem, but agree with what the name immediately suggests. It wouldn't have been very hard to come up with a name that implies "these files are better" instead of "these files are extra large."


XL is pretty well understood to mean extra large, no?


Yeah, that's the only thing I (and I would bet 99.999% of everyone else) have ever understood it to mean.


HF, like a Lancia.


Should've been JPEG SX.


Sounds like a budget car sold in SE Asia / Latin America in the base trim level.

"Can't decide between a Nissan March, Jpeg SX, or the Honda Jazz EX"


JPEG DX2/66?


I am not sure I believe the results from models like SSIMULACRA.

It might be I am not encoding properly but when I did trials with a small number of photos with the goal of compressing pictures I took with my Sony α7ii at high quality I came to the conclusion that WEBP was consistently better than JPEG but AVIF was not better than WEBP. I did think AVIF came out ahead at lower qualities as you might use for a hero image for a blog.

Lately I've been thinking about publishing wide color gamut images to the web, this started out with my discovery that a (roughly) Adobe RGB monitor adds red when you ask for an sRGB green because the sRGB green is yellower than the Adobe RGB green and this is disasterous if you are making red-cyan stereograms.

Once I got this phenomenon under control I got interested in publishing my flat photos in wide color gamut, I usually process in ProPhotoRGB so the first part is straightforward. A lot of mobile devices are close to Display P3, many TV sets and newer monitors approach Rec 2020 but I don't think cover it that well except for a crazy expensive monitor from Dolby.

Color space diagram here: https://en.wikipedia.org/wiki/Rec._2020#/media/File:CIE1931x...

Adobe RGB and Display P3 aren't much bigger than the sRGB space so they still work OK with 8-bit color channels but if you want to work in ProPhotoRGB or Rec 2020 you really need more bits, my mastering is done in 16 bits but to publish people usually use 10-bit or 12-bit formats which has re-awakened my interest in AVIF and JPEG XL.

I'm not so sure if it is worth it though because the space of colors that appear in natural scenes is a only bit bigger than sRGB

https://tftcentral.co.uk/articles/pointers_gamut

but much smaller than space of colors that you could perceive in theory (like the green of a green laser pointer. Definitely Adobe RGB covers the colors you can print with a CMYK process well, but people aren't screaming out for extreme colors although I expect to increasingly be able to deliver them. So on one hand I am thinking of how to use those colors in a meaningful way but also the risk of screwing up my images with glitchy software.


Display P3, which is what most good but still cheap monitors support, is very noticeably much bigger than sRGB, i.e. the red of Display P3 looks reasonably pure, while the red of sRGB is unacceptably washed out and yellowish.

Adobe RGB was conceived for printing better images and it is not useful on monitors because it does not correct the main defect of sRGB, which is the red.

Moreover, if I switch my Dell Display P3 monitor (U2720Q) from 30-bit color to 24-bit color, it becomes obviously worse.

So, at least in my experience, 10-bit per color component is always necessary for Display P3 in order to benefit from its improvements, and on monitors there is a very visible difference between Display P3 (or DCI P3) and sRGB.

There are a lot of red objects that you can see every day and which have a more saturated red than what can be reproduced by an sRGB monitor, e.g. clothes, flowers or even blood.

For distributing images or movies, I agree that the Rec. 2020 color space is the right choice, even if only few people have laser projectors that can reproduce the entire Rec. 2020 color space.

The few with appropriate devices can reproduce the images as distributed, while for the others it is very simple to convert the color space, unlike in the case when the images are distributed in an obsolete color space like sRGB, or even Adobe RGB, when all those with better displays are still forced to view an image with inferior quality.


Personally I think these days ideally you should be able to just publish in Rec2020 and let devices convert that to their native colorspace. I'd consider AdobeRGB purely legacy thing that doesn't really have relevance these days. Display-P3 makes sense if you are living and targeting exclusively Apple ecosystem, but not much otherwise. ProPhoto is good in itself, but idk if it really makes sense to have separate processing (rgb) colorspace anymore when Rec2020 is already so wide. Of course if you have working ProPhoto workflow then I suppose it doesn't make sense to change it.


I agree with you, except that Display P3 is not exclusive to Apple.

A lot of monitors from most vendors support Display P3, even if it is usually named slightly erroneously as DCI P3.

Display P3 differs from the original DCI P3 specification by having the same white color and the same gamma as sRGB, which is convenient for the manufacturers because all such monitors can be switched between the sRGB mode (which is normally the default mode) and the Display P3 mode.

Nonetheless, even if today most people that have something better than junk sRGB displays have Display P3 monitors (many even without knowing this, because they have not attempted to change the default sRGB color space of their monitors), images or movies should be distributed as you say, using the Rec. 2020 color space, so that those with the best displays shall be able to see the best available quality of the image, while the others will be able to see an image with a quality as good as allowed by their displays.


I don't think it's fair to equate colors in natural scenes with the space of colors you find with diffuse reflection. There are tons of things (fireworks, light shows, the sky, your 1337 RGB LED setup, fluorescent art, etc.) people may want to take photos of that include emission, scattering, specular reflection, etc.

In practice that larger space of things you could perceive "in theory" is full of everyday phenomena, and very brilliant colors and HDR scenes (e.g. fireworks against a dark sky) tend to be something people particularly enjoy looking at/taking pictures of.


Fireworks look like a good subject but specular reflections give me the creeps. (e.g. blow out the sensor of my camera and/or not being well reproduced.)


> I came to the conclusion that WEBP was consistently better than JPEG

This surprises me greatly if you're talking about image quality. I've always found WebP to be consistently worse than JPEG in quality.

I only use WebP for lossless images, because at least then being smaller than PNG is an advantage.


Eh... The Apple ecosystem is relatively isolated.

They adopted HEIF, and have not adopted AV1 video.


They also adopted HEIC which is actually quite dangerous thing for the open web to be supported by a browser, given how heavily patented the standard is.


They adopted it at the OS level years ago (when iPhones started saving it as an option), but I think it's only been in iOS 17 Safari this year (i.e. in beta) that Safari itself has started supporting it...

(At least Safari 16.5.2 on Ventura 13.4.1 won't open .heif / .heic files for me).


Yeah thats what I was thinking of, I forgot the distinction (and the other HEIF variants)


> Eh... The Apple ecosystem is relatively isolated.

Sure, Apple shipped the first consumer computer that supported Display P3 in 2015 [1].

And while there are several other vendors including Google with devices that support Display P3, Apple’s 2 billion devices is not nothin’.

[1]: https://en.m.wikipedia.org/wiki/DCI-P3#History


I wish they would include the BPG format from Bellard, even though I don't know if that format is free from any inconvenient https://bellard.org/bpg/

Note that jpg xl is different from jpg 2000 and jpg xr


BPG is based on HEVC which has patent/licensing baggage.


In previous test by Cloudinary BPG were surprising better than AV1 and JPEG XL in some / many categories. Which lead me to believe VVC in BPG would have done even better.


I can't find what you're referring to, but I would be interested if you could share it.


I truly think jpeg xl would have done better with a better name.


It's a pretty stylish and parsimonious file extension even if long file extensions are well-supported (and they are), but I think *.jpegxl should also be recognized. Have a feeling it will be de facto recognized eventually, if the format itself gets traction.


I like JXL better than Jpeg XL - if it is brought to the masses I can imagine "jexel" to replace "jaypeg"


The extension for the file format is .jxl and is pronounced "jixel" :)


It's hilarious to me that 25ish years after the death of DOS, we still define dot-three-letter file extensions in new standards.


Right?

Well, at least in the tiny part of the IT world I get to control, I always try to validate based on both the three letter extension and any common or sensible expansion of that. So ".jpg" or ".jpeg", ".jxl" or ".jpegxl" etc. etc. (And in most cases, I actually try to parse the binary itself, because you can't trust the extension much anyway.)


"I thought we were free to choose our filename extensions to our liking?"

"Well, three characters is the bare minimum. If you feel that three characters is enough, then okay. But some people choose to have longer filename extensions, and we encourage that, okay? You do want to express yourself, don't you?"


Especially funny since JPEG images commonly have .jpeg as an extension!


If it ain't broke...


Ah, following the footsteps of .png (which is pronounced ping, according to the standard)


I actually think it referencing JPEG is smart as it's immediately recognizable even by regular users as "image" and positions the format as a successor.


JPEGX or JPEG2 would be more marketable than JPEG XL.


TBH I just don't see a difference.


Maybe dumb question: If JPEG XL beats avif, and both are royalty free, shouldn't the AV group create a new video format based on av1 that for I-frames uses JPEG XL?

I mean, it feels like the same static image codec should be used in whatever free standard is being pushed for both video I-frames and images, since the problem is basically the same.


IIRC, JPEG XL beats avif on high-quality images, and avif is better on low quality. For typical vieo encoding, you don't care about perfection that much.


Specifically, the biggest users of video codecs are video streaming websites/services where bitrate, not quality, is king. Logically, their codecs of choice are optimized toward the low bitrate, "sorta acceptable, but I thought I bought a 4K TV?" quality corner.


Yeah, while in the case of images, the quality requirements are usually way higher.


Oh man. We’re still dealing with the .heic debacle where you can’t use photos from your iPhone with many applications (like Gmail) unless you manually convert them to .jpg

So crazy to me that Apple and Google fight over image formats like this.

I guess this is just the next round.


Can’t you just set your iPhone to use the most compatible format (forcing it to use JPEG/h.264) for its camera?


A short explanation of what JPEG XL is or does at the beginning of the article would have been nice. Saying:

"""Google PIK + Cloudinary FUIF = JPEG XL"""

Before saying what it is, was a little of-putting.


It's a section header in an article written as a story. It's normal for those to not be understood until you read the section. And the explanation begins with the first sentence of that section. I don't think this is a reasonable complaint.


But they didn't even explain what FUIF or PIK might be in that section or even the entire article!

To understand that article required me searching for FUIF [1], PIK [2] and a brief explanation of what JPEG XL is trying to achieve.

I double down on my "complaint" - I'd call it constructive criticism - that article was poorly written. It's actually quite a good story that their Free Universal Image Format (FUIF) has achieved what it has. That's a great acronym, especially for a world that thinks JPEG XL is a good acronym! Why not put in in the article.

To save anyone else time:

[1] https://github.com/cloudinary/fuif [2] https://github.com/google/pik


That's a fair criticism, though personally I feel like "it was a candidate to be the next JPEG" is enough information here, even if it's a bit barebones. There are several levels of detail you could go into about how the codec works, and this article decided to stay quite light.


I feel like Apple came to support JPEG XL too late, it will never take over like JPEG did, because Google dropped support for it in Chrome to support their own webps and avifs.


JPEG had a pretty slow start too though. I remember getting a viewer program on some shareware CD-ROM that would then take forever to decode a high-res image of an astronaut in the early nineties and not understandig what this was useful for.


Oddly, Safari Tech Preview on Ventura advertises support for JXL but the images don't actually render. So the linked page has almost no images, just broken placeholders.


The image formats (at least newer ones) in Safari defer to OS support, so you'll need Sonoma to view JXL in Safari.


Then you'd think STP wouldn't offer support for JXL. But it does, both to the site itself and in the release notes.


The images shown to me are .avif (Firefox and Chrome on Windows)


The header image I see is indeed an AVIF but it depends on what your browser sends in the `Accept` header. Chrome sends image/avif,image/webp,image/apng,image/svg+xml,image/*,*/*;q=0.8 but if you drop image/avif from there you get a webp and if also drop image/webp from the header you finally end up with a jpeg.

However if you manually request that image with a custom `image/jxl` at the start of the `Accept:` header, you get a JPEG XL result. So GP is correct, but you won't see that behavior except on their PC (errr, Mac) -- unless you use Firefox and enable JPEGXL support in about:config, of course.


The STP request includes jxl.

image/webp,image/avif,image/jxl,image/heic,image/heic-sequence,video/;q=0.8,image/png,image/svg+xml,image/;q=0.8,/;q=0.5


Yeah, I was explaining how you could reproduce those results without being on the Safari preview.


How does mozjpeg compare to libjpeg-turbo? at what quality is jxl faster than mozjpeg/libjpeg-turbo?


> MozJPEG is a patch for libjpeg-turbo. Please send pull requests to libjpeg-turbo if the changes aren't specific to newly-added MozJPEG-only compression code.

https://github.com/mozilla/mozjpeg#mozilla-jpeg-encoder-proj...


Libjxl includes a better jpeg encoder/decoder, too, called jpegli. It can be used as a drop-in replacement of mozjpeg or libjpeg(-turbo). It gives ~25 % more density in the high end and allows for 10+ bits (important for HDR use).


Apples and oranges. JPEG XL is a new codec entirely (though it allows lossless conversion from JPEG).


In general, the legacy hardware codec deployments are more important than what some ambitious software vendors think is "better". The primary inertia of media publishing markets, is content that will deliver properly on all platforms with legacy compatibility.

Initially, a new software codec will grind the cpu and battery-life like its on a 20 year old phone. Then often becomes pipelined into premium GPUs for fringe users, and finally mainstreamed by mobile publishers to save quality/bandwidth when the market is viable (i.e. above 80% of users).

If anyone thinks they can shortcut this process, or repeat a lock-down of the market with 1990s licensing models... than it will end badly for the project. There are decades of media content and free codecs keeping the distribution standards firmly anchored in compatibility mode. These popular choices become entrenched as old Patents expire on "good-enough" popular formats.

Best of luck, =)


Doesn't seem to much too relevant for image codecs though, no? Decoding 10s of still images on a CPU for a webpage that'll be used for minutes versus 10s of delta images in a much more complicated video coded aren't quite comparable.

I don't think we have much deployment of WebP or AVIF hardware decoders yet the formats have widespread use and adoption.


WebP has problems (several unreported CVEs too), and is essentially another tga disaster under the hood. Anyone smart keeps those little monsters in a box. =)

What I am seeing is the high-water-mark for WebP was 3 years ago... It is already dead as a format, but may find a niche use-case for some users.

Consider built-in web-cams that have hardware h264 codecs built-in, as the main cpu just has to stream the data with 1 core. Better battery life for both the sender and receiver.

Keep in mind a single web page may have hundreds of images, and mjpeg streams are still popular in machine-vision use-cases. As most media/gpu hardware is now integrated into most modern browsers, the inertia of "good-enough" will likely remain.. and become permanent as patents expire. =)


You are thinking of video codecs.

Is hardware avif decoding done anywhere? The only example I can think of where this is done is HEIF on iOS devices, maybe.

Some cloud GPUs have jpeg decoding blocks for ingesting tons of images, but that'd not really the same thing.


Actually, right now I'm thinking about whether cold pizza is still edible.

mjpeg streams are still popular in machine-vision use-cases. And you can be fairly certain your webcam has that codec built-in if the device was made in the past 15 years. Note most media/gpu hardware is now integrated into modern browsers. =)


How much better do you think a new codec needs to be to make it all the way to mainstream? 2x? 10x?


Better aesthetics and even 18% reduction in file sizes refuses to move the sleepy elephant (h265 is likely still fringe stage). Even a trivial codec licensing fee of $2.50 for Pi users was not very successful for many media formats (i.e. 10% of retail price kills most of the market). However, h264 support was important enough to wrap into the pi4 retail price, and even at 11m/pcs a month there is still no stock available.

https://www.youtube.com/watch?v=ygU2bCx2Z3g


I would like to think that integrating reconfigurable logic into chips will help. But, no idea if the economics makes sense. And, the ecosystem around managing that pretty much does not exist.


ASIC are very space/energy efficient compared to fpga.

The choice a chip maker has is to include popular legacy/free codecs like mp3, or pay some IP holder that won't even pick up a phone unless there is over $3m on the table. h264 was easy by comparison, but hardly ideal. NVIDIA was a miracle when you consider what they likely had to endure. =)




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: