Interesting to see they're also applying Daala to still images. It's a very different set of constraints and optimizations than video.
My first instinct was to wonder how it compared to Bellard's recently featured BPG image format [1], which achieves impressive quality at low sizes. Turns out you can check this by selecting "HEVC" in Xiph's image comparison, since that's the core method employed by BPG.
> Interesting to see they're also applying Daala to still images. It's a very different set of constraints and optimizations than video.
Video codecs have to encode keyframes, which is roughly the same problem as still images. Many still image codecs are modified intraprediction modes from video codecs. So it's really a subset[0] of the constraints and optimizations, not a totally different set.
It's also the case we've recently spent a lot of time working on intraprediction issues (and related topics). One reason for that is that the lapping filters make intraprediction very difficult compared to traditional codec designs.
Demo2[1] explains much of this difficulty in the context of a solution that we tried and had to abandon. The paint demo[2] explains another solution we tried, which also failed, but which is now the basis of a post-filtering technique.
[0] Modulo things like color format support, metadata, etc.
Back when BPG was announced, I mentioned here that the real headline was the idea of distributing a new image codec without requiring a browser change, and without requiring the format to be solidified.
So, it should be possible to emscripten (that's basically how BPG works - it's an H.265 still-frame codec compiled to Javascript) with Daala, and distribute Daala still-images even though it's work-in-progress. The best bit is when Daala improves, you just re-generate your images and distribute a new Javascript decoder.
Anyone up for this? Even if it's not as good as BPG (it's very slightly worse it seems), at least it's royalty-free.
And Daala is not fixed yet, so there's a good chance of further improvements a-coming. And it's likely to be relatively clean of patents and may enable a cross-everything royalty free implementation. But time will tell how MPEG LA will feel about that.
Edit: I'm also wondering, just in case any experts are around... with Daala's design seeming at first glance inherently more suitable for that kind of thing, lifting filters, wavelets and all, does that mean that we're more likely to see some hierarchical modulation/scalable video coding/bitrate peeling sooner rather than later?
Daala uses wavelets in a few areas, but is largely a lapped DCT based codec. It's not especially suited for producing scalable bitstreams, at least not much more than any other codec. There's little demand for it right now, so we don't plan to include it as a feature.
However, if you have some interesting application of it, feel free to let us know!
This isn't my direct area of expertise, so I apologize in advance if I am incorrect about anything.
Video streaming suffers from unstable connections, and varied throughputs. Some home connections can support high fidelity 1080p and others can just manage visibly degraded 480p.
Right now, either you have to store each resolution separately (which can be costly, esp. at 3x redundancy and esp. if your average video is only going to be watched a few hundred times e.g. YouTube), or you have to transcode on-the-fly, which is expensive for any sort of good compression rate.
As we move forward, p2p streaming is likely to be a lot more dominant. There are a few services already out there and a few more in development that have a lot of promise. P2p settings are even more sensitive to transcoding costs, as you'd rather have peers uploading in high parallelism, meaning each one would need to be transcoding.
I do believe that scalable bitstreams will be important, as they solve a lot of the problems.
Has bitrate peeling been used in production by anybody? I got the impression that the bitrate peeling in Vorbis is never used, and that streaming providers will just encode multiple rates for a video and switch between them. It does sound like a cool concept, though.
Vidyo uses an implementation of H.264 SVC, I believe. But in general it's not been widely adopted. SVC generally has worse compression than normal H.264, so for streaming it makes more sense just to store multiple encodes at different qualities.
Yes, SVC's sweet spot is more geared towards real-time than pre-cacheable streams.
SVC stands for "scalable video coding", and generally behaves in at least one of two dimensions: temporal or spatial.
Temporal SVC is pretty much what we have with H.264 AVC; it provides a means to support multiple receiver frame rates from a single sender frame rate. So, if Alice transmits at 60fps, a temporally-smart coding will allow a switch between Alice and Bob to drop some frames so Bob only gets 30fps, say, while Charlie gets the full 60fps. This is useful if Alice has a wonderful upstream connection and Charlie has a great downstream connection, but Bob is connecting via a crappy mobile connection.
Spatial SVC is the same thing, but cast in terms of resolution. Imagine that Alice is transmitting 1080p, and Bob is still on that bad mobile network. An SVC coding would mean that Alice could transmit a single high-res stream, and a smart media switch could send Bob a "down-sampled" 380p stream without having to decode and re-encode, while still sending the full 1080p to Charlie.
So. Clearly, this is awesome when dealing with real-time coding (e.g., video calling). But when streaming content from Netflix or Youtube or whatever, it's trivial to provide different streams at different resolutions to different users, based on prevailing network conditions for each user. The up-front cost of encoding and storing a dozen so different versions of the same content is minimal, and the work to identify the right stream for each user is about the same in either case.
Even for real-time use cases, the range of bitrates where (spatial) SVC provides an advantage over simulcast is relatively small, and it's not a big advantage. Plus, it's a patent minefield. I've also seen people argue that it has a computational complexity advantage over coding multiple streams, but a properly written simulcast encoder can share a bunch of the work between encodes as well (libvpx has an implementation of this for VP8, for example, for use in WebRTC).
Agreed; the way the numbers work out today, the advantages are minimal. The story looks like it might be a bit different with 4k on the horizon, though.
it's likely to be relatively clean of patents and may enable a cross-everything royalty free implementation. But time will tell how MPEG LA will feel about that.
So that's the thing. I'm all for royalty-free encoders, but the only way to know if an algorithm is really royalty-free is for it to be older than the current maximum patent age. I believe that Daala is being designed to be royalty-free, but that's not measurable in advance.
There are other good reasons to support Daala, and explicitly building something to avoid royalty issues is certainly laudable (esp. compared to explicitly doing the opposite), but you can't just go and guarantee that something new is royalty-free, and it's frustrating to see the community make exactly those sorts of statements.
The U.S. patent system makes it impossible to guarantee anything with respect to patent infringement. But how many royalty-free codecs does Xiph need to develop and deploy at scale [0-4], without anyone filing lawsuits, before people believe that success here is possible?
P.S., in case it's not clear, I've volunteered for the Xiph.Org Foundation for over a decade.
I don't know if there has been any concerted effort to find possible patents, however daala is touted as patent free because its implementation uses different technologies from H264/HEVC, hence most/all of the MPEG-LA patents shouldn't apply.
Charles Bloom has been discussing PVQ and Daala over on his blog. I am not an expert nor have I fully understood these articles, but they definitely look like interesting related reading.
I prefer the JPEG to the Daala in that one sample image; JPEG does have more artifacts, but Daala seems to preserve less detail.
For low-to-medium bitrate video that might be the correct tradeoff though.
[edit] H264 and H265 look like successive incremental improvements over JPEG in the direction I personally prefer; they reduce artifacts without losing detail.
I agree; JPEG has more artifacts which is most noticeable in low-frequency areas, but preserves more detail in high-frequency areas.
All of the non-JPEG codecs seem to have greatly reduce artifacts in the low-frequency areas, but for some of them (VP8 in particular) it was at a significant loss of detail.
Apart from JPEG's blockiness and VP8's complete loss of detail, it looks like the main difference between codecs is which parts of the image have detail preserved. Some codecs preserve subtle sky texture at the expense of the trees, others do the opposite.
I did not see detail in the JPEG that was not present in the h.265; a few regions in the grass still look slightly better, since they have more noise and less blur, and that's more pleasant to the eye in high-frequency areas.
Also note that there is more detail in the h.265 than the JPEG where the trees meet the sky.
Just to add as feedback to that: I much prefer the Jpeg render of 2 things in particular in that sample image. First, the brownish stone on the lower-right corner. As we as the car tracks. Regarding the car-tracks, the rough look in the Jpeg render looks more natural to me. The stone is though only slightly better in the jpeg. For the rest of it, Daala wins! I specially like the thin, tall leave-less tree on the left.
FWIW, I know that there is some research showing that students (over time) express a preference for MP3 compression artifacts. Something similar could be at play here.
I personally think the whole slider comparison method is pretty dumb - it makes direct comparison of certain image areas much harder (since you have to swivel your mouse around, not to mention how annoying this is on mobile) compared to a simpler solution where the whole image changes on mouse hover (which on mobile works nicely with a tap). This is how basically all screenshot comparison sites out there operate, and it really does work the best for the purpose of comparing two images like this.
Is it actually an animation? I assumed it was lag caused by redrawing each time the mouse moves and not ignoring queued mouse movements. Perhaps I assumed incorrectly.
I thought the BPG page earlier had a much more instant response, but it appears I remembered incorrectly because it behaves almost the same. My bad. http://xooyoozoo.github.io/yolo-octo-bugfixes/
While in this case VP9/x256 may be better indeed (they seem to preserve higher-frequency details than Daala), beware of "looks nicer" when judging codecs in genreal. Codecs are not in the business of making nicely distorted images (they're not Photoshop/Instagram). They're supposed to keep images as close to the original as possible.
For example smooth images tend to be judged as "nice", but if the original had noise then smoothness is a distortion caused by lack of detail and deblocking blur covering it up. JPEG 2000 fell into the smoothness trap: it gave it better PSNR and nice low-bitrate examples, but ultimately failed, because you don't always want everything looking like plastic.
It would be interesting to build a codec with human preferences in mind; you might end up with images that most people think looks better than the original at a fraction of the size - how you make those choices is very difficult though.
Some video encoders already incorporate perceptual optimizations (eg, x264's psy-rd and psy-trellis) that "look" better but lead to objectively worse results with traditional image quality metrics.
Audio codecs, however, have been using psychoacoustic models for decades. Frequencies outside the human hearing range are clipped, masked noises are discarded, voice codecs emphasize the range of human speech, etc.
My first instinct was to wonder how it compared to Bellard's recently featured BPG image format [1], which achieves impressive quality at low sizes. Turns out you can check this by selecting "HEVC" in Xiph's image comparison, since that's the core method employed by BPG.
[1] http://bellard.org/bpg/