So, now that the "but MP3 is patent-encumbered" consideration is finally dead-ish, what's the new balance on the debate between shipping {MP3, MP4-AAC, MKV-Vorbis, MKV-Opus} as music/SFX assets in software products?
Also, with the two sub-considerations:
• on mobile, where hardware decoding exists for only some codecs;
• on the web, where the audio APIs only accept some of these formats/codecs.
MP3 is the worst performer of the lot, but is suddenly more attractive because it's free and AAC is decidedly not. If you don't need compatibility (like for bgm/sfx) it's still a worse choice than Vorbis or Opus because it compresses worse. For sfx and looping music in particular, mp3 is also a mess because there's no standard way to play back samples with zero delay and zero extra samples at the end.
Mp3 is also really accessible. Anyone who's ever worked with audio in an amateur capacity knows how to create an mp3. I don't think many people would know how to properly encode an Opus file. I did an informal survey on Quora a while back, and I was surprised to see that almost every audio industry professional had no idea what Opus even was.
Mp3 is, for all intents and purposes, the animated gif of audio.
On the other hand, due to the encoder patent, most FOSS audio-editing tools ship without MP3-encoding support included.
My first instinct, on being asked to create an MP3 from a WAV + some effects, is to download Audacity, diddle around, and then export to MP3. But—and I always forget the first time I do it—you have to download an additional LAME plugin for Audacity to get that MP3-export to happen.
On the other hand, Opus is just right there in the Export menu; the codec is built into the app.
(On Macs, AAC is also "right there" in Audacity's export menu, due to OS-native Quicktime encoding support; but on other OSes AAC export requires similar fiddling to MP3 export.)
And, mind you, this isn't nearly as much of a concern on Linux, where usually the app package will have optional dependencies on the MP3/AAC encoder libraries, and so—if you have your "non-free" package repositories enabled—they'll get installed along with the software.
> I was surprised to see that almost every audio industry professional had no idea what Opus even was.
Assuming you mean people who deal with audio production, I'm not surprised. Opus is primarily for real-time audio streaming, not for streaming music or for general audio compression or storage.
> Opus is unmatched for interactive speech and music transmission over the Internet, but is also intended for storage and streaming applications.
> Opus can handle a wide range of audio applications, including Voice over IP, videoconferencing, in-game chat, and even remote live music performances. It can scale from low bitrate narrowband speech to very high quality stereo music
Sure, their marketing copy says it's intended for storage, but that's not how it's actually used in the wild. In reality, it's primarily used for interactive/real-time streaming and not used for storage. I think it just boils down to the idea that the storage formats we're using are more than sufficient, and Opus doesn't really bring much new to the table there. It does, however, perform remarkably better than most/all of the alternatives when dealing with (e.g.) VoIP, so it's no surprise it's been adopted there.
True enough: the main type of engineer you'd expect to speak the words "Opus codec" is a telecom/SIP engineer, not an audio engineer. It always feels like tech with origins in telecom never gets much notice outside of that realm, even when it's much more widely applicable. (Look at the set of "core" Erlang libraries to see what I mean. Think most programmers have heard of even half of the relevant standards, even as they implement NIH competitors to those same standards?)
I vaguely recall that Vorbis requires a fair bit more processing to decompress, enough to be a potential issue 15 years ago, but on modern computers it's barely detectable and doesn't effect performance at all.
My recollection matches yours. I remember loading a custom firmware on my iPod, around 2005. Vorbis support was a big new feature; they'd just figured out how to get decode working fast enough on its 80MHz ARM7 CPU.
Probably Rockbox? iPods their clickwheel interface and it's software was amazing for the time, but other MP3 players generally had kinda icky interfaces. Rockbox fixed that :)
The "fast enough Vorbis" was likely the addition of Tremor, the first fixed-point Vorbis implementation. There's a nice (but with many old entries) chart of decode speeds here: https://www.rockbox.org/wiki/CodecPerformanceComparison
On newer hardware + software, like the Clip Zip, the gap is much smaller and the performance way higher than needed in any case (except for HE-AAC, the clear loser).
MP3 almost always has hardware decode which is really power efficient. This isn't as bad nowadays since audio decode isn't that intensive relative to keeping your WiFi on.
Wifi might be 100-200 milliwatts, but a laptop's speakers are likely to be 500-3000 milliwatts. Headphones are likely to be about the same as the wifi, but can vary wildly.
Bluetooth speakers/headphones change that dynamic slightly; the amp+required battery ends up on the receiving end, so you just have to figure what the Bluetooth stream itself costs.
I thought the "balance" was and continues to be that Vorbis delivers at least as good a quality as AAC at equivalent bitrates, if not better, better quality than MP3 at equivalent bitrates, and is FOSS which way tips the scale in its favor. I haven't seen and continue to not see any reason to MP3/MP4 in the manner you describe.
(And if Vorbis isn't on par w/ MP4, my impression is that it is certainly good enough at decent bitrates that it wouldn't matter.)
I'm curious why you think shipping MP3 audio in software/games was ever seriously at issue due to patents? Decoding MP3s has always been non-problematic, and encoding MP3s was only ever problematic for a few years about two decades ago.
Some modern games use uncompressed audio because it frees up a few CPU cycles and the min/max balance on hard drives versus CPUs makes that choice desirable if you want to push the minimum spec. range on your game as low as possible (so that more people can play it). Though even then that's only necessary if your main game engine is hitting the CPU pretty hard to start with.
Also, with the two sub-considerations:
• on mobile, where hardware decoding exists for only some codecs;
• on the web, where the audio APIs only accept some of these formats/codecs.