A sequence like that, if you happen to see all of the posts, can make a topic feel over-represented, but I'd say this isn't really excessive, wouldn't you? Keep in mind that on HN, reposts are fine if a topic hasn't had significant attention in the last year or so (https://news.ycombinator.com/newsfaq.html).
In this case there was a major thread back in Feb but I decided not to treat the current post as a dupe because the two articles seemed to differ significantly.
That might be true in theory, but not relevant in practice unless you're on a really old or really esoteric CPU (one could extend that to: the C standard is important in theory, but usually not relevant in practice - the only thing that matters in the real world is how specific C compilers behave).
In the real world, for this type of code (audio code, most likely for games), only a very small number of CPUs (Intel and ARM) and C compilers (Clang, GCC and MSVC) matter, and on those, int is 32 bits wide.
PS: For new code I would also suggest using the "new" C99 fixed-width integer types though.
Might be (Arduino maybe?). AFAIK there's still quite a bit of C code written in the homecomputer retro scene, using specialized C compilers like sdcc or cc65, but IME this requires so much manual tweaking and platform specific hints that it is very far away from 'standard C', and most common open source libraries wouldn't even work even if the code was fully standard compliant C, just because of memory constraints.
It's possible to play ~8KHz+ 4bit PCM. Just need to update the wave RAM four times per frame using an hblank interrupt handler. Or write completely deterministic code that unpacks the audio and writes it out at the correct time intervals.
It's also odd that everything is big-endian, but then the official format is C structures (which will be LE on every PC and phone that would ever run this)
And in that C struct.... is `char` and also `uint8_t`. I can never remember if `char` is signed or not. It should be `int8_t` or `uint8_t`.
This reminds me of the Farbfeld image format. So simple that I'm content to just make my own little-endian ad-hoc implementation and ignore the official spec.
> I can never remember if `char` is signed or not.
char is effectively a trivalent type. Only use it when signedness is inconsequential. This is addressed in C23 with char8_t expressly to avoid the problems that happen with UTF-8 bytes as accidental signed chars.
> I can never remember if `char` is signed or not.
Technically it's neither (e.g 'char', 'signed char' and 'unsigned char' are three different types), but in practice this also usually doesn't matter (but it's more likely to be bitten by this specific detail than int not being 32 bits).
> which will be LE on every PC and phone that would ever run this
Why would you want to run this on a modern high end PC or phone? There's almost no point using a simple codec like this if you have the computing resources or dedicates hardware to use a "proper" codec.
The codec is probably a reasonable choice for a microcontroller where you do want to save memory but you barely have CPU cycles to process the audio data. Many microcontrollers are big endian.
For cross-platform 'gamey code' (e.g. game jam stuff or smaller indie games), having an audio file format which is very simple to load and decode in your own code without having to use the plethora of platform-specific audio decoder APIs, or having to integrate decoder libraries with often questionable API design choices (which is often a challenge of its own) has a lot of advantages.
Same arguments for using the Quite OK Image Format over things like PNG or JPEG for instance (although since stb_image.h this isn't such a big issue anymore).
Your post: "Why would you want to run this on a modern high end PC or phone?"
Another post in the same thread: "In the real world, for this type of code ..., only a very small number of CPUs (Intel and ARM) and C compilers (Clang, GCC and MSVC) matter ..."
These posts are strongly disagreeing with each other but neither of you directly replied to the other, so I'm not sure if you've seen it.
I'm replying to you because I think, between the two of you, you're the one that's wrong and the other poster is right. The original announcement doesn't mention microcontrollers at all and does mention computer games[1]. Further, their defense for the endianness decision was simply that they like it, rather than that they expect any target CPU to be big endian. I think GP is right--this will almost never be run on a big endian CPU.
Indeed, the start of this thread was a piece of code that won't work at all on smaller microcontrollers. I think it's likely this code has never been run on a microcontroller.
Sure there is lore diversity in microcontrollers and there are bog endians but much more are or can ne configured as little endian ARM, riscv, avr, mips, msp430 ..
I'm adapting QOA to the Dreamcast for streaming music. I've modified it to use 32-bit little endian values instead of 64-bit since (besides avoiding having to byteswap) the CPU in the Dreamcast is badly suited for 64-bit shifts.
For converting a "slice" from the 64-bit format (which is has a 4-bit value in the highest bits, and 20 3-bit values organized from high to low. Each 3-bit value is read from the high to low), I treated it as two separate 32-bit values. The bottom two bits of are combined to get the 4-bit value, and the rest contains two pairs of 10 3-bit values organized low to high.
So this:
sample = (slice >> 57) & 0x7; //Get sample
slice <<= 3; //Advance to next sample (64-bit shift)
Became this:
sample = slice & 0x7; //Get sample
slice >>= 3; //Advance to next sample (32-bit shift)
I think the conversion to 32-bit little endian was something like an 8-10% speed up, with no other changes? (This was a few months ago, and I didn't write down what the exact difference was.) The overhead big endian conversion and 64-bit shifts would become even more noticeable after further optimizations. (Replacing qoa_lms_predict with inline asm to use an integer multiply-accumulate instruction that GCC can't generate got something like another 20-25% speedup. Properly written asm would be even faster.)
BE is still common in very resource constrained applications like microcontrollers. You know, applications where having to do endianess swap actually has a significant impact? Applications where it might actually make sense to use this kind of codec?
I was really thankful that Doom WAD file was big endian (side effect of being developed on NeXT?) when I ported doom to an nRF microcontroller.
Considering that little endian is objectively better for more types of data structures than big endian is [1], one would expect things to eventually converge on little endian (barring the entrenched big endian network protocols that hold everything back). Building for the eventuality of little endian would make more sense in a modern format.
If you're exporting assets for a game, and the swap would be significant, shouldn't it really be generating big or little endian depending on the target? Many audio engines do this, using e.g. RIFX on big endian targets and RIFF on little.
I believe that in the case with Doom, and many other games that released there code as open source; the game assets are not in public domain and remain closed.
Looks like this fixes the issue raised in the last discussion [1] by no longer re-quantizing the decoder state at the start of every frame. I still think it's not worth the extra complexity over a simpler fixed set of second order linear predictors like BRR or ADX, though that clocks in at over 4 bits per sample. Might still be ok with 3 and more predictors?
Regarding clamping, might it be better to require the encoder to never generate QOA that would overflow? I've seen at least one ADPCM implementation (GameCube "AFC") that takes this approach. Seems like it wouldn't produce more distortion than clipping would.
Semi-relatedly, is there anyone working on hardware encoders or decoders for audio codecs that are plausibly patent-free (e.g. mp3, vorbis, flac, opus)?
AIUI decoding for some of these formats can be a battery drain, particularly for devices with tiny batteries like ear buds, watches, ipod nanos.
I guess because the implementations aren't intending to support a lot of different input codecs, so there's no benefits to the flexibility of AIFF/AIFC. Also avoids having to deal with an 80-bit extended precision float just for the sample rate.
If I had to guess the codec isn't meant to handle this well. To me this looks like a codec that wins because it is optimized for one specific use case (when you need "quite ok" audio on a modern day platform.
It could be that it handles slower sampling rates well, but I am not sure about the bits.
Haxe comes to mind but this seems a lot simpler.