Hacker News new | past | comments | ask | show | jobs | submit login
24-Bit vs. 16-Bit Audio Test – Part II: Results and Conclusions (archimago.blogspot.com)
209 points by signa11 on Dec 10, 2014 | hide | past | favorite | 132 comments



More people really need to watch this video. http://xiph.org/video/vid2.shtml


This is such an awesome video. I really hope there's a follow-up at some point, because as someone with a decent amount of experience with computer graphics, but only a fairly general understanding of sampling theorem, the stuff that would naturally come next is what most interests me, namely:

- What happens when you start combining waves together to create more complex signals? This is pretty important, since any real instrument produces hundreds of harmonics with complex attack and decay properties, so decomposition of the fundamental frequencies won't be as accurate as with toy examples.

- Following on from that: the effects of aliasing. It definitely exists and I have a very good understanding of aliasing with respect to computer graphics, but what effect does aliasing really have on an audio signal? In CG it's something that's talked about all the time and there are tons of papers about it, but it seems (from the outside at least) that audio guys only ever talk about aliasing in very hand-wavy terms.

- Although we can't hear sounds (much) above 20KHz, we can detect artefacts such as "beats" produced when harmonics are slightly mismatched. Is is possible to show that such information either isn't lost, or that what is lost is either below the noise floor or outside the audible frequency range? This particular one is a fairly common complaint made by audiophiles about 44KHz/Nyquist, so it would be nice to see it addressed head-on.

FWIW, I'm a bit of an audiophile myself, but not one of those people who thinks they can hear a difference between 192kbps MP3 and uncompressed, let alone 16/44 vs 24/192. Generally the noise floor in the original recording is too high to tell the difference, even if you could hear a difference in artificially constructed pathological cases. But I am interested in really understanding what information is lost, what isn't, and why that may or may not make any difference. In other words, what are those pathological cases? This video is a really good start and clears up a lot, but in some ways it only scratches the surface.


Not entirely related to your question on audio aliasing, but many moons ago I was an EE on a team using electro-optics to design a much better Analog-to Digital converter than you could buy commercially. The signals we sampled were in hundreds of MHz but the concepts are exactly the same as in audio frequencies.

What we effectively did was use a pulsed fiber laser, which produces _very_ short pulse widths of light. Connecting this to an electro-optical modulator, we can effectively capture a very fast snapshot of the electrical signal. Use some analog circuitry to hold that level constant for the slower analog-to-digital conversion, and we had an AD converter that while sampling relatively slowly, see's only a tiny portion of the signal for it's actually sample.

What was amazing about this was that the laser pulses were so short, and the 'timing jitter' of the pulses so low that we could cleanly see the expected aliased sampling frequency when saming a sine wave signal at large multiples of our sample frequency (iirc up to the 40's). Eg, inputting a sine wave around 10 GHZ and we'd see only a single sine wave in the sampled signal at it's row sampled frequency with minimal artifacts.

So if you are wondering how aliasing affects audio samping, draw a sine wave and see what happens if you sample it at a much lower frequency. You'll get a lower frequency sine wave. Without filtering the input signal, it's impossible to tell if this is a real low-frequency sine wave or something above Nyquist that has been down sampled.

N.B. If you're wondering why our AD converter was better than a commercial one, we also added optical demultiexing to get an extra order of magnitude improvement in sample rate. This is possible using our electro optical setup due to the laser's low jitter and the ability to hold the signal steady, a purely electronic version will have difficulty with the timing to see much actual improvement in SNR.


I can point you to a follow-up on one of those: An example on Wikipedia [1] lets you hear the effect of aliasing on a sawtooth wave.

When audio is aliased, the high harmonics will wrap around the Nyquist frequency and come back as audible, inharmonic tones. Aliased audio really sounds quite bad, in a way that would make any listener say "oh god make that horrible noise stop".

This is probably why it's easier to recognize the effects of aliasing in video than in audio -- you're not hearing much aliased audio. Aliased video is fairly common and seen as acceptable in some contexts, while anyone who produces aliased audio is going to fix their mistake before inflicting it on listeners.

[1] https://en.wikipedia.org/wiki/Aliasing#Online_audio_example


Presumably though the effects of aliasing will also also produce distortions at a smaller scale, which might not be as obvious. That would result in a loss of information, but would not be quite so easy to spot and filter out. The pathological case is really just a good illustration of what happens and why, whereas what's trickier is understanding the true impact it has on real-world data.

For instance, you won't notice Moiré patterns in photos very often, but it's easy to construct a test image that demonstrates the problem. The underlying issue of sampling error hasn't gone away in the photo: it just gets lost in the background noise or among smooth-shaded surfaces without hard edges, which makes it harder to see. But maybe now and then you'll have a sharp edge in a real-world image where the effect is noticeable. Depending on how close it is to the pathological case, it might not jump out and make itself immediately obvious.

I mention Moiré patterns because these are (I believe) precisely the same effect as the given audio aliasing example. Moiré patterns aren't a form of sampling error per se, but they can be caused by uniform sampling (which is how digital quantisation typically works for both audio and images).


Moiré patterns are the same thing, but if you put your video signal through an analog low-pass filter before sampling, then you wouldn't get Moiré patterns. Audio is typically low-passed before sampling for this reason.


"Presumably though the effects of aliasing will also also produce distortions at a smaller scale, which might not be as obvious"

Why? What do you mean? What scale? What do you mean by distortion? The only only only thing sampling does is move frequencies beyond the nyquest frequency into the sampled signal by taking the signal at f, and sending it to f mod f_n where f_n is the nyquist frequency.


Aliasing on a triangle wave can sound quite beautiful:

http://kmkeen.com/awk-music/


There is an Episode 1. http://xiph.org/video/

See also http://xiph.org/~xiphmont/demo/neil-young.html which was posted here a week ago and received 441 comments. https://news.ycombinator.com/item?id=8689231


With respect to aliasing, an analog to digital converter contains an analog low-pass filter which reduces the amplitude of the signal content above the Nyquist frequency prior to digitization. This effectively eliminates alsiasing in the digitized signal.

In terms of more complex signals, or frequency "beats", the same principles apply. If two high frequency (above-20kHz) signals interact in such a way to create an audible (below-20kHz) beating, the beating will be accurately recorded by the ADC.


Bit depth and sample rate are completely separable. If you are recording something important, it may well be worth the higher SR just in case some technology comes along that can exploit it. But the 20-20kHz bandlimit is a pretty well-entrenched thing. People are working in it. It's tough sledding.

So the rule of thumb is that each bit gives -6dB more noise floor. This gives us a rough digital noise floor of -96dB for 16 bit or -144dB for 24 bit. But you can add dither to randomize the digital noise, and turn it into "white noise" or the like.

Aliasing is taken care of by the "reconstruction filter" ( named for the "reconstruction" part of the Shannon-Nyquist theorem). This has been a solved problem for some time now. There's a super-high speed digital filter in the DAC phase that can be measured to eliminate all aliasing products. Implementation quality varies, but even cheap stuff now does a good job of it.

Yes; it's unusual to have a room for recording that gets to -70dB much less -96dB. -40 to -50 is much more typical for, say acoustic guitar. Noise doesn't add like you'd think; the dominant noise wins, basically.

Combined waveforms make for another waveform. They add brilliantly and this isn't lossy at all. There are pathological cases, but they're not likely found in nature.

Transducers - headphones, speakers, microphones - all impart much more error than the electronics in the middle.


> What happens when you start combining waves together to create more complex signals

Aliasing is a linear effect. If L is an operator that aliases at some frequency, L(Aa(t) + Bb(t))=AL(a(t))+BL(b(t)). Aliasing is in fact multiplicative! Any time you wish to understand aliasing, you really want write the signal multiplied by a dirac comb (a train of equally spaced infinitely narrow pulses). If you wish to understand nonfinite sampling widths, it's effectively convolution with the shape of sampling pulse... which looks like low-pass filtering. Ya dig? This is the frequency on your scope that's stupidly above the sampling rate. Why's my scope have a bazillion ghz bandwidth when it only samples at 4 samples oer week? It's because it's real good at getting you that signal, aliased down into the sampled signal.

> what effect does aliasing really have on an audio signal

PRECISELY the same effect it does on video, except in one dimension, and you haven't understood aliasing until you've understood it in one dimension.

> Is is possible to show that such information either isn't lost

It's lost. Completely... if done properly. First you need to understand Whittaker-shannon interpolation. The result of which is that a CONTINUOUS BAND LIMITED signal is PERFECTLY represented by a train of discrete samples. This is WEIRD. Really weird. Prove it to yourself. It's VERY important in sampling. Say you then perfectly sample a signal with harmonic content ALL over the place. EVERYTHING gets mirrored down into your view, if the dirac comb has infinitely narrow pulses. So at the least you lose frequency information! You only know the original frequencies modulo the nyquist frequency! If you sample CORRECTLY, you band pass filter your incoming signal so there's NOTHING above the nyquist frequency, then you have a perfect digital representation of your original signal.

If you wanna understand signal processing, you really can't use your intuition. Practice understanding things in the frequency domain and time domain, and generally there's only one proper basis to understand a phenomenon. Filtering is best understood in the frequency domain, for instance. If you try to understand filtering in the time domain, you might begin to think that it bounds the speed at which the signal can vary but that's not a useful intuition. Aliasing is best understood in the frequency domain, since it moves stuff down in frequency mod the nyquist frequency. If you try to build an intuition in the time domain, you'll think that there's some signal lost when you multiply by zero between pulses... and you'll guess at what it was and think "distortion" or whatever but what's lost is just WHERE each frequency is... so be very careful about what basis you choose!


Wow! Just amazing! I can't believe the amount of confusing science this video dispelled for me. This should be seen by everybody who think they know what a digital signal is.


Wow, I haven't learned that much in 20 minutes in years! Too bad this new found wisdom has such limited application for anything I'm actively involved with.


What technologies do you work with, and of these, what would you enjoy a ten minute enlightenment on?


Wow, that video is awesome. It definitely corrected a few misconceptions I had in my head.


The argument for mainstream 24bit dynamic range shouldn't be for higher fidelity reproduction for audiophiles. It should be for enabling more sophistication in our playback devices. In the YouTube era not every recording is perfectly mastered by audio engineers to fill the available range. And I could be listening on high end home theater equipment or the tiny speaker in my smart phone.


What about 24 bit enables "more sophistication in our playback devices". Doesn't sufficiently high fidelity mean you can already listen on your high end equipment?


It would allow our playback devices to correct poor quality source material without introducing distortions or clipping. All the same reasons why 24bit is preferred in the studio. Basically studio-quality post processing would be possible on any mainstream content resulting in higher fidelity ultimately. Also, if you consider mashup style content where people are producing derivatives based on already mastered material it would generally increase the quality of that material.


I learned more from this about digital/analogue conversion than from my EE class that covered this topic!


I watched the full 20 minutes and have absolutely no idea what he was talking about for most of it. I did have that misconception about the stepping in digital signals though - so I did learn something!


Try watching his other videos; I think there's one that is more introductory.


Is it just me, or is anyone else hearing something that sounds like wind noise over the narration? (It goes away when he plays sample sounds, and other audio on my system sounds normal.)


You mean the "noisy fan" (as Monty calls it) of the signal generator?


Oh, is that what that is? Makes sense. Thanks.


That was an awesome video, resulting in some follow up questions. Is it correct to say that lowering the bits increased the ground noise level, then why, (basically is quantization on the amplitude of the wave??)?

If lowering the number bits still produces the same output wave as input wave what and where is the lower limit that changes the sound?

Finally how is this 24 bit 16 bit related to my 128kbps mp3s?


With proper dithering, lowering the bits raises the noise floor. This is because dithering is approximately equal to changing the quantization error from something regular to something random.

If you don't dither, then the quantization error will likely be something harmonic rather than the hiss you hear in that video. This is much more noticeable[1].

Lowering the number of bits produces the same wave, but with more error. Dithering ensures that the error is random noise, which is why you hear "tape hiss" in that video as the bits gets lowered. With a tone frequency chosen to maximize the harmonic quantization error and no dithering you would start to hear it.

Lastly, 24 and 16 bit are largely unrelated to mp3s. In practice, mp3s are decoded to 16 bit audio, but they don't store the audio as PCM, but (roughly) rather as frequency/amplitude pairs which could in theory be decoded at any sample size. In any event, the absolute magnitude of error introduced by encoding a 128kbps mp3 is way, way, way larger than 16 bit quantization error, and the goal of the mp3 encoder is to find a way that is as inaudible as possible.

MP3 is an old format at this point, so it does have audible defects in some types of music at even the maximum bitrate (certain percussive instruments are an issue). Vorbis and AAC work in some ways similar to mp3 but are strict improvements (though for the love of music, don't use the FAAC decoder, it's generally considered horrible).

1: Still not noticeable at 16-bit audio, as a similar test to the one in the article was done with no dithering on the 24 -> 16 bit conversion. at somewhere between 12 to 14 bits it would start to be discernible.


Ahh, a lots been cleared up but I don't understand "Lowering the number of bits produces the same wave, but with more error" If the wave is the same how can there be more error? and is quantization error caused by quantizing the steps in amplitude?


If you dither properly, then if you use a small number of bits you'll have a sine wave with noise and with more bits you'll have a sine wave with less noise.

If you dither improperly, then you'll end up with a sine wave plus some noise (less than if you had dithered) plus some harmonic distortion.

When talking about audio, one typically doesn't compare in the time domain, since that's not even remotely how ears work. One compares in the frequency domain, and until your quantization error is very large compared to the signal, there will still be some of the original content in the result.

This isn't a great analogy, as eyes and ears operate quite differently, but it does sort of get to the issue. Look at this image: http://en.wikipedia.org/wiki/Dither#mediaviewer/File:1_bit.p...

If you consider the original to have been an unquantized, but already sampled grayscale image, then essentially all of the pixels have large quantization error (since they were originally some shade of gray, but now they are all either black or white), so in one sense the original image is gone. But in another sense what you see is the original image, plus some noise.


xiph.org is like a MOOC without the name, spectacular pedagogical quality.


That video player is amazing! What is it?


Seems to just be a browser-native <video> tag with an overlay that pretty simply controls the quality/time/subtitles. The logic for the controls is at video.js[1] and the logic for the subtitles is at subtitles.js[2] (check the `playSubtitles` function for VTT[3] decoding). All of the sources (the videos for each quality, and the VTT files for each subtitles track) are specified as HTML attributes in vid2.shtml, the linked page.

In short, it was seemingly made in-house.

[1] http://xiph.org/video/video.js

[2] http://xiph.org/video/subtitles.js

[3] http://dev.w3.org/html5/webvtt/


Thanks for sharing! I really enjoyed the video, though I didn't understand everything (english is not my mothertongue).


caveat: xiph works mainly promoting digital codecs. so...

there is a reason they like you to think DAC (digital to analog converter) is a moot problem.


I'd like to point out the potentially non-obvious here. This is a test of delivery format.

There still are benefits to using 24-bit audio in the recording and processing stages. This is in large part due to most recording systems expecting 0dbVU = -18dbFS and the subsequent processing that can bring the noise floor well into the audible range (dynamic range processors are notoriously effective at this, and heavily used in modern music). We could take a simple example of a snare drum being recorded at 16bit, being EQ'd with a +6db boost anywhere on the spectrum, then compressed with a reduction peaking around 10db (not uncommon). After brickwall limiting in the final mix, this track will easily have a noise floor (via quantization error only) > -68dbFS best case. (-96db starting. -12dbFs peaking snare, +6 + 10 to noise floor, assuming limiting with no gain reduction) -68dbFS is already audible in a critical listening scenario. With dozens (sometimes hundreds) of uncorrelated signals being subjected to similar processing, this noise floor raises well into the audible range for even a modest playback system.

While I realize that delivery format is the only thing important to most people, it is important to differentiate since the article does make a point to separate out musicians, sound engineers and hardware reviewers. These are groups of people that _should_ be aware of the benefits of higher sample resolution. Since it's fairly obvious that most people in these categories are confused about their ability to discern delivery formats, it's not beneficial to confuse them even further about working formats.

To be more succinct, the difference between 16bit and 24bit is largely inaudible when the source material is worked in a higher resolution format and properly converted.


> These are groups of people that _should_ be aware of the benefits of higher sample resolution.

I don't think the author is necessarily dismissing the idea of high fidelity audio; especially for the reasons you point out. Rather, the author is claiming that if you're simply _listening_ to the playback, it won't make a bit of difference, regardless of how your ears are trained.

Edit: Also note that several of the respondents were really confident that they heard a difference. This small study demonstrates that their confidence was misplaced. This, I believe, is what the author is trying to drive home.


> There still are benefits to using 24-bit audio in the recording and processing stages.

An analogy of what you said:

It's similar to HDR for audio[1] (but not exactly like it). HDR can be used for photography that, once composed and edited, will present more realistic information to our eyes. For example, with HDR you wouldn't have an overexposed sky - however the HDR is only used in order to get to that final 16 bit image (and even with 16 bit your eyes have a hard time discerning different colors).

The same applies to audio. Listening to 24-bit is pointless, however, if you are editing something you want to retain as much information as possible until the final render so that you don't run into clamping issues as you described.

Therefore, sites that provide 192/24 downloads are valuable. If I'm a DJ getting music for my gig I do want those production quality files, as I cross-fade between two songs I don't want artefacts popping (excuse the pun) up.

On to my own opinion: 24-bit is still not good enough. DAWs should be working in floats. Audio needs to go true HDR, 24-bit is a cop-out. Why would you even used a 24-bit int when floats are there and ready to go? Imagery went floating point, what, 10 years ago? Why can't audio catch up? Being able to exceed the clip in my DAW in a channel, and then wrangle it back down in another would be awesome.

Unrelated: That Xiph video really amazes in terms of what nature does. We rarely care about it (we do in terms of e.g. intercontinental fibre cables), but nature does all of this when we send a signal to a speaker. Even normal sound does actually have a band limit and does behave (albeit, far higher dynamic range) exactly the same way automatically. Shoot a signal down a fibre cable that can't handle it, and you'll get Nyquist. Too high frequency for RTP air? Expect distortion (that we can't hear). You don't even have to include electronics to get nature to impose these limitations for you, you have to do no extra work. Completely amazing - a deeper level of logic that is mind boggling.

[1]: http://www.slideshare.net/DICEStudio/audio-for-multiplayer-b...


192khz however is not beneficial to the processing though. There is an argument that can be made for 96khz in a limited set of cases of processing as a form of implicit pre-process upsampling, but itt can actually be detrimental. (see for instance: https://www.gearslutz.com/board/mastering-forum/968641-some-...)

Since this is a discussion about bit-depth, I don't see much of a reason to clutter it with a discussion about sample rate. This subject is already difficult enough for most people to understand it seems.


I might add that it's factually impossible to determine what it means, from the listener's perspective.

As far as "limited cases", I work at an ISV, so I am preconditioned to not accepting limited cases - as demonstrated by excessive overtime just this very week. Our customers do some really crazy shit with our software.


Typical DAW already run in floating point, as do most digital mixing consoles. It's only input/output to physical audio cards that's quantized to 24bit.

If you want to "render", i.e. produce a final .wav file from your input tracks, with applied effects, filters, gain-changes,... most DAW let you output a .wav-file with floating point data (not that it would make sense).

https://imgur.com/MJvf3cJ

But, you probably guessed it, there are people already discussing the merits of 64bit floating point over 32bit floating point (e.g. "double" over "float")... [already in 2007] https://www.gearslutz.com/board/music-computers/117203-does-...


> there are people already discussing the merits of 64bit floating point over 32bit floating point

In my very humble opinion (haven't had the extremely fortunate fortune to be formally educated on DAC and in general the math involved there), I think that discussion is greatly valuable. When you are compounding information into information you can never become accurate enough. Error accumulates - that is the woe if digital composition.

I would love to accelerate all this stuff on a GPU - given workable knowledge on how to specifically turn all those crazy mathematical formulas into code; which given only a rip-off degree is practically impossible.


Even if you add 1 unit of interference at each processing stage (and since rounding tends not to be malicious, you may well do better than that), you'd need 128 poorly-implemented processing stages for a 32-bit float to be reduced to mere 16-bit integer precision - but in practice, likely more.

When it comes to clipping or loss of data on the lower end, well, 32-bit floats have an 8 bit exponent (254 reasonable values); that means that the loudest full-precision unclipped signal is 765 dB (!) louder than the softest un-quantized signal. Even with mediocre centering, that's more than enough.

I don't think 64-bit audio is likely to be noticable, even for processing purposes, outside of really specialist kind of niches.


> "all those crazy mathematical formulas"

Most of it is convolution or multiply/accumulate, nothing crazy at all.


Really, you don't understand how much of a marketing scheme my degree was. An absolute disgrace in terms of what education should be, I'm an idiot for falling for it.

I learnt convolution in my own time, so if you have resources (that don't include integration) I would absolutely love to read them. DSP is something that seems hard and I'd love to get into it.


Heya,

I keep track of a large number of resources for learning DSP over at http://diydsp.com

Take a look at the Theory, FAQs and Books sections. There are some recommended books, links to university course lectures, etc.


This is why HN is great: industry experts on-hand. Thank you so much, I'm going to dig right into it this weekend.


Every major DAW already does this- 32 bit float is standard for internal processing, and quite a few of them support import/export of 32 bit files as well. Some even work at 64 bit precision internally (Studio One and Reaper both come to mind).


Sadly, I think that mine (FL Studio) doesn't, or at least it doesn't between channels when mixing into channels. It definitely goes above clip, but horrible things are be to expected when mixing it with other channels.

I assumed it was the norm as FL Studio is gaining immense amounts of traction with the more recent updates to advanced competence. I might have been wrong in that assertion.

Either way, my DAW does not represent it in a way that makes sense to a software engineer.

I would certainly love to actually work in HDR, as opposed to it being a ghost in the machine.


I would guess it's because even really good DACs have trouble reaching noise as low as 24 bits, so capture is in 24 bits. I think most DAWs support floats as an intermediate format though.


That's true. But I understood this as talking about playback, not about mastering. The article linked below makes this distinction more clearly: https://news.ycombinator.com/item?id=8727591


Slight problem - DXD is not PCM. It's downsampled DSD, which isn't a true PCM format and is of debatable value in a bit-depth test.

DSD uses single-bit delta-sigma modulation at a very high sample rate. You have to downconvert it before you can hear it, and this adds noise/dither/distortion. One of the problems with DSD is that it's not entirely clear what useful bit-depth you're left with after downsampling, because there are theoretical reasons for criticising one-bit sampling. See e.g.

http://sjeng.org/ftp/SACD.pdf

A useful test would start with high quality unmastered and unprocessed 24-bit PCM recordings and A/B them with 16-bit downconversions. (Remember, even orchestral recordings are mixed in a studio and the individual stems usually have some dynamic processing and gain riding, even if it's not as obvious as dance music pumping.)

I'd expect a test like this to use a bit meter like Bitter to confirm there's useful information in the lower bits, and not just rely on a vague estimate of the dynamic range.

http://www.stillwellaudio.com/plugins/bitter/

Ironically, all of the reviews of the Bozza track say that the BluRay audio version sounds cleaner than the SACD source used here. (I have no idea if this is true. But if someone has both and wants to do a blind A/B, that would be interesting.)

It's also worth mentioning there are easy-to-find test tones you can use to check how clean your audio hardware is at extreme sample rates. They're not directly relevant to bit-depth tests, but they're a good torture test for audio.

http://www.audiocheck.net/testtones_highdefinitionaudio.php


It doesn't make sense that it is unmastered. We want to test the difference of a properly made 24 bit output downsampled to 16 bit and the original.

There is difference between 16, 24, and 32 bit recordings, but it just so happens that 16 bits are enough to give an effective 120dB dynamic range, if you use proper dithering. If you do not want to use dithering "because it's not pure", you still have 96dB of dynamic range. So with dithering 16 bit allow you to represent a mosquito and a jackhammer in the same room. And even without (and that isn't a good idea, as the noise now consist of harmonic distortion which is a lot easier to hear than dithered noise) I really doubt anyone can tell the difference, 96dB is still quite a bit.


That Lipshitz paper is really worth reading.


This was posted here before, but it's a great article if anyone still didn't read it yet: https://people.xiph.org/~xiphmont/demo/neil-young.html


I'm really glad there have been more audio related posts on HN lately. Maybe it's just my own bias, but seems that print/online media related to audio is dying off and is now limited to forums and hobbiest sites now. The few that are still around are difficult to take seriously with some of the snake oil that gets reviewed. (Carbon fiber disc stabilizers anyone?) I would love to see a serious attempt at an enthusiast "magazine" done again...


I agree - audio technology, music production tools and of course music itself have always been topics of interest for me. However there's a whole jungle of misconceptions half-knowledge to browse through before you find good content.

For audio production tools and other related niche topics, I can recommend createdigitalmusic.com


I have long had the idea for a HN for music. I have found few places that resembles, such as the muffwiggler forum and some stack exchanges...


Ha! I used to chat all the time with Muff Wiggler on KVR... I still lurk there but rarely post. My handle is Stupid American Pig.


There's a lot of great data here, and the author obviously tried to cover all of the bases. Unfortunately I'm still bothered by a couple of aspects.

Firstly, the question of "can you hear a difference" is completely orthogonal to the question of "which do you think is 24 bit". By using the answer to the second question to infer an answer to the first, you're entangling them. If someone could reliably hear the difference but on half the songs they preferred 16 bit and on the other half preferred 24 bit, their own answers would cancel each other out.

Secondly, all it takes is ONE PERSON who can reliably tell the difference [1] to prove that the difference is audible, even if it's only to a very small subset of the population. The test was structured to detect the abilities of a group, not a single person. I'm perfectly willing to believe that as a group, people on average can't tell a difference, but that doesn't tell me whether I can tell a difference.

[1] Reliably telling the difference would mean being consistent on double-blind A/B testing, repeated enough times to achieve statistical significance.


I think the conclusion ("there was no evidence that 24-bit audio could be appreciably differentiated from the same music dithered down to 16-bits") is not correct.

EDIT: I'm not sure if the conclusion is correct or not, but the logic that lead to the conclusion has flaws.

50%/50% accuracy means a random guess - people can't distinguish 24-bit from 16-bit.

But if accuracy is less than 50% for a large enough sample it means the difference between 24-bit and 16-bit is heard.

Article said: "As a subgroup (total of 31 respondents), the self identified respondents with a "good amount" of musical background did not do well. In fact, this group of respondents consistently scored worse than the combined result."

People with persumably better ears ("musicians" and "hardware reviewers") were less accurate than regular people, especially on Vivaldi. I think this means they did well. This means they heard the difference - it is not possible to be significantly less accurate than 50% without hearing a difference. They failed on deciding which one is "better", but they were able to differentiate 16-bit music from 24-bit.


This means they heard the difference - it is not possible to be significantly less accurate than 50% without hearing a difference.

For small sample sizes being far of is quite likely, for example the chances for getting at most 1/3 of 31 (11 or less) 50/50 guesses right is 1 in 14.


I agree that sample sizes are small, and there is no proper analysis done - the difference can be only from a chance.

But the article states the difference is statistically significant and then draws a wrong conclusion from it (not accurate => can't differentiate).


> People with persumably better ears ("musicians"...

I's expect musicians to generally have worse hearing than similarly-aged non-musicians, especially in the non-young demographic. Although it's a moot point as it looks like the results were driven by chance rather than hearing ability.


There is a difference between how much you can hear and how much you can distinguish in what you can hear. The first decreases when you age (or when you are exposed to loud sounds). The last can be trained.

So I agree that it's likely that musicians have worse hearing (playing in a orchestra can/will give you ear damage) but they are trained to distinguish sounds.


But the article itself states that this difference was not statistically significant, and should therefore be attributed to randomness?


The article does state that the difference is statistically significant: "Curiously, the musician group seemed to select the 16-bit dithered Vivaldi as the "better" sounding version (p-value 0.047)."


How many individual comparisons were made throughout the article? A dozen? Two? You'd expect a p-value less than 0.05 by chance about one time in 20, and indeed this p-value is just marginally below that threshold. I wouldn't view this one result as particularly significant in that context.


I'm not convinced it is, though. If you take a sample of 140 people and divide it into a dozen different cohorts, it's pretty likely that one of those groups will show a difference that would be statistically significant if observed across the whole sample.


Perhaps the 16-bit sample really did sound better in some cases?

The lowest bits of a D/A converter are the most non-linear. By avoiding them you might get a more accurate waveform overall.

This would explain the people who were confident that they knew which was which, even when they got it consistently wrong. It would depend greatly on the specific D/A converter so you'd expect it to go both ways.


Sorry, but no!

    1.0 1.1 1.9 2.0 2.1 2.9 3.0 
is still a better approximation of a linearly increasing list of numbers than

    1.0 1.0 2.0 2.0 2.0 3.0 3.0
and I doubt that any 24bit DAC or ADC in existence will interpolate as bad as I did in this example. Whatever distortion due to quantisation the latter creates, the former will create less of it.


Notably, with such a small sample size and multiple testing, it's to be expected to see small deviations in preference such as the one in the article.

The article mentions p=0.28, which isn't entirely easy to interpret given that he doesn't quite explain what that p value is actually measuring (did he correct for multiple testing, for instance?), but it's certainly not a claim of any meaningful significance.


Your example is overly simplistic. Dithering is what allows a low bit sample to emulate a high bit one, and I don't see any evidence that you applied any; even if you did you wouldn't get a feel for how it operates with such a short sample.


The initial statement was: "lowest bits of a D/A converter are the most non-linear. By avoiding them you might get a more accurate waveform overall."

You can of course apply dithering to any signal to mask quantization noise.

It's just that a perfect N-bit DAC will require less dithering to mask its quantization distortion/noise than a non-perfect/not-quite-linear N-bit DAC which again will introduce less quantization distortion/noise than the (N-k) bit DAC.

In fact, you could say that a N-bit DAC is just "perfect reproduction" plus a non-linearity that amounts to the step-width of 1-LSB, e.g. it's resolution.

And a 24-bit DAC that unfortunately is off by 8 LSB (2^3) along its 0...2^N-1 to 0V..Uref curve is still as good or better as a 21-bit DAC (24bit-3bit) in reproducing a waveform, and still better than a 16 bit DAC. A graphical representation of this is commonly found in datasheets and called "integrated non-linearity."


Quality reports on audio depend, as far as I know, entirely on conscious reporting of quality that's accessible to a test subject through introspection.

By definition, this makes it easy to debunk "golden ears." Because loudness (energy) determines what we pay attention to in sound, this is why sonic detail that's low in energy compared to the total can be dropped without test subjects being able to report the missing information. And maybe this is valid. If we can't report our experiences, are they really experiences?

But I find this unsatisfactory if only from the point of view of experimental design. Does the brain really throw this information away at a low level? Does our ear "compress" audition on the way to other parts of the brain? Or does our subconscious experience uncompressed music differently?


While the conclusion is in accordance with what I would expect, I think the study suffers a lot from not having a control group. From these results, there is no telling if participants screwed up in replay, or if they were all just guessing anyway or whatever. This should be redone with at least one sample pair where one of the samples is deliberately reduced in quality is delivered to a subset of the participants.


Shoot, most people won't be able to tell 8 bit audio from 16 bit audio. Try the following:

    sox highres.flac --bits 8 lowres.wav dither
Wave is required because flac doesn't do 8 bit and we want to be 100% certain nothing sneaky is going on. You might be able to notice a slight increase in background hiss if you are in a very quiet room.


That background hiss IS the difference. I'm not sure what you think the difference would be otherwise, but the increase in the noise floor caused by quantization error will be the difference between the formats.

I also don't know what 'lowres.wav' is (is this linked in the article?), but on classical or jazz recordings the difference is very noticeable due to the lower 'average' amplitude of the recordings. If you did this on a modern pop recording that's smashed to hell and back... then yeah, many people won't even notice the noise.


No, most think that if you reduce something to 8 bits that it will sound like NES-powered voice mail being played back through a piezo buzzer.

If you converted something to 8 bits and used proper dithering, no one who heard it would exclaim "You monster! That is only 8 bit audio!"

Where as if you reduced something to a low bitrate mp3, people would notice immediately and call you out on it.


That is an issue. People don't understand what bit-depth reduction sounds like. That doesn't change the fact that the hiss is the difference. That is the quantization noise floor, and the primary artifact of the conversion.

Rather than try to goad people into thinking that there is no difference, it is better to educate them on what the difference actually is (or could be). From there an honest interaction can be had regarding the potential perception of these differences.

Quite simply, just because some people are misinformed about what qualitative effect is occurring, that does not discount the fact that there is a qualitative effect occurring.


I wonder if the output is linearly quantized or a-mu law applied in this case?


Why are there so few women in the audiophile community? Do female audiophiles face the same adversities as women in tech?


> Do female audiophiles face the same adversities as women in tech?

How would that work? There isn't even any opportunity to be excluded from being an audiophile.

It's the exact same thing that accounts for a lot of the discrepancy between men and women in professional tech: the fact that men, on average, like gadgets more than women.


I really doubt men like gadgets more than women per se, it's a social/cultural/education thing that leads them to move away from those interests.



It could simply be women having less disposable money?


Is your second question independent or leading from the first? If the latter, then you realize that just because there is a lack of certain demographics in a group does not necessarily mean it is due to adversities, right? There could be countless reasons.

As for what those reasons are, tech enthusiast communities of all sorts tend to be predominantly male. I don't think it's anything particular to audio. I agree that I was a bit surprised at how overwhelmingly male the sample was, though.


I'd gamble it's because they're less into bragging about how much money they spent on audio equipment / dick measuring. But that's just my cynical point of view. I'm sure the distribution between men and women that can appreciate good music (and good music equipment) is much less skewed than men/women in tech; it's just the public part where they're not represented as much.


Finally a gender-bias I can get behind.

That is, this finally proves women are smarter. Or at least less likely to spend 3-figure sums on an interconnect cable.


You gathered all that from 2 data points?


Why are there so few women in the coal mining community? Do female coal miners face the same adversities as women in tech?


It's always amazing how people repeat this coal mine idea: http://wehuntedthemammoth.com/2013/03/28/sexism-in-tech-not-...


Sorry, I could not get through this cynical and pretentious piece.


It's all about obsesive compulsive. I count myself on that group... I bought every recommended pair of headphones promising more and more magic but one day you come to notice that it's just a sound equalizer and some people like it some way... some other. so 192kbps 44khz and mdrv6 to a 30$ player; that's it.


not really. headphones range a lot within the frequencies you DO hear.

also, the weight vs outside sound isolation ratio varies with price.

those are all observable and measurable things.

speakers, they also vary on those audible frequencies, but after you are past $150 per speaker you are only dealing with quality after very loud volumes.

over $3000 for a home system? just be honest with yourself and confess you are buying the prettiest furniture that match your decor.


The louder you play, the less you can tell the difference. Bass is probably what you're refering to.

Last time I looked for speakers in the 400$ (each) range , I could still say that some combo was more clear in the mid range or the highs. It's all about what music you use to test it, some recording (great voices mostly) have very very clear mids which you can tell the difference between a set, the best really open up. Others have detail in the highs that are a lot harder to hear on speakers that are weak there. I don't know past what point you cannot tell the difference, but I do know it's not 150$.


True. What often seperates a good speaker from a mediocre one is how it behaves at high volumes. Most speakers made for home use regardless of the price lose composure when cranked high. The main culprit is usually the dome tweeters (a type of hi-frequency driver) that start to compress at high volumes. The solution is to use a compression driver (CD) with a waveguide but mostly for aesthetic reasons, there are virtually no commercial speakers on the market that use CDs.

The alternative is to use public announcement (PA) speakers. Virtually all of them are equipped with CDs, but they are relatively large, and unless you have a dedicated listening/home theater room, they won't really fit a home's decor well.


The other option is ribbon tweeters: https://en.wikipedia.org/wiki/Tweeter#Ribbon_tweeter

They sound great. Sometimes too great.


It's actually very easy to hear different speakers have different quality in prices much higher than $150, even at moderate sound levels. Just go to a store and try it out (especially compare a $150 speaker with a speaker for $1500).


> speakers, they also vary on those audible frequencies, but after you are past $150 per speaker you are only dealing with quality after very loud volumes.

This is simply not true, and you could determine that for yourself easily if you wanted to. There are many, many factors that affect how a loudspeaker sounds besides how loudly it's playing. $150 speakers are much better today than they were say 30 or 40 years ago, but they're still full of design compromises.


I'd say it's actually the other way around : a /very/ good speaker will generally sound good with lots of punch even a very low volume. A low end one will need to be at 50% capacity or something to start to give its best. Obviously the amp & preamp also play a big part here.


Yeah, if you are going to spend money, spend it on the reproduction side. A nice set of speakers will make you feel like you've never heard a song before.


As I said in the other recent 24-bit audio thread, any improvement in sound quality offered by using 24-bit will be inaudible for the vast majority of listeners, the extra low-level detail of the increased dynamic range lost in the noise floor of a typical room.


You mean it's inaudible for ALL the listeners. There is no subset of listeners that would be able to tell the difference between 24-bit and 16-bit listener with any type of statistical significance.


Downvoted for saying "vast majority" instead of ALL? Sigh.

For ALL the listeners in this specific test? Yes, almost certainly. Naturally, just randomly picking a song, even one that subjectively sounds really good, and listening to 16-bit or 24-bit versions of it at moderate volume in a typical room will absolutely prevent anyone from choosing correctly with any statistical significance. The OP's test was doomed to fail from the outset.

That doesn't mean it's impossible to detect any difference under any circumstances.


16bit gives you an effective dynamic range of 120dB - noone, and I challenge to to prove me wrong, can detect differences beyond that. To quote from the last articel, that is enough to record the difference between a jackhammer and a mosquito in the same room.


> that is enough to record the difference between a jackhammer and a mosquito in the same room.

That is utterly laughable. Stated like someone who has never actually tried to record sounds with large dynamic range. But I suppose it's how you define "record". So if a mosquito is 40dBA, and a jackhammer is 130dBA, that's a difference of 90dB. Now I don't know of any preamp that has such a low noise floor, but assuming one existed, if we set the gain staging such that the jackhammer is 0dBFS, then the mosquito is peaking at -90dBFS. Thats 6dB above null, or ONE BIT. So your "recording" of the mosquito is one bit flipping on and off.

Quite the recording! Statements like these are what make recording engineers roll their eyes and think here we go again.

> 16bit gives you an effective dynamic range of 120dB

16 bit gives you 96dB of dynamic range. And less than that of usable dynamic range. You may say "effectively" as is, once dithered, but then you're accepting that the recording is done at a higher bit depth and then dithered down, thus refuting the original statement that 16 bits is enough dynamic range to record said sounds.


Dithering give roughly 120dB dynamic range in 16bit audio, that is the mosquito 30dB above the noise floor. That isn't ideal, but I struggle find a situation where it matters. If the volume is low enough for the jackhammer to not damage your ears, you wouldn't be able to hear the mosquito anyway. One could argue that if we had the audio equipment it would be nice to represent the mosquito and the jackhammer without gain control, but with all the dynamic compression we have seen the last decade, I doubt it.


Some confidence intervals for the hypothesis would be really handy. Still, great article!


Discussion of this test at hydrogenaud.io: http://www.hydrogenaud.io/forums/index.php?showtopic=106156


What I would like to know is:

Is there any individual human who can reliably distinguish between 16 and 24 bit audio? If somebody believes they can, where can I send them to establish whether it's true or not?


According to https://people.xiph.org/~xiphmont/demo/neil-young.html the answer is no. 16bits is more than enough to cover the entire range a human ear can pick up (assuming the source material was mixed and sampled correctly).


you can hear the difference between summing 16 channels of 24bit audio vs. 16bit audio in a daw. 24 sounds better. then when you render it, you can't tell the difference between a 16bit dump and a 24bit dump.

i think the recent aphex twin release was out on 24bit and 16bit, it would be a great test subject matter for the foobar abx plugin.

with all types of music, you can train your ear to listen for what mp3 hiccups on. i know nothing about classic music, but i can spot a 320kbps mp3 a mile off due to terrible sounding high hats and crashes in genres where they are prominent. disco records also suffer very badly, just something about how they were recorded. i wouldn't know what to listen for in classical.


High hats and crashes are exactly the thing I noticed improved when switching my music from MP3 to FLAC many years ago. The difference is very obvious to me; I really don't think it's placebo.

That said, I've heard that AAC handles them much better, and that modern MP3 encoders do a better job. I haven't had a chance to do an A/B test to check that. I'd love to confirm that AAC has solved this problem, because my music collection is taking up way more space than I'd like!


Try opus. I was amazed how good that sounds at ridiculously low bitrates. I'd be surprised if you can tell the difference between 80kpbs opus and FLAC (i.e. raw) files outside certainly very rare corner cases. And unlike mp3, those corner cases don't sound terrible, though that's subjective. Even at 64kbps the difference wasn't obvious without careful listening - to me, YMMV :-).

If you do try it, make sure to use the 1.1 encoder (which deals with difficult samples by detecting that and upping the bitrate more aggressively than previous versions), and you might as well increase the maximum framesize the maximum (60ms) since you're not interested in low-latency applications.


> Furthermore, 20% used an ABX utility in the evaluation process suggesting good effort in trying to discern sonic differences.

Take those results with a (large) grain of salt.


Doesn’t matter - the respondents didn’t know which sample is which.


> biological gender

This is getting ridiculous. Just call it "sex" already.


Try repeating the experiment without the dither.


i found this http://www.audiocheck.net/audiotests_dithering.php

and honestly, i can easily tell the samples with dither because they added a LOT of white noise. the original and 8bit non-dither to me practically the same.


Listen for "graininess" in the 8 bit undithered sound as the voice gets quieter. You should be able to make that out. That's quantization distortion. The dithering removes that, but adds the noise.


This is the clearest demonstration of audio dither I've come across: http://youtu.be/h59LwyJbfzs


oh my god. that is even worse!

so explain why 'dithering' was 'masking' the results?! it makes it extremely obvious that it is the lower resolution sample!


That would allow the subject to easily tell the difference between the two samples by looking to see which had all 0's in the 4 lowest bits.


This makes no sense to me. Please explain. The difference between 24 and 16 is 8, not 4 for one.

Are you saying the "16-bit" sample files actually had noise added to the least significant 4 (8?) bits? This is not dithering. Dithering is adding noise BEFORE truncation. A dithered 16 bit rendering of 24 bit audio will only be 16 bit. An undithered 16 bit rendering of 24 bit audio will also only be 16 bit.


I think the GP is basically saying, how would you propose to run an internet survey asking anonymous audiophiles to blindly A/B (no peeking at the bits) full and truncated files? The dithering is probably the least questionable part of the methodology.


Dithering is known to increase the perceived dynamic range of music. As an example try listening to a song in foobar in 8bit output mode then 8bit dithered.


Are you also a fan of non-oversampling DACs?


I'm a fan of good sounding DACs.

By dithering you increase the perceived dynamic range. You change it from a listening test of the quality of the sound, to the character of the noise floor. Maybe this is why some "engineers" could tell. If you are trained in what to listen for you would know in these listening tests to completely ignore the majority of the sound and frequency range, and instead listen to the sound of the high frequency hiss on the noise floor (for instance in very quiet or completely silent sections. Just isolate a very quiet part and turn it up loud). This would be the tell tale sign of the dithering, thus 16 bit.


I do not agree that you increase the perceived dynamic range. You use it to change the noise to something more favorable. Instead of harmonic distortion, you get a much more uniform noise that doesn't distort the original signal. I really cannot see how that can be seen as anything other the better audio quality.


"Random numbers such as these translate to random noise (hiss) when converted to analog. The amplitude of this noise is around 1 LSB, which for 16 bit lies at about 96 dB below full scale. By using dither, ambience and decay in a musical recording can be heard down to about -115 dB, even with a 16-bit wordlength. Thus, although the quantization steps of a 16-bit word can only theoretically encode 96 dB of range, with dither, there is an audible dynamic range of up to 115 dB! The maximum signal-to-noise ratio of a dithered 16-bit recording is about 96 dB. But the dynamic range is far greater, as much as 115 dB, because we can hear music below the noise. Usually, manufacturer's spec sheets don't reflect these important specifications, often mixing up dynamic range and signal-to-noise ratio. Signal-to-noise ratio (of a linear PCM system) is the RMS level of the noise with no signal applied expressed in dB below maximum level (without getting into fancy details such as noise modulation). It should be, ideally, the level of the dither noise. Dynamic range is a subjective judgment more than a measurement--you can compare the dynamic range of two systems empirically with identical listening tests. Apply a 1 kHz tone, and see low you can make it before it is undetectable. You can actually measure the dynamic range of an A/D converter without an FFT analyzer. All you need is an accurate test tone generator and your ears, and a low-noise headphone amplifier with sufficient gain. Listen to the analog output and see when it disappears (use a real good 16 bit D/A for this test). Another important test is to attenuate music in your workstation (about 40 dB) and listen to the output of the system with headphones. Listen for ambience and reverberation; a good system will still reveal ambience, even at that low level. Also listen to the character of the noise--it's a very educating experience."

-- Bob Katz http://www.digido.com/articles-and-demos12/13-bob-katz/16-di...


I am aware of that statement. I still stand by my statement, but let my clarify a bit, I meant that its not just percieved increase - it is really there. Looking at a sine wave through a frequency analyzer, without dithering, it's hard to tell what the dynamic range is. You can turn down the amplitude until you can no longer see it on the analyzer. Apply dithering at that level and you can clearly see the signal again. You do not need to use your ears.


Excellent point. This makes the test somewhat flawed. Are we testing "performance" vs file size? Or overall performance for music with few quiet/silent sections?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: