Seems really opinionated, in a bad way: > rates are internally converted to 48 k...

nullc · on July 22, 2016

Voice recognition is very accurate on Opus (especially when trained on it). Commentary from an eng at a big voice rec house to me was that switching to opus _improved_ accuracy on their testing.

> There are times when I need 44.1kHz and not 48 kHz, like on an embedded system with everything running at that and no performance left for converting or extra PCM channels

Unfortunately we live in an interoperable world. And there are _many_ devices running at 48kHz without the resources to resample. Your devices would be unable to interoperate.

Being "opinionated" in this way is specifically to accomplish the goal of guarenteeing interoperability in a world with cheap throwaway IoT devices.

> the fact that all major players now have voice assistants like Siri, Google Assistant, Cortana?

You might want to look at how these systems are sending audio...

are595 · on July 22, 2016

Good points, but I just wanted to point out that we are only missing a 4 kHz range (20-24 kHz) as the highest frequency a 48 kHz digital signal can represent is 24 kHz (Nyquist sampling theorem).

MP3s actually tend to have a worse cutoff depending on encode settings, with values from 20.5 down to 16 kHz [1].

[1] https://www.whatinterviewprep.com/prepare-for-the-interview/...

ssalazar · on July 22, 2016

I agree but

> voice recognition needs the full spectrum for accuracy

do you have a source for this? Voice signals are conventionally low-bandwidth; 16kHz is usually "good enough" for human-human transmission. Formant frequencies top out around 3kHz [1] and upper vocal harmonics are not really important outside musical applications. Consonants are a bit more complicated but I'd be interested to know what voice information is present above 20kHz.

[1] https://en.wikipedia.org/wiki/Formant#Formants_and_phonetics

baq · on July 22, 2016

yeah, i can't hear anything past 13.5kHz and can understand speech just fine. can't imagine why a computer couldn't.

nullc · on July 22, 2016

Opus works great for speech recognition but I wanted to point out how your argument doesn't support the conclusion logically.

Lets imagine that human speech had a nearly unique property of having another whole copy of the speech in the form of ultrasonic overtones at 10x the normal frequency at a loud volume.

You couldn't hear them and yet you hear speech fine. But a computer could make good use of the ultrasound portion-- and maybe understand speech much better than you as a result.

This isn't how it works in reality, but it does show a flaw in your logic.

cyphar · on July 23, 2016

The argument is that if a human brain can recognise speech accurately without needing your hypotehtical ultrasonic overtones, why would a computer need them? Not to mention that most mid-range microphones won't pick up such overtones anyway. There isn't a flaw in their logic, you're just arguing that there might be more information that a computer can use -- but the fact that we don't need it leads to the conclusion that a computer doesn't need it either.

adiabatty · on July 22, 2016

They're likely thinking of people talking to people over the Internet (Skype/Mumble/Discord), not talking to machines (Siri/OK Google).