Hacker News new | past | comments | ask | show | jobs | submit login

Seems really opinionated, in a bad way:

> rates are internally converted to 48 kHz

> only frequencies up to 20 kHz are encoded.

> In particular, software developers should not use Opus Custom for 44.1 kHz support

There are times when I need to not drop non-audible frequencies. Like when my microphone is on one system and my voice recognition is on another, voice recognition needs the full spectrum for accuracy. There are times when I need 44.1kHz and not 48 kHz, like on an embedded system with everything running at that and no performance left for converting or extra PCM channels for playing with different settings at once.

They keep saying it is designed for the internet, but did they miss the fact that all major players now have voice assistants like Siri, Google Assistant, Cortana? Did they miss the fact that more and more sensors are cheap embedded throwaway IoT devices? It's like it is designed for the internet of the 90s.




Voice recognition is very accurate on Opus (especially when trained on it). Commentary from an eng at a big voice rec house to me was that switching to opus _improved_ accuracy on their testing.

> There are times when I need 44.1kHz and not 48 kHz, like on an embedded system with everything running at that and no performance left for converting or extra PCM channels

Unfortunately we live in an interoperable world. And there are _many_ devices running at 48kHz without the resources to resample. Your devices would be unable to interoperate.

Being "opinionated" in this way is specifically to accomplish the goal of guarenteeing interoperability in a world with cheap throwaway IoT devices.

> the fact that all major players now have voice assistants like Siri, Google Assistant, Cortana?

You might want to look at how these systems are sending audio...


Good points, but I just wanted to point out that we are only missing a 4 kHz range (20-24 kHz) as the highest frequency a 48 kHz digital signal can represent is 24 kHz (Nyquist sampling theorem).

MP3s actually tend to have a worse cutoff depending on encode settings, with values from 20.5 down to 16 kHz [1].

[1] https://www.whatinterviewprep.com/prepare-for-the-interview/...


I agree but

> voice recognition needs the full spectrum for accuracy

do you have a source for this? Voice signals are conventionally low-bandwidth; 16kHz is usually "good enough" for human-human transmission. Formant frequencies top out around 3kHz [1] and upper vocal harmonics are not really important outside musical applications. Consonants are a bit more complicated but I'd be interested to know what voice information is present above 20kHz.

[1] https://en.wikipedia.org/wiki/Formant#Formants_and_phonetics


yeah, i can't hear anything past 13.5kHz and can understand speech just fine. can't imagine why a computer couldn't.


Opus works great for speech recognition but I wanted to point out how your argument doesn't support the conclusion logically.

Lets imagine that human speech had a nearly unique property of having another whole copy of the speech in the form of ultrasonic overtones at 10x the normal frequency at a loud volume.

You couldn't hear them and yet you hear speech fine. But a computer could make good use of the ultrasound portion-- and maybe understand speech much better than you as a result.

This isn't how it works in reality, but it does show a flaw in your logic.


The argument is that if a human brain can recognise speech accurately without needing your hypotehtical ultrasonic overtones, why would a computer need them? Not to mention that most mid-range microphones won't pick up such overtones anyway. There isn't a flaw in their logic, you're just arguing that there might be more information that a computer can use -- but the fact that we don't need it leads to the conclusion that a computer doesn't need it either.


They're likely thinking of people talking to people over the Internet (Skype/Mumble/Discord), not talking to machines (Siri/OK Google).




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: