Very cool. seems like they addressed the problem of hallucination. would be inte...

CharlesW · on March 4, 2024

Isn't packet loss concealment (PLC) a form of hallucination? Not saying it's bad, just that it's still Making Shit Up™ in a statistically-credible way.

jmvalin · on March 4, 2024

Well, there's different ways to make things up. We decided against using a pure generative model to avoid making up phoneme or words. Instead, we predict the expected acoustic features (using a regression loss), which means that model is able to continue a vowel. If unsure it'll just pick the "middle point", which won't be something recognizable as a new word. That's in line with how traditional PLCs work. It just sounds better. The only generative part is the vocoder that reconstructs the waveform, but it's constrained to match the predicted spectrum so it can't hallucinate either.

stevage · on March 4, 2024

Any demos of this to listen to? It sounds potentially really good.

GaggiX · on March 4, 2024

There is a demo in the link shared by OP.

CharlesW · on March 4, 2024

That's really cool. Congratulations on the release!

derf_ · on March 4, 2024

The PLC intentionally fades off after around 100 ms so as not to cause misleading hallucinations. It is really just about filling small gaps.

skybrian · on March 4, 2024

In a broader context, though, this happens all the time. You’d be surprised what people mishear in noisy conditions. (Or if they’re hard of hearing.) The only thing for it is to ask them to repeat back what they heard, when it matters.

It might be an interesting test to compare what people mishear with and without this kind of compensation.

jmvalin · on March 4, 2024

As part of the packet loss challenge, there was an ASR word accuracy evaluation to see how PLC impacted intelligibility. See https://www.microsoft.com/en-us/research/academic-program/au...

The good news is that we were able to improve intelligibility slightly compared with filling with zeros (it's also a lot less annoying to listen to). The bad news is that you can only do so much with PLC, which is why we then pursued the Deep Redundancy (DRED) idea.

tialaramex · on March 4, 2024

Right, this is why the Proper radio calls for a lot of systems have mandatory read back steps, so that we're sure two humans have achieved a shared understanding regardless of how sure they are of what they heard. It not only matters whether you heard correctly, it also matters whether you understood correctly.

e.g. train driver asks for an "Up Fast" block. His train is sat on Down Fast, the Up Fast is adjacent, so then he can walk on the (now safe) railway track and inspect his train at track level, which is exactly what he, knowing the fault he's investigating, was taught to do.

Signaller hears "Up Fast" but thinks duh, stupid train driver forgot he's on Down Fast. He doesn't need a block, the signalling system knows the train is in the way and won't let the signaller route trains on that section. So the Up Fast line isn't made safe.

If they leave the call here, both think they've achieved understanding but actually there is no shared understanding and that's a safety critical mistake.

If they follow a read-back procedure they discover the mistake. "So I have my Up Fast block?" "You're stopped on Down Fast, you don't need an Up Fast block". "I know that, I need Up Fast. I want to walk along the track!" "Oh! I see now, I am filling out the paperwork for you to take Up Fast". Both humans now understand what's going on correctly.

a_wild_dandan · on March 4, 2024

To borrow from Joscha Bach: if you like the output, it's called creativity. If you don't, it's called a hallucination.

Aachen · on March 4, 2024

That sounds funny, but is it true? Certainly there's a bias that goes towards what you're quoting, but would you otherwise genuinely call the computer creative? Is that a positive aspect of a speech codec or of an information source?

Creative is when you ask a neural net to create a poem, or something else from "scratch" (meant to be unique). Hallucination is when you didn't ask it to make its answer up but to recite or rephrase things it has directly observed

That's my layman's understanding anyway, let me know if you agree

skybrian · on March 4, 2024

That's almost the same. You could say it's being creative by not following directions.

Creativity isn't well-defined. If you generate things at random, they are all unique. If you then filter them to remove all the bad output, the result could be just as "creative" as anything someone could write. (In principle. In practice, it's not that easy.)

And that's how evolution works. Many organisms have very "creative" designs. Filtering at scale, over a long enough period of time, is very powerful.

Generative models are sort of like that in that they often use a random number generator as input, and they could generate thousands of possible outputs. So it's not clear why this couldn't be just as creative as anything else, in principle.

The filtering step is often not that good, though. Sometimes it's done manually, and we call that cherry-picking.

Dylan16807 · on March 5, 2024

Doesn't the context affect things much more than whether you like the particular results?

Either way, "creativity" in the playback of my voice call is just as bad.

CharlesW · on March 4, 2024

I love that, what's it from? (My Google-fu failed.) Unexpected responses are often a joy when using AI in a creative context. https://www.cell.com/trends/neurosciences/abstract/S0166-223...

a_wild_dandan · on March 5, 2024

It was from one of his podcast appearances. Which doesn't narrow it down much, unfortunately. Most likely options:

https://www.youtube.com/watch?v=LgwjcqhkOA4

https://www.youtube.com/watch?v=sIKbp3KcS8A

https://www.youtube.com/watch?v=CcQMYNi9a2w

Sonic656 · 2024-03-13T09:12:11 1710321131

There something darkly funny that Opus could act psychotic because It glitched out or was fed something really complex. But you could argue transparent lossy compression at 80 ~ 320kbps is a controlled Deliriant like hallucination going by how only rare few can tell them apart from Lossless.