It will be more useful if it can narrate text along with those background effect...

simonw · on Sept 30, 2022

You can already achieve that by combining models - use a dedicated speech synthesis model for the narration, then layer that over background effects from AudioGen.

Given that, I don't think AudioGen particularly needs to add full narration. That seems like a very different problem to me, likely requiring a completely different architecture.

godmode2019 · on Oct 1, 2022

What is the current state of the art speech synthesis model?

ricopags · on Oct 3, 2022

It was Nvidia's Tacotron2[0] but now I believe it's NaturalSpeech[1]

[0]https://paperswithcode.com/method/tacotron-2 [1]https://speechresearch.github.io/naturalspeech/

godmode2019 · on Oct 16, 2022

Thank you