Hacker News new | past | comments | ask | show | jobs | submit login
AudioGen: Textually Guided Audio Generation (felixkreuk.github.io)
146 points by pierre on Sept 30, 2022 | hide | past | favorite | 16 comments



The last thing you'll hear before the AI eats you: https://felixkreuk.github.io/text2audio_arxiv_samples/large_...


It would be very interesting indeed to have an ebook reader paired with bluetooth earphones, and it simultaneously feeds the words into this to make an ambient soundtrack, perhaps also choosing music appropriate to the word-choice on the page.


That could be another missing piece to videogame generational art, sfx sounds and soon soundtracks.


The speech samples are really funny. Very Sims-esque.


I found them very unsettling. My brain is trying so hard to resolve words from that mess. This is the first time I’ve really thought about how the uncanny valley applies to spoken words.


The valley has no end https://youtu.be/Vt4Dfa4fOEY


It will be more useful if it can narrate text along with those background effects.


You can already achieve that by combining models - use a dedicated speech synthesis model for the narration, then layer that over background effects from AudioGen.

Given that, I don't think AudioGen particularly needs to add full narration. That seems like a very different problem to me, likely requiring a completely different architecture.


What is the current state of the art speech synthesis model?


It was Nvidia's Tacotron2[0] but now I believe it's NaturalSpeech[1]

[0]https://paperswithcode.com/method/tacotron-2 [1]https://speechresearch.github.io/naturalspeech/


Thank you


-__- I wish researchers would train a stereo 44.1kHz version...why always 16kHz? I know I know 16kHz saves more compute but come ooooon you're Meta


Text2audio is impressive, but I wanna see dance2audio. Just need a million dollars in funding to pay for cameras and dancers.


[code] redirects to the same page


According to one of the authors, the code and the models will be available soon [0]

[0] - https://twitter.com/FelixKreuk/status/1575846953333579776


s/textually/sexually

i giggled :)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: