Elevenlabs has been around for a while now. Genie has been out of the bottle for a bit, and the sooner the notion that anything digital can be easily faked seeps into the wider consciousness the better. Trust nothing.
I've seen some prank calls (a YouTuber cloned Tucker Carlson's voice and called Alex Jones) but he just had a sound bank with a few pre-generated lines and it fell apart pretty quickly.
At least for now there's too much lag to do a real time conversation with a cloned voice.
Speech to Text > LLM Response > Generate Audio
If that time can shrink to subsecond, I think there'll be madness. (Specifically thinking of romance scammers)
At last summer's WeAreDevelopers World Congress in Berlin, one of the talks I went to was by someone who did this with their own voice, to better respond to (really long?) WhatsApp messages they kept getting.
It worked a bit too well, as it could parse the sound file and generate a complete response faster than real-time, leading people to ask if he'd actually listened to the messages they sent him.
Also they had trouble believing him when he told them how he'd done it.
Believe it or not, this is how much of the population saw The Internet when it first came close to being mainstream. Everyone and their mother said "Don't believe anything you read on the cybernet", which ended up ironic as everyone and their mother ended up being the ones to believe anything on the cybernet anyways.
> everyone will live in a warped and fragile alternate reality that no one can agree on.
How is this any different from today? The various corners of the internet (which is mostly divided by languages: English, Russian, Spanish, Chinese and Portuguese) already have these vastly different realities and ground-truths.
I'm sure we could survive another Internet-Winter where people trust everything a bit less than today.
It's vastly different than today because today (or at least a few years ago), I could trust videos and voices delivered digitally. I can't do that anymore.
How long has society had voice and video delivered digitally? We managed to survive fine before we had it.
If it now becomes impossible to trust a voice received through the internet without being connected to a verified telephone number I don't know how that can be classified as society-changing.
Technology and society will adapt, just as we adapted encryption to verify credentials and secure banking data online, we'll end up with a validation signal for video and audio.