Hacker News new | past | comments | ask | show | jobs | submit login

My first and ongoing thought is that immoral/criminal uses of voice cloning vastly exceed any legitimate ones.



My first thought is anonymity. I can make YouTube videos without needing to use my real voice... while being able to keep my personal inflections and emphasis, something TTS (AI) voices can't do.

Or...! Indie game development. I can learn basic voice acting (to get rid of the cringe), and act out all of my characters using different voices.


The indie game development and animated short content are the primary uses for this type of NN for me. I’m working (not very successfully) at putting together a single source voice — to many result voice ‘style transfer’ solution using standard PyTorch components. Realistically I can pay for the target sample voice to record some amount of varied vocal performance and then hopefully if the net is trained specifically on my voice as the source the hope is the transfer can capture the ‘performance’ qualities in my original.

And in case anyone is concerned, I intend to make the purpose of the vocal samples clear to the provider and then arrange appropriate credit and compensation to those whose voices I used. I also don’t intend to train with anything but public domain and purchased data.


Out of curiosity, what/how many legitimate use cases have you considered?


Potential legitimate uses I can think of -

1. licensing voice to other uses - people with recognizable trademarkable voices (actors, singers) have another potential revenue stream. yay!

2. use of past voices - voices that are not 'owned' from the past - let's say Humphrey Bogart's voice, can be used in projects without having to pay for imitator. This would be useful for both marketing and artistic projects. But probably less for marketing because they will want to go with step 1.

3. Teach yourself to talk like X. People who need to learn to talk like a particular person / have a particular accent could learn quicker. Just think - you will be able to supplement your comedy routine with kickass Christopher Walken impersonations any day now!

Variations of 3 and 2 together open up interesting modes of aesthetic impression, but I won't go into that here. But definitely I have some ideas that might benefit from being able to do this.


Surely software that communicates with people using natural language should be topping all these lists. Direct communication through voice with a local LLM is already possible. It won't be long before it's fast enough to approach natural interaction (if we can solve the "when is it my turn to speak" challenge) and then we enter a new phase of digital interaction and AI training.


A legitimate use, in the abstract, is one where a particular individual is willing to have their voice used to say X. The entertainment industry - movies and games - are likely to want this.

But if it's trivial to use somebody's voice to say any arbitrary thing, then it'll be done. Combined with deepfake videos, the result will be the ability to show anyone saying anything, including lies and things they find incredibly objectionable, in a disturbingly realistic way, and more so as time wears on.

The fundamental issue is that we don't live in a rights-respecting world. Making it easy to utter anything in the voice of anyone will lead to many more abuses than legitimate instances.


People will get immune to it if they aren't already. It's already common to fake screenshots of tweets/etc. Not a real problem unless you want to beleive falsehoods, then you will anayway.


What of commercial uses being greater than illegitimate ones? YouTube will give people the ability to hear it in their own localized language in the author's voice.


I disagree, we should just not accept voice as authentication.

I think the most common use case will be making art & content programmatically without voice actors (and most likely without actors at all once we nail video or a 3d model pipeline + frame by frame transformation to make it look realistic)


Talk with your loved ones and make a paraphrase for if you're stuck in a emergency and need money wired or something.

Some banks have voice authentication when you call in and you have to ask to opt out.


Which just means we need to build protocols around this risk, rather than foolishly trying to shove the genie back in the bottle, lest we be left with only the criminal uses




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: