It's not just a random "voice for your chatbot", it's that particularly breathy, chatty, voice that she performed for the movie.
I would agree with you completely if they'd created a completely different voice. Even if they'd impersonated a different famous actress. But it's the fact that Her was about an AI, and this is an AI, and the voices are identical. It's clearly an impersonation of her work.
1. Multiple people agree that the casting call mentioned nothing about SJ/her
2. The voice actress claims she was not given instructions to imitate SJ/her
3. The actress's natural voice sounds identical to the AI-generated Sky voice
I don't personally think it's anywhere near "identical" to SJ's voice. It seems most likely to me that they noticed the similarity in concept afterwards and wanted to try to capitalize on it (hence later contacting SJ), opposed to the other way around.
>I don't personally think it's anywhere near "identical" to SJ's voice. It seems most likely to me that they noticed the similarity in concept afterwards and wanted to try to capitalize on it (hence later contacting SJ), opposed to the other way around.
So your theory is that this was completely coincidental. But after the voice was recorded, they thought, "Wow, it sounds just like the voice of the computer in Her! We should contact that actress and capitalize on it!"
That's what you're going with? It doesn't make sense, to me.
Listen to the side by side comparisons. Sky has a deeper voice overall, in the gpt4o demo Sky displays a wider pitch range because the omni model is capable of emotional intonation. Her voice slides quite a bit while emoting but notably doesn't break and when she returns to her normal speaking voice you can hear a very distinct rhotic sound, almost an over-pronounced American accent and she has a tendency towards deepening into vocal fry especially before pauses. I'd describe her voice as mostly in her chest when speaking clearly.
Now listen to SJ's Samantha in Her and the first thing you'll notice are the voice breaks and that they break to a higher register with a distinct breathy sound, it's clearly falsetto. SJ seems to have this habit in her normal speaking voice as well but it's not as exaggerated and seems more accidental. Her voice is very much in her head or mask. The biggest commonality I can hear is that they both have a sibilant S and their regional accents are pretty close.
I was thinking someone thought "oh that sounds a fair bit like SJ in Her, if we can get SJ onboard, perhaps we can fine-tune what we got to sound like SJ in Her".
... that it would be even better to have a famous voice from Her than a rather generic female voice they had, but their proposal was declined. Well oops, but SJ, famous as she is, doesn't have a copyright right on all female voices other than her own.
No-one had to explicitly say any of that for it to still be an impersonation. Her was a very popular film, and Johansson's voice character was very compelling. They literally could have said nothing and just chosen the voice audition closest to Her unconsciously, because of the reach of the film, and that would still be an impersonation.
> They literally could have said nothing and just chosen the voice audition closest to Her unconsciously, because of the reach of the film, and that would still be an impersonation
That's a very broad definition of impersonation, one that does not match the legal definition, and one that would would be incredibly worrying for voice actors whose natural voice happens to fall within a radius of a celebrity's natural voice ("their choice to cast you was unconsciously affected by similarity to a celebrity, therefore [...]")
What you're arguing fails to pass the obviousness test ; if I were running the company it would be blankly obvious that the optics would be a problem, so I would start to collect a LOT of paperwork documenting that the casting selection was done without a hint of bias towards a celebrity's impression. Where is that paperwork? The obviousness puts the burden on them to show it.
Otherwise your argument lets off not just this scandal but an entire conceptual category of clever sleazy moves that are done "after the fact". It's not the the Kafka trap you're making it out to be.
> if I were running the company it would be blankly obvious that the optics would be a problem, so I would start to collect a LOT of paperwork documenting that the casting selection was done without a hint of bias towards a celebrity's impression. Where is that paperwork? The obviousness puts the burden on them to show it.
I think optics-wise the best move at the moment is quelling the speculation that they resorted to a deepfake or impersonator of SJ after being denied by SJ herself. The article works towards this by attesting that it's a real person, speaking in her natural voice, without instruction to imitate SJ, from a casting call not mentioning specifics, casted months prior to contacting SJ. Most PR effort should probably be in giving this as much of a reach as possible among those that saw the original story.
Would those doing the casting have the foresight to predict, not just that this situation would emerge, but that there would be a group considering it impersonation for there to be any "hint of bias" towards voices naturally resembling a celebrity in selection between applicants? Moreover, would they consider it important to appeal to this group by altering the process to eliminate that possible bias and providing extensive documentation to prove they have done so, or would they instead see the group as either a small fringe or likely to just take issue to something else regardless?
> Would those doing the casting have the foresight to predict, ...
Yes, this should all have been obvious to those people. It would require a pretty high degree of obliviousness for it to not be obvious that this could all blow up in exactly this way.
It blew up by way of people believing it was an intentional SJ deepfake/soundalike hired due to being rejected by SJ. I think this article effectively refutes that.
I don't think it blew up by way of people believing simply that those doing the casting could have a hint of a subconscious bias towards voices that sound like celebrities. To me that seems like trying to find anything to still take theoretical issue in, and would've just been about something else had they made the casting selection provably unbiased and thoroughly documented.
Again, I think it requires a high degree of obliviousness to not have the foresight during casting to think, "if we use a voice that sounds anything like the voice in the famous smash hit movie that mainstreamed the idea of the kind of product we're making, without actually getting the incredibly famous voice actress from that movie to do it, people will make this connection, and that actress will be mad, and people will be sympathetic to that, and we'll look bad and may even be in legal hot water". I think all of that is easily predictable!
It seems way more likely to be a calculated risk than a failure of imagination. And this is where the "ethics" thing comes into play. They were probably right about the risk calculation! Even with this blow-up, this is not going to bring the company down, it will blow over and they'll be fine. And if it hadn't blown up, or if they had gotten her on board at any point, it would have been a very nice boon.
So while (in my view) it definitely wasn't the right thing to do from a "we're living in a society here people!" perspective, it probably wasn't even a mistake, from a "businesses take calculated risks" perspective.
> "if we use a voice that sounds anything like the voice in the famous smash hit movie that mainstreamed the idea of the kind of product we're making [...]
I think it's deceptively easy to overestimate how likely it is for someone to have had some specific thought/consideration when constructing that thought retroactively, and this still isn't really a specific enough thought to have caused them to have set up the casting process in such a way to eliminate (and prove that they have eliminated) possible subconscious tendency towards selecting voice actors with voices more similar to celebrities.
But, more critically, I believe the anger was based on the idea that it may be an intentional SJ soundalike hired due to being turned down by SJ, or possibly even a deepfake. Focusing on refuting that seems to me the best PR move even when full knowledge of what happened is available, and that's what they're doing.
I'm sorry, but your first paragraph is a level of credulity that I just can't buy, to the point that I'm struggling to find this line of argument to be anything besides cynical. The most charitable interpretation I might buy is that you think the people involved in this are oblivious, out of touch, and weird to a degree I'm not willing to ascribe to a group of people I don't know.
If you are an adult living and working in the US in the 2020s, and you are working on a product that is an AI assistant with a human voice, you are either very aware of the connection to the movie Her, or are disconnected from society to an incredible degree. I would buy this if it were a single nerd working on a passion project, but not from an entire company filled with all different kinds of people.
The answer is based on "they wanted a voice that sounds like the one in Her, but the person whose voice that is told them no, but then they did it anyway". The exact sequence of events isn't as important to the anger as you seem to think, though it may be more important to the legal process.
My claim is not that they hadn't heard of the movie her, but that while setting up auditions, the chain of thought that would lead them to predict a group would take issue in this very particular way (marcus_holmes's assertion that unconsciously favoring the VA's audition would constitute impersonation) that necessitates the proposed rigor (setting up auditions in a way to eliminate possibility of such bias, and paperwork to prove as such), and consider it worthwhile appeasing the group holding this view, is not so certain to have occurred that the seeming lack of such paperwork can be relied on to imply much at all.
I would go further and say that chain of reasoning is not just uncertain to have occurred, but would probably be flawed if it did - in that I don't think it would noticeably sway that group. Opposed to the evidence in the article, or some forms of other possible possible evidence, which I think can sway people.
> The exact sequence of events isn't as important to the anger as you seem to think, though it may be more important to the legal process.
Less the order of events, and more "seeking out an impersonator and asking them to do an imitation" vs "possibility of unconscious bias when selecting among auditions"
The way you write it makes it sound very complicated, but in this situation, I would definitely think "we better be really careful about who we hire here in order to avoid people making the connection with the movie voice, unless we can actually get Scarlett Johansson to do the voice", and that thought process would take less than 5 seconds.
And it is not unusual at all for there to be things that everyone knows should not be written down, but either discussed only in person, or left implicit. There is usually a few slip ups though, which would come out in discovery.
> "possibility of unconscious bias when selecting among auditions"
I think "conscious but not stated to the actress" is the more likely explanation, that is not inconsistent with this reporting.
For what it's worth, if this does go to court (which I doubt), and there is discovery and depositions, and they don't find any documentation, or get any statements suggesting that this was indeed understood to be the goal, then I would be a lot more convinced.
But I think it's a giant stretch to have the base case be that nobody thought of this and they were all shocked, shocked! that people made this connection after they released it.
Wouldn't say it's complicated, but it is a specific point. Attacking claims like "they were all shocked, shocked! that people made this connection after they released it" is meaningless when that is not a claim I'm making or relying on. This stems from me disputing a claim that the VA impersonated SJ/her, because of possible unconscious bias of the casting directors, and the supposed obviousness that they would've set up and extensively documented the auditions in such a way to disprove that.
I'd be more convinced, at least of the fact that it would have even been a good call, if I saw outrage sparked by the possibility of unconscious bias, opposed to what can or has been addressed by other forms of evidence. Claims along the lines of "I'd totally have thought [...]" made in retrospect are entirely unconvincing, particuarly in cases where the suggested thought is not sufficient.
> I don't know why you started focusing on "unconscious bias"
That's what I've been taking issue to from the beginning of this chain[0]. In all but one comment since then I've explicitly specified "[un|sub]conscious bias".
On that topic, would you agree with me that it is not "obvious" that they would predict a group would take issue in this very particular way such that it would necessitate setting up and documenting auditions to prove they have eliminated such bias, and then additionally determine it worthwhile to actually do so?
Fair enough. I guess I just shouldn't have responded. I can't really say whether I agree with you or not; I think the whole line of speculation is a non sequitur.
A lot of legal constructs are defined by intent, and intent is always something that is potentially hard to prove.
At most the obviousness should the burden of discovery on them, and if they have no records or witnesses that would demonstrate the intent, then they should be in the clear.
> I would start to collect a LOT of paperwork documenting that the casting selection was done without a hint of bias towards a celebrity's impression.
IMO having records that explicitly mention SJ or Her in any way would be suspicious.
SJs voice has some very distinctive characteristics and she has distinctive inflections that she applies. None of that inflection, tonality, or characteristics are present in the chat bot voice. Without those elements, it can be said to be a voice with vaguely similar pitch and accent, but any reasonable “impersonation “ would at least attempt to copy the mannerisms and flairs of the voice they we’re trying to impersonate.
Listening to them side by side, the OpenAI voice is more similar to Siri than to SJ. That Sam Altman clearly wanted SJ to do the voice acting is irrelevant, considering the timings and the voice differences.
Intent on whose part, though? Like, supposing in arguendo that the company's goal was to make the voice sound indistinguishable from SJ's in Her, but they wanted to maintain plausible deniability, so instead cast as wide a net as possible during auditions, happened upon an actor who they thought already sounded indistinguishable from SJ without special instruction, and cast that person solely for that reason. That seems as morally dubious to me as achieving the same deliberate outcome by instruction to the performer.
> happened upon an actor who they thought already sounded indistinguishable from SJ without special instruction, and cast that person solely for that reason
so who was doing the selecting, and were they instructed to perform their selection this way? If there was a law suit, discovery would reveal emails or any communique that would be evidence of this.
If, for some reason, there is _zero_ evidence that this was chosen as a criteria, then it's pretty hard pressed to prove the intent.
I have this sinking feeling that in this whole debate, whatever anyone's position is mostly depends on whether they think it's good that OpenAI exists or not.
No, I'm happy that OpenAI exists. But alarmed that they're being so mendacious.
If they just said "we loved the film, we wanted that feel, SJ wasn't willing, so we went for it anyway. Obviously that's backfired and we're rethinking" then I would have a thousand times more comfort than this corporate back-covering bullshit.
First two claims are "according to interviews with multiple people involved in the process", direct quotes from the casting call flier, and "documents shared by OpenAI in response to questions from The Washington Post". Given the number of (non-OpenAI) people involved, I think it would be difficult to maintain a lie on these points. Third claim is a comparison carried out by The Washington Post.
This is why things are decided by juries. You may well truly believe this all seems unrelated and above board. But very few people will agree with you when presented with these facts, and it would be hard find them during a jury selection.
> > 3. The actress's natural voice sounds identical to the AI-generated Sky voice
> No it doesn't.
That's a verbatim quote from the article (albeit based on brief recordings).
I haven't heard the anonymous voice actress's voice myself to corroborate WP's claim, but (unless there's information I'm unaware of) neither have you to claim the opposite.
Just to make sure: have you correctly understood the article's claim?
It's saying that the anonymous voice actress's natural voice sounds identical to the AI-generated Sky voice (which implies it has not been altered by OpenAI to sound more like SJ, nor that they had her do some impression beyond her own natural voice).
If so, could you link the clips of the voice actress's natural voice, to compare to the AI version? I've searched but was unable to find such clips.
The AI voice is what’s in question here and there are plenty of examples, just listen to ones that aren’t intentionally selected to try to create the impression of similarity. Sky has been around for half a year so you don’t have to limit yourself to the tech demo but even if you do if you listen to the whole thing you’ll see the normal speaking voice is very different from ScarJo.
I know and agree. The article's claim is that the anonymous voice actress's natural voice sounds identical to the AI-generated Sky voice. That is, VA = AI, not AI = SJ or VA = SJ. Which corroborates with the claim that OpenAI was not asking her to do any particular impression.
This is unclear. What is clear is OpenAI referenced Her in marketing it. That looks like it was a case of poor impulse control. But it's basis for a claim.
How do you explain the many people saying that the voices do not sound especially similar?
"The pitch is kiiiiiind of close, but that's about it. Different cadence, different levels of vocal fry, slightly different accent if you pay close attention. Johansson drops Ts for Ds pretty frequently, Sky pronounces Ts pretty sharply. A linguist could probably break it down better than me and identify the different regions involved."
There are approximately 4 billion women in the world. Given that I know a few people who sound very similar to me, I would say that there are (subjectively) perhaps 1,000 to 10,000 different types of women's voices in the world.
This would mean that a celebrity could possess a voice similar to 0.5 million to 5 million other women, and potentially claim royalties if their voice is used.
This types of estimate is sadly deeply flawed. Voices are affected not just by ethnicity but also language and culture. I know because I can feel slight noticeable differences in tones between communities just 30-50 miles apart, within same social classes and everything. Bilinguals also sound noticeably different to monolinguals even in their primary language.
I think people thinks others sound same not because they're similar from beginning, but because voices must homogenize under peer pressures. There's a proverb "nails that sticks out gets hammered". Most people probably has hammered flat voices intended to not stand out.
You first said that Sky is "clearly an impersonation" of Johansson. Now you say that it's not a coincidence they chose Sky's voice actress. These are two different claims. It may not be a coincidence in the sense that they may have chosen Sky's actress because she sounds similar to Johansson. But that alone doesn't constitute an impersonation. Impersonation means deliberately assuming the identity of another person with the intent to deceive. So you'd have to demonstrate more than a degree of similarity to make that case.
In the Ford case they hired an impersonator to sing one of her copyrighted songs, so it's clearly an impersonation.
In OpenAI's case the voice only sounds like her (although many disagree) but it isn't repeating some famous line of dialog from one of her movies etc, so you can't really definitively say it's impersonating SJ.
A very simple answer is that they also wanted SJ as a voice. That doesn't mean the original voice was trying to copy SJ.
They had a voice, the natural comparison watching the interaction is to Her, and they there is likely still time to get actually SJs voice before a public rollout.
In particular, I'm pretty sure you can bake a voice model well enough for a canned demo in two days given we have cloning algs which will do it with 15 seconds of samples.
Production ready? Probably not, but demos don't have to be.
I find this whole thing very confusing because I never thought the voices sounded identical or very similar. I was initially even more confused about this whole thing because I thought I couldn't find clips of the SJ AI voice only to realize I had but it didn't sound like her.
This opinion is independent of Sam being a conman, scammer, creepy Worldcoin weirdo, and so on.
I may be wrong, but I believe this case would be made to a jury, not to a judge.
I think it would be hard to seat a jury that, after laying out the facts about the attempts to hire Johansen, and the tweet at the time of release, would have even one person credulous enough to be convinced this was all an honest mix-up.
Which is why it will never in a million years go to a trial.
Can someone explain to me the outrage about mimicking a public persons voice, while half the people on hacker news argue that it's fine to steal open source code? I fail to see the logic here? Why is this more important?
It's not just a random "voice for your chatbot", it's that particularly breathy, chatty, voice that she performed for the movie.
I would agree with you completely if they'd created a completely different voice. Even if they'd impersonated a different famous actress. But it's the fact that Her was about an AI, and this is an AI, and the voices are identical. It's clearly an impersonation of her work.