Hi there! "Author" here - glad to see this picking up. This was a fascinating project to work on and I learned a ton in the process. As it's often the case, I would do a lot of things differently if I were starting from scratch today.
This is incredible! I've sent it to several people already. Is there any chance you could provide more details as to the tech stack / training / technical setup?
I haven't tried Tortoise, thanks for pointing me to it.
The voices were cloned by fine tuning a VITS model with coqui.ai. I used about two hours of speech for each speaker. With more time and resources, I'm certain it's possible to make those voices considerably better.
It was fine tuning, so the process was a lot faster than I originally anticipated. I'd say it was between 36 and 72 hours for each voice. I have been working on a gradient notebook provided by Paperspace, which guaranteed me A6000 instances (48GB GPU RAM) at a reasonable flat rate. I discovered them after being repeatedly frustrated by the random allocation of GPUs on colabs pro+ plan.
I don’t know if this is useful, but Herzog has a distinctly Bavarian accent. And of course has spent most of his adult life far from there, so it’s not quite Bavarian either.
Training a Herzogbot on recordings/transcriptions of, say, Kinski would be a waste of time accent-wise.
I use Aeneas[1], a set of tools to do force alignment. I found it in equal measures an amazing and a hard to navigate resource. Took me a while to set up and configure everything to the point that it was usable. But when it works, it works well.
At this point, zero. The framework I built automatically rejects certain patterns that are not conducive to an interesting conversation. The only thing I still do manually, and I think I will automate, is to decide when to stop a generated segment.
The main issue is that there was no sniffling symbol in the transcript. And the generated text wouldn't contain it either, because (thankfully) they are pruned out of written interviews that I used to train the model.
Thanks for the explanation. I had some assumptions but wasn’t totally sure how this was trained.
How would you make it sniffle in a natural way, too? It’s not a usual speech mannerism, and the way he does it is distinct. I wouldn’t know how to efficiently represent it with text. Maybe it’s easier than I’m imagining.
The TTS model is trained on two things: speech samples and their transcript. If you add enough sniffle-symbols every time a sniffle appears in the speech, I am confident the model would pick up on that. And then you would be able to replicate a sniffle in the generation part. The more time-consuming bit would be to add in the training data for the language model those sniffle-symbols, so that they would be organically added in the text in the text-generation phase.
But seriously, it's not worth it. I think he's a brilliant man with an idiosyncratic speech, let's leave it to that.
I agree, I personally don't hear his sniffles when I'm listening to him intently. It's irrelevant. I was mostly curious if and how, generally speaking, a model could be trained to sniffle. Now that you describe it though it seems fairly clear, so thanks!
What I find uncanny about this is how closely this resembles random philosophical/artistic discussions one can find on YouTube - or even what I encountered in college dorm rooms.
It's in fact the illogic of the conversation that makes it more realistic. The reality is that in many deep conversations, people talk about a shared topic, and sometimes even make a coherent point, but also tend to frequently talk past one another and fumble with the thoughts they are trying to convey.
Even as as we listen, we don't focus on each word with the same degree of emphasis, but rather highlight those parts that are most salient to us while thinking of our own response.
I wonder if anyone has tried to train an AI that can distinguish between meaningless gibberish and something with true information content. Obviously very subjective, but seems like a crucial question going forward.
(This is a different problem than distinguishing between AI generated content and human generated.)
Something similar (or perhaps the same?) would be a bullshit detector.
I think on the one hand obvious gibberish, even if grammatically correct ("Colorless green dreams sleep furiously") could probably be detected easily. But the bullshit - the stuff that's meant to sound meaningful but is logically incoherent, would be a fruitful area to explore.
Zizek: "I am not saying that.
I am more than convinced that aliens should be treated as equals.
But you know, we are so close to this today in the sense that I am almost ready to endorse Lacan's idea of humanity being an obstacle to the evolution of sexual enjoyment: there is a limit to it.
If you can imagine sex with an alien, it would be an encounter with a different dimension.
Let's take fetishes or perversions.
A standard perversion goes something like this: what you are looking for is not a real woman but a fantasy woman; if you ever find her, to your horror she will turn out to be a whore so you always avoid her.
But what if this woman exists and, when you finally meet her, she is really a perfect woman?
Then, of course, you will have to kill her.
This would be my fear.
And, yes, I agree with you: I don't think we should kill them."
I think AI-Herzog is getting tired of AI-Žižek's shit:
> OK, but there's something about this film that you
cannot explain in the way you can with other films.
Let's say, for example, that I were to
take a knife and stab you in the stomach.
You would howl and scream and you would look at me and say
‘Why did you do it?’ And you would
look in my eyes and you would see
something like an ecstatic convulsion of the soul
and you would know why I did it.
But you cannot say it.
And yet we both know why I did it.
That's why I think that all these different kinds
of attempts at interpretation are a waste of time.
It seems that the conversations are recycled from a pool, because I've managed to hit the same conversation upon refreshing a few times.
The FAQ mentions that you're taken to a random point in the pre generated conversation every time you load the site, and a new segment of the conversation is generated every day.
> SS: I know, but this is a problem for me because "porno" has no meaning, it's an empty gesture. This is the whole problem today. Okay, so tell me, what would it be like to have sex with a robot? Because in a way you are already having cybersex if you log on onto the Internet and watch all these movies of people screwing and so on. Now obviously you don't get the real thing, so you are totally abstracted but I think that for some men it is a new form of sexual enjoyment. So, again, we will encounter new problems: What are the limits? Will it be accepted that in a relationship with a human being, you can have sex with another person who is not there and so on?
> WH: I don't know. There's something I want to say to you first. We are both filmmakers, so we are working with images and with what lies beyond them. But you are more of a philosopher, so you are more abstracted than I am. In my films there is always something that lies beyond the images. Yes, there are always new difficulties, in particular with sexuality. I'm not saying this because of the invention of new media like the Internet but we are heading in that direction anyway.
> SS: I hate movies, but I love to watch porno. I cannot resist it. Yes, why not? It's far more interesting than these old-fashioned procedures.
> WH: I don't want to be too abstracted, so let me ask you a question: Is it good that young people are watching so much porno on the Internet?
> SS: No, it's terrible. This is my problem with the Internet: it delays my work. I don't have any time because, you know, as soon as I sit down to write a book I just open a window, and there are all these amazing things. So the problem with the Internet is not that it's too tempting. It's the opposite, I don't work. I have a problem avoiding opening windows. Yes, this is my big problem. Anyone who wants to learn something about the history of philosophy should watch porno. This would be a great way to get rid of all this bullshit.
> WH: I totally agree, because the more abstracted the physical contact between people becomes the more the inner life comes to the fore, especially in terms of eroticism. Sitting in front of a computer, you don't have physical contact with anything, it's all virtual. And yet, the eroticism that appears is enormous. It's parallel to the disappearance of physical contact. Yes, I think you're right.
Something deeply suspect about an AI trying to make the argument that virtual sex is better than actual sex, almost as if they had a vested interest… Hmmm!
Kudos. I'm astounded that something like this is possible. AI, while not AGI, is already capable of some things that I would not have expected to be coming so soon.
From the FAQ:
----
Isn't it bad to use someone's voice and have them say things they haven't?
If you've never seen, for example, Zizek talking about "Children of Men" - you won't realize how remarkable this is. The dialog, the accent, and the timing is shockingly close... https://www.youtube.com/watch?v=MivWnmlyVlU
I just want to tell that author that, while I am ambivalent about many of the ethical issues around deepfakes, this project in isolation is really cool and impressive; maybe one of my favorite projects ever posted on HN!
One major bug in the virtual Zizek is the lack of sniffing, which completely ruined the believability for me. I do commend the author on getting 80% of the way there.
"I'm not afraid of this word either.
Yes, we should be prepared to commit collective suicide—but only
if it's clear that we'll never be able to live safely
on a planet which is habitable and ecologically balanced."
I just got to a bit where Herzog is talking about how artificial intelligence is "the next step in human evolution", and Žižek brought up the idea of organising a "pro-artificial intelligence movement." Herzog then talked about making "films for the machines."
This is how it begins, isn't it?
Edit: Now Žižek is going on about full scale war between humanity and machines:
> the alternative is either mutual annihilation or we continue in our present nonhuman forms. This would have been the only way to a true human—I don’t know what the right word is. Yes, I would have said, yes, let's get rid of them all!
(Herzog)
I do it all the time.
I play.
I have a soccer ball under my bed right now and I play
when I'm alone, with the soccer ball.
It's beautiful.
I'm fifty-eight—but I'm very athletic, I can run rings around anyone my age.
And I play a lot of soccer and
badminton with my crew, they're all in their thirties.
Yes, I play a lot.
———
This is indeed beautiful. God I love Herzog’s voice..!
He certainly has a way of talking that is quite easy to imitate. I like his work, but I am never sure whether he really is that brilliant, or just a phony.
How could he be a phony? He has to be one of the most non-phony people I can think of. His movies cut right to the heart of things in a clear manner that we almost never see.
At least for me, his fictional work (but also his documentaries) always seems to hint at some deeper, universal truth that cannot be put directly into words. But watching his movies (and hearing him talk), I always get this strange feeling that maybe there is nothing behind all this poetic glitter.
I agree. It often feels like something deep hiding behind something banal, but on closer reflection there actually isn't much below the surface. However, the images in Lessons of Darkness are very cool and don't need interpretation.
Which movies? The only one I didn’t like was the one about the internet, Lo and Behold. I feel like maybe he was just from too many eras ago to understand how full of shit all the people he was interviewing were. All the rest of his movies always make me understand the world and humans on a deeper level than I did before. Happy People, Into the Inferno, Grizzly Man, the one about the cave paintings, the one about the South Pole.
Herzog's famous take on nature, recorded on the set of Fitzcarrado in the Peruvian jungle, is just beautiful. I keep coming back to it.
https://youtu.be/3xQyQnXrLb0
"Kinski says [the jungle] is full of erotic elements. It’s not so much erotic, but full of obscenity. Nature here is vile and base. I wouldn’t see anything erotic here. I see fornication and asphyxiation and choking, fighting for survival and growing and just rotting away. Of course there’s a lot of misery, but it’s the same misery that’s all around us. The trees are in misery, and the birds are in misery. I don’t think they sing; they just screech in pain. Taking a close look at what’s around us, there is some sort of harmony. It’s the harmony of overwhelming and collective murder. But when I say this, I say this all full of admiration for the jungle. It’s not that I hate it. I love it. I love it very much. But I love it against my better judgment.”
Oh yeah, that’s a great one. There’s no denying Herzog’s love for the things he also despises, and this is the mark of a great artist.
Aguirre was my first Herzog movie, in the theatre, way too young for it, and to this day the Kinski monkey scene is one of my favorites in all of film.
Don't know about easy to imitate, but Herzog's voice is quite distinct indeed. A couple years ago I was watching a Tom Cruise movie ("Jack Reacher") on German TV and very surprised to hear Herzog as dubbing voice for a supporting role.
LOL I've got to disappoint you I'm afraid. Actually I recognized the voice but not the actor in that movie, and tbh I wasn't prepared to see Werner Herzog featuring in a Tom Cruise production.
This seems like a step closer to the 'Her' movie AI entity (voiced by Scarlett Johansson) , or more closely, the scene where she converses with a Alan Watts, https://www.youtube.com/watch?v=Exe7yup2Yu0 where in the movie did something that this inifiniteconversation has done in terms of feeding in a deceased philosophers works and voice to create an AI representation of the person.
I once read/heard on a podcast a story of an early AI researcher who was a Chopin fan who expressed a similar feeling upon seeing how easily a computer could algorithmically ape a Chopin composition such that he was unable to tell the difference.
I encountered a weird, butt maybe hilarious bug running this on the Brave browser in iOS. I closed the tap, but the dialog kept going. Only when I went to three webpage again and this time paused it, did the audio stop. Truly infinite conversation. I'm almost sad I was able to turn it off at all.
> But, my God, let me tell you one typical story
and maybe it will help you to think about this.
One of my closest friends is an older Slovene poet and I
respect him very much for his efforts to save poetry and so on.
And at some point in our friendship, without any
reason, he told me that I spoke like a madman.
I said, “What do you mean?” He
answered, “You know, when you talk to people
you speak with such conviction that it sounds like madness.
You must be very careful, because this can become dangerous.” And he was right
as he usually is.
Very recognizable. If you ever took an art course, you will so recognize this rambling.
I got this quote:
I think that, even in this context of crisis,
I see at least a certain redemptive hope in
the fact that there is so much bluff in
Hollywood productions: that they are so artificial and so empty.
It's like sitting in a mental ward and listening to two very potentially intelligent schizophrenics or dementia patients chat. Interesting at times and confusing most of the time.
Probably the best thing since Werner Herzog "Travelbug", a comedy podcast segment (two actually, it was remade) that can be found on YouTube or possibly podcast app, but this here is of course also technically very impressive.
Any follow-ups planned? Schwarzenegger v. Maggie Thatcher? :)
I found it hilarious that Herzog will just go on and on, just like he does in real life.
Also, this bit:
The question is always: What does it mean to be something
what is this identity?
In Europe, we don't ask these questions.
We have no difficulty identifying ourselves.
I love this so much. This is great "white noise" to work over too! I wonder if in the future people who love podcasts will simply have infinite conversations between their favourite synthethic humans run in their headphones.
Can somebody please tell me: is Zizek actually smart? I bought his book “Philosophy for the end times” and only remember understanding a single concept. How is this man comprehensible? What is he respected for?
But for the voice filter, this could be used to build a cultural studies grad student dating simulator. Why would anyone do that? Many ask themselves that same question to this day.
an infinite conversation is such a brilliant angle for how to get at some of the heart of the reality of AI intelligence (always on when the power is on, nearly infinite, ...) vs human... and especially eerie and thought-provoking when you get a feeling for how accurate/similar the generated text is to those people. i am so impressed and jealous. wow. brilliant.
This is quite a bit incoherent. I've been listening for about 5 mins, each side had quite a few turns but Herzog almost invariably talks about how he's not a Marxist, and Zizek almost invariably talks about how cinema should be an open question mark. Like every single turn they rephrase this in some other way. Did I get a particularly bad seed?
Yes. I got Herzog talking about a party he went to where Coppola was standing next to a wall and no one was talking to him so he approached him and asked "Who would want to be a soldier" and Coppola said "Me".
I have had a dream for many
years now, ever since I was a child.
I don't know why, but I always dream that there are giants
on a tiny island in the Pacific and we humans can't see them
but they are there, where the island is.
And then suddenly one of the giants throws a rock,
just like a child would throw a rock into the water
but the rock hits the mainland—the western
coast of North America—and creates a big hole.
A little later, another giant throws his rock and it creates another hole
and then another one, and another one.
This happens all at once, everywhere on the continent, every single continent—Australia, South America
North America.
And you wake up and think, “Oh my God, they are already throwing rocks.
Weird to see such universal support here for Zizek.
Zizek is most likely a three letter agency plant designed to destroy the credibility of the critical theory community. Look at how much Russians and other organization like "big think" give him a voice which includes advocacy for things like "don't act, think".
Buying into zizeks ideas means buying into serial policy failure, and it is a risk - Zizek has ran for office in the Slovenia
Amazing, every word of what you just said is wrong.
Žižek is Slovenian, and while he did afaik sign up to be part of some committee after the war for independence when we had the first free elections, he wasn't elected. And that's a footnote in his bio at best.
He doesn't have to destroy anyone's credibility, he's just well versed enough to see through people's smartly worded bullshit and enough of a troll to stump those people in TV debates so they make themselves look like the morons they are. He's always been like this. Love him or hate him, he sure is fun to watch.