Hacker News new | past | comments | ask | show | jobs | submit login
Create AI videos by simply typing in text (synthesia.io)
256 points by vladoh on May 29, 2021 | hide | past | favorite | 110 comments



Synthesia is also the name of a much more established, extremely popular midi/piano visualisation software[1]. If you've ever looked up "<song> piano tutorial" on youtube, you've probably seen that program.

It's a shame they chose that name, since it was such a great play on words for the midi software (synesthesia is sound into colorful visuals, and midi uses synths) whereas this product has basically no relation.

[1] https://synthesiagame.com/


> this product has basically no relation

Sure it does. It synthesizes video from text.


Ok, but is there anything that connects it to the word synthesia beyond a connection to the word synthesis? Synthesia means something more specific.

Is there anything other than the “synthes” that synthesis shares with synthesia , connecting this product to synthesia in particular?


Is synthesia an actual word? I'm not able to find any reference to it that isn't the piano software or this. Are you thinking of synesthesia?


I am thinking of synesthesia, yes.

Thought the piano software was named after that word. Didn’t check the spelling.


“Synthesia” is probably an amalgam of “synthesis,synthetic” and “synesthesia”. You would really have to ask the author to find out.

Edit: It's worth pointing out that while the words look quite alike, they don't mean the same thing,

synthesis: to put together

aesthesia: ability to feel or perceive sensations


Synesthesia isn't specific to just visual experience from auditory stimulus. It really denotes any mix of senses, be it experiencing visuals when listening to music (reported by e.g. LSD users), bodily sensations from colors et.c.


Avoid getting your video rejected. Please make sure you adhere to our content guidelines. Please keep your script professional and business related. Political, sexual, personal, criminal and discriminatory content will not be tolerated or approved.

Ahh.. the anchor fm problem.. guess I'll need an open source version.

I started toying with libreBot I think it's called - which allows you to do anything you want with these things if you self-host license for a grand I think it was.

This synthesia didn't even get the first sentence I tried. It also requires a 'business email' and agree to terms that includes "I agree to receive occasional product information as per Synthesia Privacy Policy *"

trying hard to keep the genie in the bottle aren't they.


Bot Libre [1] "allows anyone to create their own chat bot for free, including free hosting, even for commercial bots." Paid plans [2] provide more bot interactions, larger knowledge bases, and design support. You can download the free community edition [3] to run on your own server.

[1] https://www.botlibre.com/

[2] https://www.botlibre.com/upgrade.jsp

[3] https://www.botlibre.com/download.jsp


yes this is the one! - I have kicked around a few tutorials on how to manipulate the digital avatars add-in.. So far I'm loving everything I've found with this one.

I may end up hiring some merc help to get it self hosted and running with some customized avatars.. the custom digital avatar stuff can be really complex if you want it to be - which I like - but I know I would spend weeks playing with all that, and someone that knows the digital stage / lighting tools out there can likely slap together what I would need for a small launch project in a day.


Worked fine for me, just used tempmail https://share.synthesia.io/4cc3f6d7-dfdb-49b3-b16d-9c02ab702... This totally cracked me up.


Curse the fun police, I wanted to make a virtual corporate avatar recite gangsta rap lyrics.


This sounds like exactly the sort of thing Omega Mart would want to experiment with. Though perhaps they might feel an AI would be overkill this case…


While the tech is impressive in itself, still doesn't look to be something I'd pay for. The lip sync is annoyingly off, and the bland expressions that comes from not understanding context make the communication even worse. If having a visual talking head is that important for a project, still seems better to just hire someone.

(On a side note, I'm not sure I understand the appeal of emotionally bland fake-smile talking heads in general, even when they're real.)


Mass influencing with added custom targetting. Or fake id-ing someone else. Honestly I can’t think of anything other than that. It’s just information pollution for the most part probably. Now if they would have worked on a digital sign language avatar, that would be something innovative or useful.


Would robotic AI sign language avatar be better than just text?

Would it not just have the same issue the existing product has for non-deaf people?


It’s just something I tried to come up with. Instead of a caption, a digital avatar giving context might be more useful ?


I suspect there's a whole new wave of content farming coming ...


You see the flaws and you're right, those are room for improvements. It needs to feel more human, and it's slowly getting there.

Use cases are education, museums, games, sweet sweet jesus


Just wait until big companies start doing it. Then everybody else must do it too. Just look at how people copied the boring corporate web style which is now all over the place.


Can you think of one good use for this product?

No I'm not asking if you think you can you use this to make money, I'm asking do you personally want to sit through a video of a robot telling you do things? Are we supposed to believe this is preferable to simply reading this or hearing recorded audio? This is flat out consumer hostility, basically telling your customers to talk to a sock puppet instead of a real person, I hope this fails, I would pay money to make this illegal.


I think their hook is it's only $3/min, which is pennies on the dollar compared to getting a real produced video. It's not useful yet though, because it's creepy.


The website includes a lot of examples. I think making it easier to create marketing videos (including advertisements, but also internal ones) is one use case.

Another one outside marketing is making educational lecture videos is a lot of work even with speech pre-written. Often execution goes through several passes. If we continue in the direction of moocs/online education than making it easier for teachers to make videos is valuable.

Multi language support they have is also big use case.


Employment laws being overbearing, hiring a script reader may be practically illegal (“living wage”, mandatory health care, ... I just need some words read).


I rely on reading lips for a good portion of my hearing comprehension. This might not be better than seeing an actual person speaking, but I'd VASTLY prefer this over hearing recorded audio. Reading is a separate beast, but for boring stuff that you just want to listen to instead of actively reading, I can see this being preferred also.


Yes, interactive language exercises. Use it to create tailored dialogues, like the Pimsleur tapes.


The TTS is fairly good, but it's still TTS. Imo, real native-speaker content is a must for any language program worth it's salt.


Here's the cookie text if you are lazy to read it...it sounds a bit creepy: https://share.synthesia.io/a4159eee-f70b-4318-a8bc-ec0fdf6af...


Thanks for posting this. I wanted to try this but didn't want to give out my info.

Honestly, I'm not impressed.


I would pay for any product that impresses you if this didn't.


The product is alright, the annoying requirements to sign up, receive spam and a long list of forbidden content isn't.


Jesus. If aliens dropped out of the sky and showed you technology so powerful that it allowed you to restructure reality itself according to your imagination, you would still be like... meh, I wanted rainbows made entirely of blue but that’s clearly aqua marine.


That jarring transition when she says the word "interact" reminds me of the glitches from Max Headroom.


Her mouth missed multiple consonants, especially jarring was the "t" in "about".


Are sales spam emails going to start including personalized videos? I guess I'll look forward to the "Hello dollar sign firstname. I'm dollar sign agentname. My colleague recommended I connect with you, as you both work at dollar sign employer" template misfires.


I think that's one of those 'logical conclusion' things. And it won't be emails. It will be the places where video is more appropriate: instagram, facebook, text message video, zoom calls.


So where's the version that allows NSFW content? Can't be the only one who wanted to test this with erotica.


Valid use case! Starting with porn is probably a better business idea to be honest. Porn is like a drug, easy to sell, and when it differentiates itself, it stands out. That's why there's stepmom and 3D tentacle porn. It's different, people gain tolerance to average porn. This idea applied to porn would probably have some novelty factor.


It must be available somewhere, in order to satisfy the rule 34.


Impressive. Funny enough I've started to see those faces appear on YouTube. The intention may be to create these corporate style videos, but I'm counting down the minutes until my aunt starts forwarding questionable things on WhatsApp.


https://share.synthesia.io/d8860a05-2870-4315-9316-b03cbc76a...

Animations are pretty good. Pronunciation could use some work. There also does not seem to be a way to influence the inflection, which is an absolutely crucial component for sales pitches. It's not so much what you say, but how you say it. Also, the right people have to sell the right things. Words coming from Elon's mouth in regards to cryptocurrency have a far greater effect on market behavior than the exact same words coming from this AI person's mouth.


Uncanny valley meets mixed messages and bad delivery.

The incoherent facial expressions actually manage to confuse the message more than the dissociated pronunciation.... "witch is know small feet".

This tech is a neat trick at this stage but is less useful than just leaving the text as text, in fact adding negative value to an already fully functional process.

Fiver is a better option, and I would not recommend that.

For an interesting and highly unethical experiment, someone should raise a thousand infants with this drivel and see what happens...I’m going to posit that the result is not good. Children’s narrations is exactly where this is headed though, I can see this as a multimillion view no effort YouTube babysitter.

Children find a pleasant, smiling female face soothing...so this is going to be another way that the dollar and human laziness will use AI to make the world a slightly worse place.


What awful comments here, you're all criticizing something really exciting. Of course AI can't beat real humans, what do you expect? But it's closer we've ever been, especially since is available to consumers. People in sales and marketing know how valuable is this on improving conversion rates... if you're not in those fields, that's not for you, saying something it useless just because you have no knowledge in other domains, it's highly ignorant.


If I think about things that should be automatized with robots in this world my number one priority would be boring or exhausting work like repairing roads, mine work, deliver goods from A to B. Why should we replace the "human" part in human-to-human communication with robots when social interaction is one of the pillars that define us? I'd rather speak with an unfriendly real person in a call center than an emotionless "friendly" robot.

But yes, if "improving conversion rates" is your main priority, it may be helpful.


If the only exciting thing about it is potential for more manipulative marketing, then "awful" comments feel warranted.


Sure, or this is what came out of an effort by some startup to monetize existing ML technologies in a low hanging fruit use case of generating a deep fake. Might not be the revolution we're all waiting for.


Wow this feels like a blast from the past. There used to be a service that did exactly this (little help chats with "AI" generated voices), in the mid 2000s but instead of having human avatars they were animated. Seeing the woman speak immediately unlocked a memory in my kid brain.


I was aware of "Sitepal" since before 2010.

https://www.sitepal.com/demo


Had the same thought, it made video's like this: https://invidious.fdn.fr/watch?v=FL7yD-0pqZg

Looks like it's still a thing as well: https://www.nawmal.com/


Fantastic technology and I love that the videos look and sound super lifelike. The face looks like most instagram influencers with vanilla broad-appeal pretty faces, which I guess is the style these days.

But what’s the point?

If you’re gonna send someone a soulless corporate drone video, is that really better than a soulless corporate email? I thought the goal of doing video was that it’s more personable and human ... an AI video doesn’t quite hit those goals does it?


This could be the face of an AI personal assistant, with GPT-3 like dialogue skills and using web search to find relevant information in order to solve its tasks.


>video was that it’s more personable and human ... an AI video

Can be made even more personable and human and can be customized not just for broad appeal, it can be individually customized for the appeal to the target person based say on the target's profile, browsing history, etc. Similar to how Cambridge Analytics did for text based messages.


Here’s a sample video with a custom script produced earlier https://share.synthesia.io/4b75b584-9b3b-4a96-86c2-6b34b8711...


Pretty good.... but not quite there yet, in my humble opinion.

The lips, eyes, and facial features move in natural ways, but the head remains frozen in a somewhat unnatural manner. It's just inside the uncanny valley, with barely perceptible creepiness.

I would hope to see improvements to make face/neck movements look more natural, to overcome these issues over time!


There's something quite cyberpunk about smiling AI-generated corporate headshot faces extolling the wonders of <insert product here>. And I don't mean that in a good or bad way. I imagine we'll start seeing these all over the place quite soon.

I mean, combine it with GPT-3 and you've got something that's nearly science fiction. Really interested to see where this goes.


The eyes aren't quite right and sometimes.thr voice is a little off, but I probably wouldn't notice in a real world setting without prior knowledge.


I want to see her on my wall, every day, bald, with green eyes. Spouting Shakespearean slurs at Alexa, then following up with some Rumi poetry, and a dash of Allan Watts..all powered by a Markov chain.


Very close but not quite human. A text book example of the uncanny valley https://en.m.wikipedia.org/wiki/Uncanny_valley


rel. given a script, "generating all aspects of a cinematic scene, including staging, acting, editing, framing and lighting in Assassin's Creed Odyssey."

https://youtube.com/watch?v=DFM5zbekZ7c hour-long dev talk (GDC)


Their David Beckham video is pretty good https://www.synthesia.io/post/david-beckham


Oh, they created the Lionel Messi too? It was AWFUL. Messi's voice was so far from his real voice that we created a few videos as a joke with my friends.


What's the point of using AI if it needs to be manually reviewed? I suppose the outputs are also manually reviewed as well to keep from the AI going rouge?


I believe the selling point is that it's cheaper than paying a person to act, where you'd still need to manually review it anyway.


People don't want to talk to computers, that's why chatbots (in their current form) fail one after the other. People also don't want to listen to emotionless robots. As long as this technology is not 100% accurately mimicking a human, the Uncanny valley effect will kick in and just leave an uncomfortable feeling.


Here is an instructional reading of advice I gave my friend over text on how to use enzymatic cleaner should his new kittens have an accident:

https://share.synthesia.io/2761933d-4ec7-48c7-b67e-85fc9d686...


I know I'll will probably sound a bit Luddite by saying this, but just the examples already make me cringe: a welcoming video for a corporation saying "we're looking forward to have you here", narrated by a _bot_, is as dehumanizing as it gets. :(


Interesting. I hope the models were paid adequately, considering that they can now use them effectively for free infinitely.

Reminds me of the movie The Congress.

Obviously this technology has a long way to go, but it seems that that actors should feel less secure about their jobs being resistant to automation.


Ari has really good foresight/insight on media. I thought the commodification aspect was a little thick/cheesy att. Now, I think it's dead on.

As more things become intellectual property, the tendency of property to pool becomes severe.


Founder here

Actors get a % rev share + upfront fee to work with us :)


These videos are incredibly life like. I can see many virtual companions being made with this.


Glad you like it!


Impressive, but not quite good enough to avoid the 'uncanny valley' - the lips are not perfectly synced to the audio. Also it should allow a way stress certain words in the input script.


So, a bit curious on how this factors in emotions and depth that could vary depending on the nature of the video [onboarding vs launch videos, say]? And, how to not run out of options for voice/person selection. It shouldn't end up being like the stock images (same faced used in multiple brands). How well of a brand identity gets maintained for say paying customers?


Ah dang, I pasted some literal Lorem Ipsum in to see how it would sound from the AI, and it just puts you through an invite funnel. Oh well.


>> Synthesia lets you create great business videos in minutes. Say goodbye to actors, film crews and expensive equipment.

Yay! At last! And when we've automated away everyone's work, also say goodbye to synthesia and every other automation service, because there's no business left to use it. Woo-hoo, future world, here I come!


1 - We will review your video 2 - You will receive your video in your email 3 - You will receive an account creation invite

What a great sample.


Did you use a throwaway email too? I'm wondering if using one triggered a manual verification check


A really creepy use case for this would be to combine it with one of those IP-to-company name lists. If you visit a vendor it could play a video greeting you by mentioning your business name. “Click here to learn what we can do for Acme Industries!”

Again, super creepy and not really clear if it would drive engagement.


Wow, the Portuguese pronunciation, intonation and lipsync are incredibly accurate, 10x more so than the English voice. I wonder if that's true for other latin-ish languages and if that means those languages are easier to learn.


I think in general the quality is quite good, but the characters lack personality. I think that is the opportunity. Create something with more lively movement. Think the Sham-wow guy.

Anybody can stand blankly in front of a camera without emotion. But this is an impressive start.


Will not demo anything that requires me to put in that much of my data to try their product.


I love it 1000%. Need to create videos for a new crypto. This helps translate the videos to 10 different languages and kick off a global service. It's not perfect but it's fast and looks very professional.


Groups like nxivm are going to do strange things with this tech in the future.


The require agreeing to sending promotional emails before creating the video.


Would have been interresting to try out but unfortunately, the email prompt ended my evaluation. A lot of people will probably stop there and move on as well.


Aw man, it kind of made it seem like it would be generated fast, but then you find out after putting in your information that it requires manual review.


I'm more stunned by the good speech synthesis than by the already good visuals.

Does anyone know what's under the hood for the text to speech?


No thanks. I don't like having to give you all of this personal information you really don't need in order to try your product.


Founder here. AMA :)

To answer a few recurring questions in the thread

---> Use case.

Video is a way more effective way to communicate than text. Not for the HN crowd, but if you're a blue collar worker a 2 minute video in your native language is much preferred to a 5 page pdf for training.

Anyone who has tried to record a simple corporate video know the pain of cameras, film crews, 25 takes to get one that works and post production. Cumbersome, slow and multidisciplinary. By the time the video is done the content is out of date.

Synthetic video is not yet at the quality of real video. Eventually it will be. But the mistake many are making here is comparing it to real video; it should be compared with text.

In X years we'll be able to make Hollywood films on a laptop without needing anything but time and imagination. Just like we can digitally compose music in Ableton, create images in Photoshop and type novels on keyboards rather than with pen and paper.

My (obviously biased;)) belief is that synthetic media will eventually become foundational technology that will move media production from cameras/microphones to API's. We'll be able to do all kind of things we couldn't do before.

Eg. personalized and interactive rich media, video-driven chatbots and eventually Hollywood blockbusters made by your favourite YouTuber from his or her bedroom.

---> Uncanny valley

Simulating real video is incredibly hard. We're constantly improving and launching more expressive synthesis soon.

From our tests with some of our largest clients 8/10 people don't realise it's a synthetic video (unless they are asked to look for it).

---> Tech

Has been developed over the last 3 yrs. Origins/team from Stanford/UCL/TUM.

Learning: Going from research to working, scaleable product is hard and takes time. But very rewarding when it works.

[1] https://www.youtube.com/watch?v=ohmajJTcpNk [2] https://www.youtube.com/watch?v=qc5P2bvfl44

---> Bad uses

Bad actors will do bad things with synthetic media. Like with any other technology from smartphones to cars. We're moderating all content and building safeguards and verification + working with FAANG and others on detection and provenance technology.

Recommended read - deepfakes perfectly follow the story arc of any new, powerful technology: https://journals.sagepub.com/doi/full/10.1177/17456916209193...

---> Actors

Real actors getting rev share + upfront free from every video generated with their likeness. Like being a stock photo actor.


The Snoop Dogg advertisement rebranding case study was pretty impressive to me, since there were obvious savings from reuse. Neat to see how this technology could be integrated in a subtle way with other editing techniques.

It seems to me that this technology could have immediate application to dubbing over curse words in movies (since that's already done in a not so subtle way today).

The next step I see in that progression is full dubbing for translation, which already exists in a very conspicuous form. The old meme about out of sync karate movie dubs comes in mind.

How close do you think this technology is to use for syncing lips in Hollywood tier movie dubs using real voice actors? What are the main obstacles left to achieving that?


I’m deeply interested in synthetic media but it’s hard to believe there is a shortage of people who want to be video presenters.


Not that different from being a regular stock photo/video actor.


I created a step-by-step tutorial, but the voice still sounds too robotic. Unfortunately it doesn't inspire trust to users.


Gonna have dynamic open world video games too, where custom cut scenes can play based on your characters actions.


Is this based on a paper/demo previously posted on HN? A vaguely remember seeing the faces elsewhere.


Founder here

Maybe – one of my co-founders is Prof Matthias Niessner who's been behind a large chunk of the seminal and widespread research in this space.

[1] https://www.youtube.com/watch?v=ohmajJTcpNk [2] https://www.youtube.com/watch?v=qc5P2bvfl44


It forces you to select that option to receive promotional emails from them before submiting a script.


This site does not let me try the demo without giving them the permission to send spam eMails.


After filling out the recaptcha I cannot scroll to the submit button on mobile safari.


Warning: you have to agree to receive marketing emails from them


Just give me the ability to be offensive. Who are you to stop me?


Scripts require manual review. It's not automated


The sample videos made me incredibly uncomfortable


You want my email to try it out? Hard pass.


Just use temp-mail.org or something similar.


The lack of empathy in her voice is chilling


This is really cool.


$3/minute of video seems a bit steep. $180/hour of video.


I would rather watch paint dry than an hour of this uncanny valley... thing.


Welcome to the new economy.




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: