Synthesia is also the name of a much more established, extremely popular midi/piano visualisation software[1]. If you've ever looked up "<song> piano tutorial" on youtube, you've probably seen that program.
It's a shame they chose that name, since it was such a great play on words for the midi software (synesthesia is sound into colorful visuals, and midi uses synths) whereas this product has basically no relation.
Synesthesia isn't specific to just visual experience from auditory stimulus. It really denotes any mix of senses, be it experiencing visuals when listening to music (reported by e.g. LSD users), bodily sensations from colors et.c.
Avoid getting your video rejected. Please make sure you adhere to our content guidelines.
Please keep your script professional and business related. Political, sexual, personal, criminal and discriminatory content will not be tolerated or approved.
Ahh.. the anchor fm problem.. guess I'll need an open source version.
I started toying with libreBot I think it's called - which allows you to do anything you want with these things if you self-host license for a grand I think it was.
This synthesia didn't even get the first sentence I tried. It also requires a 'business email' and agree to terms that includes "I agree to receive occasional product information as per Synthesia Privacy Policy *"
trying hard to keep the genie in the bottle aren't they.
Bot Libre [1] "allows anyone to create their own chat bot for free, including free hosting, even for commercial bots." Paid plans [2] provide more bot interactions, larger knowledge bases, and design support. You can download the free community edition [3] to run on your own server.
yes this is the one! - I have kicked around a few tutorials on how to manipulate the digital avatars add-in.. So far I'm loving everything I've found with this one.
I may end up hiring some merc help to get it self hosted and running with some customized avatars.. the custom digital avatar stuff can be really complex if you want it to be - which I like - but I know I would spend weeks playing with all that, and someone that knows the digital stage / lighting tools out there can likely slap together what I would need for a small launch project in a day.
While the tech is impressive in itself, still doesn't look to be something I'd pay for. The lip sync is annoyingly off, and the bland expressions that comes from not understanding context make the communication even worse. If having a visual talking head is that important for a project, still seems better to just hire someone.
(On a side note, I'm not sure I understand the appeal of emotionally bland fake-smile talking heads in general, even when they're real.)
Mass influencing with added custom targetting. Or fake id-ing someone else. Honestly I can’t think of anything other than that. It’s just information pollution for the most part probably. Now if they would have worked on a digital sign language avatar, that would be something innovative or useful.
Just wait until big companies start doing it. Then everybody else must do it too. Just look at how people copied the boring corporate web style which is now all over the place.
No I'm not asking if you think you can you use this to make money, I'm asking do you personally want to sit through a video of a robot telling you do things? Are we supposed to believe this is preferable to simply reading this or hearing recorded audio? This is flat out consumer hostility, basically telling your customers to talk to a sock puppet instead of a real person, I hope this fails, I would pay money to make this illegal.
I think their hook is it's only $3/min, which is pennies on the dollar compared to getting a real produced video. It's not useful yet though, because it's creepy.
The website includes a lot of examples. I think making it easier to create marketing videos (including advertisements, but also internal ones) is one use case.
Another one outside marketing is making educational lecture videos is a lot of work even with speech pre-written. Often execution goes through several passes. If we continue in the direction of moocs/online education than making it easier for teachers to make videos is valuable.
Multi language support they have is also big use case.
Employment laws being overbearing, hiring a script reader may be practically illegal (“living wage”, mandatory health care, ... I just need some words read).
I rely on reading lips for a good portion of my hearing comprehension. This might not be better than seeing an actual person speaking, but I'd VASTLY prefer this over hearing recorded audio. Reading is a separate beast, but for boring stuff that you just want to listen to instead of actively reading, I can see this being preferred also.
Jesus. If aliens dropped out of the sky and showed you technology so powerful that it allowed you to restructure reality itself according to your imagination, you would still be like... meh, I wanted rainbows made entirely of blue but that’s clearly aqua marine.
Are sales spam emails going to start including personalized videos? I guess I'll look forward to the "Hello dollar sign firstname. I'm dollar sign agentname. My colleague recommended I connect with you, as you both work at dollar sign employer" template misfires.
I think that's one of those 'logical conclusion' things.
And it won't be emails. It will be the places where video is more appropriate: instagram, facebook, text message video, zoom calls.
Valid use case! Starting with porn is probably a better business idea to be honest. Porn is like a drug, easy to sell, and when it differentiates itself, it stands out. That's why there's stepmom and 3D tentacle porn. It's different, people gain tolerance to average porn. This idea applied to porn would probably have some novelty factor.
Impressive. Funny enough I've started to see those faces appear on YouTube. The intention may be to create these corporate style videos, but I'm counting down the minutes until my aunt starts forwarding questionable things on WhatsApp.
Animations are pretty good. Pronunciation could use some work. There also does not seem to be a way to influence the inflection, which is an absolutely crucial component for sales pitches. It's not so much what you say, but how you say it. Also, the right people have to sell the right things. Words coming from Elon's mouth in regards to cryptocurrency have a far greater effect on market behavior than the exact same words coming from this AI person's mouth.
Uncanny valley meets mixed messages and bad delivery.
The incoherent facial expressions actually manage to confuse the message more than the dissociated pronunciation.... "witch is know small feet".
This tech is a neat trick at this stage but is less useful than just leaving the text as text, in fact adding negative value to an already fully functional process.
Fiver is a better option, and I would not recommend that.
For an interesting and highly unethical experiment, someone should raise a thousand infants with this drivel and see what happens...I’m going to posit that the result is not good. Children’s narrations is exactly where this is headed though, I can see this as a multimillion view no effort YouTube babysitter.
Children find a pleasant, smiling female face soothing...so this is going to be another way that the dollar and human laziness will use AI to make the world a slightly worse place.
What awful comments here, you're all criticizing something really exciting. Of course AI can't beat real humans, what do you expect? But it's closer we've ever been, especially since is available to consumers. People in sales and marketing know how valuable is this on improving conversion rates... if you're not in those fields, that's not for you, saying something it useless just because you have no knowledge in other domains, it's highly ignorant.
If I think about things that should be automatized with robots in this world my number one priority would be boring or exhausting work like repairing roads, mine work, deliver goods from A to B. Why should we replace the "human" part in human-to-human communication with robots when social interaction is one of the pillars that define us? I'd rather speak with an unfriendly real person in a call center than an emotionless "friendly" robot.
But yes, if "improving conversion rates" is your main priority, it may be helpful.
Sure, or this is what came out of an effort by some startup to monetize existing ML technologies in a low hanging fruit use case of generating a deep fake. Might not be the revolution we're all waiting for.
Wow this feels like a blast from the past. There used to be a service that did exactly this (little help chats with "AI" generated voices), in the mid 2000s but instead of having human avatars they were animated. Seeing the woman speak immediately unlocked a memory in my kid brain.
Fantastic technology and I love that the videos look and sound super lifelike. The face looks like most instagram influencers with vanilla broad-appeal pretty faces, which I guess is the style these days.
But what’s the point?
If you’re gonna send someone a soulless corporate drone video, is that really better than a soulless corporate email? I thought the goal of doing video was that it’s more personable and human ... an AI video doesn’t quite hit those goals does it?
This could be the face of an AI personal assistant, with GPT-3 like dialogue skills and using web search to find relevant information in order to solve its tasks.
>video was that it’s more personable and human ... an AI video
Can be made even more personable and human and can be customized not just for broad appeal, it can be individually customized for the appeal to the target person based say on the target's profile, browsing history, etc. Similar to how Cambridge Analytics did for text based messages.
Pretty good.... but not quite there yet, in my humble opinion.
The lips, eyes, and facial features move in natural ways, but the head remains frozen in a somewhat unnatural manner. It's just inside the uncanny valley, with barely perceptible creepiness.
I would hope to see improvements to make face/neck movements look more natural, to overcome these issues over time!
There's something quite cyberpunk about smiling AI-generated corporate headshot faces extolling the wonders of <insert product here>. And I don't mean that in a good or bad way. I imagine we'll start seeing these all over the place quite soon.
I mean, combine it with GPT-3 and you've got something that's nearly science fiction. Really interested to see where this goes.
I want to see her on my wall, every day, bald, with green eyes. Spouting Shakespearean slurs at Alexa, then following up with some Rumi poetry, and a dash of Allan Watts..all powered by a Markov chain.
rel. given a script, "generating all aspects of a cinematic scene, including staging, acting, editing, framing and lighting in Assassin's Creed Odyssey."
Oh, they created the Lionel Messi too? It was AWFUL. Messi's voice was so far from his real voice that we created a few videos as a joke with my friends.
What's the point of using AI if it needs to be manually reviewed? I suppose the outputs are also manually reviewed as well to keep from the AI going rouge?
People don't want to talk to computers, that's why chatbots (in their current form) fail one after the other. People also don't want to listen to emotionless robots. As long as this technology is not 100% accurately mimicking a human, the Uncanny valley effect will kick in and just leave an uncomfortable feeling.
I know I'll will probably sound a bit Luddite by saying this, but just the examples already make me cringe: a welcoming video for a corporation saying "we're looking forward to have you here", narrated by a _bot_, is as dehumanizing as it gets. :(
Impressive, but not quite good enough to avoid the 'uncanny valley' - the lips are not perfectly synced to the audio. Also it should allow a way stress certain words in the input script.
So, a bit curious on how this factors in emotions and depth that could vary depending on the nature of the video [onboarding vs launch videos, say]? And, how to not run out of options for voice/person selection. It shouldn't end up being like the stock images (same faced used in multiple brands). How well of a brand identity gets maintained for say paying customers?
>> Synthesia lets you create great business videos in minutes. Say goodbye to actors, film crews and expensive equipment.
Yay! At last! And when we've automated away everyone's work, also say goodbye to synthesia and every other automation service, because there's no business left to use it. Woo-hoo, future world, here I come!
A really creepy use case for this would be to combine it with one of those IP-to-company name lists. If you visit a vendor it could play a video greeting you by mentioning your business name. “Click here to learn what we can do for Acme Industries!”
Again, super creepy and not really clear if it would drive engagement.
Wow, the Portuguese pronunciation, intonation and lipsync are incredibly accurate, 10x more so than the English voice. I wonder if that's true for other latin-ish languages and if that means those languages are easier to learn.
I think in general the quality is quite good, but the characters lack personality. I think that is the opportunity. Create something with more lively movement. Think the Sham-wow guy.
Anybody can stand blankly in front of a camera without emotion. But this is an impressive start.
I love it 1000%. Need to create videos for a new crypto. This helps translate the videos to 10 different languages and kick off a global service. It's not perfect but it's fast and looks very professional.
Would have been interresting to try out but unfortunately, the email prompt ended my evaluation. A lot of people will probably stop there and move on as well.
Aw man, it kind of made it seem like it would be generated fast, but then you find out after putting in your information that it requires manual review.
Video is a way more effective way to communicate than text. Not for the HN crowd, but if you're a blue collar worker a 2 minute video in your native language is much preferred to a 5 page pdf for training.
Anyone who has tried to record a simple corporate video know the pain of cameras, film crews, 25 takes to get one that works and post production. Cumbersome, slow and multidisciplinary. By the time the video is done the content is out of date.
Synthetic video is not yet at the quality of real video. Eventually it will be. But the mistake many are making here is comparing it to real video; it should be compared with text.
In X years we'll be able to make Hollywood films on a laptop without needing anything but time and imagination. Just like we can digitally compose music in Ableton, create images in Photoshop and type novels on keyboards rather than with pen and paper.
My (obviously biased;)) belief is that synthetic media will eventually become foundational technology that will move media production from cameras/microphones to API's. We'll be able to do all kind of things we couldn't do before.
Eg. personalized and interactive rich media, video-driven chatbots and eventually Hollywood blockbusters made by your favourite YouTuber from his or her bedroom.
---> Uncanny valley
Simulating real video is incredibly hard. We're constantly improving and launching more expressive synthesis soon.
From our tests with some of our largest clients 8/10 people don't realise it's a synthetic video (unless they are asked to look for it).
---> Tech
Has been developed over the last 3 yrs. Origins/team from Stanford/UCL/TUM.
Learning: Going from research to working, scaleable product is hard and takes time. But very rewarding when it works.
Bad actors will do bad things with synthetic media. Like with any other technology from smartphones to cars. We're moderating all content and building safeguards and verification + working with FAANG and others on detection and provenance technology.
The Snoop Dogg advertisement rebranding case study was pretty impressive to me, since there were obvious savings from reuse. Neat to see how this technology could be integrated in a subtle way with other editing techniques.
It seems to me that this technology could have immediate application to dubbing over curse words in movies (since that's already done in a not so subtle way today).
The next step I see in that progression is full dubbing for translation, which already exists in a very conspicuous form. The old meme about out of sync karate movie dubs comes in mind.
How close do you think this technology is to use for syncing lips in Hollywood tier movie dubs using real voice actors? What are the main obstacles left to achieving that?
It's a shame they chose that name, since it was such a great play on words for the midi software (synesthesia is sound into colorful visuals, and midi uses synths) whereas this product has basically no relation.
[1] https://synthesiagame.com/