Hacker News new | past | comments | ask | show | jobs | submit login
MagicEdit: High-fidelity temporally coherent video editing (magic-edit.github.io)
261 points by lnyan on Aug 29, 2023 | hide | past | favorite | 98 comments



Looks like its building on the same concepts as stable video.

https://github.com/rese1f/StableVideo


Interesting enough ByteDance (TikTok/Douyin) seems to be behind it.


Coming soon to TikTok filters (if it doesn't already play a role in some of them)


ByteDance does a lot of ML research on music/video.


Their video editor CapCut is really good.


Their multi-object tracker ByteTrack is also one of the best available.


When I click on the code link, there is no code, just the same images and an assets folder with 1 gif file.

Why is this even on HN? There are zero details of how its built, or what the source looks like. Am I missing something?


there is a link to the paper (https://arxiv.org/abs/2308.14749)


This looks like a nice improvement on current video to video techniques: https://stable-diffusion-art.com/video-to-video/


Site doesn't work correctly for me on Edge. Every row has at least one disappear after a fraction of a second when loading the page.


Imagine in 5-10 years where, just like people make video games all by themselves or with a small group of people, people can make their own movies that rival Hollywood productions, for a fraction of the cost as there's no need to hire anyone. When the output is just pixels on a screen and you can manipulate the placement of every pixel, it's really no different to Hollywood making them with real actors and crew members or someone drawing them, as is the case with current animation methods. Now, we can do the same process as drawing each frame but then make it look photorealistic.


Things will certainly change. However, I don't think professional studio productions will go away.

In the age of MP3's and streaming services, "anyone" can put their music out there. And while many do, the most popular and successful music still comes from major labels.

With Amazon's distribution platform and tooling for self-published authors, "anyone" can write a book. And while many do put out some great stuff, the overwhelming majority of books that don't come from major publishers are shovelware.

As shitty of a curator as Hollywood might be, for most people there is value-add in curation.


It's identical to the current state of the publishing world. They've already gone through this transition: the marginal cost to "publish" a book is zero. However to "publish a book that will people will read" is a different question altogether and is still very much gatekept by the major publishing houses.


There's still a lot of value in the publishing process. Yes, the gatekeeping is inequitable and suboptimal, but "publishing" is more than simply formatting a manuscript and making it available in retail channels. And a competent editor is doing much more than simply proofreading.


And aside from Kindle, even the 'formatting the manuscript' part is very much underestimated by most self-publishers. Professional-grade typesetting and cover design makes a great deal of difference to reading comfort, shelf appeal, and overall perception of value.


like what?


I'm finding that fanfic seems quite good (though the fan-service is a bit glaring) these days and it's entirely free.


Doesn’t TikTok solve the distribution problem in a way that’s not currently possible with published literature?


In a manner of speaking. TikTok solves the distribution problem for short videos, but not TV shows or movies. As the time to consume goes up, you face more of a gamble on whether consuming that piece of media will be "worth it". Books are the most extreme form of the problem.


> It's identical to the current state of the publishing world. They've already gone through this transition: the marginal cost to "publish" a book is zero. However to "publish a book that will people will read" is a different question altogether and is still very much gatekept by the major publishing houses.

Not at all identical.

Writing and selling a book only requires a text editor (or typewriter), maybe a copy editor, and maybe a publicist. It's been that way for the better part of a century.

Present-day films require enormous capital, equipment, logistics, personnel, pre-production, post-production, and marketing.

Film is about to be wholly transformed by several orders of magnitude of cost, talent, personnel, and logistics reductions. All at the same time.

Kids at home will have access to this tech and grow up on it.


That already happened. Cameras, lights and a computer that can edit are hundreds of time cheaper than they were a few decades ago. Kids can make videos, but they aren't making movies anyone wants to watch.


You always take the contrarian opinion to my comments, CyberDildonics.

> That already happened. Cameras, lights and a computer that can edit are hundreds of time cheaper than they were a few decades ago.

We're not talking about the same thing. None of these trends towards fast and dirty filmmaking incorporated Gen AI. You couldn't get Hollywood-quality out of an Android phone and a boom microphone. That will change.

> Kids can make videos, but they aren't making movies anyone wants to watch.

They're certainly watching each other's content. YouTube is filled with lots of young creators with enormous audiences. That trend will continue to grow.


You couldn't get Hollywood-quality out of an Android phone and a boom microphone.

You can shoot video that looks as good or better than 35mm film from the 90s on a video camera that is in the low hundreds of dollars, if that. People can go out and make movies, but that doesn't mean they are making things people want to watch.

I think you are underestimating how detailed and exact film making really is for "hollywood" quality. You need extreme directability at every stage of the pipeline.

People being able to use a system that creates something decent but makes you accept whatever you get has existed for a long time through animations with game engine characters.


You can direct the generation already for Stable Diffusion with ControlNet and OpenPose (a pose library), it's not too far off to rig an animation of a certain movement and then have the AI generate the exact movements you want. If you look at something like ComfyUI, there is an entire node-based workflow akin to game development software, it's not a "type prompt, get output" process at all, as the AI can't read your mind (yet), so you must put in the effort to get out what you want. It's really not much different than making an animated or CGI film, only now the CGI is photorealistic.


You are making the mistake of equating incremental progress with the most extreme litmus test for quality. Posing is great, but again in a system with a lot of automation, once what you want isn't coming out automatically, you are sunk.

Already I could see concept art, animatics, previs and other areas benefiting a lot from these tools. Almost nothing so far is temporally coherent or actually final quality.

There is still a lot of use for that, but to replace actual final shots in films everything has to be perfect. The lighting of every element has to match, the perspective has to match, the model and deformations have to be perfect etc. That isn't going to happen with 2D techniques that hallucinate details from other photos.


> You are making the mistake of equating incremental progress with the most extreme litmus test for quality

Sure, but at the same time don't think that incremental progress cannot get to a superlative product, either. A translator app might improve incrementally but when it gets to 99.9% accuracy, the way people use it transforms from a mere tool to being relied upon everywhere they might need it. We see this with every technology today, from cars to smartphones.

We already have bad CGI that most except the most eagle-eyed moviegoers will not notice or care about, if AI generation tech improves such that you can trust the output 99.9% of the time, then AI generation will be what is predominantly used. It's simply good enough and much cheaper than manually rigging models or filming actors (nevermind that Photoshop and video editing style tools are already being made for AI generation that will get you the exact output you want, more often than not).


A translator app might improve incrementally

A translator needs to convert some text to another that has the same information.

I think you are misunderstanding what happens with the current image generation stuff out there. You give it text, it gives you something that might look plausible for what you described. It ends up full of artifacts and temporally extremely unstable.

We already have bad CGI that most except the most eagle-eyed moviegoers will not notice or care about,

I don't know what you mean by this exactly. CGI in movies takes an extreme amount of labor and every shot out of the hundreds of thousands that go in to a film are very exacting, being directed to a detail level that people can't notice while watching a movie in real time.

Your comment basically boils down to "what if it gets good enough, then it will be good enough".

Remember that the original claim here was that "kids will make hollywood blockbusters in their bedrooms".

Some people saw early cars and predicted they would fly too. You could also say that phone cameras have gotten much better and they will eventually replace "hollywood cameras".

Show me something where you can keep typing in text to change the image, move the lights, move the camera and have it be temporally stable. Then you might have a tool that has a chance to be used for a final image and even then, it's still a far cry from "kids making a hollywood blockbusters in their bedrooms".


> I think you are misunderstanding what happens with the current image generation stuff out there. You give it text, it gives you something that might look plausible for what you described. It ends up full of artifacts and temporally extremely unstable.

Have you used ComfyUI with inpainting? From what you're saying, I don't believe you have, or perhaps the last time you used image generation was with Midjourney or another "type in text and get an output" tool. In reality, the field has evolved to contain entire workflows such that you can get something with no artifacts, it just takes time and effort, as with other sorts of art too.

> CGI in movies takes an extreme amount of labor and every shot out of the hundreds of thousands that go in to a film are very exacting, being directed to a detail level that people can't notice while watching a movie in real time.

Oh they can notice alright, Corridor Digital on YouTube has videos of them reacting to bad CGI. Even in discussions online I see people complaining about bad CGI, especially in Marvel media. But my point is that those films still make billions of dollars, so at some point most people simply don't care and will watch it anyway.

> *Your comment basically boils down to "what if it gets good enough, then it will be good enough". [...] Show me something where you can keep typing in text to change the image, move the lights, move the camera and have it be temporally stable.

Well, yes, that is my point, that when it gets good enough, kids will be making blockbusters in their bedrooms. I'm not a future seer though so it's not like I can give you an exact timeline, but based off what I know of the field, it's trending towards that. You could say it's incremental progress, but again, at some point it'll be good enough that most people looking won't care about artifacts, just like bad CGI as above. If you want a fully temporally coherent image when moving a virtual camera around, wait a few years.


Have you used ComfyUI with inpainting?

I have, have you? It is high level automation. Go ahead and show me an image where you can move the camera, move the lights and keep it temporally stable.

Oh they can notice alright, Corridor Digital on YouTube

No they don't. I know you think you watching some youtubers kick and scream means you understand an entire industry, but it isn't true. You don't watch a video on brain surgery and then think you understand everything that's going on either and the guys you are talking about are not exactly brain surgeons.

Picking a single bad shot out of hundreds is good for a youtube video but it has nothing to do with anything I was saying and it has nothing to do with noticing all the detail work that goes in to every shot. You didn't even understand what I was saying in the first place.

But my point is that those films still make billions of dollars, so at some point most people simply don't care and will watch it anyway.

People don't care about what? Hundreds of millions spent on CGI? People do care, it is extremely difficult and you have no idea what you are talking about.

kids will be making blockbusters in their bedrooms

If there were any truth to this, kids would be able to at least write the scripts to blockbusters in their bedrooms so show me that first.

"If you want a fully temporally coherent image when moving a virtual camera around, wait a few years."

Based on what? These 2D techniques that were just invented? The fact that people can paint colors and a badly photo shopped looking image comes out?

I have no doubt that people will continue to refine what they are working on, but the claim was "kids will make hollywood blockbusters in their bedrooms". What do you think the actual visual effects people will do with these techniques?

You can actually go back 25 years and see techniques for extracting a foreground from a background called natural image matting. You could look at that and say "in the future no one will need green screens, no one will need outlines, compositing will be automatic" but a quarter of a century later, it is still being worked on and is just another tool.


> Go ahead and show me an image where you can move the camera, move the lights and keep it temporally stable.

Again, I'm not a future seer, and I also never said that this technology exists right now. In fact, I explicitly said that it will take "a few years" until we get to that stage.

Edit: actually, because you asked, I found something just for you: https://www.youtube.com/watch?v=xzTgDvcSRXU

They show it being used for normal photos as well as for AI generated images, seems to work fine. It's not yet at the level of realtime movement of the camera and lights, but something like this also exists: https://v.redd.it/kewv3epujbra1.

You said:

> being directed to a detail level that people can't notice while watching a movie in real time.

I said that people do notice, maybe not most, but yes, some people do notice. If you would like to make a different claim, then do so. It's not about some YouTuber, it's the fact that at least some people do notice bad CGI, while you said "people can't notice" it.

> People do care, it is extremely difficult and you have no idea what you are talking about.

I think you're arguing my point for me, I never said it wasn't difficult. My point was that most people can't tell good CGI from bad, some can, but most people will watch the movie anyway and thus the movies rack up billions of dollars. Not sure what point you're making, that people do care about hundreds of millions of dollars of CGI? I mean sure, like I said, some do care but unlike those who actually work in the field, most don't.

I can similarly write the most beautiful software for a CRUD app, using all the best practices, but similarly users will not care about that, beyond some ancillary benefits like fast loading times.

> kids would be able to at least write the scripts to blockbusters in their bedrooms so show me that first.

People write scripts for fun, yes. I'm not sure what there is to show you, look on fanfic sites or /r/readmyscript or /r/writingprompts. Even further, we already have LLMs today so in the future I doubt that people will be writing scripts from scratch without the use of AI entirely.

> Based on what? These 2D techniques that were just invented? The fact that people can paint colors and a badly photo shopped looking image comes out?

Based on this sentence, I can tell you haven't used the state of the art recently, even if you say you used something like ComfyUI with inpainting.

> I have no doubt that people will continue to refine what they are working on, but the claim was "kids will make hollywood blockbusters in their bedrooms". What do you think the actual visual effects people will do with these techniques?

Did I claim that visual effects people will not continue to make even better blockbusters? It raises the floor up, it does not diminish the ceiling. I think you're arguing something I never even said.

> You can actually go back 25 years and see techniques for extracting a foreground from a background called natural image matting. You could look at that and say "in the future no one will need green screens, no one will need outlines, compositing will be automatic" but a quarter of a century later, it is still being worked on and is just another tool.

Things like Ultimatte with Unreal Engine's virtual sets do just that actually. But even if they didn't, I could say the same thing about any technology, to be honest. Also, I never claimed that AI is not "just another tool."

All in all, it seems like you're arguing against things I never claimed. Where did I say that kids would be automatically making blockbusters in their bedrooms with no input from them whatsoever? My point, again, is that people will be making current blockbuster level content in their houses, that they will use AI tools (yes, tools, since you think I'm arguing that AI is not a tool) to do so, and that they will still need to put in human effort to do so. It does not say anything about future blockbusters with the same tools used by professionals, which will undoubtedly be better.


Again, I'm not a future seer,

Then why do you keep pretending you can predict the future?

I said that people do notice, maybe not most, but yes, some people do notice. If you would like to make a different claim, then do so. It's not about some YouTuber, it's the fact that at least some people do notice bad CGI, while you said "people can't notice" it.

You still don't even understand what I'm saying. There is a huge amount of explicit detail demanded by the people making movies and it all needs to be exact.

People write scripts for fun, yes. I'm not sure what there is to show you

There is nothing to show, because kids aren't writing "hollywood blockbuster scripts" in their bedrooms even though they could just type it out on their computer.

It only seems that easy to something who knows absolutely nothing about writing.

Based on this sentence, I can tell you haven't used the state of the art recently, even if you say you used something like ComfyUI with inpainting.

Or maybe these results aren't as great as you think.

Things like Ultimatte with Unreal Engine's virtual sets do just that actually

They absolutely do not. Ultimatte is not a natural image matting plugin. The virtual set stuff isn't even in the same ballpark as natural image matting, it is a live screen behind people.

https://www.blackmagicdesign.com/products/ultimatte

https://www.google.com/search?q=natural+image+matting

My point, again, is that people will be making current blockbuster level content in their houses

You don't even understand what that means. Lots of people read article headlines and watch 30 second youtube clips, but it is a mistake to buy into so much hype without understanding that new tools are still just a piece of the puzzle.


Alright, seems like you can't or are not articulating what you profess to claim, instead expecting me to read your mind and when I can't, saying "yoU StilL doN't EvEN uNdeRStaNd," and it is clear you don't even use the tools you're dismissing, so I think this is an unproductive conversation for me. Goodbye.


I explained it many times and I will again.

Automation produces something plausible, high end movies need something exact.

Plausible is fine for animatics and previs, not for hundreds of millions of dollars.

The amount of work required is so vast you could automate away 90% of it and it is still out of reach for one person to make a "hollywood blockbuster in their bedroom" just as it is for one person to launch themselves to the moon and make it back.

If you don't believe me, try to make a single shot of CG pool balls on a live action pool table. Nothing but spheres, it should be easy right? Automate some of it with 'AI' if you can.


The majors aren't so major anymore. Many big artists actually have their own labels now. A lot of 'em have distro deals... but it's a very different model. No advances, so the artist pays all recording costs, but they also end up owning the masters and having total creative control.


Taylor Swift _is_ the music industry.


I think it will create a new middle class artist category, much like youtube, commodity video editing software and good 4k cameras have done today. Certain kinds of niche content will become more viable.


>And while many do, the most popular and successful music still comes from major labels.

Isn't that just the result of marketing? Rich Men North of Richmond just came out of nowhere and went organically viral.

>And while many do put out some great stuff, the overwhelming majority of books that don't come from major publishers are shovelware.

How would you know? This is the "you don't know what you don't know" problem. Nobody has read all the books there, it's mathematically impossible I think.

>As shitty of a curator as Hollywood might be, for most people there is value-add in curation.

Mass appeal is lowest common denominator targeting, not value itself. Popularity is only correlated with quality.


> Rich Men North of Richmond just came out of nowhere and went organically viral.

Are you being sarcastic, or do you really know nothing about music promotion?

https://twitter.com/zei_squirrel/status/1691212356364226560?...

Pay a publicist with the right connections five figures and a review of your work will make its way into the New York Times.

Right wing D-listers have a much lower bar.


> In the age of MP3's and streaming services, "anyone" can put their music out there.

Not the same thing. Putting music out still requires a lot of.. artistry. The tools don't do the work for you.

"AI" is heading in a direction where you only need to supply the most rudimentary of ideas and it fills in the rest. At some point you won't even need to supply the ideas as the system will have some concept of what pleases you.


> Things will certainly change. However, I don't think professional studio productions will go away.

You'll still have auteurs such as Wes Anderson shooting film, but the era of $100 million dollar blockbusters is coming to an end.


I think we’ll find, among other things, that making tons of minute decisions is exhausting. Iterating your way to an end result is a way of getting something, but so far, the rule of thumb seems to be that the more control you have over a creative process, the harder it is to not end up with a stiff end product.


We should call this the Chinese Democracy Problem.


Why?


https://en.wikipedia.org/wiki/Chinese_Democracy

A decade of perfectionism resulting in a mediocre product.

That reference going over peoples' heads is just more evidence that I'm really, really old.


> making tons of minute decisions is exhausting.

But what if the software can predict what choices you would make?


You are forgetting that as the tooling gets better and the capability of individuals increases, so too does the capability of a group of individuals working together, using the same tools you would be. You are still not going to match the level of quality as a talented team pooling their experience, time and cognitive abilities to create something. This team can also afford more compute capacity, to the same thing in half the time, or the same time at twice the quality, so to speak.

I am excited at being able to do exactly the things you mentioned, and I definitely plan to. But I dont live under any illusion that the capability of one man will supercede the capabilities of a group that uses the same "equipment" per se.


Maybe, maybe not, not all groups are created equally. I cite in another comment about Stardew Valley being made by one person while, say, EA games are made by teams. I consider the former much more well made than the latter. So too can there be such examples in this sort of AI media generation tool usage.


Yeah, I'm skeptical that generative AI is really that different from other new filmmaking technologies (or new technologies in other creative fields), although it may be premature to say. Will individuals create good films that it wasn't feasible for them to do alone before? Probably. Will that mean large teams are obsolete? Probably not.


Even in animation, there are teams of people involved. Each team specializing in different aspects. This single person hollywood replacement dream is something to be really afraid of, at least as far as the quality of the content. I couldn't imagine watching something that only one person has ever worked on with no input from other people with suggestions/tweaks/edits to improve the product. We've already seen things like True Detective Season 2 that was produced and it had people involved that did not push back.


I played Stardew Valley, a game made entirely by one person, and by all accounts it is one of the highest rated games of all time. Can anyone make shit? Sure, that is true in any medium. But individuals or even small groups of people can do amazing things, if they have the tools. It is not "something to be really afraid of," which I find to be an extremely hyperbolic view.


Stardew Valley is great. But note that the author took a classic approach to solo assets and devlopment: pixel art on a 2d canvas. This is a great game in a well established medium, and the concept itself is part homage to Harvest Moon, originally a 2d title released in 1996.

Contrast this with the fact that Steam is now averaging about 1000 new games per month. [1] There are undoubtedly some excellent games in there that haven't survived the onslaught of choices. Sadly, finding them without either significant marketing by the dev or the right conditions makes it nearly impossible to sift out the gems from the asset flips.

[1] https://steamdb.info/stats/releases/


There are 1000 new games per month and 90% of them are crud, sure. But the end result is that we have more original and innovative games than ever before (and cheaper, too). I certainly wouldn't want to go back to the days of big publisher gatekeeping. Would you?


My original point was meant to nod towards survivorship bias. I won't argue that choices are great now, with tons of niche offerings that are a delight for many. But using Stardew Valley as an example doesn't hold up well for creator success; many (if not the majority) fail in the Steam store despite their efforts and quality.

I think this is generally true for much of the gaming industry in general. There are indeed so many titles that several very big releases years in the making can drop off the scene shockingly quick, if only because new ones show up so often now.


> I won't argue that choices are great now, with tons of niche offerings that are a delight for many. But using Stardew Valley as an example doesn't hold up well for creator success; many (if not the majority) fail in the Steam store despite their efforts and quality.

The argument was "This single person hollywood replacement dream is something to be really afraid of, at least as far as the quality of the content", and the likes of Stardew Valley (provided it's not a unique case, and I don't think it is), prove that wrong.

Making games is probably an even worse way to make a living than it was prior to the indie-dev explosion, sure (not that it was ever a great way to make a living). But top quality content is still being made, and I see no reason to think that won't continue.


And yet Steam is a shining example of curation compared to, say, the App Store or Google Play.

Excellent curation will be critical to gen AI. The window for such curation to be established feels extremely small, otherwise "app stores" will take hold and we will end up with a sea of unnavigable spam.


Hard to argue against that. There's clearly a higher bar in Steam. Hopefully that will continue.


Have you seen The Room? Anytime I see the same person's name in the credits for all the roles, I immediately start to get nervous.

Also, small teams is not a single person


> * Have you seen The Room? Anytime I see the same person's name in the credits for all the roles, I immediately start to get nervous.*

Sounds like that's more of a personal problem than one about this sort of video generation in general.

> Also, small teams is not a single person

Yes, which is why I likened this to what was in my original comment: "just like people make video games all by themselves or with a small group of people"


>Yes, which is why I likened this to what was in my original comment: "just like people make video games all by themselves or with a small group of people"

You're reading that out of context. I intentionally separated that comment. The original post I replied discussed all of the work being done by a single person. That what this is about. You introduced a small team. A small team is not one person. A small team still has the potential of having discussion on edits, creative, etc vs just the ideas of one person.


You replied to my original post, did you not [0]? The post where I, as stated above, also included the words, "all by themselves or with a small group of people," no? The post where I explicitly did not "[discuss] all of the work being done by a single person" that denies work also being done by groups. You were the one who omitted the "with a small group of people" part and started talking only about a "single person hollywood replacement dream."

I think you're just arguing semantics at this point, as to me, it doesn't really matter if it's a single person or a small team making stuff, my greater point was that it will be a lot cheaper than a full-blown Hollywood production and will usher in a new sort of industry like YouTubers today, as ubiquitous phone cameras have, only now these creators will also make photorealistic productions rather than just filming themselves doing things.

[0] https://news.ycombinator.com/item?id=37310759


theres also movies like Primer (https://en.wikipedia.org/wiki/Primer_(film)

which shows what can be create with minimal budget and crew. I heard a lot of the cast also doubled as backstage workers during the shoot.


It's also a kind of cult classic and there's strong evidence to suggest he knew what he was doing by making a horrible movie.


Even assuming it's impossible for a single person to produce something as good as a large team, why would this be "something to be really afraid of"? There will still be demand for high-quality films -- why wouldn't that demand continue to be met? It's been the status quo for at least a decade that one "normal person" can make and publish a "movie" (e.g. filming something on their phone and posting it on YouTube), and yet Hollywood somehow hasn't been upended.


>I couldn't imagine watching something that only one person has ever worked on with no input from other people with suggestions/tweaks/edits to improve the product.

You don't have to imagine, there's Astartes: https://www.youtube.com/watch?v=lr5-JXDkonc

The original HQ video is no longer on his page because they poached him: https://www.youtube.com/watch?v=LdI3WuiC6Pw


In the last 20 years virtually all professional capabilities have been accessible to the public, especially in software, which has become very affordable and sometimes free. This definitely helped creative people get a leg up and create some amazing things. However, the majority of people still don't really use this stuff. You still need a creative vision and coherent direction, both are major weakpoints in current state of generative AI (and for humans).

I for one totally embrace the latest AI tools and managed to leverage Stable Diffusion in several places. But I don't see it replace genuine creativity soon.


> But I don't see it replace genuine creativity soon.

I never said it will either. I also see it as a tool for creatives, just one that democratizes film making at the level of Hollywood productions.


I think you'll see that high production quality becomes more accessible, while content quality remains as elusive as it has ever been...

Unless you can successfully train the population to equate content quality with production values.


imho the thing that makes film the pinnacle of art is that it is the amalgamation of many artists' vision and interpretation. And what makes great film great is when those differences form a cohesive story.

And at the end of the day, art is a reflection of the human condition. Removing humans from the process is not a feature, it's a bug because it reduces the scope at which the art can address.


Since when is film the pinnacle of art? If collaboration makes it so then both music and architecture also have that trait. Theatre as well.


imho: in my humble opinion.

And you're right, but you also need to make all those art forms to make great film.


I don't agree that film is the "pinnacle of art," but even discounting that, there are many works made by individuals throughout history that are great, books and paintings for example. I don't see anyone saying that these should have more than one person making such works but for movies and shows, it seems that we are used to the status quo and cannot see how in the future that great works can similarly arise from individuals via the advances of technology. I'm sure in 2100 when this tech is commonplace that people will think about how archaic it was to have thousands of people work to produce a film.


Its an interesting concept, which will further fragment the amount of attention that each individual piece can get. If the actors are AI generated, then we are also past the point there sex and nudity can't be part of the experience. It will likely be more attractive than anything else for that reason alone.


It'll be like YouTube, anyone can put out anything they want there, the vast majority of which will be bad, but there are still good content creators making high quality videos.


I don't think so. There is a qualitative difference between youtube and what can be made for streaming, but there are also entire groups of things that are not on youtube: there are lots of video essays, but basically no TV shows over multiple episodes, in any genre.

This is currently a limitation, but it doesn't have to be and with the new system it won't be.


Huh? Many content creators on YouTube have made TV shows over many episodes on YouTube, Wong Fu for one.


Watching you own movie sounds quite boring ?

Spoiler alert ?


Imagine the amount of fake videos to spread misinformation.


People have been saying that since the Photoshop days, if not earlier. Stable Diffusion has been out for over a year, I don't see any fake images being put out that are taken seriously by anyone notable, such as news organizations.


>anyone notable, such as news organizations

The problem is the less notable people. It is not the people from news organisations who storm the Capitol.

They alread believe the fake pictures of Trump


It's crazy how consistent the Shutterstock logo is in the outpainted examples.


Seems like it's reversed along the Y axis as well? I'm curious what led to that. The nefarious side of my brain say it was a very basic attempt at making the source training data less immediately recognizable in any generated output, but I do wonder if there's a more innocent explanation.


A "more innocent explanation" could simply be data augmentation. It seems pretty clear they don't care that it's obviously using watermarked Shutterstock videos.


All of the source videos that have people's faces seem to be blurred. Are they blurred before transforming or do they blur it just to post the samples on their web page?


The latter, presumably they didn't ask.


Wait what?


what would happen if they didn't say "pretty girl"?


I’m disappointed that all of their examples of humans are “a pretty girl”. Yeah, I get that people use generative models for that, but there’s a lot more you can do.


They seem to have thrown a few "handsome man" examples in there, too.


That's not the point though. To me, it reeks of a bunch of dude bros. I'm guessing they can't say "a hot chick" otherwise the LLM would show a baby chicken on fire or at least sweating in front of a fan??? Does "hot chick" even translate to Chinese well?

It's the fact that they felt the need to use "pretty" instead of just "girl" or young woman.


Also, the one source video of a black woman is transformed into a white "pretty girl". The "pretty girls" are all either white or Asian.


(¬‿¬)


"a pretty girl, white singlet, dark pants, on the stage"

Well, glad they kept the 6 fingers on her right hand.


No code...


Right? I clicked on the "Code" link expecting code and found code for the project presentation page basically. The paper seems cool but also looks like something that'd take me a while to implement and I don't really have time right now.


I have been reading a lot of adjacent papers to this recently. Here is a useful collection for anyone interested: https://github.com/zengyh1900/Awesome-Image-Inpainting

I have noticed an overwhelming trend that the vast majority of authors tend to be Chinese sounding names even when associated with an US based university. Obviously some of those could be Americans as well but it stood out to such a degree I was curious if anyone had any insight.

Also if there is code (this projects links to github but it's empty), it tends to be basically abandonware once these papers are published with no effort towards commercializing or turning into healthy open source projects for some reason.


this is the no-code future all those platforms like retool, airtable, etc, had been talking about


Kendrick Lamar: "I'm so sick and tired of the Photoshop"

Who actually wants to look at this? It's a neat trick, but I greatly prefer to look at what's real, and I imagine most people outside of this AI hype bubble do too. It's implied that the stable diffusion here is making the video better, but by most definitions of the word "better", it's not.


Porn man, they're going to use it for porn, alot of porn, like, a gigantic shitload of porn, things you can't even imagine. Things have been spiraling around the drain for years now, remove the 'people' aspect of it, and you now have something where a large number of people's objections to it are completely removed. The studios stand to make more money, because there's no actors/crews/locations/etc to pay, and they don't have to deal with any of the management associated with those things as well.

Don't get me wrong, there's plenty of research to suggest that porn consumption is absolutely terrible for your mental health, but without the "that poor woman" ex-pornstar to sit on daytime TV and talk about the horrors they experienced, it's not likely to go much further in terms of regulation, which is going to make sure this sort of tech gets adopted there first.

It'll absolutely get rolled out into hollywood eventually, it's just that there's so much infrastructure in place to already do everything this does, I can't imagine much will change all that quickly.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: