Hacker News new | past | comments | ask | show | jobs | submit login
OpenAI: Sora: First Impressions (openai.com)
90 points by Josely 9 months ago | hide | past | favorite | 25 comments



> Sora is at its most powerful when you’re not replicating the old but bringing to life new and impossible ideas we would have otherwise never had the opportunity to see.

Roughly what you would expect, good for artsy pieces where you don't need the model to generate anything very specific, but not very useful for most work since most work you want that control.

In other words it will be used for very similar things as current image generators, like intro scenes, short one offs, concept art etc.


> not very useful for most work

We seem to be on a timeline where most of the significant use cases that the model doesn't handle well today is less than 2 years away from significant improvement.

My (completely baseless) guess is that within 2 years we begin to see "high budget" feature length productions beginning to move towards a cost saving model which fully allocate the production budget to primarily virtual content.

In less than a few years time there will almost certainly be a vast ecosystem of production and post production tools to give creators the controls to reliably create and fine tune their shots.


The cool demos from OpenAI, Figure and the like make us hallucinate a future that will take much (much) longer to pan out because they ignore the domain-specific knowledge that is inherent to the domain they pretend to disrupt.

I’ll be impressed when ILM talks about it.


this'll age well...


It's "God of the Gaps" all the way down with these folks.


I agree with you, and just a few more observations about where do I think the current bottleneck might be: I wonder how well the model handles with re-using objects/people/scenes. Like, can I create a character and then use him again along 10 different shots? Also, I'm pretty curious about how the user interface looks like. Cause they the text-to-video model interfaces seem pretty limited compared to the freedom a person has using Unreal Engine or Blender or shooting a movie in real life.

How would the golden standard text-to-video user interface would look? And I have been thinking on this for years, even before the current generative AI boom, and I wonder if it could generate like a 3D representation of the scene that you described, like there would be a file where you could very easily change things around, as if that thing had been created on Blender or whatever, but very very user-friendly and easy to edit things.

It will seem silly what I'm going to say, but the ideal interface, it reminds those movies people did using the game "The Sims", and how you could very easily move objects, and move the camera, and so on. What I'm trying to say here is that I would imagine these models creating a 3D representation of the scene, and the movie-making process ends up being somewhat similar to how could you could customize objects/people in that game.


I have only vague idea about this (I worked on small 3d games many many years ago), but I imagined something similar to what you described.

Basically you use Sora to generate a promising scene, then you ask it (or another model) to turn that scene into a scene graph in a text file.

It will make mistakes, but it could work similarly to the Python interpreter in ChatGPT--it can iterate until everything is OK. Maybe there could even be some adversarial stuff where the scene graph is rendered on the fly to compare it to the generated clip, etc.

And then you can use you standard toolset to edit it, probably enhanced with a copilot model to automate as much as possible.


Even image generators are still in that phase where they are excellent at generating faces and dream-like sequences but suck at details.

Pair that with the increasing legal copyright headwinds in terms of sucking in the world's data, and I can see this flatlining.

All that processing power is expensive, and these companies are yet to have a clear path to monetization. Stability AI is already circling the drain.

This has the "let's back the cash truck up to lock in the market share" of the streaming wars vibes, until they realized it's just not sustainable.


> Below are a few examples of the artists’ work, with early thoughts from them on how they see Sora fitting into their workflows and businesses.

I wish it was clearer how Sora was used by each artist and how it impacted the provided examples. (I think I see some Sora generated output but I'd imagine it's not as clear cut in artistic works.)


> As great as Sora is at generating things that appear real - what excites us is its ability to make things that are totally surreal.

Finally, software that makes images that don’t quite look right. The use cases for these will be unending


A good study could be comparing artist output and self satisfaction with LLMs vs. Conversing with a rubber duck or just imagining what an LLM might do. A lot of this reads to me as the artists actually selling themselves short.


Sometimes I feel like I'm seeing something completely different from what is described in the popular narrative. This is a good example. I wrote a post detailing:

* What does Sora actually do? * What does it not do? * What will it likely be useful for? * And finally, what will be needed to actually replace the majority of video generation use cases?

https://shorturl.at/auIK0


How much of this is truly Sora and how much is not?


I'm seeing shots that would be incredibly expensive for some productions - even if we ignore the ones requiring visual effects work. Some of them would need small crews, permits, rentals of expensive equipment, casting, and travel. It's impressive and concerning at the same time.


Do we have any ballpark figure of what a single sora video costs to make?


I’ll be downvoted for this, but all these videos feel like the high-fructose corn syrup of cooking.


Successful, widespread, and not differentiable in taste tests?


a cheap way to make everything sweet so that prepackaged goods are preferable to ever leaning how to make something yourself


I think its good cheap food is available!


I couldn't really disagree more. It is a ridiculous comparison.

Anyone into experimental film making can see this is basically the ultimate creative tool.

If you can't see the creative possibilities here you just don't have much imagination.


The problem is always in the tooling. Prompts aren't suitable for creative work at all, you need a large set of non-textual tools that let you guide it and create exactly what you want. As an instance, Stable Diffusion has crude higher-order tools, although they're still poorly suited for actual productive usage because they've been made by ML nerds, not creatives.

OpenAI doesn't have even that because they're an AI company, not a VFX company. Besides not even understanding the needs of their users, they see this model as a neat intermediate result on their path to the AGI, and as a progress report to raise more money. They're really interested in advanced emergent behavior it exhibits, not in artistic tools. For this reason they've never bothered to fix all the artifacts DALL-E 3 gives, let alone add any tooling to it. Sora will be the same, and its quality doesn't even remotely approach to what is required in production. It's more of an experiment.

What you see in the OP is simply a marketing material made by OpenAI in an attempt to make them look less nefarious to creatives by appealing to authority (took them quite long to understand that, usually they're superb at marketing). I can guarantee you won't see any real use of it in production because it's just not what OpenAI in there for. They already probably have another better model in the making, anyway.

Models made by actual VFX software companies will have a chance to be used because they care about the usability. Models made by Stability (SD3 has the same architecture as Sora, although it's for image generation) also have a chance because they are open-weights and have tremendous amounts of tooling around them. Models from OpenAI, unlikely.


I always tell people if I couldn't work on computers in any capacity, my favorite thing I'd like to work on is directing movies. I've already played with animatediff and I can't wait to play with Sora. These new AI tools (especially FOSS ones) are an absolute boon for anyone without a major budget.


Exactly.

I have always wanted to play around with experimental film making but not enough to find a group of actors let alone pay them.

The negative reaction to this reminds me of when rap was new. "It is not music, they aren't even singing!"


I feel all the GPU time should first go to improving GPT or solving AGI rather than image/video generation


There is no art without an artist.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: