Two figures that stood out to me were the 1.1 million render hour / day capacity of the farm and the quoted total of 190 million render hours for the film.
That means almost half a year (~172 days) was spent rendering the film on this supercomputer. The listed running time of 108 minutes * 60 seconds * 24 frames per second = 155,520 frames in the film, giving us an average render time of 1,221 compute hours per frame, or a rendering speed of 2.27e^-10 FPS. Which means that, if Moore's Law continues to hold, in 26.7 years or so we'll have a super computer that could render this film in realtime at 24FPS, and that's just neat.
Congrats to everyone who worked on this; really impressive technical achievement.
1.2 hours per frame sounds about right for a feature. Also, apart from Moore's Law, there's a Blinn's Law ( http://en.wikipedia.org/wiki/Jim_Blinn ) - As technology advances, rendering time remains constant.
My bad. It has been a dogma for decade(s) now that production quality feature is from 0.5 to 2 hrs per frame for final though, so that was my reaction without reading correctly. You have to factor in numerous passes and production previews which add up. source: am rendering daily.
TV work, on the other hand, can't deal with that kind of time burden, so it's shifting to more novel solution which lately involves near real-time rendering (see element3d for example).
In my experience, typical render times are whatever can get finished overnight. So up to 8 hours or so on average. Many complicated renders will take much, much longer. And then of course many simple things like previews and non-final renders go significantly faster. It varies widely depending in the complexity of geometry and lighting model.
Blinns law is completely true. The limiting factor on render times is the impatience and work schedule of the people submitting the jobs.
Remember that you do not get it right immediately, thus most sequences are repeatedly re-rendered until everything is correct. Also a frame often isn't rendered in a single pass but is composed of multiple layers -- although this varies depending on the animation studio and their philosophy.
Isn't this layering what they referred as "time consuming manual lightning"? From what I understood, everything is treated as global illumination and effects are done without "tricks".
No, layering just means generating multiple renders for a given shot, which can then be combined to create the final image using 2d compositing software such as Nuke. Full 3d renders are always going to be much slower than just doing image manipulations in 2d, so it is much more efficient to iterate by making adjustments in 2d than by re-rendering everything from scratch every time you need to make a change. Additionally if just one element in a shot changes, say a character's animation changes, it much more efficient to just render the layer for that element again than to re-render ALL the geometry for the entire shot. Each layer may still be rendered with full global illumination and whatever other fancy lighting is needed.. the layering just makes it easier to iterate in individual pieces of a shot.
In feature animation things are changing and being adjusted constantly. Each shot is worked on by many people across multiple disciplines, all of whom do many, many iterations to hone in in the final shot. So the entire production pipeline is designed to make handling changes as efficient and easy as possible. Look at the amount of processing resources disney used for this movie.. I can guarantee you they are cutting all the corners they can and maximizing efficiency wherever possible. Doing a single monolithic render every time something changed would be extremely inefficient.
That said, there are a few studios that do produce complete final frames in-render. Usually places that write their own renderers in-house and therefore have a sort of macho academic attachment to showing off how much their renderer can achieve out of the box. Blue Sky is this way because their whole pipeline is designed around their cg studio renderer. Anecdotally I have heard that Pixar did everything in renderman for a long time but more recently they have started using more of a compositing workflow since it's so much more efficient. So it's possible that disney may be this way now since they have this fancy new hyperion renderer they wrote, but I certainly wouldn't bet on final frames all being done in-render.
But the layers aren't independent at all, so how is it done? For each small change you show have to resolve the full rendering equation (although that should be easier given small changes). So I understand they may want to use something like that for quickly visualizing changes, but I can't see it used for actual rendering.
Each layer might only have some subset of the objects in the scene, but the lighting on those objects might still use the indirect contribution from all other scene objects.
So for example if your scene has objects A, B and C, you might break it up into a layer for each. Then to render the A layer, your primary rays would intersect with object A only. But any secondary/indirect rays would intersect all of A, B and C, so you'd still get the correct global illumination on object A. Breaking it up this way just makes things easier to adjust in compositing. Also the indirect for each layer often will be rendered as a separate pass entirely so it can be dialed in comp independently, or regenerated if the scene geometry changes.
It's also important to understand that almost all the lighting in feature rendering is incredibly faked and not physically-correct at all. This is particularly true in animated movies. It's just important that the end product looks plausible, not that it's academically correct.
Global illumination helps add a bit of realism and nuance, but its contribution to the final image is pretty subtle, especially between objects that are far apart. Nobody would ever notice if the indirect lighting wasn't perfectly correct in all but the most close-together objects. So a lot of the time it won't be re-rendered if the scene changes slightly, as long as it still looks decent.
Without global illumination, to get the illusion of light bouncing between objects lighters would have to place spot lights for every bounce they want to fake. Like a ground light pointing upwards below a character to fake light bouncing off the ground etc. People really used to do this, and it's a real pain. This is the "time consuming manual lightning" they talk about. Global illumination tools make that process automatic and are able to get much richer light interactions than anyone could set up by hand. But they're still just tools. And like any tool, artists will break them apart and use them in whatever hacky way they need to get a shot to look right. Getting the exactly correct solution to the rendering equation is just not important.
It's not like they create the film and then push render though. I assume these figures include massive amount of iterating in different work stages. So if you are interested in only rendering the end product you will get there a lot faster.
Hope someone knowledgeable can comment on this. The impression from the short making-of documentaries packaged with Pixar's films, is that they're completely animated, voiced and viewable with very basic working rendering, and the proper render is something that in principle could be done as a final complete step.
Of course, that's probably a bit of movie magic; in the real world there's surely some iteration.
Yeah, previs (pre-visualisation) is generally always done these days, as it lets the director preview what things "look" like, but that only has very basic geometry, low res textures (if any) and no proper cloth/skinning deformation animation, and no proper shaders / lighting. So it's basically using a game engine without even game engine decent lighting / shading for the most part.
So while it's great for placing objects in the scene and getting the camera position correct, that's about it. There has been a move to using progressive raytracing more and more over the past year or so, but it's still very early days with that.
On top of that, as previs is almost always done early (before most asset generation - modelling, texturing, maybe animation too) is done, it doesn't always give you all the info you need - i.e. it's often the case with mechanical hero objects in the scene (i.e. aircraft, robots, weapons, etc) that the basic geometry used for previs isn't good enough, and later on in real lookdev/lighting (or sometimes even at animation stage) serious issues are found - i.e. the previs has a robot moving through a street, but when animation actually start trying to rig the robot to get it moving realistically, they find for the arms to swing, it can't actually fit in the street.
So there's a huge amount of iteration that works up-and-down the pipeline, sometimes causing lots of different parts of it to re-do work.
Generally it goes:
Previs -> Modelling / Texturing (linked, as you need UVs) -> Layout -> Animation - > Lookdev / Lighting -> Compositing.
Any non-trivial change before Lookdev / Lighting will trigger new renders having to be done for some layers / scenes.
The above poster is right. It's not just some iteration, there is tremendous iteration. The quality that you see comes from iterating until there are no more imperfections and that takes a lot of iterations and a lot of renders. A typical lighter may have 4 shots they are working on at one time, with one finishing about once a week.
So basically the impression from the DVDs is a simplistic fantasy. While the narration says 'it's such a wonderful place' the people working there until 9 every night are thinking 'why is everything broken'.
I can't find the source, but I remember reading that when Pixar went back to render Tory Story in 3D that they got close to 24 fps, just because the rendering hardware and software had improved so much.
But without a source, I'm not going to stand by that...
Well, performance scaling in software has advanced (in average) with a stronger exponent than Moore's Law. Which is why we can get visuals as awesome as Jurassic Park on modern GPUs - while the hardware isn't quite as powerful as the rendering farms used for that movie, other advancements allow us to achieve what are in many ways more spectacular results.
So, he may be wrong about what makes this happen, in essence he's probably correct. In the time frame he outlined, I expect both Ray Tracing and Radiosity solutions on both hardware and software to match and possibly exceed what he has outlined in terms of capability.
"Moore's Law" as a phrase in popular usage dates from way after Moore's 1965 paper on transistor density doubling every year, and anyways Moore wrote a memo endorsing it's usage for things other than that (It was included in the lecture notes of some semiconductor physics class I was taking).
And more transistors doesn't mean more performance. Why do you think modern CPUs have 6 or 8 cores on? Because we can fit more transistors on a die but CANNOT make individual cores go any faster.
You can massively parallelize rendering a movie in advance because you can do each frame on its own CPU. Rendering in real-time is much less easy to extract this kind of parallelism from, particularly if you have hard real time constraints.
I thought the problem with faster as you go smaller is that smaller pipes leak higher pressure water (electron tunneling), which is why we have divided the pressure into a larger number of small pipes (multi-core).
If we can make higher pressure capable small pipes we could run then all faster. I bet there's billions being invested in solving this problem, but that doesn't mean it's solvable.
One of the main limitations is actually power, as increasing the frequency increases the power draw. All that power turns into heat, which has to be dissipated. Modern processors already shut off inactive cores and change the frequency of the active cores on the fly depending on the workload to reduce the power draw [1], so it's easier to add performance by adding cores. The problem is that software hasn't quite caught up to take advantage of the added cores.
Modern CPU cores are faster than CPU cores from before the "multicore" era. I think we're figuring out how to make CPU cores faster _and_ pack many of them into the same space.
I was about to take issue with the term "supercomputer" as this is really just a cluster of standard boxes. But then I looked at the latest Top 500 list (http://www.top500.org/list/2014/06/?page=1) and found that 55000 cores and standard gigabit ethernet easily lands you in the top 100 (#73 is a close approximation). I clearly spend too much time thinking farms like this are no big deal and that "obviously" a supercomputer is more like BlueGene/L or Tianhe.
The latency between the sites would kill its linpack performance. It's a "supercomputer" in the same sense that seti@home is a supercomputer -- it's really just a few thousand machines working on an embarrassingly parallel problem.
Right. Calling it a supercomputer is the same as calling a facebook data center a super computer. The computers don't work together as one system, a frame is queued onto a single computer (and actually multiple frames are queued onto one computer). The latency is the same as your internal network (and actually most companies use nfs and have cheap 4 port routers in offices). One article on Disney says they are using 10gb ethernet, which would make them the first cg company I've heard of to do it.
The first cg company to use 10GbE? I find that hard to believe. I develop storage boxes for video production, and 10GbE is pretty much standard these days, you only go with gigabit if you're being really cheap. We are now starting to sell 40GbE systems on the high end, as well as bonded 10GbE for those that need a bit better than 10 but don't want to shell out for all of the really expensive 40G gear.
It is the first CG company I have heard of or seen using it. You are talking about a different industry. Not everything in visual effects studios is higher tech than anywhere else. The bottleneck is often expensive network storage and not the network itself.
I would imagine that right now studios are looking at 10gb ethernet, especially since SSDs should be making their way into the enterprise level gradually. Many times reading a scene goes at 50MBs.
It helps to realize that these studios are not flush with cash. They also might not always spend money in the best places (although usually they know what they are doing or they go out of business). Articles paint them to be super high tech, but really everything is built out of commodity hardware except for the backbone disks and routers.
Also upgrading to 10Gb means upgrading thousands of boxes, not just a couple. It could also mean running new cable but you would probably know more about that.
In seti@home, they are all being controlled from a single location to work on a single task, while the entire internet consists of independent machines operated independently. That's what allows you to talk about seti@home as a single supercomputer, even though it's somewhat tenuous as the problem is so embarrassingly parallel, it would never compete on any actual supercomputer benchmarks which require low-latency communication between nodes.
Really 55,000 cores isn't all that "big", its 2300 Dual CPU Ivybridge class servers from Supermicro, which will set you back about $10M if you load them up with RAM (which I would do with this workload). Why not go a bit higher and get 25,000 machines and stick a couple of GPUs in each of them. Figure the movie is rendered at 8K resolution, 60 frames per second (for 30 frames per second 3D stereo), if you render left and right view frames on a machine in say 5 minutes, that renders an entire 2 hour movie overnight. Sure setting the lights and setting the motion takes most of the time but we've reached a point where a "big budget" animated movie can afford the hardware. Even more so if it is done on a rental type deal.
Take the infrastructure of an Amazon or a Google and its not really a material chunk of their resources. In fact, Amazon could no doubt offer an 'Elastic Render' service ala EC2 and really help a lot of CGI companies become profitable. The thing that kills those companies is keeping all the hardware after they don't need it any more. If you don't store it properly it becomes worthless, if you store it too long it becomes worthless, if you leave powered up and running it sucks money out of your account long after the checks from the projects come in.
Movie would have been rendered at 2k (maybe 4k but that's still rare).
GPU's still don't cut it for high-end VFX/Animation rendering, as you're always memory / IO bound, e.g. reading in +20GB geometry and >200GB textures for an average large scene. That doesn't fit on a GPU and copying stuff on-and-off over PCIE is really slow.
Generally you get the geometry in memory (so 20GB in memory, + stuff like acceleration structures) and page the textures with 10/20 GB texture cache sizes. A dual high-end Xeon can almost compete with a K6000 at raytracing, and copying stuff in and out of memory is much faster. Then you're just limited by network bandwidth / latency.
A couple of things that can lay some of the conjecture to rest.
1. Films are 24fps. 48fps is exotic and extremely rare.
2. Some stereo is planned to be done with conversion, some films are a hybrid between conversion and actual renders for each eye. If this movie is stereo it was very likely done by rendering each eye, as that has been the approach for their stereo films after Meet The Robinsons.
3. Films are still rendered at 2k. 4k is still extremely rare.
4. Typical frame times are probably between 2 to 10 hours on 8 cores. Many passes go into a typical frame, but usually only a couple are heavy (many hours).
5. Amazon and other clusters actually are being used and investigated by CG companies to be used during peak times (the last 6-8 weeks of a show). Profitability is not something that comes down to rendering however. It is an expense, but the vast majority of the money goes to pay the hundreds of people who work at the studio.
6. Hardware is not stored. It is on the farm and switched on, or decommissioned.
Since we are talking about recursive lighting and global illumination, GPUs aren't particularly well suited to this task. The major bottleneck I'd expect would be memory coherency or interconnect speeds rather than brute compute at high enough levels of parallelism. There is probably a sweet spot of memory speeds, reliability, parallelism and processing power which they aimed for.
tl;dr: I doubt the engineers building the cluster didn't think of that.
Actually GPUs can raytrace extremely fast, and shading the hit points is basically the same as running the shaders of a game (local linear algebra). GPUs are not used much though because the video card memory is very limiting.
Indeed, but raytracing data structures aren't complex. Primary rays scream on GPUs, they cache amazingly. However, secondary rays for GI/FG and similar are problematic since they are trashing the cache and are branching like mad. So classic Whitted ray tracing is mad fast on GPUs, but add global illumination or similar and you've got a problem. Not to mention memory issues others have mentioned. Octane, Brigade, Keyshot are top GPU renderers these days - all very limited to what you can do with classic offline renderers which are dominant now like mental ray, vray, arnold, renderman...
well a current search result on the term "Cyber Dildonics" returns a lot of "sex toys" and "virtual sex" results. So if your business isn't about that, I suggest you might wanna pick a different name...
Movie was most probably rendered at 4K and 24 fps (double for 3D), and more realistically and more probably at 2K - with actual resolution depending on the aspect ratio. Lots of smaller companies are using amazon directly or 3rd party render services built on top of amazon, but bigger aren't and won't. For example: http://www.zdnet.com/for-peter-jacksons-weta-digital-the-clo...
Also, it's not really practical to use amazon-like service for production previews if you don't have a fat pipe running to it, so there will always be a need either for a small local farm or high end gpus or both.
But then what happens if in few years they want to re-release the film in 4K or even 8K in a decade or two? Do they just go and re-render the entire film?
Running Renderman [1], Blender [2], or your own in-house/custom rendering engine in AWS should be trivial at this point (using S3 as a cache for staging before processing).
Now, if you meant that should wrap a nice API around it like Elastic Transcoding (which is still terrible in my opinion compared to services built for transcoding like encoding.com), my hunch would be that the market isn't big enough for that sort of effort.
Nobody uses EC2 for this type of workload. The requirements to get any sort of real performance are far too specific.
Dozens of companies exist that have massive render farms and they rent them out to shops working on movies and commercials, but most of the major players have large dedicated in-house infrastructure.
Zync (https://www.zyncrender.com/) was offering this kind of service on AWS but recently acquired it. I think it will get folded into Google Compute Engine somehow.
I think its usage is limited to students and very small shops.
AFAIK they won't even let you spin up anything other than one or two instance sizes they kinda got working. The pricing was ridiculous, $6+/hr if I remember correctly.
The market for this kicked off in the past year or two. Amazon is already on the cloud rendering bandwagon in some fashion[0], and Google joined them recently[1], so it's coming - it won't really be here, of course, until another generation of software comes through and does the integration work.
What makes you think GPUs are suitable for the problem? What does their global illumination model look like, and what are the limiting factors here?
(I have absolutely no idea, and the advert..cougharticle is light on technical content, but the fact that they didn't use them makes me suspect they couldn't use them.)
Whenever I give talks about WebGL and try to describe "real time rendering," I like to start off with with a description of what real time rendering is _not_. [0]
The factoid I like to use is: "Toy Story 3 took on average 7 hours to render 1 frame (24 fps), and at most 39 hours." (source [1]) That means if we wanted to render the next frame dynamically based on user input, the user would have to wait on average 7 hours to see the next frame, vs 16.666 ms that we shoot for in real time rendering. If one frame takes on average 7 hours, and there's 24 frames per second, then it take 168 hours or 1 week to render 1 second of video. Better get it right the first time (or have a super computer cluster to do all of that math)!
Many corners are cut to achieve something of lesser quality than can be achieved via pre-rendering; we call them "approximations."
It does - though it gets fully attributed to its Global Illumination rendering. Which is odd since every renderer and his dog supports GI these days, many of them have for years already. I wonder what's new or different about Hyperion's GI?
There are many ways of handling GI (which is basically, taking in consideration light that bounces though the scene in addition to that emitted directly by light sources). Until now, Disney used and hybrid approach rasterizing the main scene and raytracing some effects that a rasterizer can't handle. Now with this new renderer they have joined the full raytracing wagon.
In fact, they are far from being pioneers in the field (right now raytracing is widely used in production rendering), but it's a big step for Disney and they have built their raytracer from ground, allowing them to implement it with some clever tricks.
Probably one of the best usecases for parallel processing and truly saturating a massive multi core enviroment to the fullest.
On the other hand in 2014, having >4 cores in the desktop space hardly has any benefits over 2-4 cores with high single threaded performance.
If Disney goes through such lengths just to render scenes. I really have no idea on what scales are the computers used by security teams in intelligence agencies of some countries just to bruteforce passwords...
I love to learn about details of how cluster rendering works? How you break down the task? There is too much dependency between components(light, physics and material) compared to web server world
Light does not interact with itself under normal circumstances, so you can render each pixel completely separately once you have the environment.
In raytracing you render in reverse starting from the "eye", not forward from the lights, so each pixel of each frame can be parallelized. (With perhaps a small amount of mixing at the end to handle aliasing/quantization errors.)
It's done on frames because that's how renderers work. You build a description of a single image and the renderer renders it. The renderer will generate threads which will all render a 'bucket', which is just a square block of pixels in the image.
It's possible to parallelise this across several processes on different machines, but this is obviously less efficient because you'll have to do all the render startup tasks (reading resource files, building acceleration structures) multiple times.
I can't go into any specifics, but the general approach that most places use these days is to assign complete frames to each node on the renderfarm. The nodes are multi-core and threads divvy up the work of rendering the image in tiles. That's not to say that production renderers can't do multi-process renders too, but you don't see it nearly as often as multi-threaded renders. Remember that it can often take several minutes just to read scenes in; even if you split up an image into individual pixels and gave each node just a single pixel to do there'd still be a limit to how quickly you could get a frame back. So if you parallelize across your cluster on frames, then yes, there's usually several hours latency for a frame sequence, but you also get much better throughput.
By the way, if you haven't heard of Blinn's Law [0], you may find it interesting.
This is not true. Frames are queued to computers with free cores and memory. Splitting up frames across multiple computers is a brute force option that is rarely used. When you get to that point on show it is because render times are huge and often because renders have failed and need to be forced through during the day to make a deadline. It is a telltale sign of disaster show.
On an individual box frames are split up into tiles and the tiles are rendered on individual cpu cores.
I don't know what you mean by "a few" frames per day, but it sounds like you have a misconception about how it works. Typically many, many thousands of frames are rendered every day, and most of those are rendered multiple times as artists iterate.
The unit of parallelization for renders is always the frame. On a machine with multiple cores each core will render tiles in parallel, but across machines jobs are split up by frame. This is both because it's the simplest type of distribution for people to understand (machine goes bad? You lose one frame, not arbitrary pixels in an image). But also because it's most efficient for a single machine to read all the data for a single frame instead of multiple machines requesting the same data repeatedly across the network. I think people tend to underestimate just how massive the geometry and scene description files are for a typical feature and how much of the work involves managing the storage and network efficiency.
They don't do final renders immediately. They render previews with expensive features like global illumination turned off.
Rendering a single frame across multiple machines sounds wasteful. They would have to load the exact same textures and models for a single frame across all of them. When batch rendering, it would be more efficient to do that work just once per frame.
Wow that little? That puts the machine at an about 3.5k Cray XK7 nodes which would rank it below computers in the TOP 100 list. I expected a bugger machine from a companies whose job is to render movies.
That means almost half a year (~172 days) was spent rendering the film on this supercomputer. The listed running time of 108 minutes * 60 seconds * 24 frames per second = 155,520 frames in the film, giving us an average render time of 1,221 compute hours per frame, or a rendering speed of 2.27e^-10 FPS. Which means that, if Moore's Law continues to hold, in 26.7 years or so we'll have a super computer that could render this film in realtime at 24FPS, and that's just neat.
Congrats to everyone who worked on this; really impressive technical achievement.