As with almost everything that comes out of Wenzel Jakob's lab at EPFL, the documentation for Mitsuba 3 is _incredible_. Check out this writeup of how polarization is handled in Mitsuba 3 (complete with animated diagrams): https://mitsuba.readthedocs.io/en/latest/src/key_topics/pola...
That writeup is PhD Thesis material in terms of scope and detail, but the student that wrote it, Tizian Zeltner, actually wrote his PhD on another topic entirely. This is just the level of stuff that routinely comes out of that lab.
This is a result of a group in harmony with people who are in love with what they are doing. When you like the subject, and pour what you have into it, this is what comes out. I'm sure this wasn't tiring for them.
Make no mistake, this is a very high achievement, and is very rare in both private companies and research labs. It's a very hard point to reach if you don't like or care about what you are doing.
Agree on the quality. I have several documents from the Mitsuba 2 era that are of the absolute highest quality. And for anyone wondering, this counts both the quality of content and the actual success of communicating complex and varied ideas in commendable fashion.
Inverse rendering is such a cool idea... solving for the inputs (3D models and materials) to a renderer such that they match real world inputs (photographs) by computing gradients automatically.
As someone coming from the physics side and not so much from the Computer graphics side of physically based rendering, I have high hopes for differentiable rendering as a general tool to extract knowledge from light-based measurements.
Most differentiable renderers stop at the object surface, but differentiable volumetric light transport is what would allow us to use the renderer to optimise a measurement device or for medical diagnostics or just to estimate the optical properties of a material (that is absorption and scattering coefficent and scattering phase function), which is a pretty hard problem in itself.
From a quick look, Mitsuba 3 seems to have everything in place to be used in such a setting and I'm really excited to see some use cases outside of Computer graphics.
We're at a cusp in technology where "reverse rendering" is rapidly progressing from a research curiosity to a mainstream product. As soon as high-quality model generation becomes possible on a typical workstation, there will be a dramatic shift in how computer game art assets are produced. Many artists will be replaced by this mechanical, automated method of 3D model production.
No longer will you need a team of 100 artists for every 2-3 developers to manually model and texture objects, instead you'll have 10-20 photographers running around taking pictures of real-life objects and scenes. These will then be reverse-rendered into models, which can then be directly placed into the game world by just a handful of map designer artists.
You can see this in Unreal Engine 5, where they combined their "infinite detail" polygon engine Nanite with their acquisition of Quixel's Megascans library of models captured via photogrammetry.
However, photogrammetry is not the same as reverse rendering: it's only really suitable for diffuse objects captured in good conditions (foggy or overcast days). It can't handle transparency, reflections, or complex/dynamic self-shadowing. Currently it's mainly used for backdrops like rocks, logs, and road surfaces.
Reverse rendering can in principle capture anything a renderer can do, including specular materials. It can "undo" self-shadowing and produce objects that can be re-lit in different scenes.
For gamers, these are going to be exciting times!
For graphic artists... perhaps also exciting, but not necessarily in a good way.
> For graphic artists... perhaps also exciting, but not necessarily in a good way.
I think you're missing an important aspect of art - style.
The CG industry has been chasing photorealism for decades and we've been slowly inching closer. Photogrammetry is an important step that will provide a big boost in that goal.
But what happens when we get there? Once it's (relatively) cheap to make a AAA game that's virtually indistinguishable from reality, what happens next?
Everyone will love it for a few years but then they'll start to demand something new, and that might be non-photorealistic styles. Or it might be off world or impossible environments. Photogrammetry won't help much with a game set in space in the distant future, populated with non Earth creatures, plants, and architecture. Although it will help a bit, of course. There will still be a place for ultra realistic environments (war games, spy games, romance games etc.) but it will become just one style amongst many.
If anything this might lead to a revival in making real world models and then scanning them. Imagine a Dark Crystal style world except you are the player. All the characters and models were crafted by hand and then scanned into a 3d world where they are indistinguishable from their real world counterparts when viewed in VR/AR or on screen. That would be cool.
As for artists, perhaps they should be worried about AI stealing their jobs. But not photogrammetry, that'll just remove some of the tedious stuff like modeling furniture.
I've been both a game programmer, and a computer graphicist coveting photorealism, for ~30 years. In my opinion, this is exactly right. Photorealism is and was never going to put artists out of a job. But other algorithms... might. I can't rule it out, but the older I get the less certain I am of where so-called 'AI' is going.
Rather than putting significant numbers of artists out of work, I anticipate AI tools will act as a powerful productivity amplifier for everybody, but especially artists. As the cost of capturing, creating, and altering 3D content declines exponentially, I predict an explosion in the amount of such content being generated.
Individuals and small teams will be able to take on projects that historically have required much larger teams of people working over several years.
I agree. I'm also a long time CGI algorithms guy ~30 years. I can definitely see computer graphics getting swallowed by machine learning / AI in the next few years. The main things standing in the way as far as I can see, are the fine grained control of what the AI algorithm produces, and the resource requirements. But those barriers will fall, or be gradually lowered for sure.
Neural nets can do style transfer. It might be as simple as making photorealistic models and then transferring a style from a handful of non-realistic samples.
For one place of work, yes, just like you only need a couple of actual engineers to design a product and then an army of robots actually builds it.
This doesn't diminish the supply of jobs much though, as it lowers barrier to entry, many startups appear to compete, each needing its own artists to feed their AI with new and interesting styles.
And a new category of artists will emergence, one that is expert in specifically crafting styles amenable to AI transfer.
I think you misjudge how much artistry goes into placing those photographed assets.
You are probably way off on your estimates in how this will change the workforce.
Small teams of artists can now produce television VFX that rival what took hundreds of film artists to do decades ago, but if you pay attention to movie credits, the number of artists working on big films seems to be increasing.
Also, it seems to me that AI and procedural generated assets are more likely to be the future than real world scans.
It seems like this means artists would mostly be needed for creative art, e.g. spaceships, monsters, etc. That is, things that don't exist, so can't be photographed.
But there are already commercial and free libraries of commonplace items like furniture, cars, and buildings. So isn't what I described pretty close to the same situation today already?
But spaceships and monsters do exist. There are thousands of them, just look on google images. An AI inverse rendering system, could just as easily ingest a ton of space sci-fi images, as real world photos.
It'll be interesting which of these win; scanning real life objects as you describe or just generating 3d assets directly through a DALL-E style model.
Traditionally rendering has been concerned with the RGB colors, because that's what the screen has to output. So your materials are defined in terms of what color objects are. Then they started adding "effects" on top like reflectiveness, bump and normal maps, noise, etc. to make objects look more... real.
Physically based rendering approaches from the opposite end. It asks, what are the characteristics of the material, in terms of simulated light interacting with a nontrivial surface, so that the thing appears on screen as it should if it were a real object with real light bouncing off its nanostructure surface.
Ray tracing is one method of rendering, but you can physically based render the old way too with fragment shaders. You just won't get global illumination so it won't look as "real".
> Traditionally rendering has been concerned with the RGB colors
I do need to point out that many (most?) so called physically based renderers are still RGB based, with all the downsides that come with that. Mitsuba is notable for also having spectral rendering, which is a distinct feature that models light as a spectrum instead of rgb triplet.
Man they really need to explain what is unique about it on the home page better. I know what differentiable rendering and spectral rendering are but neither are mentioned. I have no idea what a retargetable renderer is and it seems like nobody else uses that term at all (seriously - Google "retargetable renderer").
There are ways of faking GI, even with current techniques and without using ray tracing. Screenspace GI can look convincing (and also miserably fail). Radiosity techniques exist, etc. We've been able to fake global illumination for a while now. The results aren't perfect, but then again, that's what realtime computer graphics have been forever, a bunch of tricks that are close enough to reality, and 10 artists crying in the corner because they had to spend 5 hours moving an asset around so it looks just right.
I’ve always seen “physically based” to mean the renderer does a physical simulation (photons bouncing, refracting, having wavelengths stretched, subsurface scattering, BSDF calculations, etc.).
So much in computer graphics are just (clever) hacks upon hacks to get something that looks “good enough” but isn’t really simulating physics in any meaningful way (like SSAO, texture baking, bump mapping, etc.). These hacks are much, much faster than simulating the physical process of photons interacting with the world.
I don’t know where “physically based” originated but my first introduction to it was pbrt, which I suspect popularized “physically based” naming.
Differentiable rendering is the name of going from the final image and “reverse rendering” it to reconstruct a 3D model of it.
Some clarification regarding the differentiable rendering vs inverse rendering.
Differentiable rendering is what it says: differentiating the rendering process. Imagine that x is the scene, then f(x) is the function that renders the image. Then, Differentiable rendering is simply taking the derivatives: f'(x).
Inverse rendering is a process of finding scene parameters x, such that f(x) produces the given image y. This is often achieved by using differentiable rendering together with an optimization algorithm (like SGD or Adam). However, due to the nature of rendering, it's easy to get stuck in a local minimum. Therefore, even a perfect differentiable rendering engine is not sufficient for the inverse rendering.
Mind you, physically based renderers also use a bunch of clever hacks upon hacks to be able to run in realtime. Nobody is simulating photons bouncing in the nanostructure of a material. We just have formulas thay say "for this roughness and metallicness, your material will look like that and it'll be good enough". We still abuse the hell out of UV maps to fake things, still include ambient occlusion (as a stylistic choice most of these days).
It's a different workflow that gives you consistency once you know what you're doing, as opposed to the days of Phong shading.
A physically-based renderer tries, as much as possible, to ground the equations and techniques in physical principles. For example conservation of energy[1], surfaces shouldn't reflect more light that what shines on them, and Helmholtz reciprocity[2].
This affects the choice of approaches and algorithms, such as using unbiased rendering[3], and their implementation, like using energy-conserving bidirectional scattering distribution functions[4] to describe how light interacts with a surface.
> It sounds like ray tracing that can be done in reverse on a final image?
I forgot to mention, if the renderer respects Helmholtz reciprocity, then you can chose to either do forward rendering (rays originate at light and bounce until they hit the camera or dissipate) or backwards rendering (rays originate at the camera and bounce until they hit a light or dissipate), or even do both, so-called bidirectional path tracing[1].
It just means that the renderer is making some attempt to mimic the way light behaves in the real world. This, hopefully, implies that the resulting image will look realistic.
From what I see the point of this renderer is that it can invert the rendering to come up with a 3d scene that matches the input render image, so I guess outputs aren't as important ?
Couple this with VR glasses and you can relive your favorite memories in first person rather than through a photograph. I've been thinking about this for probably a decade now, and can't wait for it to become reality.
I'd be careful with that. My suspicion is that our memories would transform themselves into memories of the simulation, rather than the original event.
It's not possible to do uniquely, i.e. producing the unique scene that produced the image, but it's possible to produce some scene that would produce that image.
This is true, though machine learning models will learn typical scene structures, alongside the rendering parameters. It is also common to use more than one source image (from a different viewpoint, a bit like photogrammetry).
If it's possible to model those diffractive materials they use for those diffractive neural networks, it would be entertaining to do an inverse rendering on a digit recognition and see what if anything it can do.
It is possible to produce infinite scenes that would produce the image, including the scene which contains the render itself as a flat object viewed through a camera. Sampling from this infinite set of possibilities isn't useful unless you introduce a ton of fragile assumptions about what you could be looking at to begin with, in which case you kind of already knew what was in the scene.
> While these techniques came closer and closer to photorealism, another question arose: what if instead of going from a 3D scene to a 2D image (rendering), we went from a 2D image to a 3D scene? As you may imagine, reconstructing 3D scenes from 2D information is quite complex, but there have been many advances in the last few years. This area of study is called inverse graphics.
It relates to GPU ray tracing because the engine can compile either CPU or GPU specific code, specialized to the available hardware and tailored to the specific scene data; makes it so you can render faster than otherwise.
There are “variants” that use simple scalar floats, CUDA, and an intermediate mode that does CPU-based vectorization. The same folks have made some vectorization libraries: enoki (used in v2) and a newer one called Dr. Jit.
Raytracing tries to create images by simulating how light bounces around an environment. It’s physically-based in that it uses laws of physics/optics to do that (e.g., Snell’s Law when light passes through different media), rather than some method that just happens to look cool.
That writeup is PhD Thesis material in terms of scope and detail, but the student that wrote it, Tizian Zeltner, actually wrote his PhD on another topic entirely. This is just the level of stuff that routinely comes out of that lab.