Hacker News new | past | comments | ask | show | jobs | submit login
Real-Time Global Illumination by Precomputed Local Reconstruction (aalto.fi)
215 points by ingve on Sept 20, 2017 | hide | past | favorite | 87 comments



As far as I can tell this is mostly the same capability as the Enlighten middleware used by e.g. Unity and Frostbite (i.e. recent Battlefield titles) with some small improvements on top (e.g. the opaque occluder approximation at the end). A demo for reference: https://www.youtube.com/watch?v=ARorKHRTI80

Is there some improvement in the technique beyond that - e.g. some clever speedup across the board?


I feel I should point out that Frostbite is used by a lot more than battlefield. Including Anthem, Dragon Age Inquisition, the latest Fifa, Mirror's Edge Catalyst and the Need for Speed games by Ghost Games (which is where I work).


And don't forget about Star Wars: Battlefront and the upcoming Battlefront 2. Ridiculously good looking games.


Unity seems to have "shines through wall" errors? https://imgur.com/a/0dSYA


You know it's a good paper when you reach the end and ask "is that it?" and yet no one else has thought of it before.

Side note: local-global approximation seems to do well in graphics and visual tasks. For example in the field of alpha matting the state of the art for a while was KNN Matting which sampled locally and globally. Most methods since then have taken a similar approach.


The article mentions that they used a Nvidia Titan X Pascal, and that it took < 5 ms to compute. 5 ms is still a large part of the time budget for each frame, and most users have a slower GPU than that.


Titan X Pascal isn't too far off a 1080 Ti, which implies in about 2-3 years this will be possible in the new budget cards (the xx60 and xx70 lines).


I remember buying a new GPU everytime a new edition of Quake came out. Those were the days...


Has anything changed, other than the particular game?


Yes. Quake 2 was December 1997, Quake 3 was December 1999. In that span, the game went from "use a GPU and you'll get colored lights and a higher resolution" to "use a GPU or else you don't get to play the game at all".

GPU power seems to be increasing as fast as it always was, but the things we can do with that extra power are less impressive. So if you're playing games, it's not so big a deal any more if your graphics card is five years old.


Things have actually greatly slowed down.

GTX 1080TI is March 2017 11.3 TFLOPs going back five years you find a GTX 680 from March 2012 is 3.1 TFLOPS. So less than 4x gains in 5 years.

Around 1997 you where seeing nearly 2x gains per year.


Up to a point; obsolescence happens by games requiring graphics APIs that didn't exist 5 years ago.


There aren't many games that fall into that category. For example, look at Direct3D 12. You can count on your fingers the number of games that require D3D 12, and it's supported by graphics cards made in 2011 (although not all of them).

Just look at the new API features we've been getting, they aren't really the kind of things you'd rewrite your engine for, with the exception of Vulkan / D3D 12 / Metal / etc. But Vulkan / D3D 12 are really just API changes that don't reflect underlying hardware changes.

On top of that, a large percentage of games are made on top of engine middleware with good support for mediocre hardware (like Unreal and Unity, both of which run on phones if you want).


Not to mention that lots of neat effects can be produced by modern shaders, and do not need further API support.


Does this support dynamic moving lights? That's the 'wow' moment that's going to feel like a move to next generation graphics - when all/most lights are fully dynamic and contribute to GI.


Haven't read the paper yet, but it looks like yes. The caption at the top says "Real-time global illumination rendered using our method in 3.9ms with fully dynamic lights, cameras and diffuse surface materials."

It's a couple of years old now, but Crassin's voxel cone tracing paper is a neat one: http://research.nvidia.com/publication/interactive-indirect-...



The approximate dynamic occluders is the most impressive part of the demo.

Games etc. really need that stuff in real time.


Games need more. The occluders from the paper completely absorb light. A more realistic occluder would also reflect light.


> The occluders from the paper completely absorb light. A more realistic occluder would also reflect light.

My understanding, "approximate occluders" aren't visible-geometry but stand-in replacements (very coarse to represent much finer-grained visible-geometry) during the GI pass(es). Probably even geometric primitives (sphere, box, torus etc pp) to, indeed, simply approximate occlusion.

Physically-based reflections across all geometry already works brilliantly well for the last few years in real-time.


it's important to note that this is for "mostly static scenes" so I'm not sure how suitable this is for games.


Most games are "mostly static scenes", where the level and lighting stays almost the same. Of course there is a varying amount of dynamic, moving objects but they don't necessarily need to have perfect indirect illumination (that would be hard to spot anyway, since the objects are on the move).

The primary contributor is from the games industry and the results they show were generated in <4ms, so this is probably specifically targeted at games.

There were even some "approximate dynamic occluders", which seem to work alright as long as the occluder isn't too near the camera.


Yes, but are most games mostly static scenes because the tech is limited? Getting a decent looking scene which can be rendered quickly has historically relied on precomputed light maps, light volumes and space partitioning, which massively restrict the sort of game you can make if you want scenes to look nice. If more could be done in realtime, we would see games which were not mostly static scenes.

I compare the situation to the 90s games. Machines did hardware scrolling and were optimised for blitting sprite sets. And so, lo and behold, most games are scrolling shooters and beat em ups. I always felt gpus have put similar constraints on our imaginations, hence most games are mostly static scenes.


That is incredible.

Finally, realistic-looking GI in real time.


It took me a minute to appreciate it, didn't know what to look for. But wow!


What should I look for?


Light bouncing around, lighting up things that aren't in direct light.


This kind of lighting (global illumination) is typically "baked in", i.e. lights don't move around.


Anyone have links to papers describing how GI works in Unreal and Unity?


Enlighten is used in UE4 and Unity as I understand it. You can read about enlighten in Frostbite circa 2010 here:

http://advances.realtimerendering.com/s2010/Martin-Einarsson...


UE4 doesn't use Enlighten out of the box. It uses its own system called Lightmass, which generates very nice static lightmaps, but is not that great for dynamic GI (it has some basic support through what's called "indirect lighting caches" https://docs.unrealengine.com/latest/INT/Engine/Rendering/Li...). However, you can get Enlighten as a licensed plugin.


Does anybody know where the Gallery scene is from?


Isn't it frustrating? All of this effort, and it looks nothing like real camcorder footage.

That's not dismissive -- no one has ever made any program that outputs a string of images indistinguishable from a real camcorder. It's just that hard.

I think whatever the next leap forward looks like, it will come from a nontraditional approach. Something strange, like powering your real-time lighting model by an actual camcorder -- set it up, point it at a real-world scene, then write a program that analyzes the way the light and color behaves in the camcorder's ground truth input. Then you'd somehow extrapolate that behavior across the rest of your scene.

That last step sounds a lot like "Just add magic," but we have deep learning pipelines now. You could train it against your camcorder's input feed. Neural nets tend to work well when you have a reliable model, and we have the perfect one. So more precisely, you'd train your neural net against the camera's input video stream: at each generation, the program would try to paint your scene using whatever it thinks is the best guess for how the colors should look. Then you move your camcorder around, capturing how the colors actually looked, giving the pipeline enough data to correct itself. Rinse and repeat a few thousand times.

The key to realism, and the central problem, is that colors affect colors around them. The way colors blend and fade across a wall has to be exactly right. There's no room for deviation from real life. Our visual systems have been tuned for a billion years to notice that.

There are all kinds of issues with this idea: the real-world scene would need to be identical to the virtual scene, at least to start. The program would need to know the camera's orientation in order to figure out how to backproject the real-life illumination data onto the virtual scene. But at the end of it, you should wind up with a scene whose colors behave identically to real life.

It seems like a promising approach because it gets rid of the whole idea of diffuse/ambient/specular maps, which don't correspond to reality anyway My favorite example: What does it mean to multiply a light's RGB color by a diffuse texture's RGB value? Nothing! It's a completely meaningless operation which happens to approximate reality quite well. There are huge advantages with that approach, like the flexibility of letting an artist create textures. But if the goal is precise, exact realism as defined by your camcorder, then we might be able to mimic nature directly.

(Those dynamic occluders looked incredibly cool, by the way!)


We have spectral renderers that simulate light transport extremely accurately and in fact are indistinguishable from a photograph (see http://www.graphics.cornell.edu/online/box/compare.html)

The problem isn’t solving light transport itself (the rendering equation has been known for years, and is asymptotically exactly solved using unbiased numerical integration techniques); it’s modeling the interaction of light with complex physical materials and doing so quickly that poses the challenge.


They are nothing like indistinguishable and have the exact problems I would look for to see if it was CGI.

* The lower edge on the foreground box, particularly the lower left corner.

* The lower edges of the mirror box.

* To a lesser extent, the bottom corners of the surrounding box.

Once you spend time in rendering, you know where to look for flaws.

If your example was animated, it would be obviously rendered.


I'm pretty confident that if you asked laypeople to mark which was real and which was fake they couldn't reliably tell.


Note the important distinction: no one, anywhere, has ever created a real-time video simulation indistinguishable from the output of a real camcorder. Real-time video is significantly different than still-frames. It's why video compression uses completely different algorithms than jpeg compression, for example.

Our visual systems are tuned to notice problems with real-time outputs, whereas a still frame is ambiguous. There are all kinds of transformations you can do to a perfect still frame where it still seems perfect afterwards: You can tweak the contrast, brightness, hue, throw a softpass filter on it, etc, and a human observer still won't notice a big difference. In other words, there are many ways to "cheat"! Even though we're doing operations that don't correspond to how nature behaves, a human observer is still fooled by them.

Yet the moment that you string a bunch of still frames together into a video, that human observer will pounce on you immediately and call it out as fake. I'm not sure why motion makes such a massive difference, but it does.

To put it another way, if it were possible to make a real-time video whose ground truth was identical to a camcorder, we'd see it in hollywood. But no one has been able to; Avatar was the best we could do.

I think it should be possible to train a neural net to paint the scene, similar to a human painter. DaVinci didn't make diffuse/ambient/specular textures; he simply painted the scene according to how it would look in real life.

And in real life, color changes are gradual. It's crucial that the colors change in exactly the right way, but the changes themselves are rarely discontinuous. That means a neural net might be able to catch on pretty quickly.

Bots are already doing similar work:

https://www.reddit.com/r/aww/comments/505zzr/colour_run/d71g...

Before: https://i.imgur.com/EqMLNFo.jpg

After: https://i.imgur.com/omYmn7Q.jpg

If it's possible to clean a puppy, it might be possible to extrapolate how light should behave in the general case.


> That's not dismissive -- no one has ever made any program that outputs a string of images indistinguishable from a real camcorder. It's just that hard.

> Note the important distinction: no one, anywhere, has ever created a real-time video simulation indistinguishable from the output of a real camcorder

You redefined what you're talking about: clarifying that it has to be real-time. What if someone generated 2 images within 1/24th of a second? Would that would count as a 33.367ms long video?

To learn more about how the eye interprets the structure of light, I recommend skimming Minnaert's Light and Color in the Outdoors (won't cover motion).


Hollywood is not optimizing for perfect realism. They are operating under many constraints such as budget (yes, even on Avatar they had budgets), schedule, and the need for directorial control (which often conflicts with realism). It would absolutely be possible to produce prerendered videos of mostly static scenes using today's technology that would be indistinguishable from reality, although the expense would be high.


It would absolutely be possible to produce prerendered videos of mostly static scenes using today's technology that would be indistinguishable from reality, although the expense would be high.

And yet, no one has. :)

It'd be a fun trophy to claim.

My theory is that it's not merely a matter of computing power. We have massive horsepower now, but our algorithms are still wrong at a fundamental level. We still approach it by trying to simulate the underlying physics of light: the transport algorithms, radiance sampling, and so on. But between the RGB textures that have nothing to do with nature and the physics approximations that we're forced to settle with, something ends up lost in translation.

No matter what we try, that human observer ends up looking at a final product that looks nothing like a real camcorder video.

Back during ~iPhone 4 days, someone forged some pictures of the "next gen iPhone." It was a hoax, but a lot of people were fooled at the time. Remarkably, it was a computer rendering. The reason it looked real is because he took a picture of the rendering using a real camera, which muddied up the colors to the point that you couldn't tell it was a rendering.

The idea here would be similar: analyze the actual pixels coming out of a real camcorder and try to mimic the nuances as closely as possible. In the best case, it wouldn't be a physics simulation at all: the program might be able to guess how the scene should look based on partial data and past experience.


> And yet, no one has. :)

I would be surprised if most people noticed the amount of CGI that goes into regular TV

https://www.youtube.com/watch?v=rDjorAhcnbY

Plenty of it is composites but parts of those are rendered


> We still approach it by trying to simulate the underlying physics of light: the transport algorithms, radiance sampling, and so on. But between the RGB textures that have nothing to do with nature and the physics approximations that we're forced to settle with, something ends up lost in translation.

You keep talking about rendering being the problem and yet you acknowledge that photos can be indistinguishable from reality and motion being the hard problem. Do you not see the disconnect in your own argument? Light simulation isn't the issue, motion simulation is. Movement and model and texture complexity all have a long way to go.

The people practicing graphics in research and in production know what the problems are, they are aware of the approximations they are making, they know what they need to make graphics more realistic, and they are making conscious choices of where to spend their limited budgets. The big problems just aren't due to light transport or radiance sampling, your pet theory seems ignorant of what's going on in CG practice today. If you would just talk to (and listen to) some production people and researchers...

You're still ignoring the fact that realism in CG in increasing. There's a clear trend, and we are closing the gap on the details that make CG look unrealistic. The standards are higher now than last year, which was higher than the year before. Realism has always increased every year, and yet at no time was the increase due to a fundamental change in multiplying colors together.


Light simulation isn't the issue, motion simulation is. Movement and model and texture complexity all have a long way to go.

FWIW I completely agree, and this is a key observation. DaVinci devoted several chapters to the problem of motion in art, and it will always be with us.

That said,

You're still ignoring the fact that realism in CG in increasing. There's a clear trend, and we are closing the gap on the details that make CG look unrealistic. The standards are higher now than last year, which was higher than the year before. Realism has always increased every year, and yet at no time was the increase due to a fundamental change in multiplying colors together.

This is the same argument that's always trotted out. Graphics are improving, but if you diff 2018 to 2013, it's nothing like 2013 to 2008. The fundamental leaps we've been accustomed to seeing are simply not happening anymore. The rate of progress is very clearly slowing down.

It depends which axis you measure, of course. We're able to render more things each year, which is nice. But the visual quality from a fundamental perspective is more or less the same as it was a few years ago.

The quality issues stem largely from light transport -- the colors are all wrong! If you compare them to a photo, you'll see that we don't end up remotely close. You can see this vividly in the YT link above (the Unreal engine walkthrough). If you try to picture yourself in the video, you'll get a strange feeling of being in a candy world, or a shrinkwrapped house.

That's certainly a promising axis to explore, and there are hundreds of papers published each year solely about light simulation.


>> Light simulation isn't the issue, motion simulation is. Movement and model and texture complexity all have a long way to go.

> FWIW I completely agree, and this is a key observation. DaVinci devoted several chapters to the problem of motion in art, and it will always be with us.

Now we are getting somewhere! Let's talk about those things instead of rendering! Which parts of motion are killing realism? Fluids and rigid body dynamics are pretty good these days. Facial animation is still in the uncanny valley. Why?

> Graphics are improving, but if you diff 2018 to 2013, it's nothing like 2013 to 2008. The fundamental leaps we've been accustomed to seeing are simply not happening anymore. The rate of progress is very clearly slowing down.

The rate of progress of rendering is slowing down, that goes right to my point that rendering isn't the main problem anymore, it's approaching good enough for realism. Movement and model and texture complexity progress is increasing. Papers on fluid simulation and facial animation and foliage and texture synthesis and multi-dimensional textures are on the rise.

> The quality issues stem largely from light transport -- the colors are all wrong!

No. The way we handle colors and light transport is fine, those pieces of human knowledge are ready for the jump to realism. You've already acknowledged that by refusing to consider still photos, because they already look realistic. One bad example from a game engine doesn't prove anything. I can show you lots of bad examples from game engines.

You would have a stronger argument if you gathered the very best examples on earth and we talked about those. By pointing at known bad examples and claiming that they say something about the state of the art, it makes me feel like you either don't know what the state of the art is, or you're taking cheap shots.


> And yet, no one has. :)

Plenty of people have. How would you spot it in a movie or music video?


Movies and music video mix in footage from real life. In fact, they're mostly footage of real life.

It's interesting that it's so easy to fool us when they use that technique, but 100% simulated video is the tempting target.

To put it differently, if the goal is to render a laptop, movies and TV shows cheat by filming an actual laptop and overlaying the footage. Whereas our goal is to synthesize a laptop.


You don't put a limit on the length of the video: any movie or music video with 2 or more sequential frames of completely synthetic footage disproves your claim.

Do you think composing real footage and generated together doesn't count? What about two generated sources?

> they're mostly footage of real life.

Historically, yes. Modern-day, most blockbusters heavily rely on generated video.

You have made your point abundantly clear; I am trying to show you that your desired state has already happened and you probably didn't notice, and to get you to clarify what it is you're looking for, and why.


Don't forget that sometimes too much realism is a problem. The "soap opera" effect with high frame rate source material or displays that interpolate frames is commonly regarded as looking awful. However - it is more realistic but less appealing to our current aesthetic sensibilities.

Photorealism is a great tool to have in your box but it's not the end goal in any creative medium.


The problems with full motion video are mostly not light transport problems, they're animation problems. Capturing realistic human movement is tricky enough, let alone generating it, and people are exquisitely sensitive to the creepy effect of unnatural movement. Also, even James Cameron's budget might have been insufficient to model out the material properties of physically credible alternate world, making everything weirdly glossy and fluorescent as an artistic oversight rather than a limitation of the rendering technology.

Creating realistic video panning over still-life scenes, like the view out of a helicopter, is a solved problem at sufficient modelling budgets, you don't need AI just fully described scene and lots of flops to run your light transport algorithm. Colorizing NNs are cute but not relevant to this problem.

There's a good analogue to sound: a long time ago we mastered algorithms to precisely simulate any musical instrument, acoustic environment, pre-digital audio tranformation, etc. but we still use voice actors because modelling human speech behavior is a much harder problem that modelling audio physics.


> Real-time video is significantly different than still-frames. It's why video compression uses completely different algorithms than jpeg compression, for example.

You are right that video is different than still images, namely there is a lot of temporal redundancy that can be used to compress things to an acceptable level. But h.264 and whatnot are, at their root, a series of jpg frames (the "I" intra frames), and various temporal compression techniques are used to interpolate between the I frames (P = predicted, B = bidirectional reconstruction).

Motion jpg is a thing, and it is often preferred in video editing over h.264 or whatever because each frame is self contained. It is essentially nothing but a stream of I frames.


> Real-time video is significantly different than still-frames. It's why video compression uses completely different algorithms than jpeg compression, for example.

That's not quite true, mainstream video codecs are all based around discrete cosine transforms (or integer approximations thereof) + scalar quantization, followed by entropy coding.

If you look at early codecs like MPEG1 or MPEG2, individual frames are compressed in a manner almost identical to JPEG. The main difference is that video allows you to do motion compression, but that still leaves you with a residual (or difference) frame, which is compressed much like a jpeg.

Modern codecs like h264 and h265 are far more complex, but the basic algorithms at the heart of them are not fundamentally different from JPEG. In fact you can use h264 as a still image codec, at about 2x the compression efficiency!

> If it's possible to clean a puppy, it might be possible to extrapolate how light should behave in the general case.

That bot did not "clean" a puppy, it just converted it to black and white. Since the dye only changed the fur color (chroma) without darkening/lightening it, it was not visible in the newly added synthetic colors. If you look at the image in black and white, the dye is obviously not present: https://i.imgur.com/xIkCP2m.jpg


There's should be no disagreement that given a graphics Turing test that there are going to be scenes that when attempted to be replicated with REALTIME 3D will not even come close to looking like real life let alone something recorded and viewed on screen. People need to stop defending real-time graphics. They are cool but they don't look real. But I also don't think that a neural net based magic wand is a viable solution.


There are actually also quite a few transformations you can do to a video that would be noticeable in a still frame.

In particular, I remember some video stabilization a few years ago that just warped the input images instead of building a proper 3d model. It worked in real time and looked great in the video---but in a still frame you could see that the angles were all slightly off.


> and doing so quickly

Most important part. The best solution which arrives too late is completely useless.


> My favorite example: What does it mean to multiply a light's RGB color by a diffuse texture's RGB value? Nothing! It's a completely meaningless operation which happens to approximate reality quite well.

You're still stuck on this idea? Your example is wrong, and you really should let it go. I explained this before: https://news.ycombinator.com/item?id=14655828

Multiplying colors is modeling absorption, and it's a provably physically correct and accurate way to model absorption. You are simply dead wrong on this point. You can get more correct graphics by using more channels, and by modeling scattering. But even when you model scattering, you still multiply colors at every step to model absorption.

> All of this effort, and it looks nothing like real camcorder footage.

What do you think about this video? http://vimeo.com/15630517

Would you know it's CG if they didn't use intentionally fake physics?


Ligths are one thing. Realistic camcorder effect is not that appealing. It looks "real" because the cameraman's hand is shaking, the sound is crap the wind is possibly blowing and the exposure is constantly adjusting because again the average user don't know what they are doing. We could simulate that, but why?

I've seen this last night and I was really impressed:

https://www.youtube.com/watch?v=bXouFfqSfxg

Add all this I said before an I'm sure many people could be fooled that it's real.

Also don't forget that immersive VR is coming. Your expectation of the "camcorder effect" will be achieved by the user's head movement if rendering is responsive enough. And it will be. 3.2ms as shown in the OP's video is really fast and leaves lots of room for additional computation while still being under 60FPS.


Just found this, not even recent but looks pretty good to me.

https://www.youtube.com/watch?v=E3LtFrMAvQ4

Hook up the camera movement to a phone's gyroscope in real time, add a few post processing filter, 3D ambient audio and it'll be pretty close to what you describe.


https://youtu.be/E3LtFrMAvQ4?t=169 looks good, but it looks nothing like real-life footage. If you were to walk through a real house that looked like that and film with your phone, you'd wind up with a shockingly different video.

That's the problem talking about stuff like this. These renderings are all amazing -- they're the best of the best. But we're talking about realism indistinguishable from a camcorder, which is simply impossible given our current technology. (After all, we haven't figured out any techniques to do it.) So it's not enough that it looks pretty good: if you set up side-by-side footage with a real-life scene, viewers would immediately identify the real footage.


> But we're talking about realism indistinguishable from a camcorder, which is simply impossible given our current technology. (After all, we haven't figured out any techniques to do it.)

Every single comment moves the goal posts. It's not clear what your standard is. Would you mind specifying your entire criteria for realistic graphics in one place?

Why is it impossible? Not having an example doesn't mean it's impossible, it just means it hasn't been done yet. Can you back your claim that it's impossible with a proof? Why do you think zero techniques we have now will apply or be involved once a video that satisfies you is produced? Considering that can rendering realistic looking still photos, and realistic video in constrained situations, I find it puzzling to assume that none of it will carry forward.


Yes, it's a simple test:

Show a shuffled playlist of 20 videos, half of which are simulations. The simulations must be entirely synthesized; no mixing in footage of real life.

As long as the videos are non-trivial (~60sec and reasonably complex scenes), the viewers will correctly select all 10 of the simulated videos. If we were capable of generating simulated video identical to a camcorder, people would score no better than random guessing.

So not only have we failed to reach that goal, we are nowhere close -- it's effortless to identify all the fake and real videos. Especially nature videos, for example. When put side-by-side with real-life footage, our best simulation attempts are nowhere near the same ballpark. They're not even the same area code.

The idea that we really haven't accomplished this may seem offensive. My frustration is what led me to work on it. But unfortunately, we're nowhere close. I'm not sure traditional techniques will be enough to close the realism gap.

I've outlined a scientific test that anyone can conduct. The results may be discouraging, but they establish our capabilities. The goalpost is rigid.

To answer your other questions: Obviously everything is impossible until it's not. But I never meant to claim this was impossible, just impossible with our current techniques. I think physically-based rendering circa 2010-2020 will turn out to be a failure from the standpoint of replicating camcorder-grade video, but that's speculation. When you try to simulate physics in such great detail, you end up with hundreds of approximations. When these integrate together, you get modern Unreal Engine videos: they don't look real. They look pretty great! But not real.

Who cares about looking real? Well, like AGI, it's a goal worth striving for. Ever since we've been scribbling on cave walls we've tried to capture realism. Unlike AGI, there isn't much use for perfect realism; it might not affect the world at all. But I'm pointing at the summit; wouldn't it be a fun climb?


> As long as the videos are non-trivial (~60sec and reasonably complex scenes)

I think this is way too vague. You have already talked about camcorders and phones: handheld cameras. You are clearly assuming moving cameras and moving objects. Be more specific, what needs to be in these videos before you'd accept them?

> the viewers will correctly select all 10 of the simulated videos

Maybe, you're predicting the outcome of an experiment nobody has ever done. And I would predict that eventually some, and then later all videos could be mis-predicted.

If it's not possible, then you should be able to demonstrate it's not possible without doing the experiment.

> I've outlined a scientific test that anyone can conduct.

You've outlined a test that can prove that we got there at some point before the test was conducted. It cannot show that we're not there yet, and failure to get a positive result proves nothing. The test you outlined can fail and we can still have all the technology we need to create realism, both can be true at the same time.


Mm, this isn't meant to be a personal remark: careful not to let yourself get too prejudiced against the idea. This test is the one metric we have for precisely judging whether we're capable of making fully-simulated footage that can be classified as real. If you reject it, we fall back to subjective "Well, it looks pretty good to me" type evaluations.

If we can't fool viewers, we haven't achieved realism. And the only way to check is to put some nature videos side by side with our simulations and ask "Which one is real?"

https://youtu.be/Bey4XXJAqS8?t=1089 will kick the crap out of any simulated video. No contest.

I agree that it's not a happy thought, but it is objective truth. On the other hand, the moment we reach that goal, the world will notice. Or at least Hollywood.


> If we can't fool viewers, we haven't achieved realism. And the only way to check is to put some nature videos side by side with our simulations and ask "Which one is real?"

I think you are making a classic fallacy of 'argument from ignorance'. https://en.wikipedia.org/wiki/Argument_from_ignorance

Your proposed experiment cannot tell us whether current techniques fall short.

The experiment can only demonstrate when we are there, after the fact. It cannot demonstrate that we won't get there on our current heading, and it can't even demonstrate that we're not already there.


it can't even demonstrate that we're not already there.

This is simply false. That's the whole point of the test.


I realize it's the intended point of your test, that doesn't mean it will work. Your test cannot do what you claim just because you want it to.

Read the article about argument from ignorance. For your test to work, you'd have to guarantee it had the best CG that's possible. You can't guarantee that. You can never prove we're not already there because no matter what CG you make for the test, someone else can always try harder, take more time, use better artists and bigger computers, spend more money. Abscence of evidence is not evidence of abscence.

Your logic is faulty, your argument is suffering from a fallacy. You can speculate that your idea of realism isn't possible yet, but no test can prove it.


> This test is the one metric we have for precisely judging whether we're capable of making fully-simulated footage that can be classified as real.

You are completely misunderstanding this topic area. We already know how to compute ground truth references based on the real physics. I suggest starting with the book Physically Based Rendering.

The question now is, how close can we get a real time approximation to match unbiased path tracing? The OP's link is a pretty definitive step along this direction. But even if that whole research agenda fails, so long as we keep getting more computation at a cheaper price, even unbiased renderers will become real time.


No worries, I don't take that as a personal remark at all. I agree with all of that. I agree (and I agreed when we discussed this before) that we have not achieved unconstrained realism yet.

I'm objecting to your claims that it's not possible. We can't check whether it's not possible. We can only verify that it has been done, after it's been done.

Nobody has done the experiment you've laid out. Not having examples that contradict you is not evidence that it can't be done. The criteria and claims you made are yours to defend.

I also counter that light transport, and specifically color multiplication, are not the problems getting in the way of realism.

It seems like you're intentionally discounting and ignoring progress in constrained realism. We have achieved some CG video that is indistinguishable from realty. Not hours of alligators in Thailand, or of people talking, but short videos of fruit bowls and architecture for sure. The dominos are falling, one by one.


I have to say that I might be fooled into believing something like the rock island video was actual HD camera footage. At least when I'm watching it on my phone. The only thing that have it away was the rapid shadow movement, which of course was on purpose.


Minor details would make a huge difference. Why do you think we use motion capture of humans to animate 3D models?

Watch Cloverfield, there's a ton of CGI involved dumbed down to camcorder quality.


> no one has ever made any program that outputs a string of images indistinguishable from a real camcorder

We know how to simulate light essentially perfectly. Path tracers are capable of producing results indistinguishable from camera images in many cases, including the colored wall case you mention, and for video as well. The main problem is it can't be done in real time. Also, doing the modeling required to get perfect images is extremely labor intensive. But simulation of light and its interaction with common materials is mostly solved.


If that were true, it should be easy to point to some simulated video and say "Look! This one looks identical to nature. It's so perfect it looks like you were really there, filming it with your own camera."

But no one has ever made a video like that, so we can only show videos of trivial scenes (like a room composed solely of cubes with only one or two lights). There'd be no contest with any real-world comparison -- say, a messy desk with a laptop on it, with the sun shining through a window. The moment you give the viewer enough visual cues, they'll pick out the simulated video nearly 100% of the time, let alone getting anywhere close to fooling them half the time.

More precisely, if you show a playlist containing a mix of simulated videos and real-life footage, viewers shouldn't be able to select the simulated footage any better than random chance. That's the proof no one has ever done this.


Try to distinguish what is real and what is not in The Hobbit.

Afaik, except the dialog scenes, practically everything is CGI.


Did you not see The Jungle Book (2016) ?



" All of this effort, and it looks nothing like real camcorder footage."

Well, who decided that was the benchmark and explicitly stated goal of real time synthetic image generation? I've worked in the video games for several years, and I've never heard an Art Director/Artists tell us : "Make it look like it's shot by a camcorder" ... Of course some games do it, and do it fairly well.

The whole idea is to give the player an immersive visual experience, not to make you sea sick by watching wobbly footage from your uncle Ron's camcorder. You can achieve very realistic and immersive visuals without anything looking like a camcorder of some other type of video footage.


To clarify, I wasn't referring to animation. A camcorder is the only instrument that can produce a series of still frames that look identical to what a human observer saw in the same spot.

All simulated video falls short of that goal. But a camcorder can! That's why it seems like if we throw out our physics simulations and try to mimic a camcorder's output, we could do the same thing in software. Our renderer would look real by definition -- it'd be considered a bug for the renderer's output to deviate too far from whatever the camcorder would've recorded in the same orientation in real life.

I'm not saying it's a worthwhile goal, just that no one has achieved it yet. This seems like a possible strategy.

If anyone knows any prior research along these lines, I'd appreciate the references. It's pretty common to use a camcorder to tune an art pipeline, but the proposal here is to use the camcorder to force every single aspect of all rendering stages to produce a final output that looks identical to whatever the camcorder would capture in real life.

It's the total absence of artistic freedom: you wouldn't even be able to move the lights around in the simulation during the training stage, since the training corpus is a real-world scene with corresponding lights. But afterwards it (hopefully) would be able to extrapolate what other scenes should look like, just like an experienced painter can.


> A camcorder is the only instrument that can produce a series of still frames that look identical to what a human observer saw in the same spot.

This is not exactly true and shows a misunderstanding of cinema technique.

(Not least is that human vision is foveal and includes a huge amount of interpolation, while cameras take images with a fixed depth-of-field)

> throw out our physics simulations

If the thing to be produced is to be useful and general, it will be an optical physics simulation.


> A camcorder is the only instrument that can produce a series of still frames that look identical to what a human observer saw in the same spot.

Like a camera? A non-colourblind human with perfect 20/20 vision and no incidental visual impairment?

What is with the repeated camcorder mentions? A camcorder will make people think of something like the Sony Video8 Camcorder from the 1990s, or other early consumer-grade videocameras.


The video is the important part. If you say camera, people will think still frames. That's poisonous to the goal of realism, since video is the area no one has yet achieved.

Also, a camcorder from the 1990's still generates images that are far beyond our current capabilities of realism.


> people will think still frames

Video is still frames… you can't tell the difference between a 2-frame 24fps video and an animation of two photos taken 33ms apart. This holds as you add more images.

The things that record video for films and TV are called 'video cameras'. In computer graphics, the software concept representing the point of view for the rendered scene is called a camera. Calling these things cameras is standard and well-understood in this domain.

The obsession with the particular recording device seems like pointless minutia. If you called it 'a video capturing device', your message would not be weakened. Specifying exactly the model of 1960s triple-lens Bolex which the video has to emulate distracts from your message.

> That's poisonous to the goal of realism, since video is the area no one has yet achieved.

I don't know what to say: this has already been achieved, and you missed it. It doesn't seem important to expound on whether this has been achieved or not, since it's trivial to find an example which fulfills whatever your latest requirement is. cue "but realistic Hi

I still don't know what the message is you're trying to convey: real-time generation of convincingly near-photorealistic video will… something? Don't worry about correcting my definitions; that makes your message "I'm a specialised dictionary in human form*".


> Our visual systems have been tuned for a billion years to notice that.

I think it's actually quite nuanced. Some things we are highly attuned to noticing and need to be exactly correct. Other things are surprisingly easy to fool the eye over.

And of course many things lie somewhere in between these two extremes.

CGI is an ongoing process of discovery about the limits of and the requirements for realism. And the artistry comes from knowing how to work around current limitations.


One way to make CG images more real is to pass them through a neural network. Some GANs can do that. Apple had a paper about synthesizing eyes using such a process.

Here: "Improving the Realism of Synthetic Images" https://machinelearning.apple.com/2017/07/07/GAN.html

Also: "Visual Attribute Transfer through Deep Image Analogy" https://arxiv.org/abs/1705.01088 (this paper even shows a refined CG Mona Lisa)


Thank you. I'd almost given up on this comment chain, but yours made wading through them worthwhile.

Turns out you can't point out that no one has achieved fully-simulated videorealism without being labeled ignorant. You'd think HN would be the one place that people would be intrigued.


> Turns out you can't point out that no one has achieved fully-simulated videorealism without being labeled ignorant.

I guess that's a minor dig at me since I'm the only other person who used that word.

I agreed multiple times that no-one's made your idea of CG realism yet, so you're distorting the truth.

I'm sorry if you felt it was personal, I'm not using ignorance as an insult. I'm using it to help explain why your proposed "scientific test" can't work.

That said, you did demonstrate ignorance of basic math & rendering issues by claiming that color multiplication is meaningless. You demonstrated ignorance of the state CG production, and of basic logic, by claiming that the lack of realism that meets your criteria somehow proves it's not possible and that current techniques won't get us there. You might turn out to be right, but claiming it's a scientific absolute like you did is wrong. It's either maliciously wrong, or wrong due to ignorance, and I choose to assume the best about you.

> You'd think HN would be the one place that people would be intrigued

I'd love to have a productive discussion on how to achieve video realism. I have yet to see any attempt from you to start one. I tried to respond positively regarding motion simulation, and you didn't reply. I've posted a realistic video, twice, to which you haven't commented.


One of the problems with your idea is that the camera position is one of the parameters of the rendering equation, so the same surface looks different from different points in 3D space. To fully capture how light interacts with the space, you can't make one photo, you would need to place the camera in infinite number of positions and store the results in infinite amount of memory.


It's similar to what Tesla is doing with autopilot. Theoretically they'd need to train their model on an infinite number of positions, but after a certain amount of training and tweaking you start getting good results.


The article isn't about creating a perfect rendering. It is about creating a reasonably convincing rendering in a limited amount of time. Here "reasonably convicing" should be interpreted relative to current computer game graphics, which is also rendered in a limited amount of time.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: