Hacker News new | past | comments | ask | show | jobs | submit login
Extracting Triangular 3D Models, Materials, and Lighting from Images (nvlabs.github.io)
91 points by tomatowurst on May 21, 2022 | hide | past | favorite | 24 comments



The rate of progress in this field -- reverse rendering -- has just been phenomenal. It seems like groundbreaking research is published every couple of months.

Something I have mentioned in other similar forums is that up until recently, computers were catching up to mathematics. The algorithms we use are often centuries old, but didn't have widespread practical applicability until computers could realise them.

The movie Prometheus (2012) made me realise that we had crossed a threshold. In one particular scene[1], some scientists throw up some drones up in the air. The drones zip down curved corridors, mapping them in real time using spinning lasers (LIDAR). At the time, that was science fiction. It was "future tech" that appeared to rely on not-yet-invented computer power.

But the computer power was available! GPUs of the era were already able to put out multiple teraflops, and should have been able to process that amount of data.

What was missing was the mathematics. We had the substrate, but not the code to run on it.

I doubt anyone had noticed that the tables had turned. That things had flipped over to the other side.

Since then, high-end workstations are rapidly approaching a petaflop of computer power. Suddenly the maths is catching up. This paper is a wonderful example of that happening!

Or to put it another way, I like to challenge computer scientists with this thought experiment:

"Imagine being given a workstation with 1024 CPU cores, a terabyte of memory, and a GPU to match. Could you fully utilise this machine to solve everyday problems in your speciality? If not, why not? What are you missing?"

-- more often then not, the answer is now "the code".

[1] https://www.youtube.com/watch?v=yO-eduvo904


> The drones zip down curved corridors, mapping them in real time using spinning lasers (LIDAR). At the time, that was science fiction.

I think it was the drone part that was science fiction there, though - the real-time mapping using spinning lasers seems to have been well underway by 2012 (e.g. cf https://ieeexplore.ieee.org/abstract/document/6385501 from 2012 or https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3478806/ which mentions ROAMER from 2007.)


I kind of agree with you. I'd summarize it "lack of knowledge in mathematics" instead of "the code".

But I disagree about the rate of progress in the field. This work is very impressive, I do agree. But I remain skeptical because they only used synthetic data for the robot demo. And that golden necklace can already be reconstructed in similar quality with other tools that use explicit algorithms. For example, the block matching and voxel volume tracing inside Agisoft Photoscan works wonders for reflective mostly smooth objects with high frequency surface detail. And the idea of using known geometry + pictures to estimate a lighting envmap and PBR material maps was already published 10+ years ago. The only difference between then and now is the growth in compute power. We already showed that this works on toy examples years ago, and now we have the compute power to use it in practice.

But despite all that research, give any of these algorithms (including the instant-NGP NeRF, that this appears to be based on) a few pictures of a tree, and it'll fail. What all of these approaches still lack is a way to use pre-acquired knowledge of the real world. And that's crucial to resolve ambiguities, like differentiating between lots of similar-looking similar-colored leaves.


> "Imagine being given a workstation with 1024 CPU cores, a terabyte of memory, and a GPU to match. Could you fully utilise this machine to solve everyday problems in your speciality? If not, why not? What are you missing?"

1. Shared memory space between the GPU and CPU

2. Tons of fast disk

Basically a memory hierarchy that descends from GPU RAM -> CPU cache/RAM -> fast hot storage (NVME) as easily as CPU cache/RAM or memory mapping works today.

Often that 1TB RAM is “wasted” with all of the copying required to feed things to the GPU in right-sized chunks.


I just had to check the dates. I remember the exact moment I realized everyone had a secret super computer in their gaming rig that no one was really putting to work. It was a year before your prometheus example.

I saw this video, while I was in college playing around with XNA, python and pygame. https://youtu.be/ACHJ2rGyP10?t=30

It's pretty mundane now, but at the time I was gobsmacked. Watching that performance and comparing it to the measely entity counts I could run realtime in C# or my python pandemic simulator I was writing made me FANATICLY BELIEVE that for many problems I had been taught to use the wrong tool. Shader programs had been misadvertized and their true power obfuscated under a veil of painting pretty pictures etc etc.

I became something of a GPU evangelist in that moment, and for years when someone would say such and such problem was computationally intractable on current hardware, I'd go into a whole schpeel, pull out some previous "impossible" things it allowed and usually they would run away, even though I would call for them to come back.

Despite that, HLSL/GLSL became my behind the counter deep stock black magic for solving all sorts of problems, plenty of which that had no pretty pictures show up on screen. If I could encode your problem as a series of textures smaller than 4096x4096 (this is most problems), I could make your function with minutes of execution time finish 60 times a second.

It allowed for some clutch performance improvements in a stuffy .net application that looked to my coworkers like space magic. Because it was space magic. To this day I have unused fingers when I count all the times I've multithread on the cpu. What a half measure! What a loathsome simulacrum of the massive power of the GPU! Why debase yourself? Oh it's running on a pizzabox somewhere that doesn't have a GPU, I suppose that's a reason.

These days GPU compute is a more standard tool for people, with more direct ways to use it. I'm not mad that it's no longer a secret weapon. It's cool to see what things people are doing with it, like this! This is incredible! I selfishly wish nvidia permissively licensed their future tech like this so we could get to the next phase faster. Oh well.

I agree with you strongly and believe that we haven't even scratched the surface.

Thanks for making me remember all that.

I still try to figure out weird/hard things made easy/possible by GPUs. Within the year I've coerced a gpu to do millions of string operations inside a fragment shader. That was weird and I think it made the GTX1080 uncomfortable, but it did it a few hundred times a second.

Admittedly it's a lot more pretty picture drawing these days. I still try to reach a little and put the massive power we have to work whenever I can.

https://www.reddit.com/r/Unity3D/comments/updci7/i_made_this...


Please show me the light, any more examples of how to use the GPU for more generic problems? I'd love to play around with it.

THanks!


This is a very platform specific question.

https://web.dev/gpu-compute/ This gets into it for web development. I've never used this though.

https://web-dev.imgix.net/image/vvhSqZboQoZZN9wBvoXq72wzGAf1... Just _behold_ that graph. Imagine where it goes even further to the right. Magnificent.

I LOVE gpu.js. https://gpu.rocks/

You have python in your name, I found this! https://developer.nvidia.com/how-to-cuda-python

Another good source for modern tutorials would be Unity compute shader tutorials. With hand wringing, anything you do in one of those .computes can be done from any other launch point.


Out of curiosity, what kind of problems did you solve on the GPU?


They've all kind of blurred together. Anything that was effectively just map/filter/reduce operations but were too slow. I've done it on a C#.net ERP frontend, a local search function in a wargaming app, a real time web application written in node.js (gpu.js is awesome)

I used this technique to write a performant electron application that served as a driver for a PoE+ lidar. That's just what comes to mind right now.


Anyone had any luck installing it? I'm getting a ton of errors at the penultimate step. This is on Windows. I've got other CUDA/PyTorch stuff working OK but it's always hit or miss.

Also - despite using a conda environment it also goes ahead and pip installs a bunch of stuff globally into site-packages. I've seen this behaviour a few times with ML stuff. Could anyone explain the reason for this? or is it just sloppiness?


It has less to do with sloppiness and more to do with Windows being a nightmare to target for this sort of thing. I dual-boot now so I can play games and run ML compatibly. Conda is not docker unfortunately and doesn't do as much as you may think to guarantee Windows support for CUDA. Further, nvidia-docker seems to only work on Linux.

Not to suggest your frustrations aren't valid, but you have some options at least.


good point and its a big reason why I haven't dived into ML because of the platform I am on is very unfriendly.

Can't do much when you get stuck on step 23 and realize Windows 10 require you to google and look for workarounds.

I'm thinking of doing a completely linux only build with powerful GPU, but this is also where things get tricky because you don't know what you really need, and its a hefty investment when you are not building a machine for fun but for experimentation with AI.


But I presume installing this on Linux also installs into global site-packages? I don't see anything in the setup that would be different in that regard.


Only if you choose to, same as on Windows, and it's _highly_ discouraged on both. I haven't read the README but culturally, even if someone uses just `python3` in documentation, they still probably expect you to understand how to do so inside of a virtual environment of some sort.

You can still use conda envs/venv/poetry on Linux, after all.

edit: re-reading your question, it sounds like you're in PATH hell. You should examine the contents of your PATH environment variable and make sure you don't have conflicting installations of python.


I'm not sure but I think you've misunderstood my point.

The readme gives specific steps to install including the creation of a conda environment.

It also uses pip directly which results in stuff being installed into site-packages.

Now - I presume conda is capable of handling this but for some reason it's not done that way. I'm familiar with virtualenv but not with conda so I'm not sure how I'd go about doing this correctly and I'm also not sure if the author had a good reason not do it this way in the readme.

So - I'm simply asking "Why is some stuff isolated and other parts not?"

My hunch is that the author doesn't care about virtualenvs/isolation and is just using conda as package installer. When it came to pip they just ignored this aspect.


In general, activating a conda environment _should_ override your PATH to include the environment's local, contained copy of both python and pip. As such, using pip install x in a conda environment will install those dependencies using your conda environment's python/pip, not your global python/pip.

On bash, you would test this like:

```sh

> conda activate env-name

> which python

$ # should be your local conda env path

> which pip

$ # should also be your local conda env, not global pip

```

If it _is_ using your global pip, that means somehow your PATH isn't being set properly. This is common with conda on windows, although I'm not certain why exactly.

The reason they are using pip inside of conda instead of conda may be because CUDA needs dependencies which are not found in conda's repositories, or simply due to personal preference.


Ah. Now I understand. That's very helpful and I now know where to start looking.


No problem! Good luck. If you can find where your conda-env's python _should_ be, you can invoke it manually to mitigate the issue.

For instance:

/home/user/.miniconda/envs/my-env/bin/python3.8 -m pip install <x>


Have you tried WSL? The NVidia developer CUDA repo has a specific folder for "wsl-ubuntu" where you only install the toolkit and it reuses the Windows graphics drivers IIRC.


Yes. I did do for a while but it used to involve very specific CUDA versions and other dependencies which I found restricted what I could do outside of WSL. I was waiting for it to settle down.

I believe it has in Windows 11 but I'm holding off on that because it's still unstable for some VR usage.


One thing I'd love to see is someone pushing the original doom and doom II monster images trough this, as a base for the newer engines. Somehow every later retexture felt wrong.

This might of course not be possible. AFAIK Id started with photos of a real model, but heavily modified the resulting photos, so no guarantee that a consistent model is even possible.


very interesting...it could be possible a handful of papers later. You would simply show it a series of sprite animations (typicall drawn from all angles), this should be able to generate texture, mesh, lighting.


Can this method be applied to photographs directly, or does it need all the camera positions/orientations/intrinsics explicitly?


You can apply structure-from-motion to recover those, for example this fairly robust one: https://github.com/AIBluefisher/DAGSfM




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: