Hacker News new | past | comments | ask | show | jobs | submit login

There's a lot of issues with it, but perhaps the biggest is that there aren't just troves of easily scrapable and digestible 3D models lying around on the internet to train on top of like we have with text, images, and video.

Almost all of the generative 3D models you see are actually generative image models that essentially (very crude simplification) perform something like photogrammetry to generate a 3D model - 'does this 3D object, rendered from 25 different views, match the text prompt as evaluated by this model trained on text-image pairs'?

This is a shitty way to generate 3D models, and it's why they almost all look kind of malformed.




If reinforcement learning were farther along, you could have it learn to reproduce scenes as 3D models. Each episode's task is to mimic an image, each step is a command mutating the scene (adding a polygon, or rotating the camera, etc.), and the reward signal is image similarity. You can even start by training it with synthetic data: generate small random scenes and make them increasingly sophisticated, then later switch over to trying to mimic images.

You wouldn't need any models to learn from. But my intuition is that RL is still quite weak, and that the model would flounder after learning to mimic background color and placing a few spheres.





Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: