It's not possible to do uniquely, i.e. producing the unique scene that produced the image, but it's possible to produce some scene that would produce that image.
This is true, though machine learning models will learn typical scene structures, alongside the rendering parameters. It is also common to use more than one source image (from a different viewpoint, a bit like photogrammetry).
If it's possible to model those diffractive materials they use for those diffractive neural networks, it would be entertaining to do an inverse rendering on a digit recognition and see what if anything it can do.
It is possible to produce infinite scenes that would produce the image, including the scene which contains the render itself as a flat object viewed through a camera. Sampling from this infinite set of possibilities isn't useful unless you introduce a ton of fragile assumptions about what you could be looking at to begin with, in which case you kind of already knew what was in the scene.
> While these techniques came closer and closer to photorealism, another question arose: what if instead of going from a 3D scene to a 2D image (rendering), we went from a 2D image to a 3D scene? As you may imagine, reconstructing 3D scenes from 2D information is quite complex, but there have been many advances in the last few years. This area of study is called inverse graphics.
It's a research renderer and its key functionality is reversible rendering—going from an image to the scene parameters that would create that image.
The purpose of this kind of renderer is to develop research ideas that will eventually make it into a production renderer someday…