Uses coordinate-based neural networks to model the scene volumetrically. However, in the case of this paper does not use an MLP to represent the scene. Instead, proposes to directly learn a voxel grid representation.
Yes, and then polygonal models (and other things) are built from those.
For anyone who wants a more technical dive into the photogrammetry pipeline, here's a video I made for a company called Mapware for NVIDIA GTC 21: https://youtu.be/ktDVWzshR4w?t=331
Some techniques for downsampling point clouds use voxelgrid representations but in general you're mapping pixel data from varied images to each other in space and producing points from that to try and capture surface geometry.
But it does? Agisoft will first estimate depth maps and then project them into a voxel volume for extracting the high-resolution mesh. Debug logging even lists the voxel grid dimensions.
For an excellent review check out Advances in Neural Rendering: https://arxiv.org/abs/2111.05849