Exactly. With GPU virtualization the driver is able to share the GPU resources with multiple systems such as the host operating system and guest virtual machine. Shame on nvidia for arbitrarily locking us out of this feature.
Got some time to try this now. It worked as expected, I have vgpu_vfio. However, it doesn't perfectly fit my needs. Particularly, my host system is "heavy", I need it to run CUDA etc, while the VM just to run games. However, it seems the 460.32.04 driver on host doesn't have full functionality, hence, cannot run CUDA on the host any more.
Is there info on this sort of usage? I'd love to use the host for NVENC and a VM guest for traditional GPU stuff, but haven't been able to find anything on doing that.