You can do without bifurcation if you use a PCIe switch such as [1]. This is more expensive but also can achieve more speed, and will work in machines without bifurcation. Downside is it uses more W.
Right, and whilst 3.0 switches are semi-affordable, 4.0 or 5.0 costs significantly more, though how much that matters obviously depends on your workload.
True. I think a switch which could do for example PCIe 5.0 on the host side and 3.0 on the device side would be sufficient for many cases, as one lane of 5.0 can serve all four lanes of a 3.0 NMVe.
But I realize we probably won't see that.
Perhaps it will be realized with higher PCIe versions, given how tight signalling margins will get. But the big guys have money to throw at this so yeah...
[1] https://www.aliexpress.com/item/1005001889076788.html