In my experience, 40gb and 100gb are still mostly used for interconnects (switch/switch links, peering connections, etc.). Mostly due to the cost of NICs and optics. 25gb or Nx10gb seems to be the sweet spot for server/ToR uplinks, both for cost, but also because it's non-trivial to push even a 10gb NIC to line rate (which is ultimately what this entire thread is about).
There's some interesting reading in the Maglev paper from Google about the work they did to push 10gb line rate on commodity Linux hardware.
I guess it'll also depend a lot on what size of server you have. You'd pick a different NIC for a 384-vCPU EPYC box running a zillion VMs in a on-prem server room than a small business $500 1u colo rack web server.
The 2016 Maglev paper was an interesting read, but note that the 10G line rate was with tiny packets and without stuff like TCP send offload (because it's a software router that handles each packet on CPU). Generally if you browe around there isn't issue with saturating a 100G nic when using multiple concurrent TCP connections.
There's some interesting reading in the Maglev paper from Google about the work they did to push 10gb line rate on commodity Linux hardware.