My guess is that because in most common x86_64 calling conventions the caller is...

ncmncm · on March 2, 2021

This is correct. Chandler Carruth gave a talk about unfortunate pessimizations imposed by the ABI. Where the call is inlined, that stuff is optimized out, but functions that take a moved unique_ptr or pimpl object -- taking ownership of a heap object -- tend not to be inlined unless they mostly just pass it along to one that isn't inlined.

So, the cost of moving a smart pointer onto the stack and, presumably, off again to someplace less ephemeral might matter, on a critical path. But it would be a mistake to exaggerate this: we are talking about a couple of unfortunate ordinary memory operations, not traps, function calls, or atomic synchronizations, when you might have hoped to be using just registers. If you are already touching memory -- something pretty common -- it is just more of that.

wheybags · on March 2, 2021

Wow, did not know this, I'm going to go and have a deeper look at it now. Thanks to all in this comment thread!