My guess is that because in most common x86_64 calling conventions the caller is responsible for destructing the parameters, it has to run the unique_ptr parameter destructor which needs to check if it was moved from in the callee or not. Additionally, because this destructor is not a trivial one (i.e. it does something), unique_ptr cannot be passed directly through a register but must be spilled on the stack.
This is correct. Chandler Carruth gave a talk about unfortunate pessimizations imposed by the ABI. Where the call is inlined, that stuff is optimized out, but functions that take a moved unique_ptr or pimpl object -- taking ownership of a heap object -- tend not to be inlined unless they mostly just pass it along to one that isn't inlined.
So, the cost of moving a smart pointer onto the stack and, presumably, off again to someplace less ephemeral might matter, on a critical path. But it would be a mistake to exaggerate this: we are talking about a couple of unfortunate ordinary memory operations, not traps, function calls, or atomic synchronizations, when you might have hoped to be using just registers. If you are already touching memory -- something pretty common -- it is just more of that.