A predicable branch is almost free in a modern processor.
Clang will optimize
bool swap_if(bool c, int& a, int& b) { int ta = a, tb = b; a = (-c & tb)|((c-1) & ta); b = (-c & ta)|((c-1) & tb); return c; }
l += swap_if( *r <= pv, *l, *r);
A predicable branch is almost free in a modern processor.