But your point still stands. In my experience the greatest benefits of inlining is when range or null checks can be removed in the inlined function, not the actual call.
Ah, good point, I missed that my version became a tail call, good catch. Still, a C function call is pushing a couple of values onto the stack then performing an unconditional jump, so it's not much worse.
That site is indeed very neat! It just uses Google Benchmark in the background, but it's excellent for these kinds of discussions (and lovely to have it link directly to godbolt if you want). It's a shame that it doesn't give you the actual latency numbers like regular Google Benchmark, but I suppose that is to be expected when you're running in server VMs, those numbers aren't necessarily meaningful.
Eg. if we force a call to eg. abs() the difference is 60%.
https://quick-bench.com/q/Hpk1YFViS6lqtV5oYcXx2ZCY9us
Really cool site btw!
But your point still stands. In my experience the greatest benefits of inlining is when range or null checks can be removed in the inlined function, not the actual call.