Doing a function call in C/C++ is not expensive. I wish people would stop saying this, it's just not true, it's one of the most annoying myths in programming. A C function call is on the order of a tiny number of nanoseconds. Here's a benchmark to compare [1]. It's an infinitesimal amount, and it is absolutely swamped by something like a cache miss. If you eliminate one cache miss but add 50 function calls, that's probably a net benefit.
The only sense in which function calls in C are "expensive" is during compilation: if a compiler can inline a function call, it can potentially do tons of new optimizations that it couldn't do before, and that can yield a huge improvment in performance. But the function call ITSELF is almost never the problem.
But your point still stands. In my experience the greatest benefits of inlining is when range or null checks can be removed in the inlined function, not the actual call.
Ah, good point, I missed that my version became a tail call, good catch. Still, a C function call is pushing a couple of values onto the stack then performing an unconditional jump, so it's not much worse.
That site is indeed very neat! It just uses Google Benchmark in the background, but it's excellent for these kinds of discussions (and lovely to have it link directly to godbolt if you want). It's a shame that it doesn't give you the actual latency numbers like regular Google Benchmark, but I suppose that is to be expected when you're running in server VMs, those numbers aren't necessarily meaningful.
Doing a function call in C/C++ is not expensive. I wish people would stop saying this, it's just not true, it's one of the most annoying myths in programming. A C function call is on the order of a tiny number of nanoseconds. Here's a benchmark to compare [1]. It's an infinitesimal amount, and it is absolutely swamped by something like a cache miss. If you eliminate one cache miss but add 50 function calls, that's probably a net benefit.
The only sense in which function calls in C are "expensive" is during compilation: if a compiler can inline a function call, it can potentially do tons of new optimizations that it couldn't do before, and that can yield a huge improvment in performance. But the function call ITSELF is almost never the problem.
[1]: https://quick-bench.com/q/2q8ch4HmQSpsSdrwd082NpV3KeI