> That eliminates all of the function call overhead and may also enable further ...

> That eliminates all of the function call overhead and may also enable further optimizations (dead code elimination, constant propagation, etc.).

AFAIK, the speedup is almost never function call overhead. As you mention at the tail end, it's all about the compiler optimizations being able to see past the dynamic branch. Good JITs support polymorphic inlining. My (somewhat dated) experience for C++ is that PGO is the solve for this, but it's not widely used. Instead people tend to avoid dynamic dispatch altogether in performance sensitive code.

I think the more general moral of the story is to avoid all kinds of unnecessary dynamic branching in hot sections of code in any language unless you have strong/confidence your compiler/JIT is seeing through it.