But that depends on the number of objects. So unless you are creating objects like there is no tomorrow in a hot loop, the GC should be able to keep up with your workload quite well.
GC will cost 50-100 instructions per object reclaimed AND 5-10 per object not reclaimed.
Even if we ignore the latter, gc will can add 2x for objects that you don't do much with.
And then there's the cache effects. Close to top-of-stack is pretty much guaranteed to be in L1 and might even be in registers. Heap allocated stuff is wherever. Yes, usually L2/L3 but that's at >2x the latency for the first access.
Is your instruction numbers from a generational GC? Not doubting it, but perhaps that can further amortize the cost (new objects that are likely to die are looked at more often).
Cache misses are indeed true, but I think we should not pose the problem as if the two alternatives are GCs with pointer chasing and some ultra-efficient SoA or array-based language. Generally, most programs will require allocations and those will cause indirections, or they will simply not loop over some region of memory at all. Then GC really is cheap and may be the better tradeoff (faster allocation, parallel reclaim). But yeah, runtimes absolutely need a way for value classes.
Generational affects the number of free objects that can be found, not the cost of freeing or looking at an object, or even the number of objects looked at.
I thought that the comparison was between "heap-allocate every object" vs "zero allocation." (C/C++ and similar languages which make it easy to stack-allocate objects, which is not far from zero-allocation.)
If the application is such that zero-allocation isn't easy, then that comparison doesn't make sense.
However, we're discussing situations when zero (or stack) allocation is possible.