Depends also heavily on the context. You pay for each cache miss twice - once for the miss itself, and next time when you access whatever was evicted during the first miss. This is why LUTs often shine in microbenchmarks, but drag down performance in real world scenarios when mixed with other cache bound code.
Access to main memory can be many many cycles; a short routine already in cache may be able to recompute a value more quickly than pulling it from main memory.