There are many reasons. The latency of getting data back and forth to the GPU is...

There are many reasons. The latency of getting data back and forth to the GPU is a pretty high threshold to cross before you even see benefits, and many tasks are still CPU bound because they have data dependencies and logic that benefit from good branch prediction and deep pipelines.

Many high compute tasks are CPU bound. GPUs are only good for lots of dumb math that doesn't change a lot. Turns out that only applies to a small set of problems, so you need to put in lots of effort to turn your problem into lots of dumb math instead of a little bit of smart math and justify the penalty for leaving L1.