A lifting implementation of a wavelet is pretty low complexity, iirc the CDF 9/7...

A lifting implementation of a wavelet is pretty low complexity, iirc the CDF 9/7 used in JPEG2000 was effectively a 5-tap filter. Implementation-wise, the biggest issue is that approximately no one bothered optimizing their memory access patters for CPU caches with anywhere near the same attention FFTs got. Then unlike block-based codecs, you basically need to store 16-bit coefficients for an entire frame for 8-bit pixels, instead of per-block.

But ultimately, DCTs make it way cheaper to encode similar but wrong details, where in wavelets details are effectively repeated across multiple highpass bands so the only cheap option is to not code detail.