The problem with the library doing this is you end up with the same issue as in Haskell: you accumulate thunks and both memory use and location of CPU hotspots becomes much harder to predict and troubleshoot.
compare memory / callcounts / time spent with thunking turned on vs. off would give a good idea of it.
i can say for my side of things thunking is a super huge performance gain as it allows for quick gathering of expression intent and then the computation side on the other side of a cache.