Other research from Meta FAIR actually suggests that you should prune deeper layers if you want to improve performance while maintaining accuracy [1]. So there must be a cutoff point for smaller networks where this approach still works, otherwise the results are contradictory. Or we could drastically improve these new models even further.
[1] https://arxiv.org/html/2403.17887v1