Something equivalent to deferred loads is a part of AMD's GCN architecture, which has existed for many years now. (It's not exactly a deferred load; rather, there's an explicit wait instruction that can block a thread until a previous load has returned its results. But in the end that's almost the same thing.)
Of course, the characteristics of memory accesses are quite different in a GPU: very high bandwidth at the cost of very high latency.
Of course, the characteristics of memory accesses are quite different in a GPU: very high bandwidth at the cost of very high latency.