Hacker News new | past | comments | ask | show | jobs | submit login

Copy paste:

Fun fact, the Mill's proposed method for hiding that latency, "deferred loads" has already been done by Duke's Architecture Group: http://people.duke.edu/~bcl15/documents/huang2016-nisc.pdf (warning PDF link). The big gain? A measly ~8%.




IIRC the Mill's presentation about deferred loads predates this paper, though the paper is a lot more detailed and has simulations. It's not clear how the Mill's gain from deferred loads would compare (it differs in a lot of other ways that would interact).


8% seems huge for an architectural change to me.


8% for an unoptimized PoC is pretty substansial, or it could be nothing when all details are accounted for. If it comes with a simpler silicon as well it's better and surely worth evaluating for GP processors.


I'm pretty out of my depth here, but I thought the deferred loads latency was part of the 12% gained from dynamic scheduling, not the 88% that the Mill claims to tackle with its phasing thing. In that case, 8% doesn't sound measly at all.

But like I said, I wouldn't be surprised if I'm just not understanding what I'm reading/watching.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: