Hacker News new | past | comments | ask | show | jobs | submit login

I don't think those scheduling constraints exist on the Mill. Certainly the ones illustrated in figure 6(a) don't. Add operation number 5 could issue during the empty slot on cycle number 5 instead of 6 thanks to deferred loads, or thanks to the fact that the Mill uses a multitude of specialized execution units rather than a handful of general-purpose execution units—a busy load unit could never stall an independent add operation to begin with.

It looks like add operation number 5 can't use the empty cycle on lane 2 because compare number 3 would already be queued in that lane. The time-reversed/consumers first scheduling heuristic that a Mill compiler uses wouldn't paint itself into this corner the way the this model's dispatcher does.

Figure 6(a) for a Mill would look like an ALU pipeline doing an add operation every cycle without stall, a load pipeline issuing a load operation every cycle without stall, another ALU pipeline doing compare operations every cycle but lagging behind by several cycles instead of just one, and a branch unit doing the conditional jump every cycle immediately after the corresponding compare. And on a mid-level or high-end Mill there would be plenty of execution units to spare for a non-trivial loop body without reducing throughput, and the whole thing would be vectorized too.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: