Hacker News new | past | comments | ask | show | jobs | submit login

The "wider" claim is one that I highly suspect.

Mill has very specific storage for arguments and results. For each operation you need 3 lines to that storage - two for arguments and one for result. The storage has O(num-of-wires^2) wiring overhead and you cannot get away from that in general. You can get away with less (Alpha AXP (OoO) has, for example, split register file where results migrated from one half to another behind the scenes) but it is very complex to implement. The VLIW architectures has some nice wideness and THEY ALL suffer from problem that register file of that width presents. You can see that in the clock speeds - Pentiums run around the Itanium in terms of clock speeds. It is precisely because Itanium has large register file (128 registers) and wide issue.

I keep coming to the discussion of the Mill and keep finding the argument that Mill will be wider. And no one bother to address my arguments that for wideness you need O(width^2) wires in the storage, which makes storage impractical or too complex to implement.




Mill has no register file, and is essentially a bypass network internally. That bypass network has a source X sink complexity (not N^2 because the number of sources need not match the number of sinks). You are right that the cross product is a limit on Mill scaling. Our internal work gives us some confidence that we can handle 30-wide issue with tolerable clock impact; beyond that is unclear, and indeed we may hit other constraints that preclude going further; memory bandwidth is a likely issue.


What makes you think you can extract that much instruction level parallelism in the first place?


One major one is speculation. Each Mill operand, or element in a vector operand, has a 'Not a Result' (NaR) flag.


You still have quadratic complexity of the network, the "need not match" argument can be reduced to a constant multiplier.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: