Hacker News new | past | comments | ask | show | jobs | submit login

Mill has no register file, and is essentially a bypass network internally. That bypass network has a source X sink complexity (not N^2 because the number of sources need not match the number of sinks). You are right that the cross product is a limit on Mill scaling. Our internal work gives us some confidence that we can handle 30-wide issue with tolerable clock impact; beyond that is unclear, and indeed we may hit other constraints that preclude going further; memory bandwidth is a likely issue.



What makes you think you can extract that much instruction level parallelism in the first place?


One major one is speculation. Each Mill operand, or element in a vector operand, has a 'Not a Result' (NaR) flag.


You still have quadratic complexity of the network, the "need not match" argument can be reduced to a constant multiplier.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: