Hacker News new | past | comments | ask | show | jobs | submit login

The article says as much.

> Although this may seem like a significant drawback, it's actually not that alarming, at least not to me. While it's true that you must insert various "move mask to v0" instructions that wouldn't otherwise be there, it's important to remember that these will not really be actual computation instructions. Moves from one vector register to another will always be simple register renames handled by the front end of any high-performance chip, and I would consider it highly unlikely that you would change masks so frequently as to overburden the front end.

This is not the point the author was making.




If you only write to the lower 32-bit of the v0 register, which could be 1024 bit wide, that claims that the hardware somehow has to allocate a 1024-bit wide register to back those up, and then makes some "locality" arguments.

The hardware can back up the 1024-bit register with a pool of 32-bit registers, and if you only wrote to the first 32-bits, and all others are zero, it can use a single 32-bit register to back it up, making this "as good" as the single mask register solution, which the author thinks is good.


Determining that you only wrote to the bottom 32 bits of the register being copied from is hard for hardware to see; and if the compiler can see, it has no way to tell the hardware.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: