Hacker News new | past | comments | ask | show | jobs | submit login

In a three-address machine, separating the integer and floating point registers basically saves you three bits per instruction word compared to a unified register file of the same aggregate size. Also, on a 32-bit machine, you save a few transistors by making the integer rename registers 32 bits instead of all 64 bits to accommodate a double float. (And if you have vectors, it really makes no sense to throw away 128 or 256 or 512 bits to store a 32-bit or 64-bit integer).



As I mentioned, though, there aren't many functions that use both the full compliment of integer and fp registers, so I think the aggregate register file size is rarely a factor. Aggregate register file size is also a detriment to fast context switches.

As long as you defined consistent semantics for switching among integer, floating point, and vector use of the same logical register, there's nothing stopping one from using a 32-bit-wide integer register file, a 64-bit-wide fp register file, and a 512-bit-wide vector register file. From an ISA level, you could (for instance) define all operations as if they worked on vector registers. Your imul could always compute a 32-bit result and sign-extend it to 64 bits as the first vector element, and zero out all but the first element of the vector. You wouldn't actually store it that way, since the top 33 bits would always be identical for the results of integer operations (and subsequent vector elements would always be zero). So, from the outside, it would look like all operations worked on very wide registers, just that the vast majority of operations did very trivial things with most of the output bits in those wide registers. The sign extension and zeroing operations would actually only happen when moving values between the internal register files.

Presumably, you'd use the same tricks used in other processors for actually tracking the amount of vector state that needs to be saved on context switches. You might re-use some of the same techniques for economizing the amount of vector state saved across function boundaries. Or, perhaps you'd define an ABI such that system calls preserve vector state, but all vector state beyond the first double is caller-saved state across function boundaries.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: