But you need 64-bits to represent affinity masks and scheduling. IE: Which of the 64-cores are available for a thread to schedule into.
----------
Furthermore, your Threadripper 3990x should ONLY really be pushing affinities for across a 8c / 16-thread die anyway, at least as far as the Windows scheduler is concerned.
Windows programmers should use multiple thread groups for different 16-thread CCX. Because its extremely costly to move all of your local state from one die's L3 cache into another.
Look at the chip, like physically look at it. AMD has 9x chips here (1x I/O die for memory, and 8x compute chips, each with 8x cores). You really only want to be moving threads within those 8x cores, because moving a thread across chips is less efficient.
So the ideal world, people would understand Windows's scheduler and work around it. Instead of complaining about how it is different from Linux's. Windows's Thread groups being 64-sized is reasonable for the job it wants to do... and there are other parts of the API that allow you to work across Window's 64-sized processor groups. In particular, to increase your affinities across to a 2nd processor group.
----------
Furthermore, your Threadripper 3990x should ONLY really be pushing affinities for across a 8c / 16-thread die anyway, at least as far as the Windows scheduler is concerned.
Windows programmers should use multiple thread groups for different 16-thread CCX. Because its extremely costly to move all of your local state from one die's L3 cache into another.
https://images.anandtech.com/doci/15318/amd_rome-678_678x452...
Look at the chip, like physically look at it. AMD has 9x chips here (1x I/O die for memory, and 8x compute chips, each with 8x cores). You really only want to be moving threads within those 8x cores, because moving a thread across chips is less efficient.
So the ideal world, people would understand Windows's scheduler and work around it. Instead of complaining about how it is different from Linux's. Windows's Thread groups being 64-sized is reasonable for the job it wants to do... and there are other parts of the API that allow you to work across Window's 64-sized processor groups. In particular, to increase your affinities across to a 2nd processor group.