Figure 4 from https://github.com/BLAKE3-team/BLAKE3-specs/blob/master/blak... is a related benchmark. On that particular machine we saw good scaling up to 16 threads, but yeah somewhere in that neighborhood you start to run into memory bandwidth issues. Which is the problem you want I guess :)