Re-implementations often turn out faster simply because you understand the problem better, which seems to be the case here.
The same algorithm not going to run faster in Rust than tuned C, period. Just like C is never going to run faster than tuned Assembly. I wish we could move on to discussing something that matters.
> The same algorithm not going to run faster in Rust than tuned C, period.
Weirdly enough, before I added that optimization my rust code was ~20% faster than the C code anyway. And I have no idea why - the programs were (as far as I could tell) identical. I was using the same compiler backend (llvm) and before alias analysis was turned on for rust. And -march=native in both cases.
Could just be a weird coincidence of the compiler’s inlining decisions - though I suspect not. I tried investigating it but I don’t understand x86_64 assembly enough to understand how the binaries differ.
The same algorithm not going to run faster in Rust than tuned C, period. Just like C is never going to run faster than tuned Assembly. I wish we could move on to discussing something that matters.