Hacker News new | past | comments | ask | show | jobs | submit login

Wasn't there a nice post somewhere talking about using anything but division for the task of splitting stuff into buckets?



Yeah, it figures Lemire would have written it. I don't think OP actually needs or wants division when you can have faster algorithms with the same statistical properties.

https://lemire.me/blog/2019/06/06/nearly-divisionless-random...


Lenore’s approach uses rejection sampling, which translates to needing an uncertain amount of input — a uint128 may or may not be enough.

But there is one related algorithm that doesn’t: https://github.com/apple/swift/pull/39143. This method only requires (output word size + 64) bits, which is really convenient for us since we probably don’t have 2^64 buckets. Same ballpark as (hash >> 64) * n, just with bias correction.

With this method the output value will be very not-scrambled relative to the high bits of the hash. Now whether that’s a problem is… someone else’s question. One can always do some shifts and XORs.


I wrote about this at https://dotat.at/@/2022-04-20-really-divisionless.html

You don’t need to worry about this for mapping hashes to buckets, because a hash has a limited width, whereas unbiased random numbers are based on an unbounded stream of random bits.

In this article they are working with 128 bit hashes so they already have plenty of bits for bias not to be a problem. A single straightforward multiplication is fine.





Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: