Hacker News new | past | comments | ask | show | jobs | submit login

What do you think the threshold number is where scanning a list will outperform a hash table? 10 items? 100? 1000? For what it’s worth, for small data sets I agree with you. But I think it’s very hard to be calibrated correctly on what small means here.

Re: binary search, the biggest cost to scanning data in a database is loading that data from persistent storage. A few mispredicted branches aren’t going to matter much if it means you can do fewer reads.




> But I think it’s very hard to be calibrated correctly on what small means here.

It depends.

If you measure or determine somehow the sizes of the L1-L2-L3 caches in the CPU, the relative speed and size of the main memory, and the relative speed and size of the local storage (in contrast with for example network-remote data), then it is a matter of performing some arithmetic, and just to be on the safe side, some benchmarks.

Any data set that fits into the CPU cache is definitely small, for a particular CPU.

If you want to decide a number that applies to every and all hardware, then it goes from hard to basically impossible.




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: