What do you think the threshold number is where scanning a list will outperform a hash table? 10 items? 100? 1000? For what it’s worth, for small data sets I agree with you. But I think it’s very hard to be calibrated correctly on what small means here.
Re: binary search, the biggest cost to scanning data in a database is loading that data from persistent storage. A few mispredicted branches aren’t going to matter much if it means you can do fewer reads.
> But I think it’s very hard to be calibrated correctly on what small means here.
It depends.
If you measure or determine somehow the sizes of the L1-L2-L3 caches in the CPU, the relative speed and size of the main memory, and the relative speed and size of the local storage (in contrast with for example network-remote data), then it is a matter of performing some arithmetic, and just to be on the safe side, some benchmarks.
Any data set that fits into the CPU cache is definitely small, for a particular CPU.
If you want to decide a number that applies to every and all hardware, then it goes from hard to basically impossible.
Re: binary search, the biggest cost to scanning data in a database is loading that data from persistent storage. A few mispredicted branches aren’t going to matter much if it means you can do fewer reads.