An algorithm sees its best time complexity over some percentage of the input spa...

An algorithm sees its best time complexity over some percentage of the input space. Over some other percentage it sees its worst time complexity. And, on average over the entire input space, it sees its average time complexity. The algorithms are necessarily different in what those best, average and worst are, and importantly, over what parts of the input space those categories apply. If they weren't, we'd call the algorithms mathematically equivalent and it wouldn't matter which one you chose. And if they all handled 'problem' data equally, we'd see that they all have the same performance penalty between best and average and average and worst. They don't. Sometimes best and average are equal. Sometimes worst and average are equal. Sometimes all three are equal.

If your data is truly random, you should pick the algorithm with the best average time complexity. Is your data random? Almost certainly not. It might be 'random enough', or it might not be. A real world data set may fall completely within the best case of algorithm A and the worst case of algorithm B, even if the average case of B is better than the average case of A.

In this case, the data points that are most likely to negatively affect an algorithm are specifically being excluded from the model. Far from comparing even the average case, we're comparing best cases. We aren't using a 'real world' data set. So it's an interesting simulation, but it isn't enough to reach any conclusions at all.