Although it usually only matters when compared code scores similarly, it is impo...

Although it usually only matters when compared code scores similarly, it is important to be able to know when benchmark comparisons are statistically significant[1]. I wrote a library[2] a while back to do this. This becomes even more of a concern when the benchmarked code is running on a machine which is not well-isolated and dedicated to running the benchmark.

[1]: http://en.wikipedia.org/wiki/Statistical_significance

[2]: http://github.com/Pistos/better-benchmark