A consistent 5 ms difference in micro-benchmarks is definitely not "measurement noise". Noise averages out way before accumulating to 5ms. There must be a reason and it mostly likely relates to the change. So you can confidently say that removing bounds checking (at least with how you did it) is a regression.
... that being said, I'd argue that the most beneficial memory-safety feature of Rust is about temporal things (i.e. prevents UAF etc) instead of spatial ones.
Well, there is both random and systemic error in any experiment, and if 5ms is small relative to anything you'd expect (or there is some other reason to discount it) then it might be related to a problem in the benchmarking setup that's too small to be worth resolving. Any test is good to within some level of accuracy and they don't always average out to infinitely good if you rerun them enough times.
The 5ms isn't the key number. It's 5ms extra over a 28ms baseline, that's about 18% difference. If your noise threshold is 18%, then I think you have to accept that the benchmark probably isn't any good for this stated task.
https://github.com/bheisler/criterion.rs is good for tests like that. It will give you much more than a single number and handle things like outliers. This makes identifying noisy tests simpler.
news_app/ranges_and_joins/cached
time: [28.583 ms 29.001 ms 29.526 ms]
thrpt: [277.45 Kelem/s 282.48 Kelem/s 286.61 Kelem/s]
news_app/ranges_and_joins/cached
time: [33.271 ms 33.836 ms 34.418 ms]
thrpt: [238.01 Kelem/s 242.11 Kelem/s 246.22 Kelem/s]
Given that 33.836/(1000/(242.11*1000)) ~= 8192, my understanding is the time reported here is how long it takes to do 8192 queries. Also it reports three metrics (should be min, median and max). All these means the benchmark harness did run the test for a lot of times and the 5 ms different is not random at all.
... that being said, I'd argue that the most beneficial memory-safety feature of Rust is about temporal things (i.e. prevents UAF etc) instead of spatial ones.