> It's the same for any other kind of test; your tests should pass before you merge, not after
Rust is very good about testing before merge, not after. It somewhat pioneered doing complete testing before merge (at least, built significant tooling and mind-share around doing it): https://graydon2.dreamwidth.org/1597.htmlhttps://bors.tech/
That is to say, if Rust isn't validating something pre-merge, it's more likely than not that there's a good reason for doing so. I don't personally know the reason for Rust, but, for performance testing in general, it's easy to overwhelm infrastructure because benchmarks should be being run on a single physical machine that's otherwise quiet. This means that testing is completely serialised and there's no opportunity for parallelism. Based on https://perf.rust-lang.org/status.html , it looks like the performance tests currently take a bit longer than an hour each.
As others have mentioned, there's automatic tooling for running tests on PRs that may be performance sensitive, and PRs are reverted if they accidentally cause a regression, to be able to reset and reanalyse.
I think the heuristic is really easy: "Is it slower than before?"
If it's slower, then you require a manual override to accept the merge.
That will make performance regressions a whole lot less likely. Sure, in some cases they may be necessary, but in many cases they were unintentional; they can be fixed, and then merged.
Rust is very good about testing before merge, not after. It somewhat pioneered doing complete testing before merge (at least, built significant tooling and mind-share around doing it): https://graydon2.dreamwidth.org/1597.html https://bors.tech/
That is to say, if Rust isn't validating something pre-merge, it's more likely than not that there's a good reason for doing so. I don't personally know the reason for Rust, but, for performance testing in general, it's easy to overwhelm infrastructure because benchmarks should be being run on a single physical machine that's otherwise quiet. This means that testing is completely serialised and there's no opportunity for parallelism. Based on https://perf.rust-lang.org/status.html , it looks like the performance tests currently take a bit longer than an hour each.
As others have mentioned, there's automatic tooling for running tests on PRs that may be performance sensitive, and PRs are reverted if they accidentally cause a regression, to be able to reset and reanalyse.