The arguments I've read for or against undefined behavior are always:
1. fully binary: "[all] UB is nonsense" vs. "[all] UB is necessary for performance"
2. qualitative / anecdotic: "look at this compiler doing something unexpected" vs. "look at this optimizing compiler's transformation that would be impossible without UB".
Instead I would love to see a more fine-grained and quantitative study. There are many different types of UB. I'm sure every one of them is exploited by optimizing compilers. However, what would be the quantitative impact of disabling each of them on, say, SPECINT benchmarks? Of course some types of UB-related assumptions are deeply embedded in compilers, so it's clearly a question easier asked than answered. But I think simple things like signed-integer overflow handling can be changed (in clang at least?). Has anyone done benchmarking with this? Are you aware of any papers on the topic?
People focus too much on SPECINT. What I’d like to see instead is the companies that are already running lots of real world benchmarks report behavior changes. Then you can get more meaningful real world results. Now obviously this isn’t something you could develop locally so these companies would have to develop tooling to be able to gather these kinds of reports. It would be extremely beneficial good for compiler authors (eg “run experiment X”). Rust has the best setup I’ve seen in that it has both microbenchmarks and non-microbenchmarks and looks for statistical significance across all workloads rather than hyper focusing on a small thing.
1. fully binary: "[all] UB is nonsense" vs. "[all] UB is necessary for performance"
2. qualitative / anecdotic: "look at this compiler doing something unexpected" vs. "look at this optimizing compiler's transformation that would be impossible without UB".
Instead I would love to see a more fine-grained and quantitative study. There are many different types of UB. I'm sure every one of them is exploited by optimizing compilers. However, what would be the quantitative impact of disabling each of them on, say, SPECINT benchmarks? Of course some types of UB-related assumptions are deeply embedded in compilers, so it's clearly a question easier asked than answered. But I think simple things like signed-integer overflow handling can be changed (in clang at least?). Has anyone done benchmarking with this? Are you aware of any papers on the topic?