Trying to reproduce is a good point, but at the same time it’s usually a pretty bad idea to do tests of the form assert(result == expected) with a floating point result though. You’re just asking for trouble in all but the simplest of cases. Tests with floating point should typically allow for LSB rounding differences, or use an epsilon or explicit tolerance knob.
There’s absolutely no guarantee that a computation will be bit-identical even if the hardware primitives are, unless you use exactly the same instructions in exactly the same order. Order of operations matters, therefore valid code optimizations can change your results. Plus you’ll rule out hardware that can produce more accurate results than other hardware if we demand everything be bit-identical always, it will hold us back or even regress. Hardware with FMA units are an example that produce different results than using MUL and ADD instructions, and the FMA is preferred, but hardware without FMA cannot match it. There are more options for similar kinds of hardware improvements in the future.
> Order of operations matters, therefore valid code optimizations can change your results.
This is exactly why optimizations that change the order of operations of floating points aren't valid! And many other optimizations, like (I learned this just recently) transforming x + 0.0 into x: those are not the same thing when x is -0.0. In other news, -ffast-math produces broken code.
Current programming languages enable writing 100% deterministic floating point code just fine (even with compiler optimizations, as long as they are not buggy). The trouble is writing cross-platform deterministic floating point code, that works the same in every machine, but with great care it still can be done, as in https://rapier.rs/docs/user_guides/rust/determinism/ (well this project does this for every platform that supports IEEE 754-2008)
Cross-platform bit-matching determinism is a tradeoff. It’s not a correctness or accuracy issue. It’s one of many goals one might have, and it comes with advantages and disadvantages. Like I pointed out above, you may be trading away higher accuracy in order to achieve cross-platform determinism. You also trade away performance almost certainly.
You say “aren’t valid” and “broken code” as though it’s somehow factual, when in reality you’re making opinionated assumptions about your choice of tradeoff. Those opinions are only true if you assume that only bit-matched results are “valid”. This hyperbolic wording breaks down a little once we start talking about the accuracy of floating point calculations and how bit-matching FP calculations on two different machines is just making two wrong values agree, and there’s nothing “exact” about it.
It is 100% absolutely fine to have bit-matching determinism as a goal, and I’m in favor of compilers supporting it. I’m not suggesting anyone shouldn’t, but I hope you recognize your language is implicitly demanding that everyone must care about floating point determinism just because you do. Some people have serious floating point calculations where they want cross-platform determinism, but -ffast-math exists precisely because many people do not need it, or because they simply prioritize performance over bit-matching, or because they engineered with epsilons instead of unrealistic expectations. There are good reasons why Rapier’s cross platform determinism is not the default, right?
Generally speaking, even the people who have strong reasons to want bit-matching results on different hardware, because they understand the nature of floating point and the reality of the hardware landscape, do not depend on it to be true, they still write their tests using tolerances.
I see there's some increase in confidence perhaps, although the result can still be deterministically wrong...