Is it possible for Rust to store this in 64 bits? enum Boxed<T> Number(double), ...

kibwen · on Sept 15, 2018

An interesting question! Let's start this nerd-snipe by thinking about easier cases first.

1. Could Option<f64> be optimized to be 64 bits in size? This would imply that the implementation can guarantee that a particular one of the zillions of possible NaN values will never be generated as a result of a legal operation. I want to say that this is probably true in practice, though it will be implementation-dependent; I don't know how LLVM handles this, but I suspect that it only uses one or a handful of the possible NaN values to represent NaN in practice. So while it's probably feasible, it would make part of Rust's ABI dependent on implementation details of the backend (consider: is it likely that every possible backend reserves, and will always reserve, one NaN value that will never be used?).

2. Could the following enum be 64 bits in size?

  enum NanBox {
    Number(f64),
    Pointer(u32)
  }

A 64-bit float should have 2^53-1 possible values for NaN, which means that theoretically that should give us plenty of space to hide an entire 32-bit integer in the significand. This still presents the same worries as the prior point, except 4.2 billion times more severe. :) Furthermore, now you might have a new problem: consider that an Option<&Foo> that is Some is already a valid pointer, with no bit twiddling required. Additionally, above in point 1, a theoretical 64-bit Option<f64> is already a valid f64 in the Some case, with no bit-twiddling required. But here in our NanBox case, while Number is always a valid f64, the bit pattern of Pointer is not automatically a valid u32! At best, we'd have to mask out the irrelevant upper 32 bits, but now we're assuming something very specific about what bit patterns LLVM will never use for NaN. And in the worst case, we might have to do a nontrivial amount of extra work at runtime to make the Pointer look like a valid value. It might still work, for all I know, but it's another thing that requires careful thought.

In the end it boils down to what sort of guarantees LLVM wants to provide about its FP code generation (and maybe it's even architecture-dependent?), and whether Rust wants to risk irrevocably making its ABI nonportable. In contrast, there's no risk in assuming that references have an uninhabited value at 0x0, because Rust is in full control of that.

evntdrvn · on Sept 15, 2018

http://llvm.org/doxygen/APFloat_8cpp_source.html#l00689

cesarb · on Sept 15, 2018

That would require Rust to convert all NaNs to a single "canonical" NaN. I don't think it does that, in fact you don't want Rust to do that, since it would make it impossible to do these NaN-boxing tricks by hand.

On the other hand, a CanonicalNaN<T> could be created in the future to allow these tricks, similar to the NonNull<T>/NonZeroU8/etc mentioned above.

(As an interesting aside, for the RISC-V ISA, all floating-point operations generate a single canonical NaN: "Except when otherwise stated, if the result of a floating-point operation is NaN, it is the canonical NaN. The canonical NaN has a positive sign and all significand bits clear except the MSB, a.k.a. the quiet bit. For single-precision floating-point, this corresponds to the pattern 0x7fc00000." So this scheme would work even better on RISC-V, since after any floating-point operation, it's guaranteed that any NaN is the canonical NaN.)

Rusky · on Sept 15, 2018

What I do instead (given the excellent sibling replies about how it's not that simple) is store the value in a `u64` and then provide a method to decode it into a non-layout-optimized enum. Since that enum is not stored anywhere but the stack, it can be optimized out if the decoding is inlined: https://github.com/rpjohnst/dejavu/blob/a768ce515b3a110542aa...

nynx · on Sept 15, 2018

No, because NaN is a valid floating point number in rust.

alkonaut · on Sept 15, 2018

There really should be a ”normal” f32 and f64 type with guarantees for non-NaN, similar to the nonzero integers. More importantly than size, these floats would be totally ordered unlike the partially ordered regular ieee floats.

Edit: turns out these exist in various forms e.g “noisy float”

tlb · on Sept 15, 2018

"in IEEE-754 there are 2^53-2 different bit patterns that all represent NaN"

From https://developer.mozilla.org/en-US/docs/Mozilla/Projects/Sp..., which describes how Mozilla implements nan-boxing.