NaN Boxing (2020)

kmeisthax · on Sept 16, 2022

NaN boxing sounds like it would have very significant pointer provenance issues. CHERI would also stick its nose up at it.

The tagged pointer type has the advantage of holding real pointers, meaning that any machine[0] that sticks extra information onto its pointers transparently just works. And reading the int half of the union to check if the pointer is valid should always be sound, but I'm not 100% sure on that. Architectures that don't enforce alignment are also probably not the sort of thing we care about running on, and I doubt there's a way for an optimizer to legally screw over programs that depend on pointer alignment being there on architectures where it has to be there.

But, then again, that's what we said about int-to-pointer casts...

[0] Real or virtual. Remember that when you compile your program with an optimizing compiler or JIT, it is executing on two architectures:

1. The compiler's expression/AST interpreter, which enforces ISO C undefined behavior rules

2. The target architecture, where those undefined behaviors become merely unspecified.

klodolph · on Sept 16, 2022

I don't think that pointer provenance rears its ugly head here. You are generally free to convert a pointer to an integer and then back, which is what's happening here, and you get pointer provenance problems because different pointers may be supposed to point to different objects. Basically, casting to an integer and back is supposed to be "safe", but doing math and getting pointers to different objects is not.

Note that there are some implementation-dependent factors here which I'm not getting into.

> And reading the int half of the union to check if the pointer is valid should always be sound, but I'm not 100% sure on that.

After various alias problems a while back (Linux kernel folks had some words), everyone got together and agreed that you can do type punning through a union... you just get the byte representation of one type, reinterpreted as the byte representation of another type. This was codified a few C standards ago and it's fairly explicit in the spec now.

I'll also just mention that you don't need an architecture than enforces alignment--you just need an allocator that returns aligned pointers.

LoganDark · on Sept 17, 2022

> You are generally free to convert a pointer to an integer and then back

(I'm going to ignore your use of the term "generally" to acknowledge that there are exceptions, because I want to elaborate on a particular exception that highlights an error in reasoning that I believe most programmers make when they manipulate memory addresses and then cast them straight to pointers.)

On CHERI, pure integer addressing can be turned off. In that case, pointers cannot necessarily be derived from integers alone. This is because CHERI implements pointer capabilities in hardware. You have to use special capability-preserving instructions to handle them, or else they permanently become invalid.

You can still manipulate memory addresses, but you generally cannot use them directly without them being part of a valid pointer. So it's valid to take a memory address, but invalid to try to use it directly.

I personally consider it unsound to try to transmute an integer into a pointer. It doesn't make sense, because [pointers are not necessarily memory addresses](https://www.ralfj.de/blog/2018/07/24/pointers-and-bytes.html). The Rust programming language is also considering taking this stance -- they already have a lint against treating pointers as integers, and with it an option to forbid it, though I'm unsure what the status on that is (it will probably remain optional forever).

This doesn't necessarily make it wrong or incorrect if the language allows for it, but I would advise not doing so, unless it's a domain-specific optimization (i.e. there is a fallback for architectures where it does not apply).

wahern · on Sept 16, 2022

> I don't think that pointer provenance rears its ugly head here. You are generally free to convert a pointer to an integer and then back, which is what's happening here

C [optionally] supports converting pointers to intptr_t (or uintptr_t), not to any integer type (even with nominally sufficient width), and certainly not to a floating point type, which is how NaN boxing works.

In CHERI intptr_t requires special treatment by the compiler. Plus, the nominal width of both pointers and intptr_t double in size--128-bits on 64-bit architectures. Most environments don't even have a 128-bit floating point type, even when the hardware might support it. (According to Wikipedia, Fortran is an exception, but it's still uncommon in C and other popular language environments.)

Dylan16807 · on Sept 16, 2022

> C [optionally] supports converting pointers to intptr_t (or uintptr_t)

Yes it's optional, but it's "generally" supported.

> (even with nominally sufficient width)

C99 says: "Any pointer type may be converted to an integer type. Except as previously specified, the result is implementation-defined. If the result cannot be represented in the integer type, the behavior is undefined. The result need not be in the range of values of any integer type."

I'm not entirely sure how I'm supposed to parse that, but it sounds like if the result is in range of an integer then it's supposed to work?

I guess you could technically say all results have a zillion added to the actual value? Rules lawyering gets annoying pretty fast.

> and certainly not to a floating point type, which is how NaN boxing works

NaN boxing works by converting to an integer type first. What it does with the bits afterwards is not important to C, as long as it reconstructs the same integer before converting back.

lokedhs · on Sept 17, 2022

The part you quoted seems pretty clear. It's implementation defined. That's why it works done on most modern architectures, but you can't rely on it in portable code, as there are certainly architectures where it doesn't.

kazinator · on Sept 17, 2022

implementation-defined is not "optional". The implementation must provide the conversion and document how it's done; it cannot say that, "oh, it's not supported and the program will refuse to compile, or behave erratically".

However some stupid definition of behavior could be given, like converting any pointer to integer results in a reliable 0.

C has pointer arithmetic; it must support subtraction of pointers that point into the same array-like object. So even on that implementation, you might perhaps be able to obtain a large-ish array and treat that as a heap in which you can convert pointers to integers by subtracting the base.

klodolph · on Sept 16, 2022

> not to any integer type (even with nominally sufficient width),

I’m going to have to say that you’re definitely wrong about that, because when I read the C standard, it seems pretty clear to me. From N2176 (C17 draft) §6.3.2.3:

> An integer may be converted to any pointer type. Except as previously specified, the result is implementation-defined, might not be correctly aligned, might not point to an entity of the referenced type, and might be a trap representation.

> Any pointer type may be converted to an integer type. Except as previously specified, the result is implementation-defined. If the result cannot be represented in the integer type, the behavior is undefined. The result need not be in the range of values of any integer type.

As I said, there are some implementation-dependent factors here, but it should be clear that casting a pointer to and from an integer is OK, depending on the implementation, and depending on what integer type—and it’s not just limited to intptr types. The C standard guarantees that you can always cast to and from intptr_t / uintptr_t (if it exists), and other casts are implementation-defined. Relying on implementation-defined behavior is ok.

There’s nothing here that requires casting to float. You’re passing around a union which has both a float and a uint64_t in it.

wahern · on Sept 17, 2022

This issue isn't round-tripping integers through object pointers, but round-tripping object pointers through integers.[1] C only guarantees the latter to occur through intptr_t, if at all. And you can't use plain pointers to round-trip integers because they have no observable structure by which you could inspect them for these user space tagging tricks. So you're stuck with intptr_t.

Also, none of your points pertain to NaN boxing in particular. NaN boxing literally requires floating point types, and nothing in the standard permits floating point types to be used this way. NaN boxing fundamentally relies upon undefined behavior. All the modern hysteria regarding undefined behavior notwithstanding, however, clearly people can manage to implement workable solutions even in the face of what the standard makes undefined. But such de facto allowances are the first to be discarded as situations change, otherwise they would have been codified as part of a standard. As the article describes, most of what can be achieved from an optimization point with NaN boxing can be achieved by smuggling floats through pointers or integers, rather than vice-versa, but strictly speaking NaN boxing is especially problematic.

[1] The standard says "Any pointer type may be converted to an integer type. Except as previously specified, the result is implementation-defined. If the result cannot be represented in the integer type, the behavior is undefined." But in CHERI 1 bit of capability data (the most critical bit!) is stored outside addressable memory and by design cannot be represented as part of an integer, except intptr_t, which as I described requires special compiler treatment. Moreover, this special treatment might not be free, obviating the performance advantages of type punning tricks. I forget the details, but IIRC in some cases a call into the kernel is actually required to preserve C's existing conversion semantics. And intptr_t semantics in CHERI more generallly likely present barriers to other kinds of compiler optimizations, even ignoring the fact that our memory saving type punning hack doesn't actually save memory anymore on account of both pointers and intptr_t being double the width of every other type in the union.

klodolph · on Sept 17, 2022

I've repeated the part about implementation-dependent behavior here twice and it seems like you've ignored it both times, which is frustrating.

I understand CHERI. What I don't understand is why you are assuming that the implementation MUST be CHERI, and that non-CHERI implementations are somehow not included in the discussion.

> NaN boxing literally requires floating point types, and nothing in the standard permits floating point types to be used this way.

Nothing in the standard requires intptr_t to exist, or floating-point types to be IEEE 754, etc. You'd have to be completely deranged to try and think that this is portable C.

The pointers are, again, NOT being interpreted as floating-point numbers. The are, instead, being converted into integers and put into unions, and then taken back out as integers. The "NaN boxing" part is just a way to do this so that when you stick a float into the same union, it will occupy a different space of integers.

Which OBVIOUSLY relies on implementation-defined behavior, and certain assumptions about the platform you're working on--so I do not understand the motivation for the discussion here.

Dylan16807 · on Sept 17, 2022

> nothing in the standard permits floating point types to be used this way. NaN boxing fundamentally relies upon undefined behavior.

What do you mean? memcpy from bytes to a NaN should work fine, and there's also a "nan" function that can take a number to insert into the NaN payload.

There are certainly implementation-defined aspects, but all of floating point is implementation defined. As long as you can get a number back out, you can make it into an integer and cast back to a pointer on 98% of platforms.

MaulingMonkey · on Sept 17, 2022

> memcpy from bytes to a NaN should work fine

Signaling NaNs are explicitly undefined in C11 F.2.1.: "This specification does not define the behavior of signaling NaNs." - and in practice may be "quieted" by conversion to Quiet NaNs, changing their bit patterns, or may trigger actual signals. Fast math optimization flags will also break the hell out of your code by assuming NaNs are impossible. I want to say there are more circumstances where optimizers and compiler generated code can butcher your NaN payloads, but I'd be working off recollected hearsay and I can't find a source, so don't quote me on that.

NaN boxing is common enough that, if you take the right precautions, a modern compiler should probably support it, maybe. NaN boxing is uncommon enough that, if your codebase needs to be sufficiently portable, you need an opt out for when it breaks. Let's review duktape's scars:

https://github.com/svaarala/duktape/blob/123d9426d5e5b36d5da...

https://github.com/svaarala/duktape/blob/5252b7a50611a3cb8bf...

https://github.com/svaarala/duktape/blob/224a0b89ca08a36e37e...

(for context, a "packed" duk_tval involves NaN boxing)

Note that "the right precautions" involve unions and proper integer types to avoid optimizer-invoked rewrites of the value and debugging when things go wrong, not simply YOLOing bytes into a double via memcpy. Note that debugging when it all goes terribly wrong can be quite painful. I've personally had the misfortune of being forced to debug duktape being built with fast math optimizatoins enabled on one "rare" platform + build configuration that wasn't caught by duktape's #if defined(__FAST_MATH__) checks linked above (wasn't Clang nor GCC, so go figure it didn't make the same #define)

Dylan16807 · on Sept 17, 2022

Well half of NaNs are quiet, so that's easy to deal with. And the fast optimization flags are themselves violating the standard so those don't count.

> Note that "the right precautions" involve unions and proper integer types to avoid optimizer-invoked rewrites of the value and debugging when things go wrong, not simply YOLOing bytes into a double via memcpy.

Using a union is iffy behavior that has been argued about in the past and is even less safe in C++.

I suggested memcpy because it's the safe way. It's the opposite of YOLO. You memcpy from one type into a char buffer, then memcpy that char buffer into the other type.

MaulingMonkey · on Sept 20, 2022

> Well half of NaNs are quiet, so that's easy to deal with

Which half varies by architecture (I'm looking at you, MIPS - and apparently RISC-V at one point was going to go the MIPS route with an all-1s payload for canonical qnans?) - so platform specific spaghet and requirements testing is in the mix.

> And the fast optimization flags are themselves violating the standard so those don't count.

Some jerk will enable them, standards-violating or not. While it's valid for the solution to disable the optimization, in practice you will need to debug and write defensive code when this happens. I know of at least one platform which enables such optimizations by default for it's "release" builds, and while I'm angry at them for doing so, I'm unfortunately relegated to existing in the same reality as them.

Perhaps you're lucky enough to exist in a different reality?

> I suggested memcpy because it's the safe way.

More generally, yes, but in the specific context of preserving a NaN payload I would not trust the optimizer to keep my NaN payloads untouched when stored as a floating point value. LLVM developers appear to agree that NaN payload preservation is not guaranteed - I guess you can quote me on my earlier "optimizers can butcher your NaN payloads", presumably even without fast math optimizations:

https://lists.llvm.org/pipermail/llvm-dev/2018-November/1276...

Which causes a good bit of awkwardness for Rust:

https://github.com/rust-lang/rust/issues/73328

The solution is to not attempt to store payload-laden NaN as any kind of floating point value, even via memcpy. A union is acceptable in the sense that at least, then, you're supposedly storing an integer, and the bit pattern of that would be preserved. A memcpy to a temporary float immediately before NaN testing / floating point usage - and never back to integer in a naieve attempt to extract the possibly discarded payload - would work, but is a hell of a caveat to omit when saying "memcpy from bytes to a NaN should work fine", especially when mentioning `nan` with it's payload argument, which is unextractable without doing the naieve "back to integer" extraction which, if the above is to be believed, is unreliable at best.

Dylan16807 · on Sept 23, 2022

> Perhaps you're lucky enough to exist in a different reality?

Any code I've touched that was going to deal with boxing had enough control over the build system to avoid that.

> More generally, yes, but in the specific context of preserving a NaN payload I would not trust the optimizer to keep my NaN payloads untouched when stored as a floating point value. LLVM developers appear to agree that NaN payload preservation is not guaranteed - I guess you can quote me on my earlier "optimizers can butcher your NaN payloads", presumably even without fast math optimizations:

That looks like it only butchers the NaN if you do math on it or try to have compile-time NaNs, which shouldn't be an issue here? Are there other factors making it worse that I'm missing in a quick read?

> A union is acceptable in the sense that at least, then, you're supposedly storing an integer, and the bit pattern of that would be preserved.

Oh, you're suggesting a union as permanent storage, not just to perform the cast. That makes sense.

MaulingMonkey · on Sept 16, 2022

> I don't think that pointer provenance rears its ugly head here. You are generally free to convert a pointer to an integer and then back

It's worth noting that Rust is experimenting with making this not so free for one to do:

https://doc.rust-lang.org/std/ptr/index.html#pointer-usize-p...

kazinator · on Sept 17, 2022

NaN boxing can be implemented with pointer-priority. That means that pointers into the heap represent themselves as-is. The upper 12 bits being zero, that bit pattern looks like a valid double. To represent double values that would look like pointers, the 64 image of the double value is altered by adding an offset (as an integer) so no double value is represented with a 0x0000 in the upper bits.

So then, that is pretty much just a variation on pointer tagging. When the pointer has certain bits set in certain ways, it represents something else, including a full double.

To defend against undefined behaviors not going the way you want, you have tests. Programming language run-times are easily testable.

If your language's compiler compiles itself and a sizeable library, and all kinds of tests pass, it's unlikely that there is any hidden time bomb in value tagging scheme due to the undefined behavior of aliasing or pointer/integer conversion.

ISO C doesn't guarantee that if some line of code performing some undefined conversion has worked a million times, it will work for the million-and-first time. In practice, there is no reasonable translation scheme from C to machine code which would have that hostile behavior; it would have to be contrived with a hidden counter.

armchairhacker · on Sept 16, 2022

Any incompatibility can be resolved by having an explicit “NaN box” type. For OSs where pointers always end in b000 and NaN boxing just works, it compiles to 8 bytes. For weird OSs like CHERI or those which store metadata in pointers, it compiles to an ordinary tagged union.

Dylan16807 · on Sept 16, 2022

NaN boxing doesn't need pointers to end in b000. You need more than ten bits that don't matter. So you need something like 48 bit pointers, and once you have that you're already done. That's enough to tag 30 different types of pointer and every smaller type.

throwawaymaths · on Sept 16, 2022

I wonder if you could do better; instead of stuffing that bit at the bottom as an int marker, do your best to represent the int faithfully (with higher order nan bits doing their thing) so you optimistically perform the add and issue the add at the same time as you issue the conditional, then by the time the conditional resolves both your float addition and integer addition have come back and you pick the winner with no branching

ridiculous_fish · on Sept 16, 2022

It's an interesting idea. In practice one must also consider int + double and double + int, so there's four possible paths.

Taniwha · on Sept 17, 2022

NaN-boxing has other uses - RISC-V systems that support FP have 32 FP registers that are used for 32-bit floats, 64-bit doubles (and possibly 128-bit doubles).

According to the spec in a system with 64-bit FP registers 32-bit floats are stored (in the registers) with the upper 32-bits as 0xffff_ffff - a NaN for 64-bit FP values (the same for systems with 128-bit floats) - this catches (some) code that mixes 32/64 bit floats without conversion

an1sotropy · on Sept 16, 2022

Some other things on the same topic:

https://leonardschuetz.ch/blog/nan-boxing/

https://anniecherkaev.com/the-secret-life-of-nan

Is NaN-boxing actually used in the major browser JS engines?

ridiculous_fish · on Sept 16, 2022

v8 does not use NaN-boxes; instead they use the low bit to distinguish between a 31-bit small integer ("smi") or a pointer. Doubles are additionally sometimes stored inline ("double field unboxing") I'm not sure how this works exactly. Other times they are heap-allocated. I am not sure if there is a specialized double-allocator, I'd like to know.

JavaScriptCore uses a tweaked NaN-box [1]: values are stored via a NaN box minus a constant, which avoids requiring a mask when chasing pointers. This makes pointers cheaper but floating point operations more expensive.

SpiderMonkey and Hermes both use straight NaN-boxing to my knowledge.

https://github.com/WebKit/WebKit/blob/ec6b5337e777f9b460ec6b...

moonchild · on Sept 17, 2022

> avoids requiring a mask when chasing pointers

This, unfortunately, is an artificial limitation; negative nans may correspond to canonical addresses, but most oses won't let you map negative memory, keeping it all to themselves.

mmoskal · on Sept 17, 2022

The way it works is that pointers are represented as themselves, assuming top 15 bits being zero. Doubles are shifted by an offset such that nans land at top 15 bits being zero.

So assuming pointers are at most 49 bits you're fine.

moonchild · on Sept 17, 2022

My point is that, if the system were more cooperative, you would not need to 'shift by an offset'.

repiret · on Sept 17, 2022

It’s not even always architecturally possible - ARMv8 doesn’t support user-space mappings with bit 63 set, for example.

moonchild · on Sept 17, 2022

I see. I was talking specifically about x86 here (and the conventions of mainstream OSes hosted on it); obviously, every platform is unique.

It looks like on arm, the high bit of an address is used as a special marker to indicate which translation table should be used to look up the rest of the address. I am no arm expert, but looking at the documentation at [0], it does not seem like allowing 'userspace' access to addresses mapped in TTBR1 would be an insurmountable challenge.

0. https://developer.arm.com/documentation/ddi0406/b/System-Lev...

moonchild · on Sept 16, 2022

Afaik, it is not used by chrome, but it is used by firefox. Luajit also uses it.

Waterluvian · on Sept 16, 2022

I’m curious for feedback about the very beginning of this post.

How many people see the “let a” example as “the variable ‘a’ keeps changing shape/type” and how many see it as, “three different things were created and the variable ‘a’ kept getting reassigned to point at the newest one.”

Or is this not really a distinction people see?

skybrian · on Sept 16, 2022

It seems like the beginning of the post avoids the issue. "First a was this, now it's this" doesn't take a position on whether it's really the same object, just how it appears in the debugger.

glenjamin · on Sept 17, 2022

My mental model of JavaScript is that the name/label “a” in that scope was re-used 3 times, but for a different value each time

kazinator · on Sept 17, 2022

I released TXR 282 with NaN boxing support. The pre-built binary for Android doesn't use it though; I discovered late in the release process that on Android's Bionic Libc, pointers are coming out of the heap with 64-bit-wide values like 0xb40000NNNNNNNNNN. So, swift crash in early startup. I will research into this for the next release.

Anyone finding things dead in the water like this has to use ./configure --no-nan-boxing.

Another issue I ran into is that when I tested with gcc 12.2 on a RISC-V machine, the double to integer conversion code threw a weird uninitialized warning, not just the warning about strict aliasing against the use of u in the second line:

  ucnum u = coerce(ucnum, num) - NAN_FLNUM_DELTA;
  return *coerce(double *, &u);

I added pragmas to suppress that. Anyway, everything built and tests passed on that installation.

kazinator · on Sept 17, 2022

Here is the issue:

https://source.android.com/docs/security/test/tagged-pointer...

Android is tagging the top byte of a pointer to check for misuses, like a pointer into the stack being passed to free() and such.

I have successfully worked around it. The only pointers affected by this that we care about are Lisp heap objects. These are allocated in large block.

When we allocate a heap block, we clear the top 16 bits of the pointer, and save that value in heap->tag.

Then, if and when we free that heap block, instead of calling free(heap), we first mask back the heap->tag that we stripped away and free the resulting pointer.

Everything builds on a Pixel 4a; tests pass.

In spite of the rude language in that Android documentation, whereby some troglodytes disparage the work of their superiors, and give grave admonishments, it's possible to peacefully coexist.

kazinator · on Sept 17, 2022

I just implemented NaN boxing for TXR Lisp several days ago.

https://www.kylheku.com/cgit/txr/commit/?id=073e4a84090b11fd...

I'm rolling out a release; it's going to be enabled by default on 64 bits by the build configure script. I tested on PPC, RISC-V, AArch64, X86-64 so far. I will do Loongarch64 and Mips64 also.

I developed my own "flavor" of NaN boxing, which gets us the maximum width of fixnum integers, and harmonizes with the pointer tagging scheme (which is still there, configurable at build time).

Some of the decisions made were to keep the code base common between the pointer tagging and NaN boxing. This arises in the arithmetic library code.

The representation is "pointer favoring"; so floating-point values are not represented as themselves; an offset has to be added to their integer image.

I chose the offset 0x0004 0000 0000 0000. What this means is that in the upper 16 bits, the values 0x0000, 0x0001, 0x0002 and 0x0003 can be used as tags for whatever purpose. We make the 0x0000 one denote pointers. Floating-point values start at 0x0004.

This fits perfectly into TXR Lisp which has two tag bits in the pointer tagging scheme: tags 0, 1, 2 and 3. We keep exactly the same tags.

When the upper 14 bits are all 1 (0xFFFC), which is the non-signaling NaN, the remaining 50 bits are a fixnum integer: the most we can have.

So we have 48 bit pointers, 50 bit integers, plus three addititional kinds of values that have 48 bits of room. Under the pointer tagging scheme, the three values are characters, integers and C string literals.

Since we otherwise represent integers under NaN, one of the tags, TAG_NUM (value 1) is not actually used. The function which returns a value's tag fakes out this TAG_NUM value when it detects a NaN-coded integer, which has no such tag.

This value must be returned because tons of code depends on it. There has to be compatibility between pointer tagging and NaN boxing not to have to rewrite a bunch of code in two ways.

xfer · on Sept 17, 2022

Is that a good trade-off sacrificing fixnum bits? I guess it would make sense when the language is intended for scientific programming.

kazinator · on Sept 17, 2022

Hard to say. It hurts in the compiler which uses integer bitmasks for register sets. Fewer registers are handled without boxing. But I see hardly any difference in the library compile time.

fixnum-max is 562949953421311, which is still astronomically large. It can represent 5.6 trillion dollars, in integer cents. We could do the accounting of some large multi-nationals in fixnums.

We will never need an array index that cannot fit into this. (Proof: pointers are 48 bits under this scheme, so using positive integers, we can enumerate every nybble.)

We can seek 562 terabytes into a file, which doesn't seem too shabby.

There are situations in which having anything less than the full 64 bits in the integer hurts, like externally imposed bitfields and masks.

beagle3 · on Sept 16, 2022

You can also double the boxed space by using subnormals (nee denormals) if you don't need them for precision (and if you do need them .... my condolences).

an1sotropy · on Sept 16, 2022

Can you explain this more? I mean why is the space doubled, exactly?

sjrd · on Sept 16, 2022

Because there are as many subnormal values as NaN values. At least if you don't count the two 0's, which are "paired" with the two infinities.

an1sotropy · on Sept 16, 2022

oh right - sorry stupid question.

maglite77 · on Sept 16, 2022

It's been ages, but this use of unions reminds me of the VARIANT structure definition from Windows OA/COM:

[1]: https://docs.microsoft.com/en-us/windows/win32/api/oaidl/ns-...

pwdisswordfish9 · on Sept 17, 2022

> Because of that, 64-bit architectures use only 48 lowest bits of every pointer which still give plenty of addresses. It means that only 48 bits in a pointer are significant and that is enough for our NaN boxing.

Congratulations, you have just ruined your software's forward c11y.

Those 16 bits aren't going to stay reserved forever.

scheme271 · on Sept 17, 2022

48 bits lets you have 256 terabytes of addressable memory. If you go to 56 bits, you get ~64 petabytes of addressable memory. 48 bits is probably good for 5-10 years as we hit the limits of process shrinks. 56 bits should be good for a quite a bit longer. This is all assuming we don't find some magical memory tech that lets us add orders of magnitude more memory without needing absurd amounts of cooling and power.

chrisseaton · on Sept 17, 2022

I don’t understand what you mean. In your process, you can control the address range of your memory, so the address range of your pointers. It’s irrelevant if the 16 bits are reserved by the architecture or not - just don’t map space that uses them. What’s the issue?

dzaima · on Sept 17, 2022

In fact, today already there are CPUs that support larger address spaces. But OSes should be smart and only provide pointers with the top 16 bits free while the application fits - even if some programs in the future end up using >256TB of address space, not all will, and those that don't should be able to take advantage of 48-bit pointers.

Moreover, a single program could even take advantage of a "low heap" that stores metadata, and can have 48-bit pointers pointing to it, and a "high heap" of the actual large data (whose addresses take the full 64 bits, but may include tagged pointers to the low heap without doubling the memory footprint of the already humongous 256TB)

userbinator · on Sept 17, 2022

When? Too long to care.

How long will it be until 256TB of RAM (which is what 48 bits will address) becomes even remotely common in a single system? Even 256GB, 1024x less, is not common today, except in huge servers.

How long will it take to change the implementation when/if it does?

I've seen far too much "future proofing" that makes the present worse, for a future that might never even happen.

sparkie · on Sept 17, 2022

Quad-precision floats will be widespread before applications that use >256TiB of memory are common.

In that case, we can NaN-box the quad float and have up to 112 bits of pointer space to play with. Lets assume we double the number of pointer bits to 96 - we can support YiBs of address space.

userbinator · on Sept 17, 2022

Note that the benchmark code is inconsistent - nanb3 and nanb4 allocate 10x the number of objects of nanb and nanb2, so I was getting very surprising results until I realised that and fixed it.

On a 32-bit system, it seems tagged pointers have the advantage:

    nanb 18.562s - NaN boxing
    nanb2 17.484s - tagged pointer
    nanb3 25.468s - tagged structure
    nanb3 with pack(1) 26.000s
    nanb4 20.093s - tagged union
    nanb4 with pack(1) 19.109s

I also tried with pack(1) to see if cache effects would show up, and although it made the tagged structure slightly slower, indeed you can see that the tagged union became faster without alignment padding.

spacechild1 · on Sept 17, 2022

> A compiler can also change the order of fields if this could improve the speed of reading them

No, C does NOT allow that! C++ theoretically allows to reorder sections with different access specifiers (public, protected, private), but I don't think any compiler actually does this. It would just create a big mess for little gain.

sbf501 · on Sept 17, 2022

All I see is 1% performance improvement for 1,000% complexity increase.

kazinator · on Sept 17, 2022

Not boxing floating-point values on the heap can be a big deal.

I retrofitted NaN boxing into a production Lisp implementation literally in one evening and morning, with minimal impact to all the existing code in the implementation that is built around the pointer tagging scheme that existed for 13 years. It's nicely delimited with #if CONFIG_NAN_BOXING.

Accumulator-incrementing loop on pointer tagging:

  This is the TXR Lisp interactive listener of TXR 281.
  Quit with :quit or Ctrl-D on an empty line. Ctrl-X ? for cheatsheet.
  1> (compile-toplevel '(for ((i 1.0) (acc 0.0)) ((< i 100000000.0) acc) ((inc acc i) (inc i))))
  #<sys:vm-desc: 188b8d0>
  2> (pprof [*1])
  malloc bytes:             0
  gc heap bytes:   6399999936
  total:           6399999936
  milliseconds:          5860
  4.99999995e15

NaN Boxing:

  This is the TXR Lisp interactive listener of TXR 281.
  Quit with :quit or Ctrl-D on an empty line. Ctrl-X ? for cheatsheet.
  1> (compile-toplevel '(for ((i 1.0) (acc 0.0)) ((< i 100000000.0) acc) ((inc acc i) (inc i))))
  #<sys:vm-desc: 24b7dc0>
  2> (pprof [*1])
  malloc bytes:             0
  gc heap bytes:            0
  total:                    0
  milliseconds:          4079
  4.99999995e15

The speedup is 43.6% here.

  3> (/ (- 5860 4079) 4079)
  0.436626624172591

This is not native code. The difference would be more emphasized in native code.

However, code that doesn't do floating-point calculation (or even some that does) will see a small performance penalty due to the more complex access to all values. That is a bummer.

At least users can choose; if you don't care about floating-point, build it with pointer tagging.

sbf501 · on Sept 18, 2022

I was referring to the table on their original web page. The last two rows are a 1.5% performance difference.

chrisseaton · on Sept 17, 2022

But a very small team of compiler people take on the extra complexity so a huge team of users can get the performance improvement.

dzaima · on Sept 17, 2022

Besides the people already saying the performance difference is much larger, the memory savings are quite important too, and I'd argue the complexity isn't that bad either. Sure, NaN-boxing isn't completely trivial, but it's largely a one-time cost for quite a small part of the definition of your vm. It even brings some nice things with it, like being able to use regular 64-bit routines to work with tagged data, and having a single nice 64-bit int to copy&paste around while using a debugger, whereas otherwise you'd have a whole struct.

userbinator · on Sept 17, 2022

It's more like 10% (https://news.ycombinator.com/item?id=32874523)

sosodev · on Sept 16, 2022

I know we're supposed to stay on topic, but I seriously thought this was going to be an article about elderly women punching each other.

blowski · on Sept 16, 2022

For a while, this was a popular Christmas novelty gift. https://www.gigglegadgets.co.uk/home/307-boxing-grannies.htm...

proto_lambda · on Sept 16, 2022

HN title mangling strikes again. "NaN Boxing", which is very likely what the OP submitted it as, would've been obvious.

jstanley · on Sept 16, 2022

Thanks, I hadn't noticed that HN had mangled it. Fixed now.

dang · on Sept 16, 2022

Twont happen again.