"256 TB should be enough for.." Instead of using true 64-bit addreses AArch64 sy...

crote · on Feb 8, 2023

Yup, something like that happened with MacOS, which used 24-bit pointers a loooong time ago.

In general it's not a big issue. You only want to use it with proper CPU support anyways. It's just best to avoid using it for user code and leave it for use by the compiler and/or operating system.

For example, AArch64 supports Pointer Authentication[0]. This uses the additional bits to essentially "sign" a pointer, which could serve as a final defense against some attacks. It is essentially free, so why not use it for the dozen or so years those bits aren't needed yet?

[0]: https://www.qualcomm.com/content/dam/qcomm-martech/dm-assets...

lgg · on Feb 9, 2023

The situation on classic MacOS was a bit different. The issue in that case was the the 68000 with a 32 bit architecture with a micro-architectural limitation of 24 bit addresses. Nobody thought deeply about the implication of that in terms of bincompat and ABI (which is not shocking, very few machines from that era were fully compatible with their predecessors). As such the processor just did not wire up the 8 address lines at all. A number of pieces of software (including the MacOS Toolbox) used those high bits for various things, and would read and write to addresses without masking the bits which worked because the address lines were not hooked up.

That is the reason why almost every future architectures after the 68000 specified the behavior for such unused bits (in general it is to sign extend them, IOW, the top bits must be all 1s or all 0s). If that had been the case on the 68000 then when software attempted to access pointers with data in the high bits it would have caused an exception when the load / store was issued. It is worth noting it would not have prevented people from using the top bits, it would have just forced them to manually mask them in software (a technique that is used all over the place for things like tagged pointers, etc). That still might have been an issue when the memory space was actually expanded to the point where the high bits were needed for real physical or virtual addresses.

TBI is actually about explicitly stating architecturally that these 8 bits are used for data not addresses. That is different from the situation on 68000 because it is an architectural guarantee that future processors will not use those bits for addresses, which means you won't have bincompat issues and you don't have to bother masking them before use dereference them.

flohofwoe · on Feb 9, 2023

Those old 68k machines used a single shared address space for all processes. Now with a per process virtual address space it makes a lot of sense to not require full width pointers, ideally the process should declare how many pointer bits are actually needed and the hardware should mask those before the page table lookup.

gopalv · on Feb 9, 2023

> Would appreciate hearing from an expert about why this is a thing. Also, if the addresses are virtualized, why not take the memory savings that'd come with efficiently packing the addresses into only 48-bytes?

Not an expert at all, but the the place where I've really used this sort of method is when writing a GC and trying to figure out where to store a "pointer has been visited" marker in the pointer instead of in a different data structure.

This was also on ARM, but quite a long time ago on a Sharp Zaurus. The word aligned pointer model gave 2 bits at the end of the pointer to mark cycles in GC pathway.

This was much simpler than a separate data structure to keep the mark phase of the traversal in memory & was an extension of the "1 bit reference counter" paper[1] with white, gray and black for the bits in there.

Of course, you need to go back and rewrite them out to 00, but with virtual memory in the mix you can play bigger tricks. This does save a lot of memory, hash lookups and is very low-overhead way of keeping a 1:1 metadata storage for every pointer you have.

You'll see the same trick in the JVM ZGC where higher bits are used to keep track of the metadata about the items being collected [2].

With a little bit of pointer arithmetic, you can also turn a pointer read-only to prevent some process from modifying shared data outside of a lock too, to avoid copying data before unlocking. I had a double mmap mode in PHP APC which segv'd when the core engine mutated a cached data structure outside of a lock[3].

[1] - https://link.springer.com/article/10.1007/BF01932156

[2] - http://hg.openjdk.java.net/zgc/zgc/file/59c07aef65ac/src/hot...

[3] - https://notmysock.org/blog/2009/Jan/08/

none_to_remain · on Feb 8, 2023

Years back I was building some concurrent data structures and I had cpu instructions to operate atomically on 64-bit quantities. So, with my data structures storing pointers, it was very useful to use the 3 low bits (0 in the pointer due to alignment) for 3 booleans, and I think sometimes a counter or something in the high bits, so my atomic operations could work on a few more variables.

loeg · on Feb 9, 2023

Yes, this kind of thing is common in mutex and other concurrency primitive implementations. You can store the lock's owner (an aligned pointer) and some bits about if any reader is waiting for the lock; that kind of thing.

Gigachad · on Feb 8, 2023

ARM wants to use the extra bits for memory tagging. A feature that would allow detecting C memory bugs.

bitwize · on Feb 9, 2023

In a few years, 256 TiB will be barely enough to run VSCode, Slack, Teams, and a browser concurrently.

But yeah, the idea is that you can add tagging info to the high bits of pointers (to enforce data types, reference counting, etc.) and pay zero cost to indirect through such a tagged pointer. No need to mask off the tag bits, just use it like a regular pointer.

retrac · on Feb 9, 2023

Most modern processors read their native word size from memory internally. And that's it. So to read a 16 bit word on a machine with a 64 data bus, that is split over two 64 bit words (e.g. addresses 0x1007 and 0x1008) two 64-bit reads are performed.

It's standard to store a byte or half-word in a struct with a full 32-bit or 64-bit word, for performance reasons.

Some RISC machines can't even do such sub-word accesses at all. Byte and half-word load/store are an optional extension to the RISC-V base instruction set.

saagarjha · on Feb 9, 2023

Most modern processors do reads on the level of cache lines.

spullara · on Feb 9, 2023

The JVM has a special mode called CompressedOops when you reserve 32G or below where all the references are 32-bit instead of 64-bit.

zajio1am · on Feb 9, 2023

> it's reasonable to expect systems will exist with > 256TB of memory in a single machine.

Note that this is a limit for address space, not for physical memory. You can need that address space e.g. for mmaping of large files. And disk arrays with files larger than 256 TB are realistic even today (that is just 16x 16TB HDD).

menaerus · on Feb 9, 2023

It is commonly used in mitigating the concurrency issues such as ABA, and which normally occurs in lock-free data structures design: https://en.wikipedia.org/wiki/ABA_problem

AaronFriel · on Feb 9, 2023

Using shorter virtual addresses make for a simpler page table and translation lookaside buffer (TLB) design.

Intel processors support up to 57 bit virtual addresses (128 PB), in a 5-level page table. The trade-off to increased size is an increase in worst case latency reading a virtual address.

kazinator · on Feb 9, 2023

Only special applications will need that much in a single address space. OS's will likely have support for 48 bit processes vs full 64 bit, which isn't complicated; basically just restricting the virtual memory mapper below a certain address.