It seems to me that such a memory space could be physically mapped quite large w...

rektide · on Sept 23, 2022

Leaving cluster coherent address space behind - like you say - is doable. But you lose what the parent was saying:

> If everyone has the same address space, then you can share pointers / graphs between nodes and the underlying routing/ethernet software will be passing the data automatically between all systems. Its actually quite convenient.

vlovich123 · on Sept 23, 2022

Let's say you have nodes that have 10 TiB of RAM in them. You then need 1.6M nodes (not CPUs, but actual boxes) to use up 64bits of address space. It seems like the motivation is to continue to enable Top500 machines to scale. This wouldn't be coming to a commercial cloud offering for a long time.

rektide · on Sept 23, 2022

Why limit yourself to in-memory storage? I'd definitely assume we have all our storage content memory mapped onto our cluster too, in this world. People have been building exabyte (1M gigabytes) scale datacenters since well before 2010, and 16 exabytes, the current Linux limit according to the most upvoted post here, isn't that much more inconceivable.

Having more space available usually opens up more interesting possibilities. I'm going to rattle off some assorted options. If there's multiple paths to a given bit of data, we could use different addresses to refer to different paths. We could do something like ILA in IPv6, using some of the address as a location identifier: having enough bits for both the location and the identity parts of the address without being too constrained would be helpful. We could use the extra pointer bits for tagged memory or something like CHERI, which allow all kinds of access-control or permission or security capabilities. Perhaps we create something like id's MegaTexture, where we can procedurally generate data on the fly if given an address. There's five options for why you'd want more address space than addressable storage. And I think some folks are already going to be quite limited & have quite a lot of difficulty partitioning up their address space, if they only have for example 1.6m buckets of 1TB (one possible partitioning scheme).

The idea of being able to refer to everything anywhere that does or did exist across a very large space sure seems compelling & interesting to me!

vlovich123 · on Sept 23, 2022

Maybe. You are paying a significant performance penalty for ALL compute to provide that abstraction though.

Aperocky · on Sept 23, 2022

Sounds like a disaster in terms of potential bugs.

jerf · on Sept 23, 2022

It is, but it's the same disaster of bugs we already have from multiple independent cores sharing the same memory space, not a brand new disaster or anything.

It's a disaster of latency issues too, but it's not like that's surprising anyone either, and we already have NUMA on some multi-core systems which is the same problem.

We have existing tools that can be extended in straightforward ways to deal with these issues. And it's not like there's a silver bullet here; having separate address spaces everywhere comes with its own disaster of issues. Pick your poison.

yencabulator · on Sept 23, 2022

Just because you can refer to the identity of a thing anywhere in the cluster doesn't mean it can't also be memory-safe, capability-based, and just an RPC.

lazide · on Sept 24, 2022

Then why make said identity a fixed size number pretending to be a flat address space, instead of some other type of key?

yencabulator · on Sept 24, 2022

For example, reads (when allowed) might just be transparent RAM accesses.

I'm not advocating 128-bit pointers, or saying they're useful or realistic. I'm just saying, what if.

lazide · on Sept 24, 2022

The issue is latency. If you don’t want to care if the read or write will take 10ns or 10000ns, then that can work, assuming a flat permission structure too. Latency is a fundamental restriction for any computer that is > zero size, and the larger the ‘computer’ (data center), the more noticeable it is.

(For that matter, what happens when different segments of memory have complex access controls? What about needing to retry to failures like network partitions that don’t happen in a normal memory space?)

If latency matters, which it usually does, then you need some kind of memory access hierarchy, copying things back and forth, etc. and then you’ll almost certainly need a library of some kind to manage all this, prefetch from a slow range of memory and populate some of your fast local memory, etc.

At that point, we’ve done a lot of work, and are still pretending there is no network or the like, even though it’s there. It isn’t free, anyway. We’d also need checksums, cross network/fabric/access error handling, etc.

And with 128 bits, we could also use something like IPv6 with the lower 64 bits being byte address hah.

yencabulator · on Sept 24, 2022

Data centers are already doing "disaggregated memory" (page out to remote RAM), as it's been faster than local disk for years now (or at least was in the pre-NVMe world).

The HPC world is all about RDMA to direct-access their huge data sets, and likely hyperscaler clouds are starting to do that too.

The latency gap is just treated as yet another layer of the cache model.

And we already have error correction for local RAM.

lazide · on Sept 24, 2022

Which as you note, NVME and the latest PCI generations turns on it’s head.

Those were due to bottlenecks that don’t exist anymore, and even in your example were even then only emergency measures due to local resource shortages.

ECC is also not reliable/sufficient in the face of issues that arrive when networks start playing their part.

rektide · on Sept 23, 2022

Not that the industry doesnt broadly deserve this FUD take/takedown, but perhaps possibly maybe it might end up being really good & useful & clean & clear & lead to very high functioning very performant very highly observable systems, for some.

Having a single system image has many potential upsides, and understandability & reasonability are high among them.

sidewndr46 · on Sept 23, 2022

Instead of having another thread improperly manipulating your pointers and scribbling all over memory, now you can have an entire cluster of distributed machines doing it. This is a clear step forward.

vlovich123 · on Sept 24, 2022

I think the challenge is that within those pages you might have absolute pointers rather than offset to some “page” boundary. In that case, everything really must share a single uniform address space even if any given mode accessed only a small portion, no?

At that point, maybe you want 256bit or 512bit pointers so that you can build a single global addressable system for all memory in the world.

maxwell86 · on Sept 23, 2022

> How likely is it that any given node would be mapping out more than 2^64 bytes worth of virtual pages?

In the Grace Hopper whitepaper, NVIDIA says that they connect multiple nodes with a fabric that allows them to creat a virtual address space across all of them.