This is probably a dumb question, but what does the hardware of such a massive machine look like? Is it just a single server box with a single motherboard? Are there server motherboards out there that support 2 TB of RAM, or is this some kind of distributed RAM?
For example Dell sells 4U servers straight out of their webshop which max out at 96x32GB (that's 3TB) of RAM with 4 CPUs (max 18 cores/CPU => 72 cores total). They seem to have some (training?) videos on youtube that show the internals if you are curious:
We have some supermicros that have about 12TB RAM, but the built in fans sound like a jumbo jet taking off so consider the noise pollution for a second there.
Er, are you summing a TwinBlade chassis? You have to be.
6TB is about where single machines currently top out due to the hardware constraints of multiple vendors and architecture, and memory bandwidth starts being an issue. You have to throw 96x64GB at the ones that exist so wave buh bye to a cool half a million USD or so. If you're sitting on a 12TB box I want a SKU (I want one!).
I don't actually think Supermicro makes a 6TB SKU, even. That's Dell and HP land.
We do have a twinblade chassis, but I'm pretty sure they are a 6TB SKU. To be honest, I'm not the one who procured them so I can ask if you are interested in a SKU.
Once upon a time I hacked on the AIX kernel which ran on POWER hardware (I think they're up to POWER8 or higher now). In my time there the latest hardware was POWER7-based. It maxed out at 48 cores (with 4-way hyperthreading giving you 192 logical cores) and a max of I think 32TB RAM. Not the same hardware as mentioned in the OP, but pretty big scale nonetheless.
I've seen these both opened up and racked up. They are basically split into max 4 rackmount systems, each I think was 2U IIRC. The 4 systems (max configuration) are connected together by a big fat cable, which is the interconnect between nodes in the Redbook I've linked above. The RAM was split 4 ways among the nodes, and NUMA really matters in these systems, since memory local to your nodes is much faster to access than memory across the interconnect.
This is what I observed about 5-6 years ago. I'm sure things have miniaturized further since then...
yeah, sure, you can get a quad xeon 2U server with 2TB of RAM for around $40K. Here's a sample configurator:
https://www.swt.com/rq2u.php
change the RAM and CPUs to your preference and add some flash.
No insight into what Amazon uses, but we've got HP DL980s (g7s, so they're OLD) with 4TB of RAM) and just started using Oracle x5-8 x86 boxes with 6TB of RAM 8 sockets. I believe 144 cores/288 threads.
Yeah, just realized my knowledge of server hardware is hopelessly outdated. They seem to be a couple of orders of magnitude more powerful than what I assumed was available.
How flipping awesome is it that some very large portion (90% or so?) could probably all be one nice contiguous block of mine from x86_64 userspace with a quick mmap() and mlockall().