* Second socket increases the memory channels and RAM available: 16-channel dual-EPYC with 8TB of RAM will be faster than 4TB of RAM on single-EPYC 8-channel.
* SQL optimizers automatically search for sequential scans, because sequential scans are faster.
* While JOIN can be done in GPU space, GPUs have extremely low memory capacity (only 80GB on the latest A100 that costs $10,000+). CPU will be faster because you can keep a much larger dataset hot in RAM. Your 80GB of VRAM on a GPU means nothing if your dataset is in the multi-TB range. (8TB of CPU-RAM on the other hand, serves as a reasonable cache)
More sockets add memory controllers, but we can also think about moving HBM closer to the cores as a L4 cache or scratch memory that’s not expected to be synchronised with other cores/sockets.
* Second socket increases the memory channels and RAM available: 16-channel dual-EPYC with 8TB of RAM will be faster than 4TB of RAM on single-EPYC 8-channel.
* SQL optimizers automatically search for sequential scans, because sequential scans are faster.
* While JOIN can be done in GPU space, GPUs have extremely low memory capacity (only 80GB on the latest A100 that costs $10,000+). CPU will be faster because you can keep a much larger dataset hot in RAM. Your 80GB of VRAM on a GPU means nothing if your dataset is in the multi-TB range. (8TB of CPU-RAM on the other hand, serves as a reasonable cache)