I would sell a kidney for one of these. It's basically impossible to train language models on a consumer 24GB card. The jump up is the A6000 ADA, at 48GB for $8,000. This one will probably be priced somewhere in the $100k+ range.
Use 4 consumer grade 4090 then. It would be much cheaper and better in almost every aspect. Also even with this, forget about training foundational models. Meta spent 82k GPU hours on the smallest llama and 1M hours on largest.
If I remember correctly the NVLINK adds 100GB/s (where PCIE 4.0 is 64GB/s). Is it really worth getting 3090 performance (roughly half) for that extra bus speed?