Sorry, it sounds like you know a lot more than I do about this, and I'd appreciate it if you'd connect the dots. Is your comment a dig at either Snowflake or Llama? Where are you finding the unquantized size of Llama 3 70B? Isn't it extremely rare to do inference with large unquantized models?
For decent performance, you need to keep all the parameters on memory for both. Well, with a raid-0 of two PCIe 5 SSDs (or 4 PCIe 4) you might get 1 t/s loading experts from disk on snowflake-artic... but that is slooow.