The big question is, why does it need to run in 10s? The main reason I can see i...

qaq · on Jan 6, 2016

Vcore is hyperthread of unknown CPU. So in reality 1000 vcores is 500 real cores. - All the overheads it's more like 450 given the low utilization until dataset loads to keep it at 10 sec you would need 90 real cores or 4 X 3 node dual boxes (ebay 1.5K each) and 2 X infiniband switches (ebay 2X300). For 6600 you have a dedicated solution with no latency bubbles fixed low cost.

zbjornson · on Jan 6, 2016

Briefly... We have many data sets, and the <10sec calculations happen every few seconds for every data set in active use. Caching results is rarely helpful in our case because the number of possible results is immense. The back end drives an interactive/real-time experience for the user, so we need the speed. Our loads are somewhat spikey; overnight in US time zones we're very quiet, and during daytime we can use more than 1k vCPUs.

We've considered a few kinds of platforms (AWS spot fleet/GCE autoscaled preemptible VMs, AWS Lambda, bare metal hosting, even Beowulf clusters), and while bare metal has its benefits as you've pointed out, at our current stage it doesn't make sense for us financially.

I omitted from the blog post that we don't rely exclusively on object storage services because its performance is relatively low. We cache files on compute nodes so we avoid that "80% of time is spent reading data" a lot of the time.

(Re: Netflix, in qaq's other comment, I don't have a hard number for this, but I thought a typical AWS data center is only under 20-30% load at any given time.)