Honest question, if you're running at 10%, why have you gone with 12-core, 512 GB RAM servers? Couldn't you start with a more reasonable 4-core, 64 GB RAM until some threshold? What value do you get from such an early overallocation of resources?
Who rips the $1,000 processors and $200 voltage control modules out of their servers to upgrade to $2,000 processors and $250 voltage control modules, then re-plans their entire infrastructure and possibly code back around that?
Symmetry between servers has a value.
The main board, raid controller, network, and usually the storage is going to be planned out meticulously ahead of time based on the maximum load the server is going to see during it's lifetime. Often, Processor and Memory come down to "If we needed X Feature and we didn't have it" or "If we took the server down for 1 hour to upgrade it, how expensive is that?".
I misunderstood, sorry. I was talking in the context of virtual servers that can scale resources somewhat dynamically. (side note, isn't it awesome we see numbers like "512GB RAM" and don't immediately assume its not a single node in a deployment?)
I initially pictured someone picking the highest spec option they can afford when setting up a new service, rather than choosing based on actual demand of each node.
> "If we took the server down for 1 hour to upgrade it, how expensive is that?"
Putting my CI / DevOps hat on for a sec: who takes production servers down for upgrades without some level of HA to avoid downtime? ;)
Several reasons. We did some POC testing before we deployed this gear and knew that we would achieve high density. Honestly, I didn't expect this much. Hard to believe that we've basically deployed every non-database app we have on this cluster and we're only 10% utilized.
Every service gets run in two environments: test and prod. Both are co-located on the same Kube cluster in different namespaces. We also don't put any datastores in Kube. That stuff still lives in OpenStack for now. Ceph can make it possible but for fast disk I/O, it's tough to beat the local SAS bus.
> Ceph can make it possible but for fast disk I/O, it's tough to beat the local SAS bus.
Personal anecdote - we have everything on-prem using a pretty standard vSphere setup, I've got a couple PostgreSQL databases that aren't even that heavily used (I mean, they top out around 20-40 tx/sec) backed by a hybrid Tegile array. Randomly throughout the day my IOWAIT starts spiking because even over 8Gb fiber channel our storage latency starts spiking because everything else is on the same storage (well over 200 full VM's), or sometimes I need to run a table scan over a 40GB table for a one-off query and the storage bottlenecks the crap out of us because only our small active set is cached on the SSD's.
You can run databases on remote storage, people obviously do it, but there's a reason why even on AWS the best practice is to use your instance disks instead of EBS volumes if you care about performance. Noisy neighbors suck.
because the marginal cost of hardware once the chassis is racked is trivial compared to the public cloud, which is insanely marked up for retail on exactly this concept.
I have heard from a staffer on the GCE team that they deploy three CPU cores for every user facing core. That might have something to do with the cost.
GCE provides significantly more reliability promises to us, from the PoV of user - As far as I know, DigitalOcean and RamNode don't provide things like no-downtime host maintenance (hell, AWS doesn't provide that)
Interestingly, the "3 cores for 1 user acing one" would explain the pricing of "preemptible" instances - which can cost as little as 1/3rd of full instance. And the main difference is that they are fully rescheduled every 24h or more often, and there's no live-migration for maintenance...
… I am now wondering if GCE runs fault-tolerant VMs. Because if it does, holy moly
Except they don't. You give up reliability and uptime and not being on oversold hosts for a tiny bit more money?? So some real world benchmarks between DO and GCE. Not the same ballpark.
I'm aware of one similar company that used to fit sixty $20/mo VPS in 1U. They were working on a couple hundred in 2 or 3U when I left (harder than you'd think), but they also have a cheaper tier now because of competitors pushing VPS down. Margins on VPS, even without oversubscribing RAM, are pretty decent once you get past the capital but they are decreasing in a race to the bottom just like shared hosting before it.