Cheap Kubernetes Cluster on AWS with Kubeadm

pstadler · on Feb 27, 2019

I'm running a three node cluster on Hetzner Cloud for less than $10 a month. Comprehensive guide and automated provisioning available here: https://github.com/hobby-kube/guide

hardwaresofton · on Feb 27, 2019

+1 for Hetzner, it's amazing -- I'm particularly fond of their dedicated server offerings.

One of the things about Kubernetes I like most is the likelihood that it reduces the barrier to entry for "PaaS land" (and resultingly "cloud land") for providers like Hetzner (see OVH's recently announced k8s offering[0]).

Once these "baremetal" providers get in the managed (by k8s) game, I'm certain companies will start to spring up and offer/chip away at cloud provider offerings like S3/RDS/etc -- they'll just connect to your k8s infrastructure, and bring their kubernetes-compatible know-how. This is going to cause prices to plummet, as it will make the tiers of value added services more distinct -- i.e. "a platform to run things" on vs. "really good managed databases".

[0]: https://www.ovh.co.uk/kubernetes/

stingraycharles · on Feb 27, 2019

What’s the network of Hetzner like? I like their offering, especially the AMD Epyc servers seem to give a great bang for the buck, but I am a bit put off by their low 99.9% network uptime guarantee.

https://www.hetzner.com/rechtliches/agb

dx034 · on Feb 27, 2019

I'm using both their dedicated servers and the cloud offering and never had any issues. Speed and latency are consistent both within Europe as well as to the US (not monitoring much traffic to Asia). Latency from their DCs in Germany to Frankfurt is around 3ms, so comparable to the central Europe offerings of most cloud providers.

stingraycharles · on Feb 27, 2019

So their network hasn’t had any downtime? Any idea why they would only guarantee 99.9% uptime on their network?

simplyinfinity · on Feb 27, 2019

I've been a hetzner customer for the past 5.5 years now, so far I've received 21 emails from their status reports as follows:

10 from failures (2 or 3 of which directly affected me),

10 emails stating when planned maintenance of certain resources will occur

1 email to let me know of the spectre and meltdown vulnerabilities.

So far i'm pretty happy customer, and sending business their way and migrating client's sites to their cloud offerings.

hardwaresofton · on Feb 27, 2019

I haven't noticed any downtime either -- all my downtime has been my own doing -- I only run a few small sites and periodically staging versions of client applications so I don't have a crazy amount of load.

I've been using Hetzner for roughly 2 years and am a pretty happy customer.

I can say though that the latency is pretty bad when compared to local options -- their server are pretty geographically far from where I and others who use some of my apps access from, and I get ~200ms of latency that I can't do much about. I've looked a traceroute and it's not their network but rather some points in between.

Demiurge · on Feb 27, 2019

In the past two years, I remember they have done scheduled infrastructure maintenance twice and unscheduled interruption for a few hours. So, no, it's not 100%. But, AWS has also had some interruptions. From personal experience, this Hetzner downtime has simply been negligible enough. Bigger issue have been replacing commodity memory a few times, which could have been avoided by paying a few more dollars for more quality.

merb · on Feb 28, 2019

actually I'm pretty sure ovh and gcloud do run the backplane on k8s itself, that's why it is so cheap

forgot-my-pw · on Feb 27, 2019

I like Hetzner's offerings. Wish they're also in North America and Asia.

jimmy_ruska · on Feb 27, 2019

Hetzner lists VMs that are ultra cheap but then has separate listing for more expensive VMs with dedicated cores. Alibaba does the same.

With these shared core machines, I wonder what the performance reliability guarantees available and how it compares to the dedicated core machines.

chrismeller · on Feb 28, 2019

I don't have any actual statistics to back it up, but I've hosted a variety of things in Hetzner's cloud (and dedicated) options, and... it depends.

If you're hosting a website (or a database backing one) there is going to be a natural kind of ebb and flow as traffic comes in, gets processed, and answered. Since you're not utilizing a large amount of CPU consistently this actually fits really well into the shared core model - that's exactly why they can offer it. Even if a subset of requests take twice as long to fulfill this is usually not even noticeable to the user.

On the other hand if you're routinely running a large ETL process, resizing video or images, or any of dozens of other things I'm sure you can think of that are using a lot of CPU for prolonged periods you're going to notice it.

Specifically I have two situations I've run into: My TeamCity build agent will be unpredictable for longer builds - one might finish in 5 minutes, the next might take 10 - and when I had a Windows Server running as an Amazon Workspace alternative (RDP in, run Visual Studio, etc.) things like building and debugging an app were noticeably slower than if I did it locally or on my dedicated server (even with similar specs).

So if CPU usage isn't normally your bottleneck or you're scaling horizontally and CPU performance isn't as important it's a great option that will save you quite a bit. If your workload is very CPU sensitive you probably shouldn't be using a VM anyway and should look into a more dedicated infrastructure, but obviously there is also a middle ground to be had...

merb · on Feb 28, 2019

they also have a dedicated vcpu option but it is not that cheap

grogenaut · on Feb 27, 2019

Depending on your data I'd also wonder about the security restrictions between tenants. It's a thing I consider on AWS as well depending on what I'm doing.

dx034 · on Feb 27, 2019

Do shared cores add any attack vectors other shared machines (with dedicated cores) don't have?

whoisjohnkid · on Feb 28, 2019

If the vendor shares bare metal servers you’ll def want to keep this in mind: https://www.wired.com/story/dark-metal-cloud-computers-invis... - gist is they are some interesting attacks that can be ran on bare metal servers that are re-used

grogenaut · on Feb 27, 2019

There are always risks with shared harware, known and unknown veunerabilities in the hypervisor or hardware. All the recent intel stuff, row hammer, etc.

The interesting question comes when people start implementing hardware hypervisors and what is the risk profile there.

Security, at the end of the day, isn't about what is secure and what isn't. If you want to be secure, don't get on the internet. Everything else is a exercise in risk tradeoffs and mitigation.

If I was doing anything with PII, cc#'s or any other data I never want to touch I wouldn't use shared hardware without hard thought on it.

londons_explore · on March 2, 2019

Shared hardware from the big cloud players adds attack vectors, but it also comes with some of the best security minds trying to keep the entire platform secure.

For example, they'll typically be on secret mailing lists and aware of security vulnerabilities weeks before you know about them.

dx034 · on Feb 27, 2019

Don't think they guarantee anything but I'm running a few servers close to max utilization most of the time and performance appears relatively stable. But that will obviously depend on the machine you're on.

Most of the smaller cloud providers seem to run a similar model, I guess it's worth it unless you need predictability.

kaivi · on Feb 27, 2019

Indeed, cloud+dedicated infrastructure is really cheap after certain load/volume, compared to pure GCE/AWS. I have been using k8s with Hetzner for 6 months now, with dirt-cheap SSD/NVMe storage and 1080 GPUs. Can't recommend them enough, and I do not really see any competition here.

crb · on Feb 27, 2019

> If anyone has any suggestions on a better way of doing this without shelling out $20 a month for an ELB, please open an Issue!

kubehost was designed for this purpose on GKE: https://github.com/GoogleContainerTools/kubehost

acd · on Feb 27, 2019

You could use route53 health checks and do dns failover

https://docs.aws.amazon.com/Route53/latest/DeveloperGuide/dn...

013a · on Feb 27, 2019

You can totally install Minikube on AWS [1], which removes the need for a dedicated master and drop the price to that of a single instance.

Not sure why you'd ever want to do this, given that GKE or DO will always be cheaper and AWS's core services aren't all that special, but as a thought experiment it's interesting.

[1] https://www.radishlogic.com/kubernetes/running-minikube-in-a...

zwerdlds · on Feb 27, 2019

New to the k8s scene, but does Minikube support multi-node systems?

crb · on Feb 27, 2019

No. We discuss this exact topic with Minikube author Dan Lorenc here: https://kubernetespodcast.com/episode/039-minikube/

shitloadofbooks · on Feb 28, 2019

Your podcast is great! You two make a fairly dry topic engaging and I look forward to every episode.

tripue · on Feb 27, 2019

I enjoy listening to your podcast every week ! Thank for your work What a small world here on hn

tripue · on Feb 27, 2019

An alternative that will soon support multi master is rancher/k3s if you want a lightweight k8

lifeisstillgood · on Feb 27, 2019

I love the attitude here - because in my privileged western world worrying if my monthly hosting costs are one latte and crossiant or two seems quaint.

But even in the "rich" EU Avergae monthly wages can be only 1,000 USD, making this a tenth of a day's work, and we don't have to go much further afield to see 6 bucks becoming a significant chunk of a workers day.

So thank you, some kids somewhere will be able to afford to develop skills because you started penny pinching.

cheers

skybrian · on Feb 27, 2019

Even for people who can afford it, this makes a difference for preserving low-traffic hobby sites you only mildly care about.

I have a couple of silly sites that I've run on App Engine for many years, using its generous free tier. If they were on Digital Ocean I'd have shut them down by now, as I eventually did for a Go search engine that I wasn't using much.

raehik · on Feb 27, 2019

It certainly makes getting into this field open to students like me. Very glad for that

fridgamarator · on Feb 27, 2019

Google Cloud and Digital Ocean both offer managed kubernetes clusters, you only pay for the workers.

bryanlarsen · on Feb 27, 2019

Not only that, Digital Ocean's fully managed cluster starts at $10.

k__ · on Feb 27, 2019

AWS Fargate starts at $0

If it's only a hobby project with a small cluster, the Fargate costs could very well be under $10.

yebyen · on Feb 27, 2019

That's not really an apples-to-apples comparison though, is it? I mean yeah, a cluster that you never power up costs nothing, that's what I would have hoped.

https://aws.amazon.com/fargate/pricing/

Assume that you are actually _doing something_ with the cluster, then there will be charges for vCPU and memory. The pricing on this page indicates that if you only need your containers for 10 minutes a day, every day, then you will pay $1.23 for the month at 2GB/1vCPU. That's pretty modest usage at a pretty modest cost.

Sure, you can do it, but... you may need to re-architect your product to take advantage of transient workers. (Now I think we all should do that, but that's another conversation...)

Compared to DigitalOcean's $10/mo (single-worker, managed) cluster, which provides one full-time node with the same specs, that you can slice-and-dice to run as many tasks as you can fit in 2GB of memory and a single vCPU. Now it becomes clearer that Fargate is priced at a premium. If I'm doing the math right, you'll pay $177.12 for that same task space with your Fargate cluster.

If your hobby project needs to run for more than a couple of hours a month, you will usually pay a lot more with Fargate.

k__ · on Feb 27, 2019

I had the impression that the $1.23 are for 10 workers?

yebyen · on Feb 27, 2019

My mistake:

> For example, your service uses 5 ECS Tasks, running for 10 minutes (600 seconds) every day for a month (30 days) where each ECS Task uses 1 vCPU and 2GB memory.

5 workers, at 10 minutes each. That's still $35 / $10 = 350% of the cost, and they're not full-time. The point was that Fargate is not priced to be the cheapest option, unless your product is designed specifically with Fargate in mind.

(And ok, not specifically Fargate, really any architecture that permits the workers to come and go, but mostly go.)

Fargate is a tool for a job, and in the suite of Amazon's offerings that includes Lambda, S3, CloudFront, you really can make some cool stuff for cheap, if you pay attention to what kind of resources it needs, and if it _really needs_ those resources, or maybe there's a cheaper offering that is a better fit, or at least that could do the job just as well.

Don't get me wrong! But if I want to take, say, my Ruby on Rails app and host it somewhere without reinventing my whole stack from scratch, I definitely won't use Lambda because any Rails app just won't fit the model very well, and I won't use Fargate because it's [N] integer times as expensive as comparable offerings.

If Fargate was capable of auto-scaling tasks to zero workers during periods of inactivity while remaining mostly responsive, like Lambda "cold start" vs "warm start", then it would be a much easier sell for me.

k__ · on Feb 27, 2019

Sure, I never had the impression it was cheaper for the same workload.

My take was, that hobby projects probably don't run 24/7, and at some point, you pay while not using them.

bryanlarsen · on Feb 27, 2019

Yup, you're both right. I use my $10 kube cluster for 2 apps, one of which has 3 users, the other has a dozen. So it's idle >99% of the time, but it needs to be available 24/7. And they're rails apps, so it's neither easy to "scale to 0", and if it was, the startup times would probably be horrendous enough that it wouldn't be worth saving that fraction of $10...

yebyen · on Feb 27, 2019

Keep an eye on Knative and/or Project Riff. They _can_ scale pods to zero during periods of inactivity, that's one of the bigger selling points, and the startup times might not be as horrendous as you think. (YMMV depending on your runtime and your requirements.)

Riff is really a Function-as-a-Service library, built on Knative, but Knative can run any arbitrary workloads. Riff will soon have support for Ruby again (I promise, I'm working on it[1]) and Knative has this capability now[2]

So, there's a convergence coming and it's going to make a big difference in this cost/benefit trade-off. You need a cluster to run your Knative workloads on, and that cluster has to be persistent and always available to make the magic trick work.

But say you have a small to medium enterprise that runs 75 custom apps, and any 10-30 of them could be in use at any given time. You'd like to take advantage of office hours and turn things off when they're not in use, but you can't guarantee that nobody is going to need some app, some time after 5pm.

The cluster itself can autoscale up and down to a smaller or larger size, depending on how many balls are in the air. Your apps remain responsive even at nighttime, when your cluster footprint is a tiny fraction of what supports the company's daytime operations.

(By "I'm working on it" I really mean, they've made it as easy as they possibly can for new runtimes to be added with their buildpack v3 framework. I riffed on Ruby back in v0.0.7 and it was possible to reuse the work in v0.1.0, after they ported their stack over to Knative. Now in v0.2.0, the work I did on my Ruby invoker is not wholly reusable, but I'm hoping it will be a pretty smooth transition.)

And you didn't say you're using Ruby, but the point again is that anyone can add support for any language, and it's no big deal. I'm basically nobody here, and I can do it...

So that may sound a little crazy, but keep in mind that there are also K8s "virtual node" solutions in the works or already out there. So binpacking your containers into nodes could soon be a thing of the past, and as clunky as what I'm proposing sounds, it may not always be that way.

Sure, AWS could do it with Fargate too, and it might turn out to be just as good, but right now that's speculative. This is pretty much stuff that is all out there now. It's just parts that need to be put together.

[1]: https://github.com/projectriff/riff/issues/1093

[2]: https://github.com/knative/docs/tree/master/serving/samples/...

dstroot · on Feb 27, 2019

I started using fargate this last weekend. From a platform (not cost) perspective I have always liked GKE. Fargate is very AWS-ish. Kubernetes is pretty buried. The good news is I found a fargate CLI tool written in GO on github that is really nice. It’s called “fargate” but I don’t have the link handy.

zegl · on Feb 27, 2019

Can it be considered cheap if the cluster is also small? The overhead of the control plane+networking can be quite big, and can easily use more resources than what a m1.small has to offer.

triplewipeass · on Feb 27, 2019

How's this better than kops? https://github.com/kubernetes/kops

jordanbeiber · on Feb 27, 2019

Kubeadm gives you a ”core” or bare bones setup which makes it much more flexible in terms of addons and versions etc.

It’s a bit more work though, which is the trade-off, but since the last few versions kubeadm makes it really easy to spin up clusters.

Personally I opt for complete flexibility.

raesene9 · on Feb 28, 2019

Also Kubeadm has a pretty sane set of default security settings, which some other k8s distributions do not.

Specifically kops (by default) does not enable authentication on the kubelet, meaning any attacker who gets access to one container in your cluster is very likely to be able to compromise the whole thing.

joseph · on Feb 27, 2019

My project keights[1] can build a cheap two node cluster in AWS, but is not limited to small clusters. Though, it does spin up an ELB unlike this one.

1. https://github.com/cloudboss/keights

moondev · on Feb 27, 2019

You can use haproxy as a tcp load balancer for the apiserver instead of an elb

manishsharan · on Feb 27, 2019

could you please comment on how you use Haproxy servers across multiple AWS availability zones ? how do you configre and manage dns for your haproxy etc ? My reason for asking is that I am super cheap and I want to avoid AWS ELB charges.

yebyen · on Feb 27, 2019

Put your Haproxy (or nginx-ingress, or whatever) into HostPort mode, or enable hostNetwork.

I have a comment somewhere in my history that explains in a bit more detail: https://news.ycombinator.com/item?id=18660503

that's it... hostPort mode, hostNetwork, and enable in DaemonSet mode. Now all of your nodes are load balancers for ingress, and you don't need any ELBs. This is not a recommended configuration because something has to point DNS at your nodes, and the nodes are really not designed to be permanent.

If you autoscale, or scale your cluster manually, your DNS needs to be updated to keep up with that. You may be able to find a way to automate that, but DNS has limitations related to TTL, such that if you are doing this too frequently, visitors to your cluster are likely to experience issues.

But if your nodes never come and go, this is a pretty good way to run a cluster and keep it on the cheap. If the traffic you want to balance is not HTTP then ingress won't help you (for now?), but the configuration for HAproxy will be similar.