I've been using Kubernetes on Azure and GCE recently and it's absolutely wonderful.
I was able to setup an entire ecosystem from scratch in a week that scales well and can be managed in one location.
When I first looked at Kubernetes, the complicated part was setting it up on a cluster. If you use Kubernetes on GCE, or Azure, you don't have to do that step, everything else is ready to go for you!
- Automatic scaling of your application
- Service discovery
- Secrets and config management
- Logging in one central dashboard
- Able to deploy various complicated, distributed, pieces of software very easily using Helm (Jenkins, Kafka, Grafana + Prometheus)
- Able to add new nodes to the cluster easily
- Health checks and automatic restarts
- Able to deploy any container to your cluster in a really simple way (if you look at Deployments, it's really simple.)
- Switch between cloud providers and still maintain the same workflow.
I won't ever touch Ansible again, I really prefer the Kubernetes way of handling operations (it's like a live organism instead of something you apply changes to.)
Also, the entire argument that you probably don't need Kubernetes because your organization doesn't have 10s, or 100s of nodes just doesn't make sense after using it.
Having a Kubernetes cluster with 3 nodes is 100% worth it, even for rather simple applications in my opinion. The benefits are just way too good.
I'm glad you mentioned Azure and GCE -- I started off with minikube locally and things didnt really become magical until i started using a kops-deployed AWS-EC2 cluster and saw the magic of elastic scaling, ease of deployment, and migrate-ability.
PS - Could you share which books/websites/resources you used to get up to speed to the point where you're at now?
Would you mind sharing links to any of the resources you used to get yourself going with kops on AWS? I'm in the midst of setting up my first cluster as we speak.
It is also nice that I could do this on macs and linux. I used my personal MBP to do some and my office machine (system76 running Ubuntu.)
The best advice would be to create, test, delete over and over and over. I used personal AWS funds to do this while in grad school, so the added pressure of making the best use of limited funds was great -- i would spin up dozens and dozens of machines, join, run, tear down in short timeframes. As usual, choose a simple project. I chose multi-machine TensorFlow inference a model since that is something that is an obvious candidate for elasticity.
Funny story - I once spun up a K8 cluster and tried to delete the machines manually on AWS and witnessed how incredibly robust the cluster is (new nodes kept getting re-instantiated.) Lessons learned: 1. K8 is robust; 2. use kops to also tear down the cluster;
AWS has their own tutorial, but I didnt like that -- it was too focused on tangential details.
Not sure why you would throw Ansible into your pod.
Someone has to build your docker images Jenkins, Kafka and Grafana + Prometheus.
I also like Kubernetes but i don't think Kubernetes is necessary for small controlled environments.
Ansible just works with help of galaxy. I set up Jenkins, Kafka, Grafana + Prometheus faster with ansible than with kubernetes with more/easier control, specificly when i take care of stuff inside those services which are not yet thought of in the docker container.
Also the scaling thing, small companies just don't need.
EDIT: I should preface my little rant by saying that this post is one of the best I've seen at explaining the basic concepts of Kubernetes. But obviously I'm not an expert :)
> The Kubernetes API should now be available at http://localhost:8001, and the dashboard at this rather complicated URL. It used to be reachable at http://localhost:8001/ui, but this has been changed due to what I gather are security reasons.
I was playing around with GCE Hosted Kubernetes about a year ago, and things were pretty clear as far as I recall. I've read lots of positive things, and figured it's a good way to start.
Then I tried again recently, and I couldn't even get to the dashboard. Eventually after several cryptic StackOverflow copy&pastes I managed to load it (don't even remember how), only for the session to expire after 10 minutes or so... It was utterly frustrating. I didn't actually get to the more interesting part I was planning to play with as a result...
People say that there's a learning curve, and I get it. And also I'm not even trying to install Kubernetes on my own, but try to use a hosted service. I'm also pretty switched on when it comes to security and trying new things (or I'd like to think I am), but there are some things that feel like too much of an obstacle for me unfortunately.
GKE actively disadvises from using the Kubernetes Dashboard and recommends their dashboard as a replacement.
On their website [1] they list the following:
> Caution: As of September 2017, Kubernetes Dashboard is deprecated in Kubernetes Engine. You can use the dashboards described on this page to monitor your clusters' performance, workloads, and resources.
That's good to know. I guess I was just heading in the wrong direction from the start. I think I did try the built-in dashboard on the GCE web console, but somehow didn't see any of the stuff I was expecting (but maybe I've just assumed the "real" dashboard is the one I was trying to load).
I gave up on my own Kubernetes writeup a while back. I just had a lot of trouble with basic networking configuration, logging, etc.
I've been at one shop with a large scale DC/OS installation. You can run a k8s scheduler on DC/OS, but by default it uses Marathon. DC/OS has it's own problems for sure, and both tools require a full time team of at least 3 people (we had 8~10) and there are a lot of things that will probably need to be customized for your shop (which labels to use, scripts to setup your ingress/egress points in AWS, HAproxy configuration or marathon-lb configuration .. which is just a haproxy container/wrapper), but I think I still prefer marathon.
I briefly played with nomad and which I had spent more time with it. I know people from at least one startup around where I live using it in production. It seems to be a bit more minimal and potentially more sane.
The thing I hate about all of these is there is no 1 to n scaling. For a simple project, I can't just setup one node with a minimal scheduler. DC/OS is going to cost you ~$120 a month for one non-redundant node:
I hear people talk about minicube, but that's not something you can expand from one node to 100 right? You still have to build out a real k8s cluster at some point. All of these tools are just frontends around a scheduling and container engine (typically Docker and VMs) that track which containers are running where and track networking between nodes (and you often still have to chose and configure that networking layer .. weavenet, flannel, etc).
I know someone will probably mention Rancer, and I should probably look at it again, but last time I looked I felt it was all point-n-click GUI and not enough command line flags (or at least not enough documented CLI) to really be used in an infrastructure as code fashion.
I feel like there's still a big missing piece of the docker ecosystem, a really simple scheduler that can easily be stood up on new nodes and attach them to an existing cluster, and has a simply way of handling public IPs for web apps/haproxy containers. I know you can do this with K8s, DC/OS, etc. But there is a lot of prep work that has to be done first.
> All of these tools are just frontends around a scheduling and container engine ...
Well, that's a gross simplification of what Kubernetes is. (I don't know about Marathon.)
Kubernetes is a "choreographer" of cluster operations. At its core it's a consistent object store that contains a description of the state you want your cluster to be in. Various controllers monitor this store and try to "reconcile" the real world with the desired state. Operations include things like creating persistent volumes, setting up networking rules, and, of course, running applications. To say that it's a frontend for a container engine is a bit misleading, since Kubernetes can control so much more.
It's a nicely layered system — a "pod" describes the desired state of a single instance of an app, a "replica set" describes the desired state of a set of pods, a "deployment" describes the desired incremental rollout of a replica set, and so on. It's also a design that scales down to a single node (hence the popularity of Minikube as well as Docker for Mac, which includes Kubernetes), as well as up.
It's also a design that means that with a few exceptions, your configuration can target any Kubernetes cluster, not a specific cloud vendor. Without a single modification, I can deploy my app to the local Kubernetes on my laptop, or to our production cluster on Google Cloud. While migrating to Docker/Kubernetes took nearly a year, migrating away from GCP would take us probably less than a week (most of it involving pointing DNS to new load balancers, and moving persistent volumes over).
Beyond Google Kubernetes Engine and various other clouds (Azure is apparently very good), there's a bunch of tools now that do the heavy lifting of creating a cluster somewhere. Kubeadm and Kops are both popular.
It abstracts away all the work of setting up a close to production HA cluster, so you can jump quickly into developing & deploying your app. You can start with 1 node and ask GKE to scale to N when you want it.
it basically comes down to bootstrapping it like normal and then removing the 'node-role.kubernetes.io/master' taint so that things can run on the master node.
The one area in kubeadm that is still being worked on is bootstrapping a HA cluster, but if you don't mind having a single master node, you can easily bootstrap a cluster and then add nodes to it later.
I'm re-evaluating k8s again - tried it one or two years ago and hit some roadblocks for my use-case.
Kelsey's tutorial is a bit outdated (Oct 2 2017 with k8s v1.8, v1.11 just got released). Here is a link to the official kubeadm guide for Creating a single master cluster with kubeadm:
on a server / VM (after installing docker and kubeadm, of course). Add a pod network add-on (Calico seems to work well, 2 commands to install), remove the mentioned taint and optionally join more worker nodes (also a single kubeadm command). Every step is in the guide, just copy & paste. ;)
Note: This is no production-ready cluster (it has a single master), also you should have some basic understanding of k8s, which the OP provides. I also highly recommend to dig around kubernetes.io/docs - good material there.
I started with kubeadm some days before the release of k8s v1.11, which made some stuff I wrote obsolete, oh well... :) I really like the new kubeadm phase stuff, though.
There is also an official guide for Creating Highly Available Clusters with kubeadm (it's updated for v1.11) which I just went through:
I opted for the "stacked masters" approach (run HA etcd right on the k8s master nodes), wrote some ansible tasks to automate to boring stuff like copying configs/certs etc., and am currently (re-)exploring add-ons and advanced functionality (helm, network policies, ingress controller, ceph via helm, ...).
I have yet to see a good tutorial that shows an automated build of a kubernetes cluster. Yeah, you can use GKE, but that gets to be prohibitively expensive.
Mist.io provides a Cloudify blueprint that can be used to deploy and scale Kubernetes clusters on any supported cloud. It's using kubeadm under the hood.
I really want to like Kubernetes, but going beyond the basics seems to require a way higher understanding of systems engineering than I currently have. Yes I know you can create container networks and stateful pods with attached storage, but how is always seemingly beyond me. Network and storage in distributed computing is hard and Kubernetes seems to be a slightly more magical bullet than Docker Swarm alone.
Completely agree... Going beyond the basics is really hard. But as I understand it's because of inadequate knowledge of advanced network/storage virtualization concepts. Any help on how to get started in those? As a side note I have decent knowledge of basic networking/storage.
Look up persistent volume claims and dynamic provisioning. If you cluster is properly configured, then you are one yaml file away from having that. By "configured properly" this means having the correct cloud provider set, otherwise it cannot talk to the APIs.
To be fair, this stateful containers in general are a relatively new thing in K8s and support has been improving.
Also, K8s is trying to do and abstract away a lot, it is more like a distributed operating system by itself. So it is more complicated than swarm.
We use kubernetes to spin up the application that I work on (in private cloud and at some point in hybrid and public cloud) deployments. It’s an end user installed tool. In deployment, about 1/4 of the new installations fail because of some problem or another. Either the GPU plugins for NVIDIA weren’t loaded correctly, kube-dns won’t start because docker0 isn’t in a “trusted” zone in redhat (not being in trusted seems to cause iptables to subtly screw up container to container communication between the various private networks), or helm just decides that it can’t start.
Are we doing it wrong?
We’re using hyperkube and k8s 1.8 which came out around q4 of last year.
Almost all of these I can trace back to user error (ie we told folks to do X, they didn’t, and stuff broke). We’re now having to write a preflight checklist of sorts that the app runs through to make sure A bunch of stuff is “ok.” That in itself becomes brittle in my experience so I’m reluctant to do that.
It requires considerable operational experience and effort to run well on bare metal. Have you considered experimenting with a managed Kubernetes offering? Out of EKS, AKS, and GKE, I cannot recommend GKE highly enough.
Even if you can’t use it for production, it’s highly worth setting up a prototype environment on GKE to see what it gives you. I believe they now have GPU support, including support for preemptible GPUs (much cheaper).
Also GKE does a good job of staying up to date with releases. They are on 1.10, with support expected soon for 1.11. They _fully_ manage the upgrade of both the Kubernetes and etcd masters, and the worker nodes.
As mentioned in the linked blog post, Kubernetes is a fast moving project, and to use it you should plan and allocate significant resources in your team to keeping up with the new releases. There are a large number of fixes and improvements since 1.8 and I would look very seriously at both upgrading, and changing your processes to allow you to stay closer to the current release version.
The Kubernetes project does not, and has no current plans to, have a long term support release.
What is now happening around Kubernetes is that various companies and projects are coming up with "distributions" of their own. These package the underlying OS, the Kubernetes binaries, and a number of built-in system pods that take care of various things such as logging, scheduling and monitoring. Red Hat's OpenShift is one of them, as well Weaveworks. Maybe try having a look at one such solution?
There is also the Canonical Distribution of Kubernetes (CDK).
If you want to try a stripped down version on your local machine, kubernetes-core can be installed into LXD containers via conjure-up with a click of the fingers - nice for playing around.
Not sure if I got the scenario. Are you installing K8s on bare metal using a previously installed OS that you do not control?
If so, that will be harsh. I am not sure how to help you there. There are many things that could go wrong as you say.
If you do have control over the OS (you are providing an ISO or virtualization image) then it should mostly "just work". My company is doing a similar thing, only we ship boxes with everything pre-installed. There is another scenario for hybrid cloud, but even then they download a VMWare image.
Also if you do have control over the OS: can you use CoreOS instead? It is very well suited for running K8s and has less things that can go wrong. RedHat bought them anyway. With Ignition (or even the old-fashioned cloud init with a config drive), it is a no-touch deployment (you do have to generate and inject the certificates beforehand).
One thing that sounds weird is that you are "telling folks to do X". Can you avoid telling them anything and have it automated?
actually you should probably look into kubeadm or bootkube.
Currently I tried to have some kind of scratch cluster myself, however it's just way easier to maintain/update clusters that are on kubeadm or bootkube.
also stuff that you describe does not look like some user error.
btw. you should not expose the deployment yaml/json to "users"/"developers".
You should have a ci that just runs `kubectl set image deployment/name pod-name=IMAGE` and keep all deployment descriptors, etc in a seperate source repository.
Isn't this exactly what you should expect from kubernetes? If you can't afford a team of people working on it you are supposed to give up and accept GKE lock-in.
I don't think GKE is lockin when there are other offering from other companies and the migration is quite simple. Also if at some point you feel like it's time to host your own then you can do it. Also I don't find GKE that expensive especially paired with their cheap vms or preemptive vms if you're fine with those.
We are working on a project with standard LXC containers [1] which tries to make orchestration and some of this stuff especially networking simpler.
We support provisioning servers, building overlay networks with Vxlan, BGP & Wireguard, distributed storage and rolling out things like service discovery, load balancers and HA.
It may be worth exploring for those struggling with some of the complexity around container deployments. At the minimum it will help you understand more about containers, networking and orchestration.
I wish I could find a tutorial for bare metal:
- how to setup cert creation for fqdn with Cloudflare
- storage
- ingress to support multiple ips glued to different nodes (so if service gets IP x it gets routed through node z that has this external IP)
I spent 6 months trying to do that and no luck.
What I find the hardest to figure out is how to properly deploy databases in kubernetes. What kind of volumes should I use and how do I configure them for production instead of some hello world situation?
A database would normally be deployed as a stateful set, with the data stored in a volume, created and maintained by a PVC (persistent volume claim). This mechanism ensures that in the case of your pod dying and getting recreated on a different volume, it would see the same data as before, with the PVC getting mounted on the same pod. This mechanism is still one of the weaker links in the Kubernetes chain, in my experience, as the PVC feature is implemented completely differently on each platform, and is prone to bugs (see https://github.com/kubernetes/kubernetes/issues/60101, from Kubernetes version 1.7 on Azure, for example). Kelsey Hightower himself mentioned in a tweet that he prefers hosted solutions for persistency, and Kubernetes for stateless applications. Also, you should have a look at operators for managing applications on Kubernetes, such as this one for PostgreSQL: https://github.com/zalando-incubator/postgres-operator.
I think in ever docker ecosystem shop I've worked at, Databases are always on RDS. For large storage, we bought NAS solutions in the local data center. Our DC/OS cluster was primarily for things that didn't need to keep any sort of state. We had a few containers that needed NFS access and that was a lot of special work with the platform team, using labels to pin containers to certain nodes.
Some like Cassandra are easier since it is multi-master and so you could just use local storage and if the node goes down then it's not the end of the world.
There is an interesting solution for MySQL: https://vitess.io which apparently is used by Youtube and Slack.
What is the typical use case for clustered applications? What size organization really needs it? I understand that a simple nginx static site can accept thousands of
simultaneous connections per host. That sounds pretty huge to me. If you were to sell kubernetes based solutions who would you consider selling to? What makes kubernetes fundamentally superior to docker swarm?
If you are on AWS, try ECS first, it is much simpler and has main features you need: HA, autoscaling, version control.
When you'll mature from using Docker to high volume production, you should ditch containers at all, they are good for prototyping and testing, but not for production loads and production security.
ECS has a docker size limit of 4GB before it will crash silently with no user facing error message. It is also is pretty buggy with the agent sometimes not starting and paying for nodes that aren't used.
Also containers are better for production security than just having apps sitting side by side on the same disk.
ECS is simple to get started, but does not have a vast array of features that are necessary to run a service reliably at scale. It is possible to add many of these to ECS, but it’s left as an exercise to the reader to write up various API’s, cloudwatch, Lambda, etc.
I can provide more specific details if there’s questions.
I've recently moved from ECS to Kubernetes, and much prefer kubernetes
In general ECS is fine when you have 1 or 2 services, but once you have 10s or 100s it gets pretty unusable.
Logging: While ECS ships everything to cloudwatch, actually accessing them is a nightmare and usually more effort than it's worth. It's very easy on k8s to get logs shipped from every container to elasticsearch and browse with kibana.
Autoscaling: Possibly less of an issue now that fargate's a thing - but fargate's expensive. Kubernetes makes it easy to set up both node level autoscaling, and deployment level autoscaling
Ingress: With ECS you have to set up loadbalancers for every service individually - in k8s once the controller's properly set up every service defines its own ingress, and the only manual change you have to do is create a DNS entry (only if you can't use a wildcard)
Metrics: Prometheus is very ingrained in kubernetes, and you get a wealth of information from every service almost "for free" once you've installed it.
And more: service meshes, secret control (through e.g. vault), declarative definitions, provider agnostic
Kubernetes has a huge startup cost and learning curve, but is incredibly powerful once you're up and running
Autoscaling: No need for Fargate. Create 'AWS::ApplicationAutoScaling::ScalableTarget' and 'AWS::ApplicationAutoScaling::ScalingPolicy' for your 'AWS::ECS::Service' resource.
Logging: Subscribe the ElasticSearch Lambda to CloudWatch Logs group to have all logs in ElasticSearch domain. Kibana is included in 'AWS::Elasticsearch::Domain' resource.
Ingress: Yes, one ALB per one autoscaled ECS Service, I don't see it as a problem.
Metrics, And more: not sure I see the benefits. CloudWatch has miriad of metrics, you can get lost there :)
We've had good luck with ECS (though occasionally I'll find it doesn't shut old tasks down). Honestly though I feel like ECS will go the way of services like SimpleDB, Cloud Search, etc. (Never going away, but a secondary offering)
If you're just launching a standard app, pretty easy. However, anything complex requires writing a ton of AWS SDK code. (For example, we launch a spot instance with a given task definition, assign a given customer's work to it, then shut it down when the work is finished, and then on to the next customer)
We had a simple setup before (and have used spot instances everywhere it made sense). Without getting into the specifics of the use case, the ability to target a specific instance for specific backend processing tasks became necessary. Rest assured, it's not "too complex", only "as complex as necessary". Be careful not to respond with a rote "you're doing it wrong"-ism when discussing technical merits.
I couldn't get my nginx container to communicate with my app container in ECS however it all worked locally. And theres very little docs or troubleshooting resources.
Not the person that originally asked you, but I'm interested in hearing why ditching docker is needed for high volume production. First, what do you consider high volume? After that, what's wrong with docker within that definition?
Maxing out your CPU and networking. If you're there, you'll be running many hosts under load balancers. Be it cloud instances or physical machines. Here you have no need for containers. Web servers and backend apps are capable of multi threading.
We max out our compute about 3% of the time, and run at about 10% load the rest of the time. It's a similar story for memory consumption. And we run this multitenanted, where the max doesn't overlap for different tenants. And we can't move our customer data out of our data centre, so we can't use cloud elastic compute.
It's in this kind of a world that elastic compute provided by a cluster scheduler with efficient packing looks really appealing. And our jobs, while meaty in time and space, are stateless once the initial data is poured in, up until the results pop out - they are big functions, basically. Good match for a stateless container.
Excellent point :) So. First of all, you'd not have to worry about following, if you don't use containers.
If your app, running in Docker container, is compromised, like PHP webshell, it might try to escape the container. What capabilities you granted for your containers? CAP_SYS_ADMIN, CAP_NET_ADMIN or you have no idea? This is just one example of escaping https://www.twistlock.com/2017/12/27/escaping-docker-contain... and lets talk about namespaces, like user namespace. Is root inside container is a root user on host system?
1. Every system has vulnerabilities. You can defend against them.
2. Any improperly configured system can be abused. In particular, the exploit you linked can be completely stopped with a litany of ways. https://news.ycombinator.com/item?id=16030107
Your argument going from "containers are unfit for production, you'll mature out of them one day" to "here's a small, preventable vulnerability" seems more like a security non-sequiter than an actual argument against containerization.
Further, claiming containers are not production ready is empirically and literally negated by them used, in production, by the largest tech companies that have ever existed.
If the same compromised app is running natively then the entire system is now compromised. Capabilities and namespaces can be used with or without containers.
I was able to setup an entire ecosystem from scratch in a week that scales well and can be managed in one location.
When I first looked at Kubernetes, the complicated part was setting it up on a cluster. If you use Kubernetes on GCE, or Azure, you don't have to do that step, everything else is ready to go for you!
- Automatic scaling of your application
- Service discovery
- Secrets and config management
- Logging in one central dashboard
- Able to deploy various complicated, distributed, pieces of software very easily using Helm (Jenkins, Kafka, Grafana + Prometheus)
- Able to add new nodes to the cluster easily
- Health checks and automatic restarts
- Able to deploy any container to your cluster in a really simple way (if you look at Deployments, it's really simple.)
- Switch between cloud providers and still maintain the same workflow.
I won't ever touch Ansible again, I really prefer the Kubernetes way of handling operations (it's like a live organism instead of something you apply changes to.)
Also, the entire argument that you probably don't need Kubernetes because your organization doesn't have 10s, or 100s of nodes just doesn't make sense after using it.
Having a Kubernetes cluster with 3 nodes is 100% worth it, even for rather simple applications in my opinion. The benefits are just way too good.