Hacker News new | past | comments | ask | show | jobs | submit login
“Let’s use Kubernetes.” Now you have eight problems (pythonspeed.com)
719 points by signa11 on March 5, 2020 | hide | past | favorite | 469 comments



The odd thing about having 20 years of experience (while simultaneously being wide-eyed about new tech), is that I now have enough confidence to read interesting posts (like any post on k8s) and not think "I HAVE to be doing this" – and rather think "good to know when I do need it."

Even for the highest scale app I've worked on (which was something like 20 requests per second, not silicon valley insane but more than average), we got by perfectly fine with 3 web servers behind a load balancer, hooked up to a hot-failover RDS instance. And we had 100% uptime in 3 years.

I feel things like Packer (allowing for deterministic construction of your server) and Terraform are a lot more necessary at any scale for generally good hygiene and disaster recovery.


I have, at various times in my career, tried to convince others that there is an awful, awful lot of stuff you can get done with a few copies of nginx.

The first “service mesh” I ever did was just nginx as a forward proxy on dev boxes, so we could reroute a few endpoints to new code for debugging purposes. And the first time I ever heard of Consul was in the context of automatically updating nginx upstreams for servers coming and going.

There is someone at work trying to finish up a large raft of work, and if I hadn’t had my wires crossed about a certain feature set being in nginx versus nginx Plus, I probably would have stopped the whole thing and suggested we just use nginx for it.

I think I have said this at work a few times but might have here as well: if nginx or haproxy could natively talk to Consul for upstream data, I’m not sure how much of this other stuff would have ever been necessary. And I kind of feel like Hashicorp missed a big opportunity there. Their DNS solution, while interesting, doesn’t compose well with other things, like putting a cache between your web server and the services.

I think we tried to use that DNS solution a while back and found that the DNS lookups were adding a few milliseconds to each call. Which might not sound like much except we have some endpoints that average 10ms. And with fanout, those milliseconds start to pile up.


> I think I have said this at work a few times but might have here as well: if nginx or haproxy could natively talk to Consul for upstream data, I’m not sure how much of this other stuff would have ever been necessary.

To be fair, half of the API Gateways and edge router projects out there are basically nginx with a custom consul-like service bolted on.


You can do get around the nginx Plus requirement by using a module like ngx_mruby to customize the backend selection. I haven't measured the latency, so it may not be suitable for your 10ms example.

Here's a post I wrote on that ~4 years ago that uses an in-process cache [1]. It'd be fairly easy to add an endpoint to update it and pull data from Consul. I agree with you, it's a missed opportunity - there are alternatives, but being able to rely on a battletested server like nginx makes a difference.

[1] http://hokstadconsulting.com/nginx/mruby-virtualhosts


As a fan of nginx, I really liked your comment. In sleuthing after reading I came across this:

https://learn.hashicorp.com/consul/integrations/nginx-consul...

It appears that if the consul client has the right permissions it can restart the nginx service after editing the configuration file. It uses the consul templating engine to generate an nginx config file.

I haven't tried it myself but it looks promising.


> if nginx or haproxy could natively talk to Consul for upstream data, I’m not sure how much of this other stuff would have ever been necessary

Airbnb's Smartstack works well for this. It's not built in to nginx as a module, but I think it's more composeable this way.

Blog post: https://medium.com/airbnb-engineering/smartstack-service-dis...

The two main components are nerve (health checking of services + writing a "I'm here and healthy" znode into zookeeper, https://github.com/airbnb/nerve) and synapse (subscribes to subtrees in zookeeper, updates nginx/haproxy/whatever configs with backend changes, and gracefully restarts the proxy, https://github.com/airbnb/synapse).

It's fairly pluggable too if you don't want to use haproxy/nginx.


Then you have a dependency on zookeeper when you already have consul... it seems like consul template + nginx or haproxy is the solution hashicorp went with.


I totally agree, especially about being able to serve content out of a cache instead of files. It would simplify some of my configuration especially for static sites that point to a CDN.

I like what Caddy is doing, exposing their entire configuration through a REST interface.


You should check fabio ( https://github.com/fabiolb/fabio ), it is really awesome ).

I 100% agree with you, I've been using Consul for four years now to run 100s of services in 1000s of VMs across datacenters distributed globally and not once I saw the need for anything else...

Maybe I just don't have the scale to find service mesh or kubernetes interesting. Nomad however is something I am willing to give a go for stateless workflows that I would usually provision a VM running a single docker container for.


> I have, at various times in my career, tried to convince others that there is an awful, awful lot of stuff you can get done with a few copies of nginx.

under the load point of view, yes. absolutely. no doubt.

under the speed of action, no way. if your k8s cluster is properly managed, you can let developers do most of the operations work themselves, confined into their namespaces, touching only the kind of resources that you tell them to touch.


I personally would advise against using DNS for service discovery, it wasn't designed for that.

The few milliseconds that you get though, most likely is due to your local machine not having DNS caching configured, this is quite common in Linux. Because of that every connection triggers a request to DNS server. You can install unbound for example to do it. nscd or sssd can also be configured to do some caching.


> I personally would advise against using DNS for service discovery, it wasn't designed for that.

It was designed for that but the SRV record requires protocols and their clients to explicitly support it. You can argue that this an unreasonable design choice but load balancers like HAproxy do support SRV records.


Why is dns not used for service discovery? The internet as a whole uses it for service discovery currently


Internet as a whole uses it to provide human friendly names.

I'm saying it is not good idea to use DNS for service discovery, there's a way of using it correctly, but it requires software to do the name resolution with service discovery in mind, and you're guaranteed that majority of your software doesn't work that way.

Why you shouldn't use DNS? It's because when you communicate over TCP/IP you need an address that's really the only thing you actually need.

If you use DNS for discovery you probably will set low TTL for the records, because you want to update them quickly, this means for every connection you make you will be checking DNS server providing extra load on the DNS server and adding latency when connecting.

On failure of a DNS server, even if you set a large TTL, you will see an immediate failure on your nodes the reason is that's how DNS cache works. Different clients made the DNS request at different time so the records will expire at different times. If you did not configure a local DNS cache on your hosts (most people don't) then you won't even cache the response and every connection request will go to a DNS server, so upon a failure everything is immediately down.

Compare this to have a service that edits (let say an HAProxy) configuration and populates it with IP addresses. If the source that provides the information goes down, you simply won't have updates during the time, but the HAProxy will continue forwarding requests to IPs (if you use IPs instead of hostnames, then you also won't be affected by DNS outages).

Now there are exceptions to this, certain software (mainly load balancers such as pgbouncer (I think HAProxy also added some dynamic name resolution)) use DNS with those limitations in mind. They basically query DNS service on the start to get IP and then periodically query it for changes, if there's a change it is being applied, if the DNS service is down they will keep the old values.

Since they don't throw away the IPs when a record expires, you don't have this kind of issues. Having said that, majority of software will use system resolver the way DNS was designed to work and will have these issues, and if you use DNS for service discovery, you, or someone in your company will use it with such service and you'll have the issues described above.


>Compare this to have a service that edits (let say an HAProxy) configuration and populates it with IP addresses.

Just edit the hosts file? If you have access to machines that run your code and can edit configuration, and also don't want the downsides of resolvers (pull-based instead of push-based updates, TTLs), DNS still seems like a better idea than some new stacks, plus you can push hosts files easily via ssh/ansible/basically any configuration management software

EDIT: The only issue I see with DNS as service discovery is that you can't specify ports. But usually software should use standard ports for their uses and that's never been a problem in my experience.


You can specify ports using SRV resource records.


You could but there's no integration for that that i know of so it'd be a bit of work to get working, which is why i didnt include it



It's how mdns works with Avahi/Bonjour/Zeroconf


This is interesting! Do you have some material on load testing the DNS servers and seeing their breaking point? I've heard as much from other people but never experienced it in practice even using Consul with 0 TTL everywhere.

Perhaps the network infrastructure team always scaled it correctly behind the scenes but they never once complained about the amount of DNS queries.


DNS is fairly lightweight and if you have one local on premises, it might be less noticeable, especially if latency is not critical (in previous places I worked that was the setup, we still had a local cache on every host and I would encourage doing that, it increases resiliency). If latency is critical, not having a cache adds extra round trip on every connection initiated.

If you have hosts on public cloud and use DNS server that is also shared with others the latency typically might be bigger and on high number of requests you might also start seeing SERVFAIL on large number of requests.

I can't find the forum post anymore, but people who had applications that were opening large number of connections (bad design of the app imo, but still) had huge performance degradation when they moved from c4 to c5 instances. It turned out that this was because of the move from Xen to Nitro (based on KVM).

Side effect of using Xen was that the VM Host was actually caching DNS requests by itself, from which all guests benefited. In the KVM, all DNS requests were going directly to the DNS server.


> I think we tried to use that DNS solution a while back and found that the DNS lookups were adding a few milliseconds to each call. Which might not sound like much except we have some endpoints that average 10ms. And with fanout, those milliseconds start to pile up.

Don't resolve DNS inline rather on every DNS update, resolve it and insert new IP addresses.


Correct me if I'm wrong, but I believe Consul, lacking a mesh of its own, is leveraging the early 1990's era trick of using round robin DNS to split load over available servers.

Caching those values for very long subverts the point of the feature.


By way of correction: Consul does not simply "round robin" DNS requests unless you configure it in a particularly naive manner.

Prepared queries [1] and network tomography (which comes from the Serf underpinnings of the non-server agents) [2] allow for a much wider range of topologies just using DNS without requiring proxies (assuming well behaved client software, which is not a given by any stretch).

Furthermore, Consul _does_ have a mesh as around 2 years ago - [3].

You are correct though that long caches subvert much of the benefit.

[1]: https://www.consul.io/api/query.html

[2]: https://www.consul.io/docs/internals/coordinates.html

[3]: https://www.consul.io/docs/connect/index.html


Not really - resolve all backend servers to IPs and list all of them as the nginx backends. When a backend server is removed, update nginx backends.

Round-robin balancing using DNS towards a small cluster is silly - you know when any new instance is added to the pool or removed from a pool, so why not push that load balancing onto the load balancer which in your case is nginx?


You're talking about layering the thing on top of Consul that I already identified in my top level comment.

Consul itself advertises DNS resolution for service discovery.


Maybe I was not clear.

Whatever is the technology that you use to register the active backends in the DNS, rather than doing name => ip address lookup per request, you can resolve all those names => ip address maps upon the service being brought up/taken down and push the resolved map as a set of backends into nginx config, thus removing the need to query DNS per request.


Kubernetes has complexity for a reason. It's trying to solve complex problems in a standardized and mature manner.

If you don't need those problems solved then it's not going to benefit you a whole lot.

Of course if you are using docker already and are following best practices with containers then converting to Kubernetes really isn't that hard. So if you do end up needing more problems solved then you are willing to tackle on your own then switching over is going to be on the table.

The way I think about it is if you are struggling to deploy and manage the life cycle of your applications... fail overs, rolling updates, and you think you need some sort of session management like supervisord or something like to manage a cloud of processes and you find yourself trying to install and manage applications and services developed by third parties....

Then probably looking at Kubernetes is a good idea. Let K8s be your session manager, etc.


I would qualify that a little more. If you are using docker and deploying to a cloud environment already, then moving to a cloud-managed kubernetes cluster really isn't that hard.

I've seen too many full-time employees eaten up by underestimating what it takes to deploy and maintain a kubernetes cluster. Their time would have been far better spent on other things.


That story is getting a lot easier but historically was awful


What is a "mature manner"?


For example by covering all of the NFRs.

You don't always find open source programs that have dedicated so much effort to security, monitoring, governance etc. And doing so in a very professional and methodical way.


There's always more than one way to do things, and it's good to be aware of the trade-offs that different solutions provide. I've worked with systems like you describe in the past, and in my experience you always end up needing more complexity than you might think. First you need to learn Packer, or Terraform, or Salt, or Ansible - how do you pick one? How do you track changes to server configurations and manage server access? How do you do a rolling deploy of a new version - custom SSH scripts, or Fabric/Capistrano, or...? What about rolling back, or doing canary deployments, or...? How do you ensure that dev and CI environments are similar to production so that you don't run into errors from missing/incompatible C dependencies when you deploy? And so on.

K8s for us provides a nice, well-documented abstraction over these problems. For sure, there was definitely a learning curve and non-trivial setup time. Could we have done everything without it? Perhaps. But it has had its benefits - for example, being able to spin up new isolated testing environments within a few minutes with just a few lines of code.


> First you need to learn Packer, or Terraform, or Salt, or Ansible - how do you pick one?

You don't. These are complementary tools.

Packer builds images. Salt, Ansible, Puppet or Chef _could_ be used as part of this process, but so can shell scripts (and given the immutability of images in modern workflows, they are the best option).

Terraform can be used to deploy images as virtual machines, along with the supporting resources in the target deployment environment.


> Salt, Ansible, Puppet or Chef _could_ be used as part of this process, but so can shell scripts

I don't see the point of your post, and frankly sounds like nitpicking.

Ansible is a tool designed to execute scripts remotely through ssh on a collection of servers, and makes the job of writing those scripts trivially easy by a) offering a DSL to write those scripts as a workflow of idempotent operations, and b) offer a myriad of predefined tasks that you can simply add to your scripts and reuse.

Sure, you can write shell scripts to do the same thing. But that's a far lower level solution to a common problem, and one that is far hardsr and requires far more man*hours to implement and maintain.

With Ansible you only need to write a list of servers, ensure you can ssh into them, and write a high-level description of your workflow as idempotent tasks. It takes you literally a couple of minutes to pull this off. How much time would you take to do the same with your shell scripts?


As he mentioned, immutable images make those type of tools largely moot.


Yes. Three web servers and a load balancer is fine. Three web servers and a load balancer, repeated 1,000 times across the enterprise in idiosyncratic ways and from scratch each time, is less fine. That’s where Kubernetes-shaped solutions (like Mesos that came before it) become appropriate.

You can get a lot done with a sailboat. For certain kinds of problems you might genuinely need an aircraft carrier. But then you’d better have a navy. Don’t just wander onto the bridge and start pressing buttons.


You're right.

However, a lot of new (or just bad) devs miss the whole Keep It Simple Stupid concept and think that they NEED Kubernetes-shaped solutions in order to "do it the right way".

Many times three web servers and a load balancer are exactly what you need.


And then what about when you need to add monitoring, logging, APM, tracing, application metrics etc.

Suddenly you have gone from 3 instances to 20.


Adding an entirely new instance is not the only way to accomplish each of those things. A lot of those things can be treated just like applications. You don't need a whole new computer to run Outlook, another computer to run Sublime Text, another computer to run Chrome, etc etc.

All of that is irrelevant to my main point though. It's never one size fits all and then all your problems are solved.

You are far better off actually assessing your needs and picking the right solution instead of relying on solutions that "worked for bigger companies so they'll work for me" without really giving it a lot of thought if you need to go that far.


> Adding an entirely new instance is not the only way to accomplish each of those things. A lot of those things can be treated just like applications.

That's what containers are. Containers are applications, packaged to be easily deployable and ran as contained processes. That's it.

Kubernetes is just a tool to run containers in a cluster of COTS hardware/VMs.

I've said it once and will say it again: the testament of Kubernetes is simplify so much the problem of deploying and managing applications in clusters of heterogeneous hardware communicating through software-defined networks thay it enable clueless people to form mental models of how the system operates that are so simple that they actually believe the problem is trivial to solve.


Again .... that might be the best solution for your company, or it might be introducing complexity where it's not actually needed.

It all depends on the situation and needs of whatever problem you are trying to solve.


Almost all of those aren't single binaries like Chrome.

They often have their own databases, search engines, services etc to deploy along with it. And necessitate multiple instances for scalability and redundancy.


They can be shared across a company. Office one team handels those all other teams can just use them.


I personally went from 4 servers to 5, self hosting ELK and it works great.


Aren't those rather trivial to do if you didn't make the mistake of choosing a microservice architecture?


I get the point but I’ve also seen three web servers and a load balancer go terribly wrong at a number of places as well. L8s provides a lot of portability that you would need a disciplined and experienced engineer to match with raw deployments.


> Many times three web servers and a load balancer are exactly what you need.

May be, just may be, they want k8s not to create value but to develop/enrich resumes - in order to signal that they are smart and can do complex stuff.


I think anyone considering these wild setups should read about how stackowerflow is hosted on a couple of IIS servers. It’s a sobering reminder that you often don’t need the new cool.


Joel, if any, has always been super pragmatic and very realistic.

Not to misunderstand. For FogBugz they wrote a compiler/transpiler for Asp and PHP because the product had to run on customers servers - because "clients should not leave their private data at another company".

Google it, great read.


Tried to google for it, couldn't find anything - could you provide a link, please?


For the details on that transpiler: https://www.joelonsoftware.com/2005/03/30/the-road-to-fogbug...

I would recommend going through all of Joel Spolsky’s posts between 2000 and 2010, there are plenty of absolute diamonds. Part of why StackOverflow was so successful was because Joel had built a big audience of geeks and entrepreneurs with his excellent blog posts (he was the Excel PM during the creation of VBA and had plenty of accrued wisdom to share), so they adopted SO almost instantaneously when him and Jeff Atwood built it.


"“don’t you mean translator?“

Let me explain.

In computer science jargon a translator IS a compiler. It’s exactly the same thing. Those are synonyms."

Every time someone says "transpiler", god kills a kitten. Please, think of the kittens.


> I think anyone considering these wild setups should read about how stackowerflow is hosted on a couple of IIS servers.

Apparently in 2019 stack overflow was hosted in at least 25 servers, including 4 servers dedicated to run haproxy.

https://meta.stackexchange.com/questions/10369/which-tools-a...


Nothing much happens if SO goes down, they are not doing a ton of business transactions.


That's right. As opposed to stock exchange software, which runs on complex micro-services cloud k8s thing-as-a-thing virtualized rotozooming engines. Wait, no, it's literally just good-old n-tier architecture with one big server process and some database backend.

Pet food delivery startups use k8s to manage their MEAN stack. Meanwhile grown-ups still have "monoliths" connected to something like Oracle, DB2 or MS SQL server, because that's obviously the most reliable setup.

The cloud/k8s stuff is an ad-hoc wannabe mainframe built on shaky foundations.


> Meanwhile grown-ups still have "monoliths" connected to something like Oracle, DB2 or MS SQL server, because that's obviously the most reliable setup.

More often than not they just crystalized their 90s knowledge and just pretended there aren't better tools for the job because it would take some work to adopt them and no one notices it in their work anyway.

The "Oracle" keyword is a telltale sign.


Nothing much? The world stops producing software:)


He's got a point though. Ironically the world's most highly scaled software are often fairly unimportant. Think Facebook - people would get annoyed if it went down for a day, but it'd soon be forgotten. Your banks are built with less scalable software but is much more critical.


A typical PHP application that does a bit of database updating per request, gets some new data from the DB and templates it should handle 20 requests per second on a single $20/month VM. And in my experience from the last years, VMs have uptime >99.99% these days.

What made you settle on a multi-machine setup instead? Was it to reach higher uptime or were you processing very heavy computations per request?


Higher uptime and more computation, although this was mostly C# so very efficiently run code. It was an e-commerce site doing +100MM a year.

There was little to no room for error. I once introduced a bug in a commit that, in less than an hour, cost us $40,000. So it wasn't about performance.

Also this was 9 years ago. So adjust for performance characteristics from back then.


100 000 000$ a year with only 20 requests a second? That's some crazy revenue per request, 100 000 000 / (365 * 24 * 60 * 60) = 3.17$ per request!

What were you selling?


Insurance policies :)

Good point, actually the 100MM may have included brick and mortar.


Forgot to multiply by 20, it's actually around $0.15 per request. Still high but certainly not as crazy.


I worked on a site 20 years ago that got even better revenue per request (though with fewer requests).

It did analytics on bond deals. Cost $1k/month for an account. Minimum 3 accounts. Median logins, ~1/month/account.

On the other hand people would login because they were about to trade $10-100 million of bonds. So knowing what the price should be really, really mattered.

Wall St can be a funny place.


That's quite interesting! That's why I asked, I wasn't exactly doubting the figure, just curious about the market that allowed that kind of customer targeting.


It should be (365 * 24 * 60 * 60 * 20) which brings it to $0.15/req if I did my mental math right. Still a high amount of course.


Oh thanks, I actually did the math right the first time, kept only the result and then when I was about to hit send I thought it was too good to be true, thus I did it again and got 3.15$, which was even crazier, but couldn't find why my math was wrong.


A single server isn't redundant. 3 behind a load balancer, where each is sized to handle 50% of the volume lets you take systems offline for maintenance without incurring downtime.

Heck, Raspberry Pis have more horsepower than the webservers in the cluster I ran around Y2k.


For performance context: nginx on a $5 DO droplet does 600 requests per second on static file.

Serving static files with Elixir/Phoenix has a performance of 300 requests per second.

Python+gunicorn serves about 100 requests per second of JSON from postgres data.


(outside of GP's reply) Generically, life is messy and unpredictable, never put all your eggs in one basket. Your cloud server is sitting on a physical hyp which will need maintenance or go down, or even something in your VM goes wrong or needs maintenance. Using a basic N+1 architecture allows for A to go down and B to keep running while you work on A - whether that's DNS, HTTP or SQL etc.


If your physical hyp dies, how do you redirect your traffic to a different one?


Replace "your" with "the" - the hyp can be run by your provider (Linode, DO, Vulture, AWS, GKE, whoever). Most cloud providers have virtual/shared/managed load balancers to rent time on as well, such that you don't have to maintain N+1 of those (let them do it). You could even use basic round-robin DNS, it's a possible choice just not generally suggested.


For the load balancer solution, a lot more is needed then to just rent a load balancer.

Example: What do you expect to happen when the server with your DB goes down? Just send the next UPDATE/INSERT/DELETE to DBserver2? Which is replicated from the DBserver1? When DBserver1 comes back, how does it know that it now is outdated and has to sync from DBserver2? How does the load balancer know if DBserver1 is synced again and ready to take requests?

Even if you set up all moving parts of your system in a way that handles random machine outtages: Now the load balancer is your single point of failure. What do you do if it goes down?


Respectfully, I am not designing a full architecture here in HN comments and have not presented a full HA solution for you to pick apart. Your leading questions seemed basic, you received basic answers - going down the rabbit hole like this is just out of left field for this comment thread.


Load balancing--and yes that can become a source of failure too.


Two load balancers using either keepalived or bgp anycast.


The odd thing about having 10 years of experience as a consultant is that you know when to write "Kubernetes" into a project proposal, even though everyone agrees that it'll be a sub-optimal solution.

But both you and their tech lead want to be able to write "used Kubernetes" on your CV in the future, plus future-oriented initiatives inside your contact's company tend to get more budget allocated to them. So it's a sound decision for everyone and for the health of the project to just go with whatever tech is fancy enough, but won't get into the way too badly.

Enter Kubernetes, the fashionable Docker upgrade that you won't regret too badly ;)


I worked on a transacted 20-60k messages/s system and am not sure K8S wouldn't be a hindrance there... Imagine writing Kafka using K8S and microservices.


I don't know about "lot more necessary". The images are one part of the equation especially to meet various regulations. There is a ton to running a large scale service especially if you are the service that the people who are posting how wicked smart they are at k8 service runs on. Google found that out yesterday when they said "oh hey people expect support maybe we should charge". That is not new for grown ups.

The cloud existed before k8 and k8's creator has a far less mature cloud than AWS or Azure.

But this thread has convinced me of one thing. It's time to re-cloak and never post again because even though the community is a cut above some others at the end of the day it's still a bunch of marks and if you know the inside it is hard to bite your lip.


The most important video I've ever watched for my career and sanity: https://www.youtube.com/watch?v=bzkRVzciAZg


Who is giving you 100% uptime? All major providers (AWS, GCP, Azure, etc) all have had outages in the past 3 years. And that level of infrastructural failure doesn't care about whether or not you're using k8s.


EKS (AWS Kubernetes) has the control plane across 3 AZs.

It is very rare to have a complete region outage so it is pretty close to 100% uptime.


Not OP but you can achieve higher availability through means like redundancy, availability zones, multi-region deployments, etc.


> Even for the highest scale app I've worked on (which was something like 20 requests per second,

Kubernetes is not for you. 5kQPS times a hundred or more services and Kubernetes fits the bill.

> And we had 100% uptime in 3 years.

Not a single request failed in that time serving at 20 QPS? I'm a little suspicious.

Regardless, if you were handling 10 or 100 times this volume to a single service, you'd want additional systems in place to assure hitless deploys.


> Not a single request failed in that time serving at 20 QPS? I'm a little suspicious.

Things that aren't monitored are also things that don't fail.


Same. I like trying out new things, so I have a feel for what they're good for. I tried setting up Kubernetes for my home services and pretty quickly got to "nope!" As the article says, it surely makes sense at Google's scale. But it has a large cognitive and operational burden. Too large, I'd say, for most one-team projects.


I'm in a similar boat, only my eyes are wide, glazed over, and I'm lost in the word salad...which only seems to be getting worse.


These kinds of posts always focus on the complexity of running k8s, the large amount of concepts it has, the lack of a need to scale, and that there is a "wide variety of tools" that can replace it, but the advice never seems to become more concrete.

We are running a relatively small system on k8s. The cluster contains just a few nodes, a couple of which are serving web traffic and a variable number of others that are running background workers. The number of background workers is scaled up based on the amount of work to be done, then scaled down once no longer necessary. Some cronjobs trigger every once in a while.

It runs on GKE.

All of this could run on anything that runs containers, and the scaling could probably be replaced by a single beefy server. In fact, we can run all of this on a single developer machine if there is no load.

The following k8s concepts are currently visible to us developers: Pod, Deployment, Job, CronJob, Service, Ingress, ConfigMap, Secret. The hardest one to understand is Ingress, because it is mapped to a GCE load balancer. All the rest is predictable and easy to grasp. I know k8s is a monster to run, but none of us have to deal with that part at all.

Running on GKE gives us the following things, in addition to just running it all, without any effort on our part: centralized logging, centralized monitoring with alerts, rolling deployments with easy rollbacks, automatic VM scaling, automatic VM upgrades.

How would we replace GKE in this equation? what would we have to give up? What new tools and concepts would we need to learn? How much of those would be vendor-specific?

If anyone has a solution that is actually simpler and just as easy to set up, I'm very much interested.


I'm in the same camp. I think a lot of these anti-k8s articles are written by software developers who haven't really been exposed to the world of SRE and mostly think in terms of web servers.

A few years ago I joined a startup where everything (including the db) was running on one, not-backed-up, non-reproducible, VM. In the process of "productionizing" I ran into a lot of open questions: How do we handle deploys with potentially updated system dependencies? Where should we store secrets (not the repo)? How do we manage/deploy cronjobs? How do internal services communicate? All things a dedicated SRE team managed in my previous role.

GKE offered a solution to each of those problems while allowing me to still focus on application development. There's definitely been some growing pains (prematurely trying to run our infra on ephemeral nodes) but for the most part, it's provided a solid foundation without much effort.


Exactly, all these articles seem to come from operational novices, who think in terms of 1-2 click solutions. K8s is not a 1-2 click solution, and clearly isn't designed to be; it's solving particular tough operational problems that if you don't know exist in the first place you won't really be able to evaluate these kinds of things properly.

If a group literally doesn't have the need to answer questions like the ones you posed, then OK, don't bother with these tools. But that's all that needs to be said - no need for a new article every week on it.


> it's solving particular tough operational problems that if you don't know exist in the first place

They probably don't exist for the majority of people using it. We are using k8s for when we need to scale, but at the moment we have a handful of customers and it isn't changing quickly any time soon.


They basically exist for everyone.

As soon as you go down the road of actually doing infrastructure-as-code, using (not running) k8s is probably as good as any other solution, and arguably better than most when you grow into anything complex.

Most of the complaints are false equivalence: i.e. running k8s is harder than just using AWS, which I already know. Of course it is. You don't manage AWS. How big do you think their code base is?

If you don't know k8s already, and you're a start-up looking for a niche, maybe now isn't the time to learn k8s, at least not from the business point of view (personal growth, another issue).

But when you do know k8s, it makes a lot of sense to just rent a cluster and put your app there, because when you want to build better tests, it's easy, when you want to do zero trust, it's easy, when you want to integrate with vault, it's easy, when you want to encrypt, it's easy, when you want to add a mesh for tracing, metrics and maybe auth, it's easy.

What's not easy is inheriting a similarly done product that's entirely bespoke.


No the complaint is that we aren't scaling to google levels, in fact we are barely scaling at all. K8s isn't needed.

We ran applications without it fine a few years ago. And it was a lot simpler.


"Fine"

As in, doesn't get hacked or doesn't go down? We live in different worlds.


> Of course it is. You don't manage AWS. How big do you think their code base is?

This seems like a fairly unreasonable comparison. The reason I pay AWS is so that I _do not_ have to manage it. The last thing I want to do is then layer a system on top that I do have to manage.



Remember that not everyone is using an operational model that would benefit from K8s.


Yet, they write articles that prescribe their operational model on the rest of the world.


Problem is hidden assumptions. Happens a lot with microservices too. People write about the problems they're solving somewhat vaguely and other people read it and due to that vagueness think it also is the best solution to their problem.


They are engineers, writing about what they do and what their companies sell. I wouldn't ascribe an "imposition" to that!

As a practitioner or manager, you need to make informed choices. Deploying a technology and spending the company's money on the whim of some developer is an example of immaturity.


> I think a lot of these anti-k8s articles are written by software developers who haven't really been exposed to the world of SRE ...

Think again. There's plenty of SREs at FAANGs that dislike the unnecessary complexity of k8s, docker and most "hip" devops stuff.


From my knowledge, pretty much all FAANGs got their own tooling in place before k8s went public, covering those same issues.

Now imagine you have to do it from scratch.


Agreed. I've been saying for years that if you go with Docker-Compose, or Amazon ECS, or something lower level, you are just going to end up rebuilding a shittier version of Kubernetes.

I think the real alternative is Heroku or running on VMs, but then you do not get service discovery, or a cloud agnostic API for querying running services, or automatic restarts, or rolling updates, or encrypted secrets, or automatic log aggregation, or plug-and-play monitoring, or VM scaling, or an EXCELLENT decoupled solution for deploying my applications ( keel.sh ), or liveness and readiness probes...

But nobody needs those things right?


You do in fact get a lot of this stuff with ECS and Fargate - rolling updates, automatic restart, log aggregation, auto scaling, some discovery bits, healthchecks, Secrets Manager or Parameter Store if you want, etc.


This. We’ve been chugging along happily on Fargate for a while now. We looked into EKS, but there is a ton of stuff you have to do manually that is built into Fargate or at least Fargate makes it trivial to integrate with other AWS services (logging, monitoring, IAM integration, cert manager, load balancers, secrets management, etc). I’d like to use Kubernetes, but Fargate just works out of the box. Fortunately, AWS seems to be making EKS better all the time.


And what if Amazon feels like changing Fargate or charging differently or deprecating it or... Whats your strategy for that?


We think about it when it happens, if it ever happens.

I have seen too many projects burn money with vendor independence abstraction layers that were never relevant in production.


Like others have said, cross that bridge if/when we get to it--there's no sense in incurring the Kubernetes cost now because one day we might have to switch to Kubernetes anyway.

It's also worth noting that Fargate has actually gotten considerably cheaper since we started using it, probably because of the firecracker VM technology. I'm pretty happy with Fargate.


The fallback is to use something else less convenient--maybe k8s, nomad, some other provider's container-runner-load-balancer-thing.


I'm pretty sure AWS has a better track record than Google when it comes to keeping old crud alive for the benefit of its customers.

In my experience AWS generally gives at least a year's notice before killing something or they offer something better that's easy to move to well in advance of killing the old.

Hell, they _still_ support non-VPC accounts...


And suddenly you have a MUCH higher degree of vendor lock-in than on k8s.


The dirty secret of virtually every k8s setup--every cloud setup--is that the cloud providers' stuff is simply too good, while at the same time being just different enough from one another, that the impedance mismatch will kill you when you attempt to go to multi-cloud unless you reinvent every wheel (which you will almost certainly do poorly) or use a minimal feature set of anything you choose to bring in (which reduces the value of bringing those things in in the first place).

"Vendor lock-in" is guaranteed in any environment to such a degree that every single attempt at a multi-cloud setup that I've ever seen or consulted on has proven to be more expensive for no meaningful benefit.

It is a sucker's bet unless you are already at eye-popping scale, and if you're at eye-popping scale you probably have other legacy concerns in place already, too.


I'm not saying you'd be able to migrate without changes, but having moved workloads from managed k8s to bare-metal and having some experience with ECS and Fargate, I can tell you that the scale of disparities is significant.


To me the disparities aren't "how you run your containers". They're "how you resolve the impedance mismatch between managed services that replace systems you have to babysit." Even something as "simple" as SQS is really, really hard to replicate, and attempting to use other cloud providers' queueing systems has impedance mismatch between each other (ditto AMQP, etc., and when you go that route that's just one more thing you have to manage).

Running your applications was a solved problem long before k8s showed up.


This.

The whole point of k8s, the reason Google wrote it to begin with, was to commoditize the management space and make vendor lock-in difficult to justify. It's the classic market underdog move, but executed brilliantly.

Going with a cloud provider's proprietary management solution gives you generally a worse overall experience than k8s (or at least no better), which means AWS and Azure are obliged to focus on improving their hosted k8s offering or risk losing market share.

Plus, you can't "embrace and extend" k8s into something proprietary without destroying a lot of it's core usability. So it becomes nearly impossible to create a vendor lock-in strategy that customers will accept.


Amen to that. We use ECS to run some millions of containers on thousands of spot nodes for opportunistic jobs, and some little Fargate for where we need uptime. It's a LOT less to worry about.


Pesky monitoring and logging just makes you notice bugs more and get distracted from making the world a better place


Why would docker be shittier? It's easier and flexible.

Eg. Sidecar pattern resolves most things (eg. logging)


1000%. If you take a little bit of time to learn k8s and run it on a hosted environment (e.g. EKS), it’s a fantastic solution. We are much happier with it than ECS, Elastic Beanstalk, etc.


> If anyone has a solution that is actually simpler and just as easy to set up, I'm very much interested.

Sure. @levelsio runs Nomad List (~100k MRR) all on a single Linode VPS. He uses their automated backups service, but it's a simple setup. No k8s, no Docker, just some PHP running on a Linux server.

As I understand it, he was learning to code as he built his businesses.

https://twitter.com/levelsio/status/1177562806192730113


Many startups are not as resource efficient. The ones I am familiar with spend 50%+ of their MRR on cloud costs.


My first production k8s use was actually due to resource efficiencies. We looked at the >50 applications we had to serve, the technical debt that would lead us down the road of building customized distribution to avoid package requirements incompabilties, and various other things, and decided to go with containers - just to avoid the conflicts involved.

Thanks to k8s, we generally keep to 1/5th of the original cost, thanks to bin packing of servers, and sleep sounder thanks to automatic restarts of failed pods, ability to easily allocate computing resources per container, globall configure load balancing (we had to scratch use of cloud-provider's load balancer because our number of paths was too big for URL mapping API).

Everything can be moved to pretty much every k8s hosting that runs 1.15, biggest difference would be hooking the load balancer to the external network and possibly storage.


That's very surprising to me! As an employee at multiple startups in multiple countries, I've never seen cloud costs anywhere near payroll.


Well, I said 50% of monthly revenue, not total costs.


You're asking exactly what I've been wondering. The answer in this thread this so far has been "maybe dokku." I can totally buy that K8s is overkill, but the alternatives for small scale operators seem to require a lot more work internally than just using a K8s service.


same we use kubernet not for scale but for the environment repeatability. we can spin any branch at any time on any provider and be sure it's exactly as we have it in production down to networking and routing, to built that out of plain devops tools would require a lot more in depth knowledge and it won't port exactly one provider to another


> These kinds of posts always focus on the complexity of running k8s...

Yes but that is also the worst already that you could criticize about k8s.

Complexity is dangerous because if things are growing beyond a certain threshold X you will have side effects that nobody can predict, a very steep learning curve and therefor many people screwing up something in their (first) setups as well as maintainability nightmares.

Probably some day someone will prove me wrong but right now one of my biggest goals to improve security, reliability and people being able to contribute is reducing complexity.

After all this is what many of us do when they refactor systems.

I am sticking with the UNIX philosophy at this point and in the foreseeable future I will not have a big dev-team at my disposal as companies like Alphabet have to maintain and safe-guard all of this complexity.


From a developer perspective, k8s seems ripe for disruption.

It does a bunch of junk that is trivial to accomplish on one machine - open network ports, keep services running, log stuff, run in a jail with dropped privileges, and set proper file permissions on secret files.

The mechanisms for all of this, and for resource management, are transparent to unix developers, but in kubernetes they are not. Instead, you have to understand a architectural spaghetti torrent to write and execute “hello world”.

It used to be similar with RDBMS systems. It took months and a highly paid consultant to get a working SQL install. Then, you’d hire a team to manage the database, not because the hardware was expensive, but because you’d dropped $100k’s (in 90’s dollars) on the installation process.

Then mysql came along, and it didn’t have durability or transactions, but it let you be up and running in a few hours, and have a dynamic web page a few hours after that. If it died, you only lost a few hours or minutes of transactions, assuming somone in your organization spent an afternoon learning cron and mysqldump.

I imagine someone will get sufficiently fed up with k8s to do the same. There is clearly demand. I wish them luck.


It seems to be. When you start implementing something, you'll soon find that most of that complexity is inherent, not accidental. Been there, done that, didn't even get the T-Shirt.

Today it's much easier to package nicer API on top of the rather generic k8s one. There are ways to deploy it easier (in fact, I'd wager that a lot of complexity in deploying k8s is accidental due to deploy tools themselves, not k8s itself. Just look at OpenShift deploy scripts...)


Exactly - I deployed a k8s cluster with eksctl in about 10 min. Deployed all my services within another 10min (plain Helm charts).


This, kubernetes is container orchestration framework. If you know your needs then you have opportunity to make a better decision and planning. I had similar experience with ingress also I would like to add installing Kubernetes on bare metal is a pain in the neck, even with kubespray.

Here is a comparison with other frameworks, from 2018: https://arxiv.org/pdf/2002.02806.pdf


Running k8s on GKE is the only rational choice for anyone who isnt a large enterprise with a team of Google SREs to support k8s.

Period.


I'm curious how, with your experience, that you came to the conclusion that k8s is a monster to run?


"Pod, Deployment, Job, CronJob, Service, Ingress, ConfigMap, Secre"

Wow as a new developer coming onboard your company, I will walk out the door after seeing that, and the fact that you admit its a small serivce.


The names may seem daunting, but here's what they do: a Pod is one or more containers, usually 1, that run an application. A Deployment manages a number of the same pods, making sure the right number is running at all times and rolling out new versions. A Job runs a pod until it succeeds. A CronJob creates a job on a schedule. A service provides a DNS name for pods of a certain kind, automatically load balancing between multiple of them. An Ingress accepts traffic from the outside and routes it to services. A ConfigMap and a Secret are both just sets of variables to be used by other things, similar to config files but in the cluster.

It's a small service according to " web scale", but it's serving, and vital for, a good number of customers.


People seem to hate learning vocabulary for concepts that they are already dealing with. Personally I love having a common vocabulary to have precise conversations with other developers.


It’s more than that. People hate learning concepts that should be abstracted away from then.

As an example, why can’t ConfigMap and Secret just be plain files that get written to a well known location (like /etc)?

Why should the application need to do anything special to run in kubernetes? If they are just files, then why do they have a new name? (And, unless they’re in /etc, why aren’t they placed in a standards-compliant location?)

If they meet all my criteria, then just call them configuration files. If they don’t, then they are a usability problem for kubernetes.


ConfigMaps and Secrets can hold multiple files, or something that isn't a file at all, and your application doesn't have to do anything special to access them. You define how they configure your application in the pod/deployment definition. They solve a problem not previously solved by a single resource.

Maybe you don't personally find value in the abstraction, but there are certainly people who do find it useful to have a single resource that can contain the entire configuration for a application/service.


They can be files on the disk. They can also be used as variables that get interpolated into CLI args. What they really are is an unstructured document stored in etcd that can be consumed in a variety of ways.

As the other user said, they can also be multiple files. I.e. if I run Redis inside my pod, I can bundle my app config and the Redis config into a single ConfigMap. Or if you're doing TLS inside your pod, you can put both the cert and key inside a single Secret.

The semantics of using it correctly are different, somewhat. But you can also use a naive approach and put one file per secret/ConfigMap; that is allowed.


It's an afternoon worth of research to understand the basic concepts. Then, with the powerful and intuitive tooling you can spin up your own cluster on your computer in minutes and practice deploying containers that:

- are automatically assigned to an appropriate machine(node) based on explicit resource limits you define, enabling reliable performance

- horizontally scale (even automatically if you want!)

- can be deployed with a rolling update strategy to preserve uptime during deployments

- can rollback with swiftness and ease

- have liveness checks that restart unhealthy apps(pods) automatically and prevent bad deploys from being widely released

- abstracts away your infrastructure, allowing these exact same configs to power a cluster on-prem, in the cloud on bare metal or vms, with a hosted k8s service, or some combination of all of them

All of that functionality is unlocked with just a few lines of config or kubectl command, and there are tools that abstract this stuff to simplify it even more or automate more of it.

You definitely want some experienced people around to avoid some of the footguns and shortcuts but over the last several years I think k8s has easily proven itself as a substantial net-positive for many shops.


So why should I do all of that instead of throwing a little money at AWS, run ECS and actually spend my time creating my product?

Heck, if my needs are simple enough why should I even use ECS instead of just putting my web app on some VM's in an auto-scaling group behind a load balancer and used managed services?


I don't think anyone is arguing that you should use k8s for a simple web app. There's definitely some inherent stack complexity threshold before solutions like k8s/mesos/nomad are warranted.

When you start having several services that need to fail and scale independently, some amount of job scheduling, request routing... You're going to appreciate the frameworks put in place.

My best advice is to containerize everything from the start, and then you can start barebones and start looking at orchestration systems when you actually have a need for it.


If you need to fail and scale independently.

-you can use small EC2 instances behind an application load balancer and within autoscaling groups with host based routing for request routing.

- converting a stand-alone api to a container is not rocket science and nor should it require any code rewrite.

- if you need to run scheduled Docker containers that can also be done with ECS or if it is simple enough lambda.

- the first thing you should worry about is not “containerization”. Its getting product market fit.

As far as needing containerization for orchestration, you don’t need that either. You mentioned Nomad. Nomad can orchestrate anything - containers, executables, etc.

Not to mention a combination of Consul/Nomad is dead simple. Not saying I would recommend it. In most cases (I’ve used it before), but only because the community and ecosystem is smaller. But if you’re a startup, you should probably be using AWS or Azure anyway so you don’t have to worry about the infrastructure.


What requirement is driving containers ?

How are you managing your infrastructure and if you have that already automated how much effort is it to add the software you develop to that automation vs the ROI of adding another layer of complexity ?

The idea everything needs to be in containers is similar to the idea everything needs to be in k8s.

Let the business needs drive the technology choices don't drive the business with the technology choices


Portability - the idea being that you can migrate to an orchestration technology that makes sense when you have the need. The cost and effort of containerizing any single service from the get-go should be minimal. It also helps a lot with reproducibility, local testing, tests in CI, etc.

Valid reasons to not run containerized in production can be specific security restrictions or performance requirements. I could line up several things that are not suitable for containers, but if you're in a position of "simple but growing web app that doesn't really warrant kubernetes right now" (the comment I was replying to), I think it's good rule of thumb.

I agree with your main argument, of course.


The overhead managing a container ecosystem to run production is not trivial. IF you are doing this in a service then by all means leverage that package methodology.

If you are managing your systems who already have a robust package management layer then adding the container stacks on top of managing the OS layers you have just doubled the systems your operations team is managing.

Containers also bring NAT and all sorts of DNS / DHCP issues that require extremely senior well rounded guys to manage.

Developers dont see this complexity and think containers are great.

Effectively containers moves the complexity of managing source code into infrastructure where you have to manage that complexity.

The tools to manage source code are mature. The tools to manage complex infrastructure are not mature and the people with the skills required to do so ... are rare.


> If you are managing your systems who already have a robust package management layer then adding the container stacks on top of managing the OS layers you have just doubled the systems your operations team is managing.

Oh yeah, if you're not building the software in-house it's a lot less clear that "Containerizate Everything!" is the answer every time. Though there are stable helm charts for a lot of the commonly used software out there, do whatever works for you, man ;)

> Containers also bring NAT and all sorts of DNS / DHCP issues that require extremely senior well rounded guys to manage.

I mean, at that point you can just run with host mode networking and it's all the same, no?


Or you can just use ECS/Fargate and each container registers itself to Route53 and you can just use DNS...


Out of all of the business concerns that a startup or basically any company has, “cloud lock in” is at the bottom of the list.


Creating the infrastructure is using your infrastructure as code framework of choice - Terraform or CloudFormation.

Monitoring can be done with whatever your cloud platform provides.


Hate to tell you this but that second idea is much easier to manage from an operations standpoint and significantly more reliable.

Also easier to debug and monitor... but you run your business to make developers happy right ?


Having worked with ECS 2.0 when Fargate was released, the setup was much harder to use from operations standpoint the moment you needed anything more complex (not to mention any kind of state) than k8s. Just getting monitoring out of AWS was annoyance involving writing lambda functions to ship logs...


Fargate has built in logging and monitoring via CloudWatch. It redirects console output to CloudWatch logs.


And I'll be blunt that CloudWatch, especially logs, aren't all that hot in actual use. It might have gotten better, but all places I worked for last two years if they used AWS they depended on their own log aggregation, even if using AWS-provided Elastic Search.



CloudWatch sucks. Sorry.


I think that was his point (you are referring to using VMs, right?)


> It's an afternoon worth of research to understand the basic concepts. Then, with the powerful and intuitive tooling you can spin up your own cluster on your computer in minutes and practice deploying containers that.

Source? I’ve never heard of someone going from “what’s kubernetes?” to a bare metal deployment in 4 hours.


That's more for minikube, which is a few minutes to run, as it is specifically for developer to test things locally on their computer.

The basic concepts in k8s are also pretty easy to learn, provided you go from the foundations up -- I have a bad feeling a lot of people go the opposite way.


That’s a pretty small API. I don’t know what developers are doing if they can’t learn that in stride.


Really? If someone told me they were going to write all the glue code that basically gets you the same thing a UI deployment of k8s and a couple yaml files can provide, I’d walk out.


Reminds me of the business trying to reduce our spend on VMware licenses while keeping the same architecture.

A high level person actually asked me to reimplement vCenter :|


This, and the other articles like it, should be required reading on any "how to startup" list. I personally know startups for whom I believe drinking the k8s/golang/microservices kool-aid has cost them 6-12 months of launch delay and hundreds of thousands of dollars in wasted engineering/devops time. For request loads one hundredth of what I was handling effortlessly with a monolithic Rails server in 2013.

It is the job of the CTO to steer excitable juniors away from the new hotness, and what might look best on their resumes, towards what is tried, true, and ultimately best for the business. k8s on day one at a startup is like a mom and pop grocery store buying SAP. It wouldn't be acceptable in any other industry, and can be a death sentence.


When I was younger, I was much more enticed with new technologies. But as I grow older, I've grown much more cynical. I just want to solve problems with ideally as little coding/thinking as possible.


When I was younger, I believed technology is developed to solve problems people have. Today, I've grown much more cynical, and believe that most technology is developed to sell worthless crap to other people, which may or may not occasionally do something useful for them (but not as much as it would do if the vendor actually cared).


And articles like the submission often have "let me write an contrarian article for internet points, who cares about details" feel to me :/

K8s solved very real problems that might not be seen when you're running one app, shitty standard syslog and cloud-provided database. But those problems still exist, and k8s provided real, tangible benefit in operation where you don't need to remember thousand and one details of several services, because you have a common orchestration to use as abstraction.


Tech has become much more self-referential in the last years. Now we do tools that help build a system that can build an app to cash in minimal productivity gains


Also known as consulting, conferences, trainings, certifications and books.


Yes, but also a lot of software products, open-source or not (a lot of open-source these days is created as a part of various business models of for-profit companies).


When I was younger, I was already suspicious of hyped, marketing-driven technologies and lock-in. But as I grow older, I've grown more aware at how people constantly ignore the problem.

I just want to solve problems with ideally as little complexity as possible.

Nothing cynical about it.


Old-man-yells-at-cloud.


I think your comment is probably tongue-in-cheek but I still want to throw my support in with the person you replied to.

The longer I've been at the company I'm at, the less interested I am in how cool something is and the more interested I am in the least effort possible to keep the app running.


I am not that old, develop most apps besides embedded firmware for the cloud and have been yelling at it from the beginning to absolutely no effect. In my experience it is often C-level customers who embrace the cloud with most enthusiasm.

Interestingly the older generation often had the most reservations against hosting data on external systems. They are generally very big on everything surveillance though.


Yeah, every so often the bizdev guy at my office who also serves as our POC for the web developers pings me about some Azure IoT thing or another, and every time I read the documentation I think "that's great, if only we had any of the problems that that solves."


If you grow old and stay in this business, you'll be doing a lot of that too, I promise you.


Damn cloud.


Cloud get off my lawn.


In late-stage capitalism, the cloud gets you off "your" lawn, when your Lawn-as-a-Service provider gets acquihired by an adtech company.


"One of us...one of us..."


Anecdata: series B startup. First year was a single VM with two docker containers for api and nginx. Deploy was a one shot “pull new container, stop old, start new” shell command. Year 2 onwards, k8s. No regrets, we only needed to make our first dedicated ops hire after 15 engineers, due in large part to the infra being so easy to program.

I used GKE and I was also very familiar with k8s ahead of time. I would not recommend someone in my shoes to learn k8s from scratch at the stage I rolled it out, but if you know it already, it’s a solid choice for the first point that you want a second instance.


When people talk about companies using k8s too soon, they are talking about deploying it on your own, not using a hosted service like GCE. That’s a whole new ball game and takes 100x the effort.


I read TFA as mostly complaining about the conceptual and operational complexity of running your code on k8s, not so much about operating a cluster itself.

Lots of ink spilled on irrelevant concepts that most users don't need to know or care about like EndpointSlices.

And, arguing against microservices is a reasonable position -- but IF you have made that architectural choice, then Docker-for-Mac + the built-in kubernetes cluster is the most developer-friendly way of working on microservices that I am aware of. So a bit of a non-sequiteur there.


I don’t see when you’d need to understand what an EndpointSlice is, unless running k8s itself. The concept does not leak through any of the pod management interfaces.


Some seniors get exited by that crap as well.

Tech lead in my team is full of common sense when it comes to not getting exited by JavaScripts framework of the season. But he loves to add new tools to the back end. He is choosing the "best tool" for the job, but we have a small team and already have way more things than we can comfortably manage. Good enough tools that are already in our system would be more pragmatic.


Hopefully that's not a case of "must be seen to be doing something" syndrome. If you can use pre-existing tools and they serve the purpose, how easy or difficult is it to resist the "shift to this" pressure?


Exactly this. For most startups Heroku, Dokku, or plain docker is enough. The stack will certainly evolve as growth (and success) comes.

Building with 12-factor principles make that transition effortless when the time comes.


From experience:

Plain docker - hell on earth. Literally some of the worst stuff I had to deal with. A noticeable worsening vs. running the contents of the container unpacked in /opt/myapp.

Heroku, Dokku - Very depends. A dance between "Simple and works" and "Simple and Works but my Startup is bankrupt".

K8s - Do I have more than 1-2 custom deployed applications? Predictable cost model, simple enough to manage (granted I might be an aberration), easy to combine infrastructural parts with the money-making parts. Strong contender vs Heroku-like, especially on classic startup budget


I have desperately avoided K8s since it became popular, but this seems to be pretty opposite of everything else I've read? Can you share what specifically makes plain docker so terrible?


Plain docker (no swarm etc.) for practical purposes resulted in containers that were in weird position when it comes to decoupling from the host OS.

Integration with init system was abysmal. Docker daemon had its own conventions it wanted to follow. Unless you ensure that the state of the docker daemon is deleted on reboot, you could have weird conditions when it tried to handle starting the containers by itself.

A very easy thing to use for developer, a pretty shitty tool (assuming no external wrappers) on server.

One of the greatest joys of k8s for me was always "it abstracts docker away so I don't have to deal with it, and it drops pretty much all broken features of docker"


> One of the greatest joys of k8s for me was always "it abstracts docker away so I don't have to deal with it, and it drops pretty much all broken features of docker"

It also offers portability away from docker via the Container Runtime Interface; we use containerd and it has been absolutely rock solid, without the weird "what happens to my containers if I have to restart a wedged dockerd?" situation


Indeed - Back when the two alternatives were very alpha quality rktnetes and hypernetes, I was already full of joy that I was seeing the end of docker on server.

Since then we got CRI-O and life looks even better.


You've decoupled from the host OS (good thinking), and you've abstracted away from Docker (even better thinking), but how did you decouple from k8s?


I always start and keep my containers stateless.

the docker run --rm command line switch tell docker to remove the container when it dies. Never a problem on restarts.


If you are a user of k8s then the experience relatively smooth and painless since all the complexity is hidden behind the API you're calling to provision resources for you.

If you are an operator or k8s then you've entered the nightmare zone where you have to make all of those endpoints actually work and do the right thing with code that written 17 days ago. Unlimited terrible middleware to try and form static services into dynamic boxes.

k8s was not designed for someone to deploy on-prem that doesn't have a dedicated team of developers and ops people to work on just k8s.


I mostly use it, but I also tend to keep to stable features so that running on-prem is also relatively simple.

My biggest cryparty story when it comes to on-prem kubernetes is not actually due to kubernetes, but due to Red Hat. There are words I could say about their official OpenShift deployment scripts and their authors, but I would be rightly banned for writing them.

Biggest issues I've encountered involve things on the interface between "classic distro" and running k8s on top of it, and that goes down when you move towards base OS that is more optimized for k8s (for example flatcar).

When it comes to the size of team involved, I'd say keeping "classic" stack with loadbalancers, pacemaker, custom deployment methods etc. was comparable effort - at least if we're matching feature for feature what I'd see on "base" k8s on-prem setup (and based on "what I wish I had, or that I could replace with k8s, back in 2016 for an on-prem project).

There's one thing, however, where it gets much harder to deploy k8s and I won't hide it - when you're dealing with random broken classic infrastructure with no authority to change it. K8s gets noticeably harder when you need to deal with annoying pre-existing IP layouts because you can't get IP address space allocated, when you have to sit on badly stretched L2 domains, where the local idea of routing is OSPF - or worse, static routes to gateway and a stretched L2. To the point that sometimes it's easier to setup a base "private cloud" (I like VMware for this) on the physical machines, then setup the hosts on top of that - even if you're only going to have one VM per hypervisor. The benefits of abstracting possibly broken infrastructure are too big.


> Badly stretched L2 domains, where the local idea of routing is OSPF

Hahaha… welp. So you’re saying that stretching every “overlay” L2 domain to every hypervisor/physical host with VXLANs and OSPF isn’t maintainable. Color me surprised. I need a drink.


Hey, at least you got VXLAN! And OSPF! Though I do have reasons to dislike it from past life as network admin, and somehow there are no integrations - or at least none I found - for integrating things into OSPF routing area (like with ExaBGP).

Dealing with overstretched VLANs where somehow, somehow, STP ("What is that RSTP you're telling us about?") decided to put a "trung" across random slow link >_>


Since you've made this distinction, can you explain more? How would the person I replied to not be both? In what scenario are they different people?


You seem like you know what you're talking about. What do you think about AWS RDS for db and EBS for auto-scaling and deployment? (It's my first fullstack product and it feels as if it's both a good developer experience and something that won't cost as an arm and a leg, so I just have to ask).


AWS RDS is pretty nice, and 95% of the time you won't even notice any problems (it is, however, pretty complex like all of AWS and there are ways to wedge yourself into weird condition). Generally, if you don't have specific and well reasoned motivation to deploy a database (assuming your stack is based on DB supported by RDS) on your own, and you're running in AWS, it's a no brainer.

As for EBS, remember that the SLAs for EBS are not the same as for S3, and that EBS volumes can be surprisingly slow (especially in IOPS terms once you go above certain limit, I don't have the numbers in cache at the moment). So it's important to have good backup/recovery/resiliency plan for anything deployed on EC2 or dependant on EBS volumes. Planning for speed is mostly a case when you do need a more custom datastore than offerred by AWS.

Remember that AWS definitely prefers applications that go all vendor lock-in for various higher-level options from them, or ones that are at least "cloud native". Replicating an on-prem setup usually ends up in large bills and lower availability for little to no gain.


I run plain docker without issue.


What about raw systemd? It can do resource control, service monitoring and restart and seems pretty lightweight compared to Docker. Fewer ways to footgun with data being erased, no separate shells or finding a tool you need isn't available inside the container.


you can use systemd to manage your dockers


Because systemd is an operating system by itslef nowadays.


Or, you could use a distro with a sane init system, which eliminates all the systemd and reinvented wheel footguns.

The last time I installed ubuntu 18.04, DNS queries took ~5 seconds. It’s a well known issue, with no diagnosed root cause. The solutions involved uninstalling the local dns stack, starting with systemd’s resolver.

2018 was well after DNS was reliable. How can stuff like that break in a long term support release?


5 seconds is the amount of time Linux takes to fail over to the next dns resolver in resolv.conf. You likely had a bad/unreachable DNS server.


Which is ironic because systemd got a lot of hate for doing the right thing and not just querying DNS servers in order but discovering the remembering the servers that are up/fastest.

Turns out a lot of people just ignored the fact that all configured DNS servers are assumed to serve the same records and used DNS ordering to implement shitty split-horizon DNS.


Two virtual machines are more than enough to a startup for the first 2 to 4 years.


One for Website + database, the other for monitoring?

I guess backups could just be snapshots or something, depending on how active the database is. ;)


Haha, some scripts to manage the deployments and rollbacks, plus some other scripts to manage the environment (ansible, terraform.. ).. also some scripts to manage the machines.. log aggregations, cleanup, upgrades... Later add some crons.. some more scripts to manage Dev environments.. and a bit more for having an staging env... That's without using any big cloud provider (add to it firewalls, vpcs, iam.. )

You know, managing some yml files that describes all of this it's so hard and so expensive...


One for dev, one for production. Database snapshots and code can be backed up locally.


While your average startup doesn't need to go "uberscale", they probably need to be highly available.


High availability is much easier when your product is a single runtime, and not a bunch of services running a complex communication network, any of which is likely to go belly up at any given moment.


Do they though? I've had plenty of virtual machines with over a year of uptime no problem (sometimes quite a bite more). I'd argue the only important thing in an early stage startup is a robust backup plan, including monitoring your backups and test restores often.


Sure and my home server had an uptime for a year and a half until the local power company had a few brownouts one summer.

Downtime doesn't happen until it does.


So then your website is down, the monitoring alerts you and you fix it. Or, worst case, your monitoring doesn't alert you and you fix it on monday when you notice. You've been down the whole weekend. For most startups: so what, your dog-walker-as-a-service marketplace didn't work for 48 hours, that doesn't really matter.

You've had a working system very quickly and saved plenty of money that you were able to invest into more features or runway though.


Except as a startup potentially targeting enterprise customers or hospitals, that downtime may cost you significant contracts or future work. It's probably something that gets disclosed in investor decks - and depending on your offering may really hurt you.


I totally agree with "you can't just replace availability with pagerduty" if you're dealing with stuff like pacemakers. But that's not what most startups do.

I believe that too many engineers worry about "what if a million users sign up tomorrow" and plan a system that will handle that (which also happens to be fun and tickles all the right places), which takes a lot of time and money and manpower instead of building something that works reasonably well and worrying about building the "right" solution when they're beginning to actually grow rapidly. I'd much rather hear "our servers can't handle the load, there are too many users signing up" than "when we're done here in six months, our system will be able to handle any load you throw at it".

I wouldn't say that it's a no-go (there absolutely are situations where it makes sense), but it often looks like premature optimization.


Don't forget DDoS


commercial cdn solves that for most cases if needed


Two VMs in two separate availability zones is pretty ridiculously available. Certainly more so than other least-common-denominators in your stack.

There are a ton of successful startups who made it a long way with less than 2 VMs.


I'm in the "first 4 years were on two physical machines colo'd sitting at a local datacenter" club. Our cheap operating costs are actually a competitive advantage in our space.


With something like Heroku, you can have multiple VM's in staging and production, w/ a deployment pipeline that supports rollbacks, monitoring, alerting, autoscaling, all in a managed environment w/ a managed, highly available Postgres setup, with very little effort and 0 maintenance. This is what I've setup at my current startup. My last company was on K8's and I loved it -- but this is nearly as good and requires literally no maintenance and _far_ less expertise / setup.


If you have your k8s cluster in a single region, you probably aren't much more HA than a couple of instances load balanced.


> I personally know startups for whom I believe drinking the k8s/golang/microservices kool-aid has cost them 6-12 months of launch delay and hundreds of thousands of dollars in wasted engineering/devops time.

If Kubernetes had only cost us a year and two hundred thousand dollars then we'd have been luckier than we actually are.

It definitely has a place, but it is so not a good idea for a small team. You don't need K8s until you start to build a half-assed K8s.


> You don't need K8s until you start to build a half-assed K8s.

which you start doing with more than 1 db server and more than one app server. until you realize you have a ansible script that is tied to your specific program. oh shit now you have two programs, and copied some stuff from ansible, but not everything is the same - damn. deployments occur a downtime (5 seconds), which some user notice - until you add like a thousand lines of ansible code. now you need monitoring and oh shit another ansible bloat, soon you outgrown k8s codebase with just ansible. (p.s. this does not account how you start your baremetal servers, etc. this would be another story)


What are you talking about? Just create two more virtual machines. Copy some files over and run `docker-compose up`. After that you only need to configure a load balancer.


>It is the job of the CTO to steer excitable juniors away from the new hotness, and what might look best on their resumes, towards what is tried, true, and ultimately best for the business.

Then they might simply join another startup or a big tech company as competition for good engineers is fierce. Startups also famously underpay versus larger companies so you need to entice engineers with something.


Well, when you pay your engineers 6-12 months of extra salary before you ship anything because they had to use Kubernetes-on-Highways to host this clever NoNoNoPleaseNoSQL DB hosted that some guy on Github wrote last week, hosted on ZeroNinesAsAService.com and with a new UI built in ThreeReact (the hot new React-based framework that implements an OpenGL interface that works on approximately 3% of devices in the wild right now, and approximately 0% of your target user base's devices), don't forget to account for that in the investor pitch and salary offers.

I mean, seriously, this is a startup killer. Our host wrote an essay a long time ago about beating standard companies stuck in boring old Java or C++ with your fast, agile Python code, but in 2020 it seems to me it's almost more important now to try to convince new startups to be a little more boring. Whatever your special sauce that you're bringing to market is, it isn't (with no disrespect to the relevant communities) that you're bringing in Rust or Nim or whatever for the first time ever, for Maximum Velocity. Just use Python, cloud technologies, and established databases. Win on solving your customer needs.

While by no means is everyone in the world using effective tech stacks well-chosen to meet the needs and without over-privileging "what everyone else is doing and what has always been done", enough people are now that it's probably not a competitive advantage anymore.

Honestly, you can beat most companies in just getting stuff out the door quickly.

(Excuse me, off to file incorporation papers for ZeroNines LLC. Why wonder whether your provider will be up when you can know? Nobody else in the business can make that promise!)


I like to refer to this availability as "nine fives".


We like to call it "A 9, a 4, and a 7". You pay depending on what order you want those numbers to be in.


Hah! I often encourage teams to start by shooting for five eights. Once they have that nailed, we can start talking about nines.


>when you pay your engineers 6-12 months of extra salary before you ship

Money may not be the limiting factor for a startup and time is a counter-factual as you don't know the alternative. Had they not been able to hire any engineers they may have taken an extra 2 years to ship the same thing. Or maybe not.

Hiring at startups is time consuming and difficult with heavy competition for good engineers. Salaries lag behind large tech companies and equity may be worth nothing. Scale isn't there so the problems are less interesting than at a larger company. And good engineers can 5x better than average in an early stage startup because there is no process and technical debt is fine (in a larger organization the 10x ones leave havoc in their wake).

That may not be the right decision for a startup to make but there is a logical basis for making it.


Maybe you need less engineers trained in the new hotness but just experience with, uh, older hotness. Which may be easier to find.


Engineers want what is best for them and not what is best for the company. That is the right choice for them.

New hotness is one way to entice them. Far, far from the only one but it is a tool in the CTO's tool belt.


Then consider this: a startup in its job ad is probably simultaneously talking about the hottest new tech and passion for building a reliable product. But the intersection between these two is vanishingly small. So what kind of hires do you want to optimize for? CV-driven developers, or people who know how to build reliable and efficient software fast?


This is why you need to be interviewing for things like "customer focus" as well as the technical side of things.

I've rejected candidates who have been great on their technical skills . . . who I would never want to be making ANY decisions about customers, or the technical direction of the company.


I feel like for any company, there should probably be a balance between old a new. If all of the technology from 10 years ago was inarguably "the best", I don't think we'd be in the situation we're in now. Everything is pros and cons.

My team right now, for example, had a mantra of "No JS frameworks, just Rails" which was absolutely dreadful. Rails UI is absolutely dreadful. I can't say enough, it is absolutely dreadful. So we recently made the move to use React for more "dynamic" UIs, which has brought up somewhat of a happy medium? React will be here in 5 years, Rails will be here in 5 years, everyone wins.


Developers will stop caring about having experience in the latest hot thing when that stops being very important in interviews.

I hope, but doubt, that will happen before I'm retired or have been ageism'd into something else.


> New hotness is one way to entice them.

Only the bad ones.


If they are more focused on playing with toys or polishing their resumes for their next job, then I'd want them to leave. Their goal should be understanding what users need and building it as simply and effectively as possible.


On the flip side, a lot of developers are more motivated by what they can actually build and release with their tools, rather than learning new and complicated tools for the heck of it. If I were running a company I'd much rather court devs in this group.


This is a really good argument for why startups should build the initial product with only the founding team, and hire only once you have more demand than you can possibly handle. You get mission-alignment from your technical team (because they only get paid if they ship and customers appear), and then finding and retaining engineers is a lot easier once you're clearly a rocket-ship, their stock options double in value every 6 months, and they become a manager after 6 months in the company.


It’s ok to let them go.


That depends on the situation and is the CTO's decision to make. Given the difficulty and time frames of hiring letting go could mean another 6 months of too few engineers. Acting like there is a single universal answer is the wrong world view I'd say.


If you have a nominally full compliment of people who are over-building or playing around with shiny toys instead of delivering value, then you still have too few actual engineers, but are paying a lot more money for the privilege.

You're also setting up a bill that will come due eventually. I've made some really good money going into companies and ripping out the hot-three-years-ago garbage that some long-gone goof put in. Last time this happened I looked up the responsible party. Turned out he was doing the work so he could give a conference talk on shiny tech. Not long after the talk was done, he took a job somewhere else, leaving his buzzword-compliant half-baked system to rot.


1000 users, almost 0 request per second / 10000 euro a month to deploy a very simple software on GKE in 3 continents and have some development instances.

Why? Because Docker and "scalability" it offered looked much better on the investor slides...

How? Instead of actually hiring someone that at least has experience with docker he decided it was very easy thing to learn so he did it him self and we ended up with thing like running two application in a same container, having database on containers that disappear (container restarts when ram is full is the best way to run a database) etc...

And after all that started talking about mirco-services and how cool are they for code decoupling... Of all the things... I don't work there anymore...

And if challenged on these reasons some people give (that supposedly have more experience) blanket statements like: "Docker is more safe by default thus we should use it..."

Maybe when you go thought these situations you get to write articles like this.

Of course containers, docker, k8 etc have their place, but in reality you can find all kinds of stunning non-sense.


I worked at a startup where the CTO was the impressionable junior and it caused all of those issues you described and more. I ended up leaving when I decided I shouldn't be having panic attacks in the middle of the night due to ridiculously tight deadlines that I couldn't meet because of the overhead introduced by k8s and microservices.


I've seen the pressure come from customers too, in the case of b2b between tech companies. My company embraced Kubernetes after years of customers asking us "what's your Kubernetes story?" and not liking the answer, "that doesn't solve any problems we have".


> It is the job of the CTO to steer excitable juniors away from the new hotness

Sometimes it's the CTO who is the excitable one pursuing new hotness...


Then the company is doomed.


Sure, it is the job of the CTO to steer juniors in an appropriate direction. But, and I hate to say this, isn't it also the job of the CTO to help source funding and it seems VCs like to go after the buzzwords?


I absolutely love Kubernetes, it makes my job so much easier...

I can’t understand for the life of me why any start up uses it, it’s insane.


The lack of type checking must have cost way more than standing up Golang services when the time came that you actually had customers though right?


People managed with dynamically typed languages for a long, long time in building pretty vast services. Static typing has its merits but it really depends on what you're building - if you're a bank, it offers excellent guarantees. If you're building something smaller scale or lower impact, you can cope if you're logical and have good tests.


My understanding is that 1. Sooner or later you end up having to check all the types in your tests which means 2. You end up having the same number of lines of code in each, one where the type is declared in the model, which realistically is easier to document and to keep your data consistent, and the other where the only time you verify it is in the tests, or worse both tests and runtime, in which case by definition it's more lines of code. If lines of code is a proxy for bugs, and you build on something that doesn't have type safety, wouldn't that just increase your bugs across two very important immutable dimensions of computing? Stripe has a type checking team now...


Having been at a company that was starting to move things to Kubernetes, when it had absolutely no reason to, I can say that it was being done because: 1) the developers wanted to be able to say they knew how to use Kubernetes, when they applied for their next job (perhaps at a company big enough to need it) 2) the managers didn't really understand much about what it was, to evaluate if it was necessary, but 3) some of the managers wanted to say they had managed teams that used Kubernetes, for the same reason as the developers

Which is not to say that it should never be used. But we have a recurring pattern of really, really large companies (like FAANG) developing technologies that make sense for them, and then it gets used at lots of other companies that will never, ever be big enough to have it pay off. On the other hand, they now need 2-3x the developers they used to, because they have too many things going on, mostly related to solving scale problems they'll never have.

Don't use a semi-tractor trailer to get your groceries. Admit it when you're not a shipping company. For most of us, the compact car is a better idea.


Well, for engineers sometimes it's just interesting to learn new tech, you know.

And managers are ok with keeping the team motivated, especially if they can deliver new features while playing with the new tech.

UPDATE: I mean, it's not about resume in most cases I've seen. Usually, people who don't love programming do not care about what tech to use, so resume-oriented people are usually not very vocal about using new cool and shiny tech. It's nerds who are, that keeps them motivated.


Playing with new tech is fun. Using a nuclear weapon when a hammer would do is bad engineering.


We are talking about an orchestrator here, right? it's a piece of software that is safe for use in front of kids, last time I checked.

Linux is preety complex system too, you know, with millions lines of code and gazillion moving parts. It's probably safe to say that no one alive, including Linus himself, ever read all the codebase required for running a base linux system serving Apache with a static site. And yet we do.

Anyone who know how to code well can learn how to pack containers and run them within a kubernetes cluster in a way that will work reliably, even if those containers do nothing except serving pure html text files over port 80, and in fact the whole cluster is waste of money and it is not required and not helping a thing. It will serve the content just fine. No nukes will be hurt during the operation, don't worry.


"Anyone who know how to code well can learn how to pack containers and run them within a kubernetes cluster in a way that will work reliably"

If this is your criterium, sure. That's nice and simple and for much the same way that physics cows are perfectly spherical - this isn't how real life happens and I suspect you have enough life experience to know that this is the case.


> this is your criterium, sure. That's nice and simple and for much the same way that physics cows are perfectly spherical

Ironically your example is a good one, because tools like Kubernetes are what enables the proverbial cows of deploying services in heterogeneous clusters of COTS hardware to be interpreted as being spherical in the sense that they are only contained processes that execute somewhere.

And the irony is that you might seriously argue that being forced to waste time modeling cow in higher detail than a sphere is necessary or even required.


I don't know, I have seen a few simple kuberbenets deployments alive and well, as to spherical cows...

Your comment reads like stating "I am right and deep down you know that too", and that is less than pleasing.


I love playing with things that expand my mind to what is possible. I have never found newness in itself to be attractive.

However I have come to respect that many people are different. These are often the same people who insist on buying a new phone, never mind that they haven't understood a millionth of the old one yet.


> Using a nuclear weapon when a hammer would do

The hammer will only suffice in the eyes of those who only envision problems that resemble nails.

But there's more to operations than occasionally hammering on a single nail.


Not using modern weapons and only using a hammer always is also bad engineering.


All startups are simply temporarily embarrassed unicorns.


Ah, because no software project ever experienced problems due to their newly found inability to scale in any way.


Job interviews should always consider if the correct tool was used for the use case.

So, if someone used k8s where it was not needed, that experience means zilch:

1) it was the wrong tool for the job

2) it was not used at the right scale. So, if I need k8s to manage a really large deployment, someone who used k8s to manage 5 machines does not have the right experience since they never used it at my scale.


Before you get on a technical interview, an automated system, an HR person and a project manager (or similar) will have to pass on the application, which encourages putting buzzwords tech terms on a CV.

Sadly today fewer and fewer companies look at soft and engineering skills and prefer to hire people based on experience with specific tools.


This is exactly the problem. "Able to learn new technologies, and pick the right one for the problem at hand", is virtually impossible for HR or a recruiting company to evaluate. If they're supposed to filter out the obviously unsuitable candidates so that the devs and tech managers only have to interview plausible candidates, then they will go with keywords, because that's something they can do.

In other words, we have a systemic problem. If everyone does what the current system incentivizes, we get the wrong result.


> was starting to move things to Kubernetes, when it had absolutely no reason to

There is "absolutely no reason" to use a system that automatically handles blue/green deployment of your containers for free and supporting auditable and revertible deployment histories?

I'd like to hear what you consider to be operational best practices!


I think teaching devs to use k8s is generally a good thing. The patterns in k8s are very well established and a developer can learn a lot by understanding how to implement them. For basic web apps on something like GCE the time needed to get operational should be minimal.


Well-said @rossdavich... also says a lot about tech company cultures: most people are just looking for the next job. Is there anyone actually interested in creating and contributing to building great products, ones that people actually need? #rhetorical


Developers learn quickly that the only way to get a decent salary and/or job security is to cram as many buzzwords into their resume. Companies don't like to invest in training, so the only other way to get those buzzwords is to force the use of whatever cool tech into a project, whether it's appropriate or not.

And no, experience in cool buzzword tech does not count if it's a side project or open source contribution - as much as we would wish that to be the case.

We can blame developers, but they are just adapting, as humans always do, to their environment. Sometimes it is just bored devs chasing the next fad, but there is plenty of blame to be laid at the feet of modern tech companies.


For those companies i recommend rancher... It's kubernetes under the hood but a lot is stuff is abstracted away..


So docker runs a bunch of system services but abstracts them away... And kubernetes runs docker but abstracts that away... and rancher runs kubernetes but abstracts that away..

Should I just wait a year for something that lets me use rancher without knowing anything about it?


The problem of infrastructure is that low level interfaces are always consumed by higher-level interfaces.

And if you want to run a process, but you want to distribute the apps and run them as process containers, and you want to run them in an automatically configurable cluster of COTS computers communicating through a virtual private network...

Don't you understand where and why are there abstractions?

If anything, having people naively complain about how things are layered and abstracted is a testament of the huge success of the whole teck stack, because complainers formed such a simple mental model of how to distribute, configure, run, and operate collections of heterogeneous services communicating through a virtual network that they simply have no idea of the challenge of implementing a workable system that does half of this.

But with docker+kubernetes it only takes a click, so it must be trivial right?


I haven't used kubernetes, but it must be a very difficult click if another tool (Rancher) exists to make it easier.

I understand why abstractions exist, but the amount abstractions in the chain I mentioned is amusing to me.


Why is it amusing? Do you find the amount of abstraction between the CPU and a browser similarly amusing? That judgement seems arbitrary. The reason why an abstraction is created is because it's sometimes helpful to have complexity managed automatically if full control of the complexity is not necessary for your needs, your reaction seems to suggest "kubernetes doesn't need to be so complex", but I am not sure if you really believe that.

I can understand the "kubernetes may not be the best engineering decision for your needs" argument, but that's a different argument from kubernetes is too complex.


I suppose amusement is arbitrary.

This comment chain started with: "Having been at a company that was starting to move things to Kubernetes, when it had absolutely no reason to, I can say that it was being done because: 1) the developers wanted to be able to say they knew how to use Kubernetes... "

Someone responded by saying "For those companies i recommend rancher... It's kubernetes under the hood but a lot is stuff is abstracted away.."

So if you dont need Kubernetes, and are just using it to learn Kubernetes, you should throw an additional tool on top of Kubernetes, that abstracts away Kubernetes?

I'm sorry, that is amusing to me.

Some abstractions are necessary. Some aren't.


I said the judgement is arbitrary, not the amusement.

> Some abstractions are necessary. Some aren't.

It just seems bizarre to me that you can suggest that the abstraction is unnecessary when you also claim to have never used the tool. What makes you think it's unnecessary?


1. I didn't judge anything. I said I was amused. You inferred judgement.

2. I didn't say it wasn't necessary. The poster of the parent comment did. I didn't work there, I don't know what was necessary. But it's safe to say, if you don't need Kubernetes (which the parent poster said, not me), then you don't need something to abstract Kubernetes (Rancher)...

And also, if I did know the environment, and the environment was incredibly simple, I don't think it's necessary for me to have Kubernetes experience to determine that it is not necessary... Sometimes a couple of VMs in different zones behind a load balancer is just fine...

And if you don't agree, you probably also think a static landing page requires React to be "done properly." How's that for inferring things you didn't say? I've never used React either, I guess I'll never know if I really need it for that landing page!


I worked at company with k8s+rancher, and was constantly bumping into rancher specific issues. I am not sure if rancher is really worth the effort.


I'm also a fan of Rancher. Especially the newer versions. It significantly simplifies the process of spawning up and managing a Kubernetes cluster.

I do think that Kubernetes is overkill if you just want to spawn a couple of server instances. But if you want to build a complex system which scales and you happen to have experienced developers on the team who understand how to build scalable systems and who have experience with Kubernetes then you'd be foolish not to use K8s IMO.

That said, having that specific talent and experience on the team is critical or else you're just wasting your time and money with K8s - And that talent is very hard to find. There is no point building some complex architecture for Kubernetes if that architecture is not designed to scale linearly from the ground up.

Kubernetes can allow you to operate highly scalable systems easily and elegantly, but simply using Kubernetes doesn't itself guarantee in any way that your systems will be scalable and will benefit from K8s at all (aside from pretty UI). Very few people can build systems that scale and also meet business requirements.


I call this "resume driven development"


You clearly understood one of the primary incentives that drives k8s adoption (career self-interest).

And then you issued advice like "Don't do this". I'm curious if you truly believe that anybody in the above class of people would change their ways on that basis, and without a greater incentive.


Doing a migration when you're not clear on what problem you're solving and why is always a bad idea. But I don't think this is a fair characterization of Kubernetes. There are lots of well supported hosted ways to use it, in which case it works a lot like other PAAS tools like Heroku or Elastic Beanstalk, except with much more powerful and better designed primitives and a big open source community of tools to make working with it even easier.

It's not like using a tractor trailer to get your groceries, it's more like using a swiss army knife to cut a piece of string. Sure you don't need all the extra accessories for that task, but they're not really getting in your way, it's still a good tool for the job. And the extras might even come in handy at some point in the future.


I am a solo developer (full stack, but primarily frontend), and Kubernetes has been a game changer for me. I could never run a scalable service on the cloud without Kubernetes. The alternative to Kubernetes is learning proprietary technologies like "Elastic Beanstalk" and "Azure App Service" and so on. No thank you. Kubernetes is very well designed, a pleasure to learn and a breeze to use. This article seems to be about setting up your own Kubernetes cluster. That may be hard; I don't know; I use Google Kubernetes Engine.

For others considering Kubernetes: go for it. Sometimes you learn a technology because your job requires it, sometimes you learn a technology because it is so well designed and awesome. Kubernetes was the latter for me, although it may also be the former for many people.

The first step is to learn Docker. Docker is useful in and of itself, whether you use Kubernetes or not. Once you learn Docker you can take advantage of things like deploying an app as a Docker image to Azure, on-demand Azure Container Instances and so on. Once you know Docker you will realize that all other ways of deploying applications are outmoded.

Once you know Docker it is but a small step to learn Kubernetes. If you have microservices then you need a way for services to discover each other. Kubernetes lets you use DNS to find other services. Learn about Kubernetes' Pods (one or more Containers that must reside on the same machine to work), ReplicaSets (run multiple copies of a Pod), Services (exposes a microservice internally using DNS), Deployments (lets you reliably roll out new software versions without downtime, and restarts pods if they die) and Ingress (HTTP load balancing). You may also need to learn PersistentVolumes and StatefulSets.

The awesome parts of Kubernetes include the kubectl exec command which lets you log into any container without almost any setup or password, kubectl logs to view stdout from your process, kubectl cp to copy files in and out, kubectl port-forward to make remote services appear to be running on your dev box, and so on.


> Once you know Docker you will realize that all other ways of deploying applications are outmoded.

This is a strong and absolute statement to be making in a field as broad and diverse as software engineering. My experience from being on both sides of these statements it that they're often wrong, or at least short sighted.

In this case, while I get the packaging benefits of Docker, there are other ways to package applications that don't require as much extra software/virtualization/training. So the the question isn't as much about whether Docker/K8S/etc. provides useful benefits as whether or not those benefits are worth the associated costs. Nothing is free, after all, and particularly for small to moderate sized systems, the answer is often that the costs are too high. (And with hardware as good as it is these days, small-to-moderate is an awful lot of capacity.)

I've personally gotten a lot of value out of packaging things up into an uber jar, setting up a standard install process/script, and then using the usual unix tooling (and init.d) to manage and run the thing. I guess that sounds super old fashioned, but the approach has been around a long time, is widely understood, and known to work in many, many, many worthwhile circumstances.


Indeed. Containers suck when your entire filesystem is 60 megabytes.


When I know how to use a hammer, everything starts looking like nails?


When something breaks you log in to the machine and make incremental updates to fix it, right? This approach leads to non-reproducible deployment environments. Immutable systems are better, and Dockerfile is essentially a written record of how to reproduce the environment.


> When something breaks you log in to the machine and make incremental updates to fix it, right?

Not generally, and you do a good job explaining why I don't in your next sentence.

> This approach leads to non-reproducible deployment environments.

It's true that there's some discipline involved, but it's not necessarily a huge amount. For me, what it tends to look like is a build that produces some sort of deployable artifact, an idempotent install script, and following standard Unix patterns. Except for maybe that last bit, this is exactly what you'd do in a Docker environment. And of course, Docker and the like are always still candidates for adoption, if the circumstances warrant.

Part of what surprises me about conversations like this is that the idea of an environment in a known and stable state isn't a novel development. The question is really about what degree of environment stability you need to achieve to meet your requirements and then the specific tools and procedures you choose to adopt to meet that goal. Docker is once choice, but not the only choice, and even if you chose it, there is still a set of disciplines and procedures you'll need to follow manually for it to be effective.


Everybody feels confident in the stack they have spent time using.. You like Kubernetes because you took the time to learn it, someone else will find Elastic Beanstalk or AWS ECS equally easy to setup and scale. It's not that Docker is the only way to deploy an application either, there are virtues of learning the serverless deployment modes as well on the various clouds. For many of the "proprietary lock-ins" you run into, you often get something back.

I do agree in point that kubernetes and Docker are nice of course :)


Another advantage of Kubernetes over things like Elastic Beanstalk is portability. Your app can move from one cloud to another with minimal effort.

Yet another advantage is portability and durability of your knowledge. Kubernetes has so much momentum, so it is here to stay. It is extensible so third parties can innovate without leaving Kubernetes, which is yet another reason it is going to be around for a long time.


That's clearly also a disadvantage, because part of the source of k8s complexity is being a generic services supporter.

Please apply some level of critical thinking before copying/pasting generic selling points that could apply to almost any other open-source IAC framework.


EB or ECS specific knowledge is AWS specific. I can (and do) run k8s on my laptop and can (and do) deploy Helm charts (the ones I wrote or 3rd party ones) on any k8s install. So that's quite different to the usual vendor lock in that comes with proprietary cloud services.


...or you could deploy your app on Google App Engine or Heroku and spend all your time developing features your customers care about.


I have no idea how to deploy my app on Google App Engine or Heroku. So instead of spending time developing features my customers care about, I'll spend time learning how to deploy my app on those services.


You will spend orders of magnitude more time fiddling with K8s. You may end up with employees working on infrastructure fulltime.

These are not even remotely comparable things.


This is true for any way of deploying, & depends on what your already know versus what you need to learn about. But different deployment approaches require you understanding different things, or different volumes of stuff.

There's also the difference between what you need to know to get started vs what you need to know to run a service reliably.

If you deploy to a platform that uses thing X for your app in production, and thing X has unhelpful defaults or will behave poorly in some situation and cause or amplify an outage, then not only do you need to learn the minimum about how to deploy, but to also learn about the pitfalls and what you need to do to overcome or mitigate them -- either proactively or reactively when production breaks and you don't understand why & don't understand how to fix it.

The amount of latter stuff you need to learn to have a reliable production system that you're able to maintain in a more complicated configurable deployment system is going to be much larger even if it happens to be quick & easy for you to get started.


> This is true for any way of deploying, & depends on what your already know versus what you need to learn about.

The difference is that Kubernetes is portable from cloud to cloud. Also, when you invest in learning Kubernetes your knowledge is both portable and durable. This fact made a huge difference for me, because I am not a backend dev, so I am not willing to invest time in learning something unless my the knowledge I acquire is both portable and durable.


> Also, when you invest in learning Kubernetes your knowledge is both portable and durable.

this may be true, let's check back in 10 years to validate the durability!

e.g. to give a non-tech counterpoint: I'm currently working on some logic to fit statistical models to data. The foundations of much of this knowledge is hundreds or thousands of years old (e.g. algebra, calculus, statistics). Orders of magnitude more durable than any knowledge related to the particular tech stack I am using.


I'm skeptical to believe the service is any more scalable than it would be with regular instances and multi az. mainly because in my experience scalability has way more to do with network topology and the architecture of how requests flow, rather than the tech for implementation.


> I could never run a scalable service on the cloud without Kubernetes

Can you give us an indication of the scale of your app? e.g rpm.


It is still in development, so no rpm at the moment.

That’s another thing: some people think Kubernetes is something you use if you need high scalability. I disagree. Kubernetes should be the default if your app consists of more than 1 service. If you don’t have high scalability requirements you can rent a single-node GKE “cluster” for about $60 per month.

If you have just 1 service then a single Docker container is all you need, so Kubernetes not needed.


This mentality is how we end up with overly engineered piles of dung. Instead of building something in the most simple way practical which would fulfill our requirements, we go all out. Now changing things takes longer because to do anything you have to weave through 10+ layers of opaque abstraction. No thanks.


If you don't have high scalability requirements, virtually anything will work. You're probably paying $55/month over the odds.


What will you use for service discovery?


off the top of my head?

a) Shared data source; each service writes pid/state to a file in the shared data store. It could be a single directory in a single server setup or a dedicated NFS/SMB server for hundreds/thousands of nodes.

b) Pub/Sub service; Kafka, et al, in which services simply subscribe to and publish to a central channel to see everyone else.

c) Determinism; You use predictable naming/addressing and simply infer. This is tricky to scale but not impossible.

d) Any number of stand alone discovery services ala Zookeeper or Eureka. They all end up being effectively the same pub/sub model as B, just prepackaged.

e) You don't discover shit, you have a single load balanced endpoints that can scale out instances as needed behind balancer with zero knowledge required by the rest of the system.

Pick one to suit your needs. Service Discovery is not that hard and has been way over engineered.


As I was reading this, I thought to myself "How does this scale" and then I re-read the parent comment that said "If you don't have high scalability requirements, virtually anything will work."

The fact of the matter is that Kubernetes solves certain problems well but also presents other problems/challenges. For some organizations, the problems K8s solves is bigger than the problems/challenges it creates. It's all about trade offs.

Some people do want to hop on the next big thing in order to keep their imposter syndrome in check. Others know a certain technology and stick with it.

Sorry, I'm just ranting.


There are lots of ways to avoid learning Kubernetes, but why? Kubernetes is so well designed and easy to learn and use!


This is the comment you see from people on EKS or GKE. Many companies have compelling reasons to keep a large part, or all, of their services in-house. Nobody who actually has to install and administer K8s is on here commenting about how easy it is to run, maintain, and upgrade on their bare metal hosts. Troubleshoot, I almost forgot troubleshoot! All of those moving pieces, and something is hosed at 3am. This will be fun.

It will be great if that changes someday, and there's certainly been progress, but for places where they'd need to run it themselves, K8s is a tough proposition.


I took the time to learn it, and for just my side projects it's a ridiculous amount of overkill.


/s


If you're not worrying about scalability like the OP said, static configs. Add a new service? Roll out config changes. Server goes down? Let the redundancy handle it, roll out config changes in the morning.


If you are a single dev writing of couple of small services all by yourself, then the odds are you don't need a technical solution for service discovery.


Your comment makes me so irrationally angry. I totally disagree. But I'll be civil.

I write scalable apps without K8. I moved away from it. Stateless services are trivial to scale.


[flagged]


Most of us have to deal with the design decisions made by others. I can see how poor decisions can make someone angry down the road.


That's exactly what my therapist said.


It was probably more the "if you have more than 1 service you need Kubernetes".

No. You don't.


well if you have 1 service k8s is already useful. try to make blue/green or any other no downtime deployment. especially with database changes.

in k8s deployments are deterministic, it will roll out X containers at once (X is configurable and defaults to 1)


This is my experience too. I've used smaller-scale tools (such as docker-compose, Dokku, Heroku etc) but I've found them to be a mixture of unreliable or unsuitable in the case of fairly modest complexity.

Eventually I turned to Kuberenetes to see how it compared. I spent a day-ish reading through the 'core concepts' in the docs which was plenty enough to get me started on GKE. It took me a week or two to migrate our workloads over, and once everything stabilised it has been pretty much fire-and-forget.

I have about twenty pieces of software deployed for my current client and I feel that I can trust Kuberenetes to just get on with making everything run.

I've since deployed clusters manually (i.e. bare metal), but I certainly wouldn't recommend it for anyone starting-out. Personally I'm keeping a close eye on k3s.

I think my main learning during this process – at least for my situation – was to run any critical stateful services outside of Kubernetes (Postgres, message queues, etc). I think this applies less now than it did when I started out (v1.4), but non-the-less it is a choice that is still serving me well.


"I could never run a scalable service on the cloud without Kubernetes."

But also

"The alternative to Kubernetes is learning proprietary technologies like "Elastic Beanstalk" and "Azure App Service" and so on. No thank you"

So can we clarify that you truly meant: "I decided not to run a scalable service in the cloud using any of the existing cloud tools that do and have supported that scenario for years. And decided to use k8s instead" :)


> I could never run a scalable service on the cloud without Kubernetes.

I find this statement quite bizarre.


Not bizarre at all - it's perfectly fine - this poster could never run a service without kubernetes.

Doesn't make any kind of judgement, just stating their personal fact.

I could never make a souffle without a recipe. Do you find this statement bizarre as well?


> I could never make a souffle without a recipe. Do you find this statement bizarre as well?

Of course. You most likely could, after making it dozens of time with a recipe.


I’m in a similar situation and kubernetes is honestly pretty easy to use one you get. If your team is small use a managed kubernetes Like GKE of EKS

It’s worth noting that kubernetes uses containers which can be created via docket but is not dependent on docker


Can you point me to a good doc on deploying a small production service on k8s?

The official documentation provides a super simple tutorial, and then nothing. There's not even documentation of the primary config file. Frustrating.

https://github.com/kubernetes/website/issues/19139


    If you have microservices then you need
    a way for services to discover each other
Why not run them in docker containers with fixed IPs?


What happens when the IP address changes? You need some way to lookup current IP addresses. Why re-invent DNS? Also, how do you protect these services from unauthorized access?


    What happens when the IP address changes?
Changes how? It's not as if the IP of a server magically changes out of the blue.

    Why re-invent DNS?
There is no reason to re-invent DNS. Each docker container will have to have the info where the other containers are. So you could write that into /etc/hosts of the containers for example.

    Also, how do you protect these services
    from unauthorized access?
You need to do this no matter if you use Kubernetes or your own config scripts.


> What happens when the IP address changes?

Erm, he literally said "with fixed IP's" (i.e. a "static IP")

You DO realize this is possible and easy to configure, right? If it changes anyway after that, that's an entirely new problem.

I feel like some networking knowledge will fall through the cracks eventually, static IP's might be one of those


Because you want to scale, or roll out during a deploy. Or one goes down and you need a new host.


Do you have any resources you'd recommend to learn Docker?


So much this.


I'm also enjoying Kubernetes. I started a hobby project on GKE just to learn, but now the project has 8,000 MAU or so and will be scaling up more in the near future. K8s is totally overkill, but I've had a good time and it's worked well so far.


I run a Saas business solo, for eight years now, netting six figures, and I've been on Heroku the entire time for just under $1,000 a month. Monolithic rails app on a single database, 300 tables.

Sometimes I feel teased by 'moving to EC2' or another hot topic to save a few bucks, but the reality is I've spent at most 2 hours a month doing `heroku pg:upgrade` for maintenance once a year, and `git push production master` for deploys and I'd like to keep it that way. I just hope Heroku doesn't get complacent as they are showing signs of aging. They need a dyno refresh, http/2, and wildcard SSL out of the box. I honestly have no idea what the equivalent EC2/RDS costs are and I'm not sure I want to know.


You should look into render.com which provides a service similar to Heroku. I haven't used them myself and have no connection with them. Their name does pop up a fair bit though.


Congrats on the business taking off! What is it?


Based on jblake's profile it seems to be guestmanager.com, but I might be wrong.

(Also, for everyone who doesn't know:

- clicking on a username takes you to that users profile

- clicking on the n minutes/hours/days ago take you to a permalink directly to that comment

)


Software engineering is the perfect example of the "blind scientists and the elephant" problem. It is a very complex field, with a number of related but distinct disciplines and activities required to make it work; it's impossible to be an expert in everything, so we tend to specialise: we have back-end engineers, front-end engineers, data engineers, SRE experts, devops specialists, database experts, data scientists and so on. Additionally, the software we are building varies wildly in terms of complexity, dependencies, external requirements etc; and finally, the scale of that software and the teams building it can vary from one person to literally thousands.

Articles like this one, and even more comments on HN and similar sites, generally suffer from a perspective bias, with people overestimating the frequency of their own particular circumstances and declaring something outside of their needs as "niche" and generally misguided and "overhyped".

The reality is that various technologies and patterns -- microservices, monoliths, Kubernetes, Heroku, AWS, whatever -- are tools that enable us to solve certain problems in software development. And different teams have different problems and need different solutions, and each needs to carefully weigh their options and adopt the solutions that work the best for them. Yes, choosing the wrong solutions can be expensive and might take a long time to fix, but that can happen to everyone and actually shows how important it is to understand what is actually needed. And it's completely pointless to berate someone for their choices unless you have a very detailed insight into their particular needs.


> Articles like this one, and even more comments on HN and similar sites, generally suffer from a perspective bias, with people overestimating the frequency of their own particular circumstances and declaring something outside of their needs as "niche" and generally misguided and "overhyped".

It's my experience the opposite is true. The blindness is people overestimating their needs (or resume-padding) and using specialized, overcomplicated tools meant for traffic in the billions (e.g. cassandra, kafka, mapreduce) for 20-person startups that haven't hit rapid growth (most of which never do).


I'm afraid you might be falling into the exact trap I have described. Realistically, how many such cases have you seen? And of those you have, how many did actually implement such a complex solution and ran it for a long time without either closing down or transforming to something more suited to their needs?


I kind of suspect you may be the one lacking experience.

I've worked at at least 8 different tech companies, mostly startups in SF or NY. The vast majority used overcomplicated technologies that didn't fit the needs of project (most frequently microservices and no-sql).

Off the top of my head I can't think of a single time such mistakes got corrected. More often than not things would continue to be even more poorly designed with the addition of new unnecessary technology.

In short -- I'm annoyed about this stuff because I've seen it first hand and had to struggle with it for numerous years.

Your weird theory that people are inventing hypothetical situations to be angry about... well I think you're the one inventing hypotheticals here...


In 24 years I've only rarely seen scenarios that actually require something at the level of complexity that k8s represents. I worked on Bing some years ago, and it would definitely have benefited but MS rolled their own solution (which has since been replaced by I don't know what).

I've seen k8s USED many times where it was wholly and completely unnecessary and being pushed by juniors who wanted to go apply to Google in a year or two.

I am currently running a service that receives 3000 rpm spike and averaging 500k requests a day.

On a single server behind cloudflare deployed straight from Github.

We have a version of the service also running on ElasticBeanstalk with a single server.

Neither experiences downtime.

People severely overestimate their needs.

Google, Facebook, Microsoft, Amazon? Are serving literally billions of requests per minute. They have a need for that level of complexity.

Most of us here... do not.


> mostly startups in SF or NY

This explains a lot. Fair point, this kind of approach is likely very common in this very small (except financially) corner of the software industry.

> Your weird theory that people are inventing hypothetical situations to be angry about

Uh, can you point where I said or implied that? That doesn't have much in common with the point I was trying to get across, but I can believe that I failed in my intent.


I disagree with the HN consensus here: I think managed kubernetes is really useful for startups and small teams. I also commonly hear folks recommending that I use docker-compose or nomad or something: I don't want to manage a cluster, I want my cloud to do that.

We run a fairly simple monolith-y app inside kubernetes: no databases, no cache, no state: 2 deployments (db-based async jobs and webserver), an ingress (nginx), a load balancer, and several cron jobs. Every line of infrastructure is checked into our repo and code reviewed.

With k8s we get a lot for free: 0 downtime deployments, easy real time logging, easy integration with active-directory for RBAC, easy rollbacks.


Overengineering is a real problem out there. I’ve seen k8s deployed for internal back office apps that have literally 5 users - a raspberry pi could’ve hosted it. Keeping things simple and reliable is often a harder skill to learn than $BIGCO_TECH, and often confounded by political incentives.


So if you do that another internal enforcement group will come and be like "policy is that everything is cloud now, what's your migration plan?" regardless of anything else, and the only answer is to have a plan that keeps costs similar to the Raspberry Pi.


So, request budget to move the app to a cloud VM / container, and work out the security issues, etc. The latter will be expensive, but the RPi deferred the cost. Bean counters like deferred costs.

Then, ask finance to figure out the billing. It costs about $5 / month to rent a raspberry pi equivalent, but multitenancy might reduce that.


I'm getting tired of these "you don't need k8 posts". Sure, if you have a simple web application with a REST API, don't use k8, unless it's for learning purposes. But nobody does that anyway.

If you have something more complex with many moving parts that are separate services, k8 is a great option. I've been using it in production for close to 2 years now - not a single service downtime, great fault-tolerance, and absolutely zero management effort. Deploying complex applications, databases, and monitoring systems is easier than ever before. I don't think using k8 is overly complex. Yes, you need to invest some time to learn it, but that's the case for every new technology.


Practically every SaaS business could be built as a single application or a set of services, especially at the beginning when all the traffic could be handled on one server. So whether or not k8s is appropriate isn't driven by your business's complexity but rather by your architecture choices. Choosing to build microservices is what adds complexity and makes tools like k8s necessary, not the business.

That complexity can be useful. It can mean you don't need to build a simple version and a more scalable version later. It means you can replace parts as the business grows rather than the entire thing. You can outsource or buy some services. These are benefits, but they are also choices, and all of them increase the time it takes to get to market. In a very well funded startup that has years of runway sometimes choosing microservices makes sense.

In most small startups that are just proving their model they'd be much better off hacking something together that they'll throw away later in order to validate the business idea. Finding out that your business doesn't work as a business is far more useful than building something that can scale just in case it does.

The real difficulty in that is accepting that the idea you have might not actually work.


"But nobody does that anyway."

You'd be surprised, but they do.

EDIT. I work at company with < 5 devs and we are in progress of moving our services to k8s. Will se how it goes.


> But nobody does that anyway.

Please speak for yourself. In my experience, most of the software that's written using k8s/µServices could've been written using a simple Flask+Nginx+monitoring on a DO Droplet. K8s and the extra overhead leads to extra efforts in developement and testing, and many software teams are perennially behind schedule on delivering features.

There are ways to achieve fault tolerence without tying in to the complex ecosystem of K8s.


We have a possible counter-example: our service is computing AB-test results (essentially reading from a database, processing totals, writing significance back). It have no non-system users, no dependencies, etc. All the test and strangeness is handled internally.

We use k8s which is indeed over-engineered — it ran fine as a reminder and a local script for years. But the rest of the company has a release process that they like. We just integrate to it. Our service has a name in that space, ressources, a schedule that other engineers can read. Our description looks a little… film-school-credit-rolly because my name appears as the lead, architect, project, emergency contact, etc.

I think the main oversight of those “you don’t need k8s” is that most projects are part of a system and fitting in that system gives you legibility to your peers that a nginx might not.


I have ~62 apps (that's applications, not instances) deployed on kubernetes right now, for single client.

It started out on 2 VMs with I think 2-4 cpus, already running kubernetes. The actual containers inside ran lighttpd and served static files while we fixed up the sites that we have mirrored as static to run in containers.

If we had to run one, or maybe few of those sites, it would have been easy to run a single Apache + mod_php + vhost. We would have some annoying work to do on monitoring and logging side.

But we have 62. Some of them are mutually incompatible to run on "standard" distro, as they have mutually exclusive dependencies (for example, PHP versions). This meant we ended up with containers to manage this in somewhat doable way (we are two people). We can't expend manpower to do a complete redo of the apps, though we have ideas on that (to go to single common CMS system for all of them).

K8s saved our sanity. Because those "simple apps" altogether made for hard to manage setup, and the client doesn't like when they are not available, so we brought up HA as well.

Doing this as separate VMs would be hard and expensive. Doing it on Heroku is expensive - I made a calculation, and our original setup would result in somewhere around ~1800 USD a month.

Our 2016-2017 spend on GCP (GKE, Cloud SQL, a VM to host Gitlab + network traffic and DNS) was around 1000/month.


Do they need to be 62 though? Without having looked, my ignorant guess is that it could be reduced by an order of magnitude, it's just that doing so would take time that no one has, so the simpler choice is just to shovel the complexity into K8 instead of dealing with the complexity of reducing the number of apps.


The guess is a big miss, yes.

Assuming certain pruning is done that I can see, we would reduce it to maybe 40-50. They are all separate concerns, independent from each other, the pruning would merge the most mergable elements back (those are, honestly, a tech debt and I'd welcome replacing them with one common app).

BTW, those 62 apps? They map to ~260 domain names. Those domain names and what shows when you go there are what the client is paying us for.


We used to manually ssh to deploy our dozens of nodes, just a handful on developers. git pull, restart service.

Then we got to hundreds of nodes. Chef, chef, and more chef. Deploys were typically run with a chef-client run via chef ssh (well, a wrapper around that for retries). With dozens of services and many dozens of engineers, this worked well enough.

Then we got to thousands of nodes. And hundreds of developers working on a multitude of services.

We've adopted k8s. It has been a lot of work, but the deploy story is wonderful. We make a PR and between BuildKite and ArgoCD we can manage canary nodes, full roll outs, roll backs, etc. We can make config changes or code changes easily, monitor the roll out easily, and revert anytime. I still don't _like_ k8s mind you - I don't think programming with templates and yaml is a good thing. But I've come to terms with that being the best we will have for now.


We deploy small clusters everywhere in the same pattern, I love argocd. This article fails to understand the use case for kubernetes, and arguably doesn't fully understand the cloud.

Kubernetes is revolutionary, to think it's not is foolish.


> Then we got to hundreds of nodes.

Until you get there, Kubernetes is most likely overkill.


Kubernetes solves very real problems in a way that handles a full suite of them.

This is very complex because the problem set is complex.

If you're running a substantially smaller system, k8s makes less sense.

That said, if you're familiar with running and monitoring k8s, a gke deploy will solve a lot of the pain a traditional LB + EC2 ASG will incur out of the gate. Let me explain:

Notionally, we need 4 basic services operationally for a single typical service deployment. 1 of FooService, 1 load balancer, 1 database, 1 monitoring/logging system. All of these should tolerate node death; this means roughly 3 pieces of hardware for this notional system. This is complexity that k8s covers, at a high cost of knowledge. If you're bought into AWS, the Beanstalk system will do this decently well, last I checked.

I think there is room for a k8s-like tool that is good for teams with < 10 services, and less than 10 engineers. Even k3s (https://rancher.com/docs/k3s/latest/en/) has substantial complexity at the networking layer that, I think, can be stripped for the "Small Team".

So I agree with the author in theory that k8s is overkill. But also other infra types can start getting difficult to deal with in time, and "just deploy onto a single big box" doesn't cover the operational needs.


Would AWS Elastic Beanstalk fit that <10 services profile?


yes, it would.

Costs start really getting heavy with EB at a certain point, since you're spinning up 1+ ASG & LB per service (a tier is an ASG and a LB, possibly a DB). I wouldn't build a microservice architecture against EB, at _all_.

I'd say EB probably is cost effective up to, IDK, maybe 3 services with 3 nodes per ASG. Then you're breaking even or worse with k8s ops cost, and now you're looking at "how much time (= money) is it to manage k8s with KOPS" vs "how much are we spending on EB". KOPS is a very low-effort solution once you get it rolling.


Probably unpopular, but I am generally opposed to using Docker/Kubernetes for ~75%+ of projects. I've been in arguments over this, but containers being unmaintained and the complexity of Kubernetes can cause major issues. It's over engineering for smaller projects. That's just my opinion. I think a flat VM is more appropriate most of the time. But there is no denying the advantages of Docker when it's done right and used right.

A developer told me just a few weeks ago that you should "always" use Docker, which I just found to be so ridiculous.


So I wasn't really on the Docker train for a good while, but using containers takes so many sysadmin and environment issues out of play that I've started to change my mind. The position I find myself in is actually that for development environments Docker Compose and similar tools are wildly convenient, but that there's often not a really obvious path from that to production at small scale low cost.


that’s not unpopular at all. everyone that has had to run k8s and keep it uptodate or deal with unmaintained docker containers understands this 100%


People who have issues with unmaintained docker containers are doing it wrong. You still need to assess the quality of your dependencies for container images just like you should be doing for any other dependency.

The issue is that docker lets some developers get in over their head very easily. Many orgs have system admins to install and configure server operating systems, but docker shifts some of those responsibilities back on to the developer.


yes, but no.

usually I would agree with you, but in today's world where we curl install stuff from the internet you'll always have someone pull a container to 'just get it to work'. Once the prototype works it's production. People that have the discipline to actually research the quality of deps or... god forbid... actually build the containers that rely on from scratch will not get into this kind of issue, but again: kids these days...


It's not that hard to use Kubernetes and it makes the developer's life easy. It's very easy to deploy helm charts and even though that there are many gotchas and complex things, if you want to deploy something simple, it is easy and completely doable to do it even solo.

(rant)

After over 10 years in development I've done and used literally all the things people complain here a lot about: Virtual machines, Single page apps, docker, microservices, FP and the list goes on. Even though I've struggled, I feel very lucky to be able to try all those things and it's a joy to use and I've shipped shitloads of great code that is making a lot of money to a lot of people and improving businesses in general.

I don't mean you need to use K8S or even like it, but there is definitely developers which know their shit very well, and can also make great Single page apps using more than 3 different JS framework, also write good backend code and so on. And also enjoy all of this and make companies definitely successful. It sickens me a bit how so much posts of this kind get a lot of attention and could be replaced by "yes, software, like everything in life, is complex!!!11". I think the article itself is completely shallow to actually touch the difficulties there is with using kubernetes and is mostly useless information. There are at least 10 posts with a better and more structure criticism, but it's just because it's cool to complain about new things, it gets automatically traction in HN(which used to be a place where people like new things...).

So... yes, you shouldn't use K8S everywhere(also applies to everything...), but it is the new thing(well, not really new...). Should we just talk about Apache mod_php? It's natural that people want to try new stuff and actually enjoy working with software. Not everybody sees everything as problems. "Now you have eight problems, hehehehehe!!11".

Am I the only one that found this post completely useless and at some degree, toxic?

(/rant)


Well good for you that your SPAs work every time, and you never broke production with kubernetes. Here's a cookie.

Now for the rest of us who work with engineers across all skill ranges and experience levels, we actually do need to care about such factors.

The question is -- you hire a guy off craigslist to run your site, and every minute of downtime costs $1,000. Are you going to want him to use Kubernetes or a braindead simple hosted solution?


Some people are tired of acceleration and fear the industry moves too fast for them and some people are tired of the people who would like to take a step back because they feel like they're being held back. Both feelings are valid and it's good to have the full spectrum represented.


Having an industry veteran like Itamar weigh on a pointless complexity that everyone seems to be in love with right now? Extremely valuable. Possibly saves peoples' lives, financially speaking.

Old IT folks are like explosives experts: if you see them fleeing in panic, follow them!


I see the author is a proponent of docker-compose, which I use myself for small projects. I have a docker-compose configuration in all my repos, and a `docker-compose up` brings the app up on my laptop. I could use minikube in almost exactly the same way. i.e. there is effectively no difference from a development perspective.

If you are managing kubernetes yourself, on your own hardware, the moving parts can indeed be a burden for a small team - but all of these pain points go away with a managed kubernetes, as offered by most IaaS providers. i.e. if you are using an IaaS provider, there is (usually) no difference from a production perspective.

There are less moving parts in docker compose, and its easier to run on a single VM - but it doesnt offer any of the dynamic features of kubernetes that you would want at scale. The same containers can run on both.

If you need to dynamically scale your application, or grow beyond a single machine (I disagree with the vertical scaling proposed by the author - thats for a very specific use-case IMHO), then docker-compose is simply no good. Then you need to use docker-swarm. At this point, you either need to manage a docker-swarm cluster or a kubernetes one. Kubernetes is the obvious choice here. Fortunately, there is a trivial migration path from docker-compose to kubernetes.


> there is a trivial migration path from docker-compose to kubernetes

The migration path of docker-compose to swarm is basically:

eval $(docker-machine env my_cluster) docker deploy --compose-file docker-compose.yml PROJECT_NAME

I have looked into k8s and it wasn't as easy as this.


Yeah, its not quite as easy as swarm - you basically need new configs - but you can use the existing containers.

From experience, this was no more than a few hours work on an app consisting of ~20 services - but I already had kubernetes experience so knew what I was doing.


Also docker-compose doesn't have zero downtime deploys which might be a deal breaker depending on your application.


To be fair, its quite simple to do this.

As part of your config, you would specify a reverse proxy service, and a couple of app aliases. Then you would bring up the new deployment under one alias, and shut down the other.

But yeah, its nice to have that kind of stuff (and much more) built in. docker-swarm also offers that out of the box, but my experience leans more towards k8s here.


Or Nomad or Mesos!


Ill check these out. Thanks!


There’s a lot of configuration to understand with k8s and even GKE. Badly configured probes, resource budgets, pod disruption budgets, node affinities etc. can have disastrous effects. I’m pushing my teams more towards serverless since it takes out nearly all ops/scaling/rollout complexities. Right now we’re seeing our serverless apps on GCF, GAE and cloud run outperform our GKE apps easily in scaling, reliability, and simplicity (configuration and time spent getting it deployed in a satisfactory manner)


This is the lesson my last company learned hard. For anything serving less than tens of thousands of requests a second, you just can't really beat GAE in terms of simplicity and cost.


Im planning to use GAE in a production environment.

Can you share the specifics on how GAE managed to scale please ?


It’s interesting that this critique of kubernetes is on a blog called “python speed” because my most recent project with kubernetes was deploying a large dask cluster. For this use case k8s was really valuable. It made the devops part so much easier than it otherwise would have been, so we could put most of our time into application logic. In other words, when we wanted to achieve substantial “python speed” kubernetes was very helpful. For data engineering projects, even with a small number of data engineers, it can be a big productivity booster.

Personally, I like kubernetes and find it easier to use than other devops tool sets, so it’s become my go-to tool. Probably wouldn’t recommend it to someone who doesn’t know it and has a simple app architecture.


I've taken over a project containing 6 DB entities. Instead of building a monolith (or normal REST API), the Architects used 7 µServices based on k8s and NoSQL DB. Now simple development tasks take extra time, and anything that affects multiple µServices needs n times the development efforts. I wish they had started with a simple monolith, and refactored to µServices if needed.


Your problem, my friend, is not Kubernetes or anything else technological. It is the people around you that call themselves architects :-)


True that, I must add that k8s added to the delay for features and enhancements.

Like most enterprise projects, by the time I got my hands on this project, all the Architecture Astronauts had already moved to their next planet :-)


I’m a very happy user of Rancher 1.6 for years. Simple, nice GUI, got everything I need, works fast, can deploy as many apps /services as you wish, no new concepts to learn (if you know Docker that is).

Used it in my previous agency to manage clients websites and use it now in my startup to manage multiple envs with few apps (api, front end, workers) and nice and easy deployments via GitLab CI.


Heh, it’s quite amusing to see the posts here arguing that “you can do the same thing with multi az deployments on aws with VMs, packer and ebs. Kubernetes needs you to learn so much shit” ... do you even read what you write?

Kubernetes is not gospel. It’s an opinionated, incomplete framework for orchestrating container workloads. There are other ways to do the same thing which are fine too. It works well for the most part but has disgusting failure scenarios. So do other techs.

People who use and like kubernetes are comfortable with its trade offs and portability. You may not be. It’s fine.

Shitting on kubernetes just because you’re comfortable with another technology just because you can: that’s not fine.


> The more you buy in to Kubernetes, the harder it is to do normal development

This demonstrates the bias and perspective of the author. The best way I can describe it is code-centric rather than system centric. If that's "normal" then the article makes some very valid points. For example, I've seen quite a few folks make the attempt to scale out badly when they could've scaled up rather easily. Very many "bigdata" problems can be handled on a single machine with a terabyte of memory.

If one shares that code-centric perspective, then yeah, k8s probably isn't for you. The real benefit in overcoming the very validly criticized complexity of k8s is the number of things that happen without intervention.

From a systems-level perspective, all these things are crucial. Services are abstracted with endpoints by default. Liveness and readiness are built in. Self-healing is built in. A consistent model by which apps are deployed is built in. Logging, metrics, and SLA monitoring while not built in can all be added and employed without intervention.

Ideally, these things abstract the infrastructure sufficiently well that it allows developers to focus on development, rather than ancillary tasks like deployment, monitoring, resilience, etc.


k8s is raw technology like linux kernel. You shouldn't use it directly which will be hard to maintain. There are bunch of packaged solutions around k8s like Google GKE or AWS EKS. By leverage them, you'll be working on a higher level of abstraction and bring productivity back.


What's the best package for running in-house?


Rancher labs have a wide offering that has worked well for me. I’m currently running a couple K3S clusters on an array of Rock64 sbcs and have used the „real“ rancher distro for a while too with no problems. But I’m hardly an expert regarding the k8s on prem market so there might be even better ones!


Not sure about your use case but you could have a look at Minikube:

https://kubernetes.io/docs/setup/learning-environment/miniku...


I assume GP means Rancher (not a recommendation for nor against) or similar.


I’m only on the start of my k8s journey but I’m finding typhoon a nice option for on prem currently.


OpenShift is pretty neat!


it seems too me like kubernetes aims to replace the existing service-mesh consisting of de facto microservices (load-balancers, remote-logging, systemd, xinetd, ...) bonded by unixy conventions with.... a monolithic, proprietary system. Proponents are then advocating to build decoupled microservices on top of this. Am I the only one who thinks this is shizophrenic?

(on the other hand: companies claiming to make the world a better place by selling ads to the highest bidder are shizophrenic...)


I’m positively attuned to the ... de facto Microservices bonded by unixy conventions ... part. And it has worked well in the past. You had much more freedom but everyone needed to pretty much roll their own.

Kubernetes is a compromise on that. You don’t need to build you ops framework from scratch now and deploying something has very well defined APIs. OTOH you now are in a world of finding which k8s extensions/plugins/whatever you should use and which are just “fancy”. And answer questions like Is Knative V0.8 good enough for our workloads? Because Ai Defi needs that for some reason...

EDIT: Kubernetes is open source though right?


yeah it is open source, but half a million loc for a simple admin tool seems like the perfect path to vendor lockin ;)


This is exactly what lock-in looks like.


I'm curious what HN would recommend as alternatives, especially for small/early teams that are outgrowing single-machine setups.

It seems to me that there's something of a gap between "for single machine setups" (eg docker-compose) and "for 500-engineer teams" (eg kubernetes).


I would recommend two machines and a loadbalancer. Most often the thing that doesn't scale isn't the applications you're able to run in Kubernetes, at least not at first. Most customers I've dealt with who have scaling issue need take a look at their database before Kubernetes.

Kubernetes is awesome of many reasons, but it's a big step from a single server setup to full Kubernetes and there are plenty of options in between. We're able to scale large national infrastructure projects using VMs and loadbalancers, but we also deploy run stuff on Kubernetes. The thing is: when stuff breaks, you'll prefer that it's not the Kubernetes stuff.

If you run the infrastructure your self, you should be VERY sure that you know how it works, because debugging it is extremely complex.


Full-disclosure: I co-founded a company that's building a developer tool for K8s (and distributed systems in general).

But we've had a lot of success with running on GKE. The tool that we're making takes away the complexity of building, testing and deploying the stack, and GKE takes care of running it.

In fact we use GKE for both our development and staging/prod environments.

There are a lot of great tools out there that vastly improve the K8s developer experience[1] and the cloud providers take away of the pain of operating it. And with tools like Terraform and Pulumi you can codify the whole setup.

Here's an example of how you can quite easily get started on GKE: https://medium.com/garden-io/gke-and-cloud-sql-a-complete-wo...

Here's a video of the same workflow: https://www.youtube.com/watch?v=iHyeD97GrE4.

[1]

https://github.com/GoogleContainerTools/skaffold

https://github.com/garden-io/garden

https://github.com/windmilleng/tilt


Often you don't need docker-compose either as long as you aren't deploying things that have weird dependencies.

Single monolithic application? Can run on a VM.

Multiple smaller applications written in Java, C# or Go? Can probably run side by side on a VM.


Fwiw docker compose is great for running stuff side by side on a VM :-)


I work in a small team, we just install and configure everything without Docker, Kubernetes or anything else. Just install it like I would with any other software and then run it after configuration. When it's time to deploy we have a bash script that bundles the app and pushes it onto the server.

Almost too simple to work, but it does ;)


ECS works great and ties into IAM/VPC/etc.

Also you're not paying $80 per control plane, they're free.


I’ve been using Rancher for years. Version 2 is all about K8s, but version 1 is super simple to set up and manage with a nice and clean GUI and pretty much no new concepts.

Unfortunately version 1 is abandoned now (I think) but it’s stable and I haven’t had any problems with it. Give it a go.


Docker swarm?


Does anyone actually use it in anger though? I've not followed it well, but it seems to me like it's Docker Inc's failed attempt to compete with K8s, which doesn't bode well for its future.


Anyway, it fills the gap between single node docker-compose and k8s. Because you can use docker-compose against swarm instance (so your app is running across multiple hosts)

Swarm is so simple. Few commands and you can start scaling, routing, load balancing, desired state reconciliation

https://docs.docker.com/compose/production/#running-compose-...

Yeah, there is a concern about the docker swarm future: https://github.com/docker/swarm/issues/2965 But mirantis just last week announced they will support and INVEST in docker swarm: https://www.mirantis.com/blog/mirantis-will-continue-to-supp...


Yup. It’s good enough if you’re already invested in docker for dev and good enough if you’ve out grown a single machine. But not big enough for more than 10 nodes and user roles.

From an ops point of view it’s simple to deploy and I’m yet to hit any really sharp edges. However the lack of cronjobs and init containers is an annoyance, but you can work around them.

Mirantis’ recent announcement of further dev on Swarm after the initial announcement of only 2 years of support has not helped. I had been looking at moving to k8s. I’m now undecided if we should just continue with the plan to dump Swarm or keep it.


K8s is complex. For this reason Cloud providers sell it as a service. K8s and microservices are a trend topic, so it is true you must think with your own head before creating microservices at will.

But I think the article is a bit too negative.

For instance, in my humble experience the application server is always a bottleneck before the database (i.e. if the database is Oracle or PostgreSQL).

Microservices move a bit of complexity on the client side, ask for smarter clients and offer a lot more resilience and fault tolerance.

The article focus only on scaling and forget the "Single Point of Failure" problem.


It's sold as a service because it drives compute. Compute is one of the top money makers. The more people who buy into things like k8 that are essentially VM deployment launchers the better. The providers aren't emotional about this like 90% of the posters here. If a new flavor of the week gets people to launch VMs next year then that will be sold.


> you can use docker-autoheal or something similar to automatically restart those processes

I consider it sloppy to accept that a process will crash and become unresponsive as a normal fact of life, and that subsequently they have to be automatically restarted. A process should stay put at what it was designed to do. Reasons for the crash/unresponsiveness should be investigated (memory leaks, race conditions etc) and not swept under a carpet with an automatic restart.


Restarts should go into your reporting too.

Erlang seems to take the opposite approach. Processes are cheap, when one wears out you dispose of it instead of trying to fix it.


It would be unfair to simply mention about a technology's "problems" without mentioning even a single point of "features".

IMO, Itamar Turner-Trauring has misrepresented Kubernetes.

The sole reason of Kubernetes being popular and uprising is because there are genuine pros and features that make many people's lives easier instead of more miserable.

So let me point out the pros here:

[1]: https://www.infoworld.com/article/3173266/4-reasons-you-shou...

[2]: https://hackernoon.com/why-and-when-you-should-use-kubernete...

[3]: https://opensource.com/article/19/6/reasons-kubernetes

[4]: https://www.weave.works/technologies/the-journey-to-kubernet...


I'll take kubernetes over the miriad of AWS services any day.

Kubernetes is de facto the cloud standard. If I have to know stuff about the cloud, I would like my knowledge to be transferable to the other cloud also. I understand that these clouds have to compete and so on, but why do I have to pay the price to learn their particular ways of naming things and private apis, and what not.

So of course devs like kubernetes.


We're being pushed to move to Kubernetes/EKS (or, ECS as a plan B) away from Elastic Beanstalk.

Elastic Beanstalk does the job but it's slow to deploy things, it's inflexible, and the Amazon AMIs that underly them are riddled with ancient packages that infuriate security scanning security people.

Are we making a mistake moving to Kubernetes instead of re-designing an infra-as-code using docker+terraform+ansible+a CI pipeline? We run a client-facing app but it doesn't have super low latency requirements, although it does need to be able to scale up to run jobs.

Something else. I have to be honest, the complaints here about "people are doing it for their CV," while true, need to be understood in the context of, it's extremely hard to move jobs without significant kubernetes and Docker experience.


Lol. I've been using k8 for 5+ years and am currently interviewing for more 'traditional' sys admin roles. The number of traditional problems I don't have by running even a small k8 cluster has become even more obvious as I go through the process. For every alternative you may suggest, I can point out dozens of problems you can run into that are solved with k8s. Just like any other operations tool, yes, it takes time and effort to learn...that's why sys admins exist. I always find these posts by non-sys admins humourous. Devs often have so little respect for the knowledge base required to maintain a stable, secure, scalable, and flexible system. It ain't nothing, no matter the tool you use.


> "You can get cloud VMs with up to 416 vCPUs and 8TiB RAM, a scale I can only truly express with profanity. It’ll be expensive, yes, but it will also be simple."

It's simple until you need to update. Good luck meeting any SLAs with your fleet of singletons.


We originally built OpenFaaS for Swarm, then moved to Kubernetes and support both now. The complexity of K8s is harrowing, but by and large works well, if you can keep up with the pace of change. Try running a controller you last modified 12 months ago on Kubernetes 1.17.

Now we've spent time looking into containerd and trying to provide microservices/faas on top of that instead - without the clustering (https://github.com/openfaas/faasd)

Something I do like about K8s is the ecosystem - in 5 minutes I can automate TLS with LetsEncrypt on a managed cluster.


Kube isn't for small businesses. It's for enterprises.

Most developers aren't running the clusters, they are 99% of the time managed by the cloud provider.

Why would I ever order VMs and manually run a kube cluster.

But as an operator needing to manage clusters and multiple teams, I spend my time coding automation.

I'm not saying kube is not simple, but it's a lot easier then managing application runtime on VMs.

Before kube I was managing 260+ VMs (hyperv was internal managed DCs central/east) for my products in-house app. I had to essentially build my own poor man's orchestration platform to manage applications and deploying.


I find the assumption of many companies that they NEED stuff like K8S because they are going to be so "webscale", frankly, pretty arrogant. Most systems out there can run on a 2010 laptop, if done right.


In my previous company we had 20 million daily active users and we ran that on 4x M4.large EC2 instances. 4 instances not because it had any significant load (probably ~10-15% sustained), but purely because of high-availability and the ability to do a rolling release update.


I use docker swarm (mode? I am not sure) in production and it works great.


I really like docker swarm for the on-prem stuff we have.


I'm coming from the serverless side of things where people always say that only a few companies on earth even need K8s, like cloud providers.

How do you rationalize using it in your company?

For example, a learning platform that allows to integrate code for frontend and backend into your lessons uses Docker containers for their product. They can't offer runtimes for every programming language running on the frontend and backend, so they allow the teachers to upload their own container. I'd say they are a good example of a company that could need k8s.


> there are wide variety of tools that will do just as well: from Docker Compose on a single machine, to Heroku and similar systems, to something like Snakemake for computational pipelines.

Even though it has the disadvantages of being vendor-dependent and not open source, I've found ECS to be a very nice solution. Conceptually, it's very similar to Kubernetes, with much of the plumbing that makes Kubernetes so complex baked into the AWS platform (much of which you're already using if you're on AWS)


I wouldn’t recommend ECS to anyone. It’s technically worse than k8s in every way and tied to a vendor.

For me the biggest issue is speed of deployments. In practice it’s hard to get deploys on ECS under 1 minute. With k8s, 5 seconds is easy.


+1 for ECS as well, I think it’s a good middle ground and much easier to use. Is getting better as well with spot instance termination detection in the ALB and more seem to be coming soon. If you are on AWS is worth considering rather than EKS or role your own K8s cluster.


EndpointSlice is a really bad cherry pick. I have written K8S controllers for 2+ years and only recently learned about it when I had to write an ingress controller. Funnily enough I also had to learn about externalTrafficPolicy because it turns out if you have a lot of pods and use an AWS ELB the traffic distribution can be terrible, so a daemonset with local-only routing to them and then round-robining to pods works wonders.

You need to know none of this if you're not even close to the level of scale we are.


Every time I feel secretly embarrassed for running my small projects on a simple cloud VPS VM, an article like this comes along and restores faith in my decision to not over-engineer things.

This has come up on HN before, and it's a great read - "You are not Google": https://news.ycombinator.com/item?id=19576092


You can get pretty far with docker and things like ECS / Fargate etc too.


What service would the HN folks recommend for someone who needs to run a few dozen different docker services that require persistent storage? It would be nice to just have a pool of compute resources tied to persistent storage and be able to spin up docker instances at will. K8's has been suggested as the correct solution but it sounds like a lot of overhead for services that require no scaling at all.


I feel that the pro-monolith / anti-microservice attitude has become something of a cargo cult (at least here on HN).


Cargo cults were island nations that saw cargo planes land. They didn’t understand why the planes landed, but they did want to trade with them. They built makeshift runways and put out boxes of cargo in the hope of attracting planes.

If anything, the 5 person startup with 20 microservices “because Google/Netflix” and “we want to scale” is the cargo cult in this debate.

Not saying one side’s wrong or right, just that cargo cult doesn’t seem to being used correctly here.

(I admit I’m skeptical of microservices, since they add so much complexity, and even Netflix suffers partial service outages on a regular basis...)


Most of the "You don't need kubernetes" posts should be "you don't need docker".


There is something amusingly ironic about a python blog complaining k8's being overly popular relative to how difficult it is to actually use. K8's is extremely complex sometimes, but at least it maintains some semblance of semantic logical consistency, unlike certain other tools.


Great tools make hard problems approachable. They also reduce impossible problems to hard ones.

But, it takes experience (having solved things with easy and hard ways) to prefer easy problems and easy solutions.

It's a good time to be a consultant where you get to solve the same problems using different approaches.


Its also more common now to add on 3rd party functionality like a service mesh, various serverless implementations, secrets management, logging frameworks, etc. Making it even more complex. Not disputing some of these add value, but the number of moving parts is high.


How about this rule of thumb when it comes to the question of "Should you Kube"

How many developers in your organization?

0-10: No

10-100: Maybe

100-1000: Yes

1000+: Definitely


You are the man. Preach on.

I wonder how many teams with active users in the 1000s have fully dockerized/Kubernetes/microservice type shit designed for 10000x load that they will likely never get to because they didn't spend their time iterating on product.


I think the network part of Kubernetes is hard todo right and it’s very complex.

Further more I ponder on the performance if you compare a local monolith service on metal with full local cpu cache vs a distributed network requests Microservices.

Disclosure I run Kubernetes in Production.


It's IMO much easier if you don't run complex CNIs, but unfortunately not everyone can afford that :(


We use k8s at $JOB so I decided to look into it on my own time. A wasted weekend trying to install it and a profanity laden #ragequit later I moved all my stuff over to OpenBSD instead.

Never been happier.

In the case of k8s: ignorance truly is bliss. Keep it simple people.


So if Kubernetes makes management simpler and more robust for teams of 500+, but is overly complex for teams of 5, what solution would people recommend for teams of 5?


Docker compose and a bunch of shell scripts?


Yeah always seemed like madness to me. Docker compose seems to be the sweet spot. Still sorta infrastructure as code via yaml without the fleet/swarm logic overkill


There is something amusingly ironic about a python blog complaining k8's being overly popular relative to how difficult it is to actually use. To be fair, I've had nothing but mediocre experiences with Python, so I'm a bit jaded.

On the actual content of the article....well it gets worse somehow. There are good arguments against using K8's (or any tool, really) but I don't think any of them made it into this article. "Why scale with microservices when you can just get a single massive VM" was probably my fav.


If your solution comprises only of one or few systems, and primary reason for which you are considering k8s is just to tackle clustering/scalability/ service discovery then you can always just try to start by building simple clustering into your system.

Here is how I built it into mine: https://www.titanoboa.io/cluster.html

Obviously this will not always be the right solution, but in some cases might be better fit than k8s...


Kubernetes local environment setup

Kubernetes debugging

Too many configurations to worry about

Ever evolving features

No good practices

All these apply for small teams who wanted to get the products out to the market


For our team with many containers scattered around ElasticBeanstalk and ECS, K8S makes everything much cleaner.


Flash News! Software engineering is hard.


So, Kubernetes is the 2020 version of the enterprise application server of the '00s ?


they say kube is greek for 'pilot' but it also means 'dice'

are you feeling lucky punk?


Because Greek y and u are the same, "cyber" is from the same word as "kube". "Cybernetics" was supposed to be the art of piloting, but it was very loosely defined and the term was deliberately overhyped.


the proper way to use kubernetes is:

- if your org is big enough: hire/train your devops engineers to manage the kubernetes cluster(s)

- if your org is small or waaay too big: use some form of managed kubernetes cluster (aws, gke, do-k8s etc)

- don't


I will in the future on my CV also have a "Will Not Work With Skillz-Matrix" Kubernetes goes on first... Unless I'm applying to Google or something of similar size.


A lot of the problem with using Kubernetes is it appears to be the only option for running microservices in a cloud environment. People choose it because they think there's no other option (and they're somewhat right). But there's Nomad, DC/OS, Docker Swarm (for a little bit anyway), ECS, GKE, etc. That's still not a ton of options, but there are options.

That's just microservice orchestration. That's a small part of the totality of things needed to implement a full-out SDLC. You can't just build Kubernetes and think you're done; your code will need to integrate into a lot more stuff, and you'll end up writing 10 layers of glue because that's just how many use cases you have to support.

And it's weird that all that glue doesn't use standards. I mean, we have TCP/IP & RPC & REST, we have pipes & filehandles, we have the OCI specs. That gets us to a point where (at most) half of the stack of an architecture is portable and interoperable with any system following those standards. But then there's every other component of the architecture that connects all the pieces together, meaning you're writing glue that will only work for one implementation. Change your implementation, and you have to change your glue, and probably more stuff.

I think a lot of that non-reusable glue could be erased if it all followed standards, such that the configuration and operation of each part followed a standard interface, set of data types, etc. Tools and libraries could just "talk container orchestrator" or "talk load balancer" or "talk object storage" or "talk secrets management", and virtually any component could be integrated into any other, by virtue of either a system-wide or application-specific configuration.

You could argue we have something like that now with a "kubectl file" or similar, but that's not only still platform-specific, but other tools don't speak it, so K8s has to do everything, because it's the only thing that speaks its language (config file/backend data store/IAM/secrets/roles/etc).

Rather than resign ourselves to those limitations, we could bundle everything in an implementation-agnostic standard way with standard interfaces. The exact same configuration (as code) could be used to run the same complete architecture on a dozen different platforms, because every component would speak the same language and handle all the other components in the same ways. The backend services could all translate the standard based on how they were configured, such that generic instructions are then translated into implementation-specific actions. You could really write your architecture once and run it anywhere, without the caveat of "anywhere on this platform only".

I feel like we're not talking about doing that because we keep getting caught up in "Fuck, Kubernetes is pretty hard" conversations. Yeah, it's hard; building and operating an 18-wheeler is hard. But what about the roads? What about the gas stations? What about the containers we put on the trucks? All that stuff is standard, and so we don't have to worry about what implementation of gas station or road we use. I feel like we still don't have those things in the cloud, and it's just weird.


Then dont use it. Like wtf.


Once you configure the kubernetes network layer with whatever hosting platform you’re using, it’s really not difficult to administrate. It’s funny to me how much kubernetes hate there is on hn.


Until need to debug an issue and you're way over your head due to all the moving pieces.


Then assign someone to debug it who understands that kubernetes is a wrapper of common linux functionality.

It’s not that hard to debug issues in kubernetes. Check status of pods, memory levels, storage mounts, network configurations, and the stacktrace you’re debugging. Not that difficult.

I’m not saying there aren’t edge cases, but if you set up your system with centralized logging (filebeat) and have some way to scrape metrics (jmx, built in tooling) you’ll be fine.


For my personal projects, k8s is so useful that I wouldn't ever build a server by hand again. I can spin up my blog or whatever easily on one cluster, and if it becomes too expensive I can just move elsewhere, or if I want to reduce my costs, I could just run a single-node-cluster (I don't need HA) on a DO droplet or something and still get the ease of being able to destroy and rebuild my apps anytime I want to. It might be "overkill", but so are most of the tools I use each day. Of course, I never create my own clusters, but it isn't that hard to follow a tutorial if I had to.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: