“Let’s use Kubernetes.” Now you have eight problems

atonse · on March 5, 2020

The odd thing about having 20 years of experience (while simultaneously being wide-eyed about new tech), is that I now have enough confidence to read interesting posts (like any post on k8s) and not think "I HAVE to be doing this" – and rather think "good to know when I do need it."

Even for the highest scale app I've worked on (which was something like 20 requests per second, not silicon valley insane but more than average), we got by perfectly fine with 3 web servers behind a load balancer, hooked up to a hot-failover RDS instance. And we had 100% uptime in 3 years.

I feel things like Packer (allowing for deterministic construction of your server) and Terraform are a lot more necessary at any scale for generally good hygiene and disaster recovery.

hinkley · on March 5, 2020

I have, at various times in my career, tried to convince others that there is an awful, awful lot of stuff you can get done with a few copies of nginx.

The first “service mesh” I ever did was just nginx as a forward proxy on dev boxes, so we could reroute a few endpoints to new code for debugging purposes. And the first time I ever heard of Consul was in the context of automatically updating nginx upstreams for servers coming and going.

There is someone at work trying to finish up a large raft of work, and if I hadn’t had my wires crossed about a certain feature set being in nginx versus nginx Plus, I probably would have stopped the whole thing and suggested we just use nginx for it.

I think I have said this at work a few times but might have here as well: if nginx or haproxy could natively talk to Consul for upstream data, I’m not sure how much of this other stuff would have ever been necessary. And I kind of feel like Hashicorp missed a big opportunity there. Their DNS solution, while interesting, doesn’t compose well with other things, like putting a cache between your web server and the services.

I think we tried to use that DNS solution a while back and found that the DNS lookups were adding a few milliseconds to each call. Which might not sound like much except we have some endpoints that average 10ms. And with fanout, those milliseconds start to pile up.

rumanator · on March 6, 2020

> I think I have said this at work a few times but might have here as well: if nginx or haproxy could natively talk to Consul for upstream data, I’m not sure how much of this other stuff would have ever been necessary.

To be fair, half of the API Gateways and edge router projects out there are basically nginx with a custom consul-like service bolted on.

vidarh · on March 5, 2020

You can do get around the nginx Plus requirement by using a module like ngx_mruby to customize the backend selection. I haven't measured the latency, so it may not be suitable for your 10ms example.

Here's a post I wrote on that ~4 years ago that uses an in-process cache [1]. It'd be fairly easy to add an endpoint to update it and pull data from Consul. I agree with you, it's a missed opportunity - there are alternatives, but being able to rely on a battletested server like nginx makes a difference.

[1] http://hokstadconsulting.com/nginx/mruby-virtualhosts

saym · on March 6, 2020

As a fan of nginx, I really liked your comment. In sleuthing after reading I came across this:

https://learn.hashicorp.com/consul/integrations/nginx-consul...

It appears that if the consul client has the right permissions it can restart the nginx service after editing the configuration file. It uses the consul templating engine to generate an nginx config file.

I haven't tried it myself but it looks promising.

philsnow · on March 5, 2020

> if nginx or haproxy could natively talk to Consul for upstream data, I’m not sure how much of this other stuff would have ever been necessary

Airbnb's Smartstack works well for this. It's not built in to nginx as a module, but I think it's more composeable this way.

Blog post: https://medium.com/airbnb-engineering/smartstack-service-dis...

The two main components are nerve (health checking of services + writing a "I'm here and healthy" znode into zookeeper, https://github.com/airbnb/nerve) and synapse (subscribes to subtrees in zookeeper, updates nginx/haproxy/whatever configs with backend changes, and gracefully restarts the proxy, https://github.com/airbnb/synapse).

It's fairly pluggable too if you don't want to use haproxy/nginx.

aprdm · on March 6, 2020

Then you have a dependency on zookeeper when you already have consul... it seems like consul template + nginx or haproxy is the solution hashicorp went with.

atonse · on March 5, 2020

I totally agree, especially about being able to serve content out of a cache instead of files. It would simplify some of my configuration especially for static sites that point to a CDN.

I like what Caddy is doing, exposing their entire configuration through a REST interface.

aprdm · on March 6, 2020

You should check fabio ( https://github.com/fabiolb/fabio ), it is really awesome ).

I 100% agree with you, I've been using Consul for four years now to run 100s of services in 1000s of VMs across datacenters distributed globally and not once I saw the need for anything else...

Maybe I just don't have the scale to find service mesh or kubernetes interesting. Nomad however is something I am willing to give a go for stateless workflows that I would usually provision a VM running a single docker container for.

znpy · on March 6, 2020

> I have, at various times in my career, tried to convince others that there is an awful, awful lot of stuff you can get done with a few copies of nginx.

under the load point of view, yes. absolutely. no doubt.

under the speed of action, no way. if your k8s cluster is properly managed, you can let developers do most of the operations work themselves, confined into their namespaces, touching only the kind of resources that you tell them to touch.

takeda · on March 5, 2020

I personally would advise against using DNS for service discovery, it wasn't designed for that.

The few milliseconds that you get though, most likely is due to your local machine not having DNS caching configured, this is quite common in Linux. Because of that every connection triggers a request to DNS server. You can install unbound for example to do it. nscd or sssd can also be configured to do some caching.

imtringued · on March 6, 2020

> I personally would advise against using DNS for service discovery, it wasn't designed for that.

It was designed for that but the SRV record requires protocols and their clients to explicitly support it. You can argue that this an unreasonable design choice but load balancers like HAproxy do support SRV records.

totony · on March 5, 2020

Why is dns not used for service discovery? The internet as a whole uses it for service discovery currently

takeda · on March 6, 2020

Internet as a whole uses it to provide human friendly names.

I'm saying it is not good idea to use DNS for service discovery, there's a way of using it correctly, but it requires software to do the name resolution with service discovery in mind, and you're guaranteed that majority of your software doesn't work that way.

Why you shouldn't use DNS? It's because when you communicate over TCP/IP you need an address that's really the only thing you actually need.

If you use DNS for discovery you probably will set low TTL for the records, because you want to update them quickly, this means for every connection you make you will be checking DNS server providing extra load on the DNS server and adding latency when connecting.

On failure of a DNS server, even if you set a large TTL, you will see an immediate failure on your nodes the reason is that's how DNS cache works. Different clients made the DNS request at different time so the records will expire at different times. If you did not configure a local DNS cache on your hosts (most people don't) then you won't even cache the response and every connection request will go to a DNS server, so upon a failure everything is immediately down.

Compare this to have a service that edits (let say an HAProxy) configuration and populates it with IP addresses. If the source that provides the information goes down, you simply won't have updates during the time, but the HAProxy will continue forwarding requests to IPs (if you use IPs instead of hostnames, then you also won't be affected by DNS outages).

Now there are exceptions to this, certain software (mainly load balancers such as pgbouncer (I think HAProxy also added some dynamic name resolution)) use DNS with those limitations in mind. They basically query DNS service on the start to get IP and then periodically query it for changes, if there's a change it is being applied, if the DNS service is down they will keep the old values.

Since they don't throw away the IPs when a record expires, you don't have this kind of issues. Having said that, majority of software will use system resolver the way DNS was designed to work and will have these issues, and if you use DNS for service discovery, you, or someone in your company will use it with such service and you'll have the issues described above.

totony · on March 6, 2020

>Compare this to have a service that edits (let say an HAProxy) configuration and populates it with IP addresses.

Just edit the hosts file? If you have access to machines that run your code and can edit configuration, and also don't want the downsides of resolvers (pull-based instead of push-based updates, TTLs), DNS still seems like a better idea than some new stacks, plus you can push hosts files easily via ssh/ansible/basically any configuration management software

EDIT: The only issue I see with DNS as service discovery is that you can't specify ports. But usually software should use standard ports for their uses and that's never been a problem in my experience.

drybjed · on March 6, 2020

You can specify ports using SRV resource records.

totony · on March 6, 2020

You could but there's no integration for that that i know of so it'd be a bit of work to get working, which is why i didnt include it

imtringued · on March 6, 2020

https://www.haproxy.com/documentation/aloha/9-5/traffic-mana...

nitrogen · on March 6, 2020

It's how mdns works with Avahi/Bonjour/Zeroconf

aprdm · on March 6, 2020

This is interesting! Do you have some material on load testing the DNS servers and seeing their breaking point? I've heard as much from other people but never experienced it in practice even using Consul with 0 TTL everywhere.

Perhaps the network infrastructure team always scaled it correctly behind the scenes but they never once complained about the amount of DNS queries.

takeda · on March 6, 2020

DNS is fairly lightweight and if you have one local on premises, it might be less noticeable, especially if latency is not critical (in previous places I worked that was the setup, we still had a local cache on every host and I would encourage doing that, it increases resiliency). If latency is critical, not having a cache adds extra round trip on every connection initiated.

If you have hosts on public cloud and use DNS server that is also shared with others the latency typically might be bigger and on high number of requests you might also start seeing SERVFAIL on large number of requests.

I can't find the forum post anymore, but people who had applications that were opening large number of connections (bad design of the app imo, but still) had huge performance degradation when they moved from c4 to c5 instances. It turned out that this was because of the move from Xen to Nitro (based on KVM).

Side effect of using Xen was that the VM Host was actually caching DNS requests by itself, from which all guests benefited. In the KVM, all DNS requests were going directly to the DNS server.

notyourday · on March 5, 2020

> I think we tried to use that DNS solution a while back and found that the DNS lookups were adding a few milliseconds to each call. Which might not sound like much except we have some endpoints that average 10ms. And with fanout, those milliseconds start to pile up.

Don't resolve DNS inline rather on every DNS update, resolve it and insert new IP addresses.

hinkley · on March 5, 2020

Correct me if I'm wrong, but I believe Consul, lacking a mesh of its own, is leveraging the early 1990's era trick of using round robin DNS to split load over available servers.

Caching those values for very long subverts the point of the feature.

jen20 · on March 6, 2020

By way of correction: Consul does not simply "round robin" DNS requests unless you configure it in a particularly naive manner.

Prepared queries [1] and network tomography (which comes from the Serf underpinnings of the non-server agents) [2] allow for a much wider range of topologies just using DNS without requiring proxies (assuming well behaved client software, which is not a given by any stretch).

Furthermore, Consul _does_ have a mesh as around 2 years ago - [3].

You are correct though that long caches subvert much of the benefit.

[1]: https://www.consul.io/api/query.html

[2]: https://www.consul.io/docs/internals/coordinates.html

[3]: https://www.consul.io/docs/connect/index.html

notyourday · on March 5, 2020

Not really - resolve all backend servers to IPs and list all of them as the nginx backends. When a backend server is removed, update nginx backends.

Round-robin balancing using DNS towards a small cluster is silly - you know when any new instance is added to the pool or removed from a pool, so why not push that load balancing onto the load balancer which in your case is nginx?

hinkley · on March 5, 2020

You're talking about layering the thing on top of Consul that I already identified in my top level comment.

Consul itself advertises DNS resolution for service discovery.

notyourday · on March 5, 2020

Maybe I was not clear.

Whatever is the technology that you use to register the active backends in the DNS, rather than doing name => ip address lookup per request, you can resolve all those names => ip address maps upon the service being brought up/taken down and push the resolved map as a set of backends into nginx config, thus removing the need to query DNS per request.

lazyier · on March 5, 2020

Kubernetes has complexity for a reason. It's trying to solve complex problems in a standardized and mature manner.

If you don't need those problems solved then it's not going to benefit you a whole lot.

Of course if you are using docker already and are following best practices with containers then converting to Kubernetes really isn't that hard. So if you do end up needing more problems solved then you are willing to tackle on your own then switching over is going to be on the table.

The way I think about it is if you are struggling to deploy and manage the life cycle of your applications... fail overs, rolling updates, and you think you need some sort of session management like supervisord or something like to manage a cloud of processes and you find yourself trying to install and manage applications and services developed by third parties....

Then probably looking at Kubernetes is a good idea. Let K8s be your session manager, etc.

pavon · on March 5, 2020

I would qualify that a little more. If you are using docker and deploying to a cloud environment already, then moving to a cloud-managed kubernetes cluster really isn't that hard.

I've seen too many full-time employees eaten up by underestimating what it takes to deploy and maintain a kubernetes cluster. Their time would have been far better spent on other things.

mountainriver · on March 6, 2020

That story is getting a lot easier but historically was awful

throwsprtsdy · on March 5, 2020

What is a "mature manner"?

threeseed · on March 5, 2020

For example by covering all of the NFRs.

You don't always find open source programs that have dedicated so much effort to security, monitoring, governance etc. And doing so in a very professional and methodical way.

sciyoshi · on March 5, 2020

There's always more than one way to do things, and it's good to be aware of the trade-offs that different solutions provide. I've worked with systems like you describe in the past, and in my experience you always end up needing more complexity than you might think. First you need to learn Packer, or Terraform, or Salt, or Ansible - how do you pick one? How do you track changes to server configurations and manage server access? How do you do a rolling deploy of a new version - custom SSH scripts, or Fabric/Capistrano, or...? What about rolling back, or doing canary deployments, or...? How do you ensure that dev and CI environments are similar to production so that you don't run into errors from missing/incompatible C dependencies when you deploy? And so on.

K8s for us provides a nice, well-documented abstraction over these problems. For sure, there was definitely a learning curve and non-trivial setup time. Could we have done everything without it? Perhaps. But it has had its benefits - for example, being able to spin up new isolated testing environments within a few minutes with just a few lines of code.

jen20 · on March 6, 2020

> First you need to learn Packer, or Terraform, or Salt, or Ansible - how do you pick one?

You don't. These are complementary tools.

Packer builds images. Salt, Ansible, Puppet or Chef _could_ be used as part of this process, but so can shell scripts (and given the immutability of images in modern workflows, they are the best option).

Terraform can be used to deploy images as virtual machines, along with the supporting resources in the target deployment environment.

rumanator · on March 6, 2020

> Salt, Ansible, Puppet or Chef _could_ be used as part of this process, but so can shell scripts

I don't see the point of your post, and frankly sounds like nitpicking.

Ansible is a tool designed to execute scripts remotely through ssh on a collection of servers, and makes the job of writing those scripts trivially easy by a) offering a DSL to write those scripts as a workflow of idempotent operations, and b) offer a myriad of predefined tasks that you can simply add to your scripts and reuse.

Sure, you can write shell scripts to do the same thing. But that's a far lower level solution to a common problem, and one that is far hardsr and requires far more man*hours to implement and maintain.

With Ansible you only need to write a list of servers, ensure you can ssh into them, and write a high-level description of your workflow as idempotent tasks. It takes you literally a couple of minutes to pull this off. How much time would you take to do the same with your shell scripts?

mixmastamyk · on March 6, 2020

As he mentioned, immutable images make those type of tools largely moot.

closeparen · on March 5, 2020

Yes. Three web servers and a load balancer is fine. Three web servers and a load balancer, repeated 1,000 times across the enterprise in idiosyncratic ways and from scratch each time, is less fine. That’s where Kubernetes-shaped solutions (like Mesos that came before it) become appropriate.

You can get a lot done with a sailboat. For certain kinds of problems you might genuinely need an aircraft carrier. But then you’d better have a navy. Don’t just wander onto the bridge and start pressing buttons.

mmazing · on March 5, 2020

You're right.

However, a lot of new (or just bad) devs miss the whole Keep It Simple Stupid concept and think that they NEED Kubernetes-shaped solutions in order to "do it the right way".

Many times three web servers and a load balancer are exactly what you need.

threeseed · on March 5, 2020

And then what about when you need to add monitoring, logging, APM, tracing, application metrics etc.

Suddenly you have gone from 3 instances to 20.

mmazing · on March 5, 2020

Adding an entirely new instance is not the only way to accomplish each of those things. A lot of those things can be treated just like applications. You don't need a whole new computer to run Outlook, another computer to run Sublime Text, another computer to run Chrome, etc etc.

All of that is irrelevant to my main point though. It's never one size fits all and then all your problems are solved.

You are far better off actually assessing your needs and picking the right solution instead of relying on solutions that "worked for bigger companies so they'll work for me" without really giving it a lot of thought if you need to go that far.

rumanator · on March 7, 2020

> Adding an entirely new instance is not the only way to accomplish each of those things. A lot of those things can be treated just like applications.

That's what containers are. Containers are applications, packaged to be easily deployable and ran as contained processes. That's it.

Kubernetes is just a tool to run containers in a cluster of COTS hardware/VMs.

I've said it once and will say it again: the testament of Kubernetes is simplify so much the problem of deploying and managing applications in clusters of heterogeneous hardware communicating through software-defined networks thay it enable clueless people to form mental models of how the system operates that are so simple that they actually believe the problem is trivial to solve.

mmazing · on March 7, 2020

Again .... that might be the best solution for your company, or it might be introducing complexity where it's not actually needed.

It all depends on the situation and needs of whatever problem you are trying to solve.

threeseed · on March 5, 2020

Almost all of those aren't single binaries like Chrome.

They often have their own databases, search engines, services etc to deploy along with it. And necessitate multiple instances for scalability and redundancy.

jbergens · on March 6, 2020

They can be shared across a company. Office one team handels those all other teams can just use them.

bdibs · on March 5, 2020

I personally went from 4 servers to 5, self hosting ELK and it works great.

aganame · on March 6, 2020

Aren't those rather trivial to do if you didn't make the mistake of choosing a microservice architecture?

mountainriver · on March 6, 2020

I get the point but I’ve also seen three web servers and a load balancer go terribly wrong at a number of places as well. L8s provides a lot of portability that you would need a disciplined and experienced engineer to match with raw deployments.

raghava · on March 6, 2020

> Many times three web servers and a load balancer are exactly what you need.

May be, just may be, they want k8s not to create value but to develop/enrich resumes - in order to signal that they are smart and can do complex stuff.

moksly · on March 5, 2020

I think anyone considering these wild setups should read about how stackowerflow is hosted on a couple of IIS servers. It’s a sobering reminder that you often don’t need the new cool.

danielovichdk · on March 5, 2020

Joel, if any, has always been super pragmatic and very realistic.

Not to misunderstand. For FogBugz they wrote a compiler/transpiler for Asp and PHP because the product had to run on customers servers - because "clients should not leave their private data at another company".

Google it, great read.

tommica · on March 6, 2020

Tried to google for it, couldn't find anything - could you provide a link, please?

toyg · on March 6, 2020

For the details on that transpiler: https://www.joelonsoftware.com/2005/03/30/the-road-to-fogbug...

I would recommend going through all of Joel Spolsky’s posts between 2000 and 2010, there are plenty of absolute diamonds. Part of why StackOverflow was so successful was because Joel had built a big audience of geeks and entrepreneurs with his excellent blog posts (he was the Excel PM during the creation of VBA and had plenty of accrued wisdom to share), so they adopted SO almost instantaneously when him and Jeff Atwood built it.

08-15 · on March 7, 2020

"“don’t you mean translator?“

Let me explain.

In computer science jargon a translator IS a compiler. It’s exactly the same thing. Those are synonyms."

Every time someone says "transpiler", god kills a kitten. Please, think of the kittens.

rumanator · on March 7, 2020

> I think anyone considering these wild setups should read about how stackowerflow is hosted on a couple of IIS servers.

Apparently in 2019 stack overflow was hosted in at least 25 servers, including 4 servers dedicated to run haproxy.

https://meta.stackexchange.com/questions/10369/which-tools-a...

jiofih · on March 5, 2020

Nothing much happens if SO goes down, they are not doing a ton of business transactions.

pnako · on March 6, 2020

That's right. As opposed to stock exchange software, which runs on complex micro-services cloud k8s thing-as-a-thing virtualized rotozooming engines. Wait, no, it's literally just good-old n-tier architecture with one big server process and some database backend.

Pet food delivery startups use k8s to manage their MEAN stack. Meanwhile grown-ups still have "monoliths" connected to something like Oracle, DB2 or MS SQL server, because that's obviously the most reliable setup.

The cloud/k8s stuff is an ad-hoc wannabe mainframe built on shaky foundations.

rumanator · on March 7, 2020

> Meanwhile grown-ups still have "monoliths" connected to something like Oracle, DB2 or MS SQL server, because that's obviously the most reliable setup.

More often than not they just crystalized their 90s knowledge and just pretended there aren't better tools for the job because it would take some work to adopt them and no one notices it in their work anyway.

The "Oracle" keyword is a telltale sign.

fishnchips · on March 5, 2020

Nothing much? The world stops producing software:)

sgt · on March 5, 2020

He's got a point though. Ironically the world's most highly scaled software are often fairly unimportant. Think Facebook - people would get annoyed if it went down for a day, but it'd soon be forgotten. Your banks are built with less scalable software but is much more critical.

FreeHugs · on March 5, 2020

A typical PHP application that does a bit of database updating per request, gets some new data from the DB and templates it should handle 20 requests per second on a single $20/month VM. And in my experience from the last years, VMs have uptime >99.99% these days.

What made you settle on a multi-machine setup instead? Was it to reach higher uptime or were you processing very heavy computations per request?

atonse · on March 5, 2020

Higher uptime and more computation, although this was mostly C# so very efficiently run code. It was an e-commerce site doing +100MM a year.

There was little to no room for error. I once introduced a bug in a commit that, in less than an hour, cost us $40,000. So it wasn't about performance.

Also this was 9 years ago. So adjust for performance characteristics from back then.

dwild · on March 5, 2020

100 000 000$ a year with only 20 requests a second? That's some crazy revenue per request, 100 000 000 / (365 * 24 * 60 * 60) = 3.17$ per request!

What were you selling?

atonse · on March 5, 2020

Insurance policies :)

Good point, actually the 100MM may have included brick and mortar.

mytherin · on March 5, 2020

Forgot to multiply by 20, it's actually around $0.15 per request. Still high but certainly not as crazy.

btilly · on March 5, 2020

I worked on a site 20 years ago that got even better revenue per request (though with fewer requests).

It did analytics on bond deals. Cost $1k/month for an account. Minimum 3 accounts. Median logins, ~1/month/account.

On the other hand people would login because they were about to trade $10-100 million of bonds. So knowing what the price should be really, really mattered.

Wall St can be a funny place.

dwild · on March 6, 2020

That's quite interesting! That's why I asked, I wasn't exactly doubting the figure, just curious about the market that allowed that kind of customer targeting.

aaron_m04 · on March 5, 2020

It should be (365 * 24 * 60 * 60 * 20) which brings it to $0.15/req if I did my mental math right. Still a high amount of course.

dwild · on March 5, 2020

Oh thanks, I actually did the math right the first time, kept only the result and then when I was about to hit send I thought it was too good to be true, thus I did it again and got 3.15$, which was even crazier, but couldn't find why my math was wrong.

Damogran6 · on March 5, 2020

A single server isn't redundant. 3 behind a load balancer, where each is sized to handle 50% of the volume lets you take systems offline for maintenance without incurring downtime.

Heck, Raspberry Pis have more horsepower than the webservers in the cluster I ran around Y2k.

_klm7 · on March 6, 2020

For performance context: nginx on a $5 DO droplet does 600 requests per second on static file.

Serving static files with Elixir/Phoenix has a performance of 300 requests per second.

Python+gunicorn serves about 100 requests per second of JSON from postgres data.

gravitas · on March 5, 2020

(outside of GP's reply) Generically, life is messy and unpredictable, never put all your eggs in one basket. Your cloud server is sitting on a physical hyp which will need maintenance or go down, or even something in your VM goes wrong or needs maintenance. Using a basic N+1 architecture allows for A to go down and B to keep running while you work on A - whether that's DNS, HTTP or SQL etc.

FreeHugs · on March 5, 2020

If your physical hyp dies, how do you redirect your traffic to a different one?

gravitas · on March 5, 2020

Replace "your" with "the" - the hyp can be run by your provider (Linode, DO, Vulture, AWS, GKE, whoever). Most cloud providers have virtual/shared/managed load balancers to rent time on as well, such that you don't have to maintain N+1 of those (let them do it). You could even use basic round-robin DNS, it's a possible choice just not generally suggested.

FreeHugs · on March 5, 2020

For the load balancer solution, a lot more is needed then to just rent a load balancer.

Example: What do you expect to happen when the server with your DB goes down? Just send the next UPDATE/INSERT/DELETE to DBserver2? Which is replicated from the DBserver1? When DBserver1 comes back, how does it know that it now is outdated and has to sync from DBserver2? How does the load balancer know if DBserver1 is synced again and ready to take requests?

Even if you set up all moving parts of your system in a way that handles random machine outtages: Now the load balancer is your single point of failure. What do you do if it goes down?

gravitas · on March 5, 2020

Respectfully, I am not designing a full architecture here in HN comments and have not presented a full HA solution for you to pick apart. Your leading questions seemed basic, you received basic answers - going down the rabbit hole like this is just out of left field for this comment thread.

vajrabum · on March 5, 2020

Load balancing--and yes that can become a source of failure too.

NewJazz · on March 5, 2020

Two load balancers using either keepalived or bgp anycast.

fxtentacle · on March 5, 2020

The odd thing about having 10 years of experience as a consultant is that you know when to write "Kubernetes" into a project proposal, even though everyone agrees that it'll be a sub-optimal solution.

But both you and their tech lead want to be able to write "used Kubernetes" on your CV in the future, plus future-oriented initiatives inside your contact's company tend to get more budget allocated to them. So it's a sound decision for everyone and for the health of the project to just go with whatever tech is fancy enough, but won't get into the way too badly.

Enter Kubernetes, the fashionable Docker upgrade that you won't regret too badly ;)

bitL · on March 5, 2020

I worked on a transacted 20-60k messages/s system and am not sure K8S wouldn't be a hindrance there... Imagine writing Kafka using K8S and microservices.

cmhnn · on March 5, 2020

I don't know about "lot more necessary". The images are one part of the equation especially to meet various regulations. There is a ton to running a large scale service especially if you are the service that the people who are posting how wicked smart they are at k8 service runs on. Google found that out yesterday when they said "oh hey people expect support maybe we should charge". That is not new for grown ups.

The cloud existed before k8 and k8's creator has a far less mature cloud than AWS or Azure.

But this thread has convinced me of one thing. It's time to re-cloak and never post again because even though the community is a cut above some others at the end of the day it's still a bunch of marks and if you know the inside it is hard to bite your lip.

Ill_ban_myself · on March 6, 2020

The most important video I've ever watched for my career and sanity: https://www.youtube.com/watch?v=bzkRVzciAZg

JMTQp8lwXL · on March 5, 2020

Who is giving you 100% uptime? All major providers (AWS, GCP, Azure, etc) all have had outages in the past 3 years. And that level of infrastructural failure doesn't care about whether or not you're using k8s.

threeseed · on March 5, 2020

EKS (AWS Kubernetes) has the control plane across 3 AZs.

It is very rare to have a complete region outage so it is pretty close to 100% uptime.

barbazoo · on March 5, 2020

Not OP but you can achieve higher availability through means like redundancy, availability zones, multi-region deployments, etc.

echelon · on March 5, 2020

> Even for the highest scale app I've worked on (which was something like 20 requests per second,

Kubernetes is not for you. 5kQPS times a hundred or more services and Kubernetes fits the bill.

> And we had 100% uptime in 3 years.

Not a single request failed in that time serving at 20 QPS? I'm a little suspicious.

Regardless, if you were handling 10 or 100 times this volume to a single service, you'd want additional systems in place to assure hitless deploys.

rumanator · on March 7, 2020

> Not a single request failed in that time serving at 20 QPS? I'm a little suspicious.

Things that aren't monitored are also things that don't fail.

wpietri · on March 5, 2020

Same. I like trying out new things, so I have a feel for what they're good for. I tried setting up Kubernetes for my home services and pretty quickly got to "nope!" As the article says, it surely makes sense at Google's scale. But it has a large cognitive and operational burden. Too large, I'd say, for most one-team projects.

Damogran6 · on March 5, 2020

I'm in a similar boat, only my eyes are wide, glazed over, and I'm lost in the word salad...which only seems to be getting worse.

jorams · on March 5, 2020

These kinds of posts always focus on the complexity of running k8s, the large amount of concepts it has, the lack of a need to scale, and that there is a "wide variety of tools" that can replace it, but the advice never seems to become more concrete.

We are running a relatively small system on k8s. The cluster contains just a few nodes, a couple of which are serving web traffic and a variable number of others that are running background workers. The number of background workers is scaled up based on the amount of work to be done, then scaled down once no longer necessary. Some cronjobs trigger every once in a while.

It runs on GKE.

All of this could run on anything that runs containers, and the scaling could probably be replaced by a single beefy server. In fact, we can run all of this on a single developer machine if there is no load.

The following k8s concepts are currently visible to us developers: Pod, Deployment, Job, CronJob, Service, Ingress, ConfigMap, Secret. The hardest one to understand is Ingress, because it is mapped to a GCE load balancer. All the rest is predictable and easy to grasp. I know k8s is a monster to run, but none of us have to deal with that part at all.

Running on GKE gives us the following things, in addition to just running it all, without any effort on our part: centralized logging, centralized monitoring with alerts, rolling deployments with easy rollbacks, automatic VM scaling, automatic VM upgrades.

How would we replace GKE in this equation? what would we have to give up? What new tools and concepts would we need to learn? How much of those would be vendor-specific?

If anyone has a solution that is actually simpler and just as easy to set up, I'm very much interested.

envoked · on March 5, 2020

I'm in the same camp. I think a lot of these anti-k8s articles are written by software developers who haven't really been exposed to the world of SRE and mostly think in terms of web servers.

A few years ago I joined a startup where everything (including the db) was running on one, not-backed-up, non-reproducible, VM. In the process of "productionizing" I ran into a lot of open questions: How do we handle deploys with potentially updated system dependencies? Where should we store secrets (not the repo)? How do we manage/deploy cronjobs? How do internal services communicate? All things a dedicated SRE team managed in my previous role.

GKE offered a solution to each of those problems while allowing me to still focus on application development. There's definitely been some growing pains (prematurely trying to run our infra on ephemeral nodes) but for the most part, it's provided a solid foundation without much effort.

mslm · on March 5, 2020

Exactly, all these articles seem to come from operational novices, who think in terms of 1-2 click solutions. K8s is not a 1-2 click solution, and clearly isn't designed to be; it's solving particular tough operational problems that if you don't know exist in the first place you won't really be able to evaluate these kinds of things properly.

If a group literally doesn't have the need to answer questions like the ones you posed, then OK, don't bother with these tools. But that's all that needs to be said - no need for a new article every week on it.

collyw · on March 5, 2020

> it's solving particular tough operational problems that if you don't know exist in the first place

They probably don't exist for the majority of people using it. We are using k8s for when we need to scale, but at the moment we have a handful of customers and it isn't changing quickly any time soon.

ownagefool · on March 5, 2020

They basically exist for everyone.

As soon as you go down the road of actually doing infrastructure-as-code, using (not running) k8s is probably as good as any other solution, and arguably better than most when you grow into anything complex.

Most of the complaints are false equivalence: i.e. running k8s is harder than just using AWS, which I already know. Of course it is. You don't manage AWS. How big do you think their code base is?

If you don't know k8s already, and you're a start-up looking for a niche, maybe now isn't the time to learn k8s, at least not from the business point of view (personal growth, another issue).

But when you do know k8s, it makes a lot of sense to just rent a cluster and put your app there, because when you want to build better tests, it's easy, when you want to do zero trust, it's easy, when you want to integrate with vault, it's easy, when you want to encrypt, it's easy, when you want to add a mesh for tracing, metrics and maybe auth, it's easy.

What's not easy is inheriting a similarly done product that's entirely bespoke.

collyw · on March 5, 2020

No the complaint is that we aren't scaling to google levels, in fact we are barely scaling at all. K8s isn't needed.

We ran applications without it fine a few years ago. And it was a lot simpler.

ownagefool · on March 6, 2020

"Fine"

As in, doesn't get hacked or doesn't go down? We live in different worlds.

jen20 · on March 6, 2020

> Of course it is. You don't manage AWS. How big do you think their code base is?

This seems like a fairly unreasonable comparison. The reason I pay AWS is so that I _do not_ have to manage it. The last thing I want to do is then layer a system on top that I do have to manage.

ownagefool · on March 6, 2020

https://www.digitalocean.com/products/kubernetes/

https://cloud.google.com/kubernetes-engine/

???

Spooky23 · on March 5, 2020

Remember that not everyone is using an operational model that would benefit from K8s.

013a · on March 5, 2020

Yet, they write articles that prescribe their operational model on the rest of the world.

bcrosby95 · on March 5, 2020

Problem is hidden assumptions. Happens a lot with microservices too. People write about the problems they're solving somewhat vaguely and other people read it and due to that vagueness think it also is the best solution to their problem.

Spooky23 · on March 5, 2020

They are engineers, writing about what they do and what their companies sell. I wouldn't ascribe an "imposition" to that!

As a practitioner or manager, you need to make informed choices. Deploying a technology and spending the company's money on the whim of some developer is an example of immaturity.

ex_amazon_sde · on March 5, 2020

> I think a lot of these anti-k8s articles are written by software developers who haven't really been exposed to the world of SRE ...

Think again. There's plenty of SREs at FAANGs that dislike the unnecessary complexity of k8s, docker and most "hip" devops stuff.

p_l · on March 5, 2020

From my knowledge, pretty much all FAANGs got their own tooling in place before k8s went public, covering those same issues.

Now imagine you have to do it from scratch.

honkycat · on March 5, 2020

Agreed. I've been saying for years that if you go with Docker-Compose, or Amazon ECS, or something lower level, you are just going to end up rebuilding a shittier version of Kubernetes.

I think the real alternative is Heroku or running on VMs, but then you do not get service discovery, or a cloud agnostic API for querying running services, or automatic restarts, or rolling updates, or encrypted secrets, or automatic log aggregation, or plug-and-play monitoring, or VM scaling, or an EXCELLENT decoupled solution for deploying my applications ( keel.sh ), or liveness and readiness probes...

But nobody needs those things right?

sk5t · on March 5, 2020

You do in fact get a lot of this stuff with ECS and Fargate - rolling updates, automatic restart, log aggregation, auto scaling, some discovery bits, healthchecks, Secrets Manager or Parameter Store if you want, etc.

throwaway894345 · on March 5, 2020

This. We’ve been chugging along happily on Fargate for a while now. We looked into EKS, but there is a ton of stuff you have to do manually that is built into Fargate or at least Fargate makes it trivial to integrate with other AWS services (logging, monitoring, IAM integration, cert manager, load balancers, secrets management, etc). I’d like to use Kubernetes, but Fargate just works out of the box. Fortunately, AWS seems to be making EKS better all the time.

Logishort · on March 5, 2020

And what if Amazon feels like changing Fargate or charging differently or deprecating it or... Whats your strategy for that?

pjmlp · on March 5, 2020

We think about it when it happens, if it ever happens.

I have seen too many projects burn money with vendor independence abstraction layers that were never relevant in production.

throwaway894345 · on March 5, 2020

Like others have said, cross that bridge if/when we get to it--there's no sense in incurring the Kubernetes cost now because one day we might have to switch to Kubernetes anyway.

It's also worth noting that Fargate has actually gotten considerably cheaper since we started using it, probably because of the firecracker VM technology. I'm pretty happy with Fargate.

sk5t · on March 5, 2020

The fallback is to use something else less convenient--maybe k8s, nomad, some other provider's container-runner-load-balancer-thing.

russellendicott · on March 5, 2020

I'm pretty sure AWS has a better track record than Google when it comes to keeping old crud alive for the benefit of its customers.

In my experience AWS generally gives at least a year's notice before killing something or they offer something better that's easy to move to well in advance of killing the old.

Hell, they _still_ support non-VPC accounts...

Legogris · on March 5, 2020

And suddenly you have a MUCH higher degree of vendor lock-in than on k8s.

eropple · on March 5, 2020

The dirty secret of virtually every k8s setup--every cloud setup--is that the cloud providers' stuff is simply too good, while at the same time being just different enough from one another, that the impedance mismatch will kill you when you attempt to go to multi-cloud unless you reinvent every wheel (which you will almost certainly do poorly) or use a minimal feature set of anything you choose to bring in (which reduces the value of bringing those things in in the first place).

"Vendor lock-in" is guaranteed in any environment to such a degree that every single attempt at a multi-cloud setup that I've ever seen or consulted on has proven to be more expensive for no meaningful benefit.

It is a sucker's bet unless you are already at eye-popping scale, and if you're at eye-popping scale you probably have other legacy concerns in place already, too.

Legogris · on March 5, 2020

I'm not saying you'd be able to migrate without changes, but having moved workloads from managed k8s to bare-metal and having some experience with ECS and Fargate, I can tell you that the scale of disparities is significant.

eropple · on March 5, 2020

To me the disparities aren't "how you run your containers". They're "how you resolve the impedance mismatch between managed services that replace systems you have to babysit." Even something as "simple" as SQS is really, really hard to replicate, and attempting to use other cloud providers' queueing systems has impedance mismatch between each other (ditto AMQP, etc., and when you go that route that's just one more thing you have to manage).

Running your applications was a solved problem long before k8s showed up.

tylerl · on March 5, 2020

This.

The whole point of k8s, the reason Google wrote it to begin with, was to commoditize the management space and make vendor lock-in difficult to justify. It's the classic market underdog move, but executed brilliantly.

Going with a cloud provider's proprietary management solution gives you generally a worse overall experience than k8s (or at least no better), which means AWS and Azure are obliged to focus on improving their hosted k8s offering or risk losing market share.

Plus, you can't "embrace and extend" k8s into something proprietary without destroying a lot of it's core usability. So it becomes nearly impossible to create a vendor lock-in strategy that customers will accept.

archisgore · on March 5, 2020

Amen to that. We use ECS to run some millions of containers on thousands of spot nodes for opportunistic jobs, and some little Fargate for where we need uptime. It's a LOT less to worry about.

dilyevsky · on March 5, 2020

Pesky monitoring and logging just makes you notice bugs more and get distracted from making the world a better place

NicoJuicy · on March 5, 2020

Why would docker be shittier? It's easier and flexible.

Eg. Sidecar pattern resolves most things (eg. logging)

rclayton · on March 5, 2020

1000%. If you take a little bit of time to learn k8s and run it on a hosted environment (e.g. EKS), it’s a fantastic solution. We are much happier with it than ECS, Elastic Beanstalk, etc.

AlchemistCamp · on March 5, 2020

> If anyone has a solution that is actually simpler and just as easy to set up, I'm very much interested.

Sure. @levelsio runs Nomad List (~100k MRR) all on a single Linode VPS. He uses their automated backups service, but it's a simple setup. No k8s, no Docker, just some PHP running on a Linux server.

As I understand it, he was learning to code as he built his businesses.

https://twitter.com/levelsio/status/1177562806192730113

icedchai · on March 5, 2020

Many startups are not as resource efficient. The ones I am familiar with spend 50%+ of their MRR on cloud costs.

p_l · on March 5, 2020

My first production k8s use was actually due to resource efficiencies. We looked at the >50 applications we had to serve, the technical debt that would lead us down the road of building customized distribution to avoid package requirements incompabilties, and various other things, and decided to go with containers - just to avoid the conflicts involved.

Thanks to k8s, we generally keep to 1/5th of the original cost, thanks to bin packing of servers, and sleep sounder thanks to automatic restarts of failed pods, ability to easily allocate computing resources per container, globall configure load balancing (we had to scratch use of cloud-provider's load balancer because our number of paths was too big for URL mapping API).

Everything can be moved to pretty much every k8s hosting that runs 1.15, biggest difference would be hooking the load balancer to the external network and possibly storage.

AlchemistCamp · on March 5, 2020

That's very surprising to me! As an employee at multiple startups in multiple countries, I've never seen cloud costs anywhere near payroll.

icedchai · on March 5, 2020

Well, I said 50% of monthly revenue, not total costs.

Glyptodon · on March 5, 2020

You're asking exactly what I've been wondering. The answer in this thread this so far has been "maybe dokku." I can totally buy that K8s is overkill, but the alternatives for small scale operators seem to require a lot more work internally than just using a K8s service.

avereveard · on March 5, 2020

same we use kubernet not for scale but for the environment repeatability. we can spin any branch at any time on any provider and be sure it's exactly as we have it in production down to networking and routing, to built that out of plain devops tools would require a lot more in depth knowledge and it won't port exactly one provider to another

VvR-Ox · on March 5, 2020

> These kinds of posts always focus on the complexity of running k8s...

Yes but that is also the worst already that you could criticize about k8s.

Complexity is dangerous because if things are growing beyond a certain threshold X you will have side effects that nobody can predict, a very steep learning curve and therefor many people screwing up something in their (first) setups as well as maintainability nightmares.

Probably some day someone will prove me wrong but right now one of my biggest goals to improve security, reliability and people being able to contribute is reducing complexity.

After all this is what many of us do when they refactor systems.

I am sticking with the UNIX philosophy at this point and in the foreseeable future I will not have a big dev-team at my disposal as companies like Alphabet have to maintain and safe-guard all of this complexity.

hedora · on March 5, 2020

From a developer perspective, k8s seems ripe for disruption.

It does a bunch of junk that is trivial to accomplish on one machine - open network ports, keep services running, log stuff, run in a jail with dropped privileges, and set proper file permissions on secret files.

The mechanisms for all of this, and for resource management, are transparent to unix developers, but in kubernetes they are not. Instead, you have to understand a architectural spaghetti torrent to write and execute “hello world”.

It used to be similar with RDBMS systems. It took months and a highly paid consultant to get a working SQL install. Then, you’d hire a team to manage the database, not because the hardware was expensive, but because you’d dropped $100k’s (in 90’s dollars) on the installation process.

Then mysql came along, and it didn’t have durability or transactions, but it let you be up and running in a few hours, and have a dynamic web page a few hours after that. If it died, you only lost a few hours or minutes of transactions, assuming somone in your organization spent an afternoon learning cron and mysqldump.

I imagine someone will get sufficiently fed up with k8s to do the same. There is clearly demand. I wish them luck.

p_l · on March 5, 2020

It seems to be. When you start implementing something, you'll soon find that most of that complexity is inherent, not accidental. Been there, done that, didn't even get the T-Shirt.

Today it's much easier to package nicer API on top of the rather generic k8s one. There are ways to deploy it easier (in fact, I'd wager that a lot of complexity in deploying k8s is accidental due to deploy tools themselves, not k8s itself. Just look at OpenShift deploy scripts...)

rclayton · on March 5, 2020

Exactly - I deployed a k8s cluster with eksctl in about 10 min. Deployed all my services within another 10min (plain Helm charts).

hrgiger · on March 5, 2020

This, kubernetes is container orchestration framework. If you know your needs then you have opportunity to make a better decision and planning. I had similar experience with ingress also I would like to add installing Kubernetes on bare metal is a pain in the neck, even with kubespray.

Here is a comparison with other frameworks, from 2018: https://arxiv.org/pdf/2002.02806.pdf

geggam · on March 5, 2020

Running k8s on GKE is the only rational choice for anyone who isnt a large enterprise with a team of Google SREs to support k8s.

Period.

halbritt · on March 5, 2020

I'm curious how, with your experience, that you came to the conclusion that k8s is a monster to run?

rawoke083600 · on March 5, 2020

"Pod, Deployment, Job, CronJob, Service, Ingress, ConfigMap, Secre"

Wow as a new developer coming onboard your company, I will walk out the door after seeing that, and the fact that you admit its a small serivce.

jorams · on March 5, 2020

The names may seem daunting, but here's what they do: a Pod is one or more containers, usually 1, that run an application. A Deployment manages a number of the same pods, making sure the right number is running at all times and rolling out new versions. A Job runs a pod until it succeeds. A CronJob creates a job on a schedule. A service provides a DNS name for pods of a certain kind, automatically load balancing between multiple of them. An Ingress accepts traffic from the outside and routes it to services. A ConfigMap and a Secret are both just sets of variables to be used by other things, similar to config files but in the cluster.

It's a small service according to " web scale", but it's serving, and vital for, a good number of customers.

eska · on March 5, 2020

People seem to hate learning vocabulary for concepts that they are already dealing with. Personally I love having a common vocabulary to have precise conversations with other developers.

hedora · on March 5, 2020

It’s more than that. People hate learning concepts that should be abstracted away from then.

As an example, why can’t ConfigMap and Secret just be plain files that get written to a well known location (like /etc)?

Why should the application need to do anything special to run in kubernetes? If they are just files, then why do they have a new name? (And, unless they’re in /etc, why aren’t they placed in a standards-compliant location?)

If they meet all my criteria, then just call them configuration files. If they don’t, then they are a usability problem for kubernetes.

kube-system · on March 5, 2020

ConfigMaps and Secrets can hold multiple files, or something that isn't a file at all, and your application doesn't have to do anything special to access them. You define how they configure your application in the pod/deployment definition. They solve a problem not previously solved by a single resource.

Maybe you don't personally find value in the abstraction, but there are certainly people who do find it useful to have a single resource that can contain the entire configuration for a application/service.

curryst · on March 5, 2020

They can be files on the disk. They can also be used as variables that get interpolated into CLI args. What they really are is an unstructured document stored in etcd that can be consumed in a variety of ways.

As the other user said, they can also be multiple files. I.e. if I run Redis inside my pod, I can bundle my app config and the Redis config into a single ConfigMap. Or if you're doing TLS inside your pod, you can put both the cert and key inside a single Secret.

The semantics of using it correctly are different, somewhat. But you can also use a naive approach and put one file per secret/ConfigMap; that is allowed.

nrb · on March 5, 2020

It's an afternoon worth of research to understand the basic concepts. Then, with the powerful and intuitive tooling you can spin up your own cluster on your computer in minutes and practice deploying containers that:

- are automatically assigned to an appropriate machine(node) based on explicit resource limits you define, enabling reliable performance

- horizontally scale (even automatically if you want!)

- can be deployed with a rolling update strategy to preserve uptime during deployments

- can rollback with swiftness and ease

- have liveness checks that restart unhealthy apps(pods) automatically and prevent bad deploys from being widely released

- abstracts away your infrastructure, allowing these exact same configs to power a cluster on-prem, in the cloud on bare metal or vms, with a hosted k8s service, or some combination of all of them

All of that functionality is unlocked with just a few lines of config or kubectl command, and there are tools that abstract this stuff to simplify it even more or automate more of it.

You definitely want some experienced people around to avoid some of the footguns and shortcuts but over the last several years I think k8s has easily proven itself as a substantial net-positive for many shops.

scarface74 · on March 5, 2020

So why should I do all of that instead of throwing a little money at AWS, run ECS and actually spend my time creating my product?

Heck, if my needs are simple enough why should I even use ECS instead of just putting my web app on some VM's in an auto-scaling group behind a load balancer and used managed services?

Legogris · on March 5, 2020

I don't think anyone is arguing that you should use k8s for a simple web app. There's definitely some inherent stack complexity threshold before solutions like k8s/mesos/nomad are warranted.

When you start having several services that need to fail and scale independently, some amount of job scheduling, request routing... You're going to appreciate the frameworks put in place.

My best advice is to containerize everything from the start, and then you can start barebones and start looking at orchestration systems when you actually have a need for it.

scarface74 · on March 5, 2020

If you need to fail and scale independently.

-you can use small EC2 instances behind an application load balancer and within autoscaling groups with host based routing for request routing.

- converting a stand-alone api to a container is not rocket science and nor should it require any code rewrite.

- if you need to run scheduled Docker containers that can also be done with ECS or if it is simple enough lambda.

- the first thing you should worry about is not “containerization”. Its getting product market fit.

As far as needing containerization for orchestration, you don’t need that either. You mentioned Nomad. Nomad can orchestrate anything - containers, executables, etc.

Not to mention a combination of Consul/Nomad is dead simple. Not saying I would recommend it. In most cases (I’ve used it before), but only because the community and ecosystem is smaller. But if you’re a startup, you should probably be using AWS or Azure anyway so you don’t have to worry about the infrastructure.

geggam · on March 5, 2020

What requirement is driving containers ?

How are you managing your infrastructure and if you have that already automated how much effort is it to add the software you develop to that automation vs the ROI of adding another layer of complexity ?

The idea everything needs to be in containers is similar to the idea everything needs to be in k8s.

Let the business needs drive the technology choices don't drive the business with the technology choices

Legogris · on March 5, 2020

Portability - the idea being that you can migrate to an orchestration technology that makes sense when you have the need. The cost and effort of containerizing any single service from the get-go should be minimal. It also helps a lot with reproducibility, local testing, tests in CI, etc.

Valid reasons to not run containerized in production can be specific security restrictions or performance requirements. I could line up several things that are not suitable for containers, but if you're in a position of "simple but growing web app that doesn't really warrant kubernetes right now" (the comment I was replying to), I think it's good rule of thumb.

I agree with your main argument, of course.

geggam · on March 5, 2020

The overhead managing a container ecosystem to run production is not trivial. IF you are doing this in a service then by all means leverage that package methodology.

If you are managing your systems who already have a robust package management layer then adding the container stacks on top of managing the OS layers you have just doubled the systems your operations team is managing.

Containers also bring NAT and all sorts of DNS / DHCP issues that require extremely senior well rounded guys to manage.

Developers dont see this complexity and think containers are great.

Effectively containers moves the complexity of managing source code into infrastructure where you have to manage that complexity.

The tools to manage source code are mature. The tools to manage complex infrastructure are not mature and the people with the skills required to do so ... are rare.

Legogris · on March 5, 2020

> If you are managing your systems who already have a robust package management layer then adding the container stacks on top of managing the OS layers you have just doubled the systems your operations team is managing.

Oh yeah, if you're not building the software in-house it's a lot less clear that "Containerizate Everything!" is the answer every time. Though there are stable helm charts for a lot of the commonly used software out there, do whatever works for you, man ;)

> Containers also bring NAT and all sorts of DNS / DHCP issues that require extremely senior well rounded guys to manage.

I mean, at that point you can just run with host mode networking and it's all the same, no?

scarface74 · on March 5, 2020

Or you can just use ECS/Fargate and each container registers itself to Route53 and you can just use DNS...

scarface74 · on March 5, 2020

Out of all of the business concerns that a startup or basically any company has, “cloud lock in” is at the bottom of the list.

scarface74 · on March 5, 2020

Creating the infrastructure is using your infrastructure as code framework of choice - Terraform or CloudFormation.

Monitoring can be done with whatever your cloud platform provides.

geggam · on March 5, 2020

Hate to tell you this but that second idea is much easier to manage from an operations standpoint and significantly more reliable.

Also easier to debug and monitor... but you run your business to make developers happy right ?

p_l · on March 5, 2020

Having worked with ECS 2.0 when Fargate was released, the setup was much harder to use from operations standpoint the moment you needed anything more complex (not to mention any kind of state) than k8s. Just getting monitoring out of AWS was annoyance involving writing lambda functions to ship logs...

scarface74 · on March 5, 2020

Fargate has built in logging and monitoring via CloudWatch. It redirects console output to CloudWatch logs.

p_l · on March 5, 2020

And I'll be blunt that CloudWatch, especially logs, aren't all that hot in actual use. It might have gotten better, but all places I worked for last two years if they used AWS they depended on their own log aggregation, even if using AWS-provided Elastic Search.

scarface74 · on March 6, 2020

There are other supported options.

https://aws.amazon.com/blogs/opensource/fargate-container-lo...

rclayton · on March 6, 2020

CloudWatch sucks. Sorry.

takeda · on March 5, 2020

I think that was his point (you are referring to using VMs, right?)

hedora · on March 5, 2020

> It's an afternoon worth of research to understand the basic concepts. Then, with the powerful and intuitive tooling you can spin up your own cluster on your computer in minutes and practice deploying containers that.

Source? I’ve never heard of someone going from “what’s kubernetes?” to a bare metal deployment in 4 hours.

p_l · on March 5, 2020

That's more for minikube, which is a few minutes to run, as it is specifically for developer to test things locally on their computer.

The basic concepts in k8s are also pretty easy to learn, provided you go from the foundations up -- I have a bad feeling a lot of people go the opposite way.

akvadrako · on March 5, 2020

That’s a pretty small API. I don’t know what developers are doing if they can’t learn that in stride.

rclayton · on March 5, 2020

Really? If someone told me they were going to write all the glue code that basically gets you the same thing a UI deployment of k8s and a couple yaml files can provide, I’d walk out.

p_l · on March 5, 2020

Reminds me of the business trying to reduce our spend on VMware licenses while keeping the same architecture.

A high level person actually asked me to reimplement vCenter :|

sho · on March 5, 2020

This, and the other articles like it, should be required reading on any "how to startup" list. I personally know startups for whom I believe drinking the k8s/golang/microservices kool-aid has cost them 6-12 months of launch delay and hundreds of thousands of dollars in wasted engineering/devops time. For request loads one hundredth of what I was handling effortlessly with a monolithic Rails server in 2013.

It is the job of the CTO to steer excitable juniors away from the new hotness, and what might look best on their resumes, towards what is tried, true, and ultimately best for the business. k8s on day one at a startup is like a mom and pop grocery store buying SAP. It wouldn't be acceptable in any other industry, and can be a death sentence.

AznHisoka · on March 5, 2020

When I was younger, I was much more enticed with new technologies. But as I grow older, I've grown much more cynical. I just want to solve problems with ideally as little coding/thinking as possible.

TeMPOraL · on March 5, 2020

When I was younger, I believed technology is developed to solve problems people have. Today, I've grown much more cynical, and believe that most technology is developed to sell worthless crap to other people, which may or may not occasionally do something useful for them (but not as much as it would do if the vendor actually cared).

p_l · on March 5, 2020

And articles like the submission often have "let me write an contrarian article for internet points, who cares about details" feel to me :/

K8s solved very real problems that might not be seen when you're running one app, shitty standard syslog and cloud-provided database. But those problems still exist, and k8s provided real, tangible benefit in operation where you don't need to remember thousand and one details of several services, because you have a common orchestration to use as abstraction.

_klm7 · on March 6, 2020

Tech has become much more self-referential in the last years. Now we do tools that help build a system that can build an app to cash in minimal productivity gains

pjmlp · on March 5, 2020

Also known as consulting, conferences, trainings, certifications and books.

TeMPOraL · on March 5, 2020

Yes, but also a lot of software products, open-source or not (a lot of open-source these days is created as a part of various business models of for-profit companies).

eeZah7Ux · on March 5, 2020

When I was younger, I was already suspicious of hyped, marketing-driven technologies and lock-in. But as I grow older, I've grown more aware at how people constantly ignore the problem.

I just want to solve problems with ideally as little complexity as possible.

Nothing cynical about it.

ttflee · on March 5, 2020

Old-man-yells-at-cloud.

elpatoisthebest · on March 5, 2020

I think your comment is probably tongue-in-cheek but I still want to throw my support in with the person you replied to.

The longer I've been at the company I'm at, the less interested I am in how cool something is and the more interested I am in the least effort possible to keep the app running.

raxxorrax · on March 5, 2020

I am not that old, develop most apps besides embedded firmware for the cloud and have been yelling at it from the beginning to absolutely no effect. In my experience it is often C-level customers who embrace the cloud with most enthusiasm.

Interestingly the older generation often had the most reservations against hosting data on external systems. They are generally very big on everything surveillance though.

jschwartzi · on March 5, 2020

Yeah, every so often the bizdev guy at my office who also serves as our POC for the web developers pings me about some Azure IoT thing or another, and every time I read the documentation I think "that's great, if only we had any of the problems that that solves."

aganame · on March 6, 2020

If you grow old and stay in this business, you'll be doing a lot of that too, I promise you.

hkt · on March 5, 2020

Damn cloud.

missosoup · on March 5, 2020

Cloud get off my lawn.

TeMPOraL · on March 5, 2020

In late-stage capitalism, the cloud gets you off "your" lawn, when your Lawn-as-a-Service provider gets acquihired by an adtech company.

dvtrn · on March 5, 2020

"One of us...one of us..."

theptip · on March 5, 2020

Anecdata: series B startup. First year was a single VM with two docker containers for api and nginx. Deploy was a one shot “pull new container, stop old, start new” shell command. Year 2 onwards, k8s. No regrets, we only needed to make our first dedicated ops hire after 15 engineers, due in large part to the infra being so easy to program.

I used GKE and I was also very familiar with k8s ahead of time. I would not recommend someone in my shoes to learn k8s from scratch at the stage I rolled it out, but if you know it already, it’s a solid choice for the first point that you want a second instance.

jiofih · on March 5, 2020

When people talk about companies using k8s too soon, they are talking about deploying it on your own, not using a hosted service like GCE. That’s a whole new ball game and takes 100x the effort.

theptip · on March 5, 2020

I read TFA as mostly complaining about the conceptual and operational complexity of running your code on k8s, not so much about operating a cluster itself.

Lots of ink spilled on irrelevant concepts that most users don't need to know or care about like EndpointSlices.

And, arguing against microservices is a reasonable position -- but IF you have made that architectural choice, then Docker-for-Mac + the built-in kubernetes cluster is the most developer-friendly way of working on microservices that I am aware of. So a bit of a non-sequiteur there.

jiofih · on March 6, 2020

I don’t see when you’d need to understand what an EndpointSlice is, unless running k8s itself. The concept does not leak through any of the pod management interfaces.

collyw · on March 5, 2020

Some seniors get exited by that crap as well.

Tech lead in my team is full of common sense when it comes to not getting exited by JavaScripts framework of the season. But he loves to add new tools to the back end. He is choosing the "best tool" for the job, but we have a small team and already have way more things than we can comfortably manage. Good enough tools that are already in our system would be more pragmatic.

zentiggr · on March 5, 2020

Hopefully that's not a case of "must be seen to be doing something" syndrome. If you can use pre-existing tools and they serve the purpose, how easy or difficult is it to resist the "shift to this" pressure?

simplecto · on March 5, 2020

Exactly this. For most startups Heroku, Dokku, or plain docker is enough. The stack will certainly evolve as growth (and success) comes.

Building with 12-factor principles make that transition effortless when the time comes.

p_l · on March 5, 2020

From experience:

Plain docker - hell on earth. Literally some of the worst stuff I had to deal with. A noticeable worsening vs. running the contents of the container unpacked in /opt/myapp.

Heroku, Dokku - Very depends. A dance between "Simple and works" and "Simple and Works but my Startup is bankrupt".

K8s - Do I have more than 1-2 custom deployed applications? Predictable cost model, simple enough to manage (granted I might be an aberration), easy to combine infrastructural parts with the money-making parts. Strong contender vs Heroku-like, especially on classic startup budget

_t0du · on March 5, 2020

I have desperately avoided K8s since it became popular, but this seems to be pretty opposite of everything else I've read? Can you share what specifically makes plain docker so terrible?

p_l · on March 5, 2020

Plain docker (no swarm etc.) for practical purposes resulted in containers that were in weird position when it comes to decoupling from the host OS.

Integration with init system was abysmal. Docker daemon had its own conventions it wanted to follow. Unless you ensure that the state of the docker daemon is deleted on reboot, you could have weird conditions when it tried to handle starting the containers by itself.

A very easy thing to use for developer, a pretty shitty tool (assuming no external wrappers) on server.

One of the greatest joys of k8s for me was always "it abstracts docker away so I don't have to deal with it, and it drops pretty much all broken features of docker"

mdaniel · on March 5, 2020

> One of the greatest joys of k8s for me was always "it abstracts docker away so I don't have to deal with it, and it drops pretty much all broken features of docker"

It also offers portability away from docker via the Container Runtime Interface; we use containerd and it has been absolutely rock solid, without the weird "what happens to my containers if I have to restart a wedged dockerd?" situation

p_l · on March 5, 2020

Indeed - Back when the two alternatives were very alpha quality rktnetes and hypernetes, I was already full of joy that I was seeing the end of docker on server.

Since then we got CRI-O and life looks even better.

pnako · on March 6, 2020

You've decoupled from the host OS (good thinking), and you've abstracted away from Docker (even better thinking), but how did you decouple from k8s?

simplecto · on March 5, 2020

I always start and keep my containers stateless.

the docker run --rm command line switch tell docker to remove the container when it dies. Never a problem on restarts.

Spivak · on March 5, 2020

If you are a user of k8s then the experience relatively smooth and painless since all the complexity is hidden behind the API you're calling to provision resources for you.

If you are an operator or k8s then you've entered the nightmare zone where you have to make all of those endpoints actually work and do the right thing with code that written 17 days ago. Unlimited terrible middleware to try and form static services into dynamic boxes.

k8s was not designed for someone to deploy on-prem that doesn't have a dedicated team of developers and ops people to work on just k8s.

p_l · on March 5, 2020

I mostly use it, but I also tend to keep to stable features so that running on-prem is also relatively simple.

My biggest cryparty story when it comes to on-prem kubernetes is not actually due to kubernetes, but due to Red Hat. There are words I could say about their official OpenShift deployment scripts and their authors, but I would be rightly banned for writing them.

Biggest issues I've encountered involve things on the interface between "classic distro" and running k8s on top of it, and that goes down when you move towards base OS that is more optimized for k8s (for example flatcar).

When it comes to the size of team involved, I'd say keeping "classic" stack with loadbalancers, pacemaker, custom deployment methods etc. was comparable effort - at least if we're matching feature for feature what I'd see on "base" k8s on-prem setup (and based on "what I wish I had, or that I could replace with k8s, back in 2016 for an on-prem project).

There's one thing, however, where it gets much harder to deploy k8s and I won't hide it - when you're dealing with random broken classic infrastructure with no authority to change it. K8s gets noticeably harder when you need to deal with annoying pre-existing IP layouts because you can't get IP address space allocated, when you have to sit on badly stretched L2 domains, where the local idea of routing is OSPF - or worse, static routes to gateway and a stretched L2. To the point that sometimes it's easier to setup a base "private cloud" (I like VMware for this) on the physical machines, then setup the hosts on top of that - even if you're only going to have one VM per hypervisor. The benefits of abstracting possibly broken infrastructure are too big.

Spivak · on March 6, 2020

> Badly stretched L2 domains, where the local idea of routing is OSPF

Hahaha… welp. So you’re saying that stretching every “overlay” L2 domain to every hypervisor/physical host with VXLANs and OSPF isn’t maintainable. Color me surprised. I need a drink.

p_l · on March 6, 2020

Hey, at least you got VXLAN! And OSPF! Though I do have reasons to dislike it from past life as network admin, and somehow there are no integrations - or at least none I found - for integrating things into OSPF routing area (like with ExaBGP).

Dealing with overstretched VLANs where somehow, somehow, STP ("What is that RSTP you're telling us about?") decided to put a "trung" across random slow link >_>

_t0du · on March 5, 2020

Since you've made this distinction, can you explain more? How would the person I replied to not be both? In what scenario are they different people?

golergka · on March 5, 2020

You seem like you know what you're talking about. What do you think about AWS RDS for db and EBS for auto-scaling and deployment? (It's my first fullstack product and it feels as if it's both a good developer experience and something that won't cost as an arm and a leg, so I just have to ask).