Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: Have You Left Kubernetes?
317 points by strzibny on Aug 1, 2022 | hide | past | favorite | 313 comments
If so, what did you replace it with?



We started with a full-stack k8s approach (on GKE); left (switching to plain GCE VMs); then came back much more conservatively, just using GKE for the stateless business-layer while keeping stateful components on dedicated VMs. Much lower total maintenance burden.

(Hard-won bit of experience: k8s + Redis really don't like each-other if Redis 1. is configured to load from disk, and 2. your memory limit for the Redis container is somewhat-tightly bounded. At least from the k8s controller's perspective, Redis apparently uses ~400% of its steady-state memory while reading the AOF tail of an RDB file — getting the container stuck in an OOM-kill loop until you come along and temporarily de-bound its memory.)

However, we're considering switching back to k8s for stateful components, with a different approach: allocating single-node node-pools with taints that map 1:1 to each stateful component, effectively making these more like "k8s-managed VMs" than "k8s-managed containers." The point would be to get away from the need to manage the VMs ourselves, giving them over to GKE, while still retaining the assumptions of VM isolation (e.g. not having/needing memory limits, because the single pod is the only tenant of the VM anyway.)


This makes too much sense to me to think people use Kubernetes for anything else. Why must one use Kubernetes for EVERYTHING unless there is some higher-order reason?

I use Kubernetes to host something like 6 of our own services, and it excels at that and is fairly simple. Databases and other things use different services.


We did the reverse of go. We started with purely stateless k8s and got really comfortable with the platform before moving to support stateful loads. For databases, many vendors have dedicated operators which introduces best practice deployments with less operational fuss provisioning the instances. Tooling like Strimzi for Kafka as an example here.


This may be the issue with stateful operators:

Do you really think they have ALL the operations coded properly for all operational conditions?

Stateless is so much easier for operations. It either is running, or not, A/B upgrades, yada yada.

Stateful has backups, restores, outages, patching, migrations, corruptions, fixes. For distributed systems it gets even hairier. Do they support your distribution? What if you have a multicloud or blended internal/cloud? Does the operator provide a turnkey restore from backup, and are you testing it?

Operators shouldn't be viewed as replacements for stateful operations knowledge, but it's probably what they'll be used for.

If you're using a large scale distributed stateful/database system, you need an operations team to support it, or pay for that.


Stateful has all of those problems anytime, anywhere. Even if you run a single stateful app in a single VM, you will still have all those issues.


You have a lot less problems in a single VM: There‘s (hopefully) a clear path on how to install software etc. and it is easy to understand. If your distributed PostgreSQL operator written by someone fails you‘re out of luck and it will be really really hard restoring your system. With non-kubernetes non-automated solutions not written by someone you may have some docs saying „Update the replication address and restart“.

The way to restore to a running system is much easier in old traditional vm approaches.


In K8 paths are clearer: Every bit of component that goes into a deployment is versioned. They deploy the same in every build. Its 100% certain.

The problem with stateful app scaling and HA comes from two things: the need to make files and database multi read-write and high available. Other than that, app scaling is pretty easy in K8 even for stateful apps.


I am wondering, did you ever look at tuning MALLOC_ARENA_MAX? This sort of constant consumption of memory is fairly well aligned to the default tuning of MALLOC_ARENA_MAX, which is 8 * nproc.

We've just tuned it for a java-based app which was also stuck in OOMkill hell, and this has completely resolved the situation (MALLOC_ARENA_MAX=2).


At Coherence (withcoherence.com - I'm a cofounder) we generally agree that GKE and other managed k8s are best used for stateless workloads. Rather than move stateful workloads back, leveraging managed services will yield the best results in the long term. In the case of Redis on GCP, something like Memorystore is going to be a better fit than managing a nest of node pools over time (think about version upgrades, resources differences across environments, etc...) However, the complexity of managing the different kinds of configuration across GKE and managed services can be a nightmare.

That's a problem we're hoping to help solve, where you define your application and it's dependencies, and we help run it in the right way to leverage managed cloud services across environments without passing that headache on!


Crossplane [1] is great way to create and manage resources across cloud providers, MSPs via kubernetes objects.

[1] https://crossplane.io


Config Connector [1] is also an option in this space for GCP, it supports many GCP resources and thus far our experience with it has been largely positive.

[1] https://cloud.google.com/config-connector/docs/overview


Have you checked out Managed Instance Groups? Used them a wile back, and worked as advertised :)

https://cloud.google.com/compute/docs/instance-groups#manage...


We do something similar with ElasticSearch. We use EKS (a k8s operator) but give each ES node a full k8s node using pod anti-affinities and taints. That way we can just select a sensible disk and instance size on our node pool and not worry about resource request/limit. It's been working very well so far.

ES handles node restarts or upgrades pretty gracefully though. I'd imagine for databases or "non-clustered" things you'd have to consider GKE's aggresive upgrade schedule. We use CloudSQL for some databases but our larger ones are still on GCE because we get more control of replication, CDC, and can use tools like proxysql to reduce downtime.


> Much lower total maintenance burden.

Devil's advocate but isn't having to maintain VMs (and then software deployed to those VMs) and k8s YAML/charts/whatever more "maintenance burden" than just one or the other?


I guess it would depend on how you managed your deployments/infrastructure but in general I would say no. In my experience stateless services are all managed differently because they typically need fewer resources and can be scaled more easily. Services that require state tend to have a more hands-on approach since they are usually in the critical path for many other services. Deploying a cookie-cutter service is something where k8s accels so it makes sense to use it for those types of workloads.


What do you define a server as stateless? As... not needing to talk to a filesystem or a cache server or a database?

Just something that takes in API requests (or a cron-like scheduled job) and makes other API calls/does plumbing?


At least in my experience, no.

We used managed services for our stateful stuff which significantly eased the operational burden there. Might be a different story if we looked at doing the absolute minimum cost optimization. However, at least for us, the extra cost of managed services is worth the price.

The yaml tends to be a "one and done" sort of thing. We touch it MAYBE once every 2 months if that.


Why did you run Redis on K8 in the first place? (One of the reasons we did not move to K8 was the default recommendation to not run Redis, SQL etc on the clusters.)


Our Redis use-case started off as ephemeral per-deployment storage for things like rate-limit counters; before evolving into durable per-deployment storage for things like service-metadata discovery.

Ephemeral Redis is well-suited to k8s — you can treat it as just a sidecar to your app layer deployment. Durable Redis is not. But sometimes the transition can sneak up on you.


> The point would be to get away from the need to manage the VMs ourselves, giving them over to GKE, while still retaining the assumptions of VM isolation (e.g. not having/needing memory limits, because the single pod is the only tenant of the VM anyway.)

Isn't this just moving the problem from per-pod resource constraints to per-VM resource constraints?


Yes. They could have just set the memory limit high enough to handle that workload. There's really no different.


With the pod, you still have the per-VM/node resource constraints, so it's an additional layer.


you can set hostAffinities on PersistentVolumes.

  nodeAffinity:
    required:
      nodeSelectorTerms:
      - matchExpressions:
        - key: kubernetes.io/hostname
          operator: In
          values:
          - <hostname>
this ensures that your workload will be rescheduled on the matching node


Thanks for your insights! One question regarding the approach to hosting stateful components you describe last:

what’s the difference between doing this as single node pools vs pod constraints like anti affinity?


It would make node pool operations like version upgrades more predictable since you'd know for sure which apps are running a given node pool

It can also make monitoring resource usage a little easier since you can just monitor at the node level


While I like Kubernetess generally, I agree that the OOM handling is not ideal, feels quite a bit like "exercise left to the reader".


For stateful stuff, give a try to Google's File Store. It has pretty good performance.

As for K8 managed VMs - thats a good idea. Resource management wise, security wise etc.


I've yet to encounter a non-smelly k8s deployment that was started before everyone knew how it works or why it works.

On the other hand, once everyone on the team has experience building such a system from scratch, then deploying k8s and using it somehow becomes straightforward.

It's almost as if we need to learn how a tool works before being able to use it effectively.

Anyways, what we (actually didn't) replace it with:

  - Don't let your devs learn about k8s on the job.
  - Let them run side-projects on your internal cluster.
  - Give them a small allowance to run their stuff on your network and learn how to do that safely.
  - Give your devs time to code review each other's internally-hosted side-projects-that-use-k8s.
  - Reap the benefits of a team that has learnt the ins and out of k8s without messing up your products.


Maybe its just my team, but dev's dont need to know k8s. It certainly doesn't hurt, but they should be able to write code and get their jobs done without knowing much about k8s at all. Basic shit like how to get logs, but thats a given for all platforms


> but dev's dont need to know k8s

If somebody else on the team already knows k8s. The problem is a lot of places just give devs admin access and let them go hog wild. If devs don't know k8s they can't make significant changes without waiting for the one guy who knows k8s to do it.

Give a man a k8s, he will deploy for a day. Teach a man to k8s, he will deploy for at least 5 years while the hype cycle continues.


But can your team succeed without having people who:

1. Are motivated by the sort of curiosity that would frustrate them if they were blocked from knowing about k8s

2. Are motivated by the sense of responsibility that would unnerve them if they didn't understand 1 abstraction layer beneath their work.

?


You don't need to solve this problem if k8s already works for you.


What do you mean with side projects? Are they paid?

If you want your Devs to learn kubernetes you should pay them for doing it.

If you can't, hire a contractor with the expertise you need.


I would interpret 'side project' here as a work project that is not your primary and has low stakes for delivery into production timelines or expectations.


I don't really understand this. Work is usually prioritized. If it has low stakes for delivery into production then it should have a lower priority than other activities, but that doesn't make it a side project.


If you don't let developers prioritize some time to play around with new concepts and ideas and to learn how they work, they'll play around to learn in your products.


I feel like you're conflating multiple unrelated topics together. This isn't advice on how to use another team's experience, or cut costs, or maintain your team morale.

It is difficult to tell at a glance whether an engineer is qualified to effectively use a tool. Letting them self-train by working on self-projects in isolation compounds this effect.

The goal of this exercise is to give time and space to your devs to practice in a safe environment, while allowing them to push, deploy and review projects internally as if they were core products, so that other SMEs are allowed to spend some time every week reviewing those projects for smells and issues before those ideas make it into a core product.


Running a side project on company infrastructure seems like a disaster waiting to happen for both parties.


It's true that running side projects on a company cluster in an environment where no one is quite sure how to use the tool properly is a disaster waiting to happen.

Fortunately, k8s can act as a very secure sandbox when it's configured properly, so you'll know how to mitigate such a disaster once your company has trained its engineers on how to use the tool effectively.


I took it to mean "sandbox"


I took it to mean that infamous mythical Google 20% project.


It's not mythical. Gmail and Google Calendar were both 20% projects. They were built to solve for bad internal web mail (mirapoint, I think... ?) and Oracle Calendar (which was absolute trash on Linux).

It's not as common as some would have you believe but it is real. A teammate spent his 20% on underwater topography on Google Earth and another spent his on the glider thing.


I'm not surprised 20% is not more common, but I am surprised among 20%-enabled companies that the norm isn't to have the company host side projects for all employees. Insurance, snacks, gym memberships, mobile plans and laptops, but not the one thing all hackers need?

Having an employee turn up a popular side projects while being vendor-locked onto your platform sounds like it should be more popular among rich people.


Yes I was being deliberately hyperbolic.

The way it is should actually be sold to the average entry-level fresh-out-of-college Google hire is probably closer to the way I framed it, though, than the examples of Gmail and Google Calendar which are pretty much two unicorns.


it likely means internal tools that are just within the company like companies internal wiki.


You can provision vclusters to give each dev (each or even) that gives the space to play with the env without it being a problem.

Cattle not pets after all.


Not yet. We are still deluding ourselves that the 3x cost increment and insane complexity increase we can barely manage to keep spinning is actually a business benefit.

Note: this isn't everyone's end game but I suspect it's realistic for a lot of people.

I would like to go back to cleanly divided, architected IaaS and ansible. It was fast, extremely reliable, cheaper to run, had a much lower cognitive load and a million less footguns. What's more important possibly is not everything can be wedged into containers cleanly despite the promises.


Also a big fan of sticking to ansible and plain VMs, at least for most cases I've encountered. To me, a VM in the cloud already feels like a container and you can use the cloud provider's APIs to scale up and down virtual instances as needed


> To me, a VM in the cloud already feels like a container

This is the mental abstraction I've been operating with for over a decade now.

All of our products are monolithic binaries that can be installed on bare-ass windows or linux machines. For all intents & purposes, basic AWS/Azure/et. al. VM hosting is our containerization strategy. We just pushed the tricky bits down into our software.

95% of our pain is resolved by using a modern .NET stack and leaning hard on their Self-Contained Deployment model. Our software has zero external dependencies at deploy time, so there isn't much to orchestrate. Anything that talks to a 3rd party system is managed purely via configuration in our software.


Agreed. But I do think there are places for containers. I will often package single binaries in containers for built in distribution and capabilities for rolling upgrades. Especially tooling that relies on a lot of externalities that can taint the system. Python applications, as an example are much easier to deploy and manage this way than dealing with Terraform / Ansible to provision correctly. Even if you're just using host networking and good ol' Docker there is a ton of operational upside with very low maintenance overhead (mental and otherwise).

I'm working with a product now that's made their k8s deployment the standard and all it's done is create bigger issues. Ops got behind on Strimzi and so we got stuck on 1.21 because we couldn't upgrade due to being locked to the Strimzi version. This caused issues because of log4j and we ran into a wall quickly with customers on GCP as soon as 1.22 ended up as GA. Honestly I'm not sure we're getting much, if any, overhead advantage since I feel like the app has become bloated due to container creep.

That and supporting 4 different ways to provision storage across customers on every cloud / on-prem is a nightmare. Customer environment installed applications on k8s is a nightmare today.


> To me, a VM in the cloud already feels like a container

A VM provides better isolation than a container as it has a separate kernelspace. Hence the DevOps mantra "containers don't contain".


Unless you have massive scale VMs are your best option. If you need VM configuration on startup (elastic scaling), you may need to maintain your own image. Salt Stack and/or Fabric are good alternatives to Ansible.

You could look at containerization without K8S (podman or docker) especially if you use python and don’t want to mess with the Linux native python installation.


Unless you have money to burn K8s excels compared to VMs in my experience.

It's original purpose wasn't to do elastic scaling or anything like that - it was to binpack workloads onto a set of nodes, and not everyone has Silly Valley money to pay Silly Valley prices (especially when one's currency is weak against dollar)


> It's original purpose wasn't to do elastic scaling or anything like that - it was to binpack workloads onto a set of nodes

This is arguably still its primary purpose, and all the rest of its features are ancillary and only exist for the sake of operational convenience.


Considerable portion of the internet, even people who supposedly know k8s, have this weird notion that it's for "scaling up" ... Except they never talk about scaling what single engineer can do, but some less useful things like dynamically adding lots of servers ;)


you might consider migrating to systemd controlled rootless dockerless podman. helm even has a plugin for podman.


I wouldn't bother. I'd just consolidate our product out of microservices and run more small clusters of monoliths all started from systemd.


Do you know of some writeups for this? I am like halfway there but just mess with podman+systemd on the weekend.


We did. Our use case is spinning up containers on demand to user actions, giving them ephemeral, internet-routable hostnames, and shutting them down when all inbound connections have dropped. Because users are waiting to interact with these containers, we found the start times with Kubernetes too slow and its architecture to be a bad fit.

We ended up writing our own control plane that uses NATS as a message bus. We are in the process of open sourcing it here: https://github.com/drifting-in-space/spawner


> we found the start times with Kubernetes too slow

Just curious if you could elaborate here? I work with k8s on docker, and we're also going to be spinning up ephemeral containers (and most of the other things you say) with jupyter notebooks. We're all in on k8s, but since you might be ahead of me, just wondering what hurdles you have faced?

Our big problem was fetching containers took too long since we have kitchen sink containers that are like 10 GB (!) each. They seem to spin up pretty fast though if the image is already pulled. I've worked on a service that lives in the k8s cluster to pull images to make sure they are fresh (https://github.com/lsst-sqre/cachemachine) but curious if you are talking about that or the networking?

From what it looks like in your repo it might be that you need to do session timing (like ms) response time from a browser?


Jupyter notebooks are actually a use case we think about a lot, you can try a live demo with a Jupyter notebook here: https://jamsocket.com/tmpenv/

It wasn't really one thing with Kubernetes that was slow, but that the more we tried to optimize it the less of core Kubernetes we were using and so the less value we were getting for the complexity tax we were paying. The image pulling you mention is a good example of that; having pre-pulled images is a big factor, but we have too many images to push every image to every node, instead we'd like the scheduler to be aware of which node has which image. We could do that with node affinity, but what we'd end up building would be more work than if we wrote our own scheduler to support it from day one.

> From what it looks like in your repo it might be that you need to do session timing (like ms) response time from a browser?

Our goal is subsecond container starts. We're not there yet, and might not get there with Docker, but we have a POC that is there with WebAssembly-based workloads. Too bad those are rare :)

(By the way, I'm always happy to chat about this stuff, my email is in my profile)


> we'd like the scheduler to be aware of which node has which image

The kubernetes scheduler should be aware of which node has which image, that is why the Node object has the status.images field: https://kubernetes.io/docs/reference/generated/kubernetes-ap....

It turned out to be somewhat tricky, because it increased the size of the Node object, and colocating node heartbeats onto the same object meant that a bigger object was changing relatively often. But that was addressed by moving heartbeats to a different object: https://github.com/kubernetes/enhancements/issues/589


TIL, thanks. Looks like there's a corresponding ImageLocality score used by the scheduler: https://kubernetes.io/docs/reference/scheduling/config/#sche...

It doesn't get all the way to what we want, but it could be used to build a piece of it.


Very cool, I didn't know about this either. I feel like so many of these features are coming in which is great, but also part of the drag of k8s is the kind of constant upgrade churn and having to keep your yaml fresh.


AWS has put work into fast-starting containers [1] using tricks like lazy loading container storage, profiling container startup, non-lazily priming critical blocks, and caching shared blocks. IIRC parts of it are open source. I don't know if enough of it is open source to be helpful, but it's cool stuff!

[1] Gigabytes in milliseconds: Bringing container support to AWS Lambda without adding latency. https://www.youtube.com/watch?v=A-7j0QlGwFk


On the Google side, Artifact Registry supports image streaming

https://cloud.google.com/kubernetes-engine/docs/how-to/image...


Doesn’t the latest version of k8s let you use your own custom scheduler?


You can, but that falls into this bucket:

> the more we tried to optimize it the less of core Kubernetes we were using and so the less value we were getting for the complexity tax we were paying

Since we were headed down that path, we took a step back and asked what we were really getting out of Kubernetes, and most of it was things that were orthogonal to our intended use case. The way Kubernetes is architected around control loops works great for its intended use case, but we wanted a more event-driven system.


Event driven ... like a streaming data pipeline? Given your comment about Jupyter notebooks, that makes sense. It might be the Mesos project is better architected for your use-case. Then again, I think Mesos ported some of their schedulers to Kubernetes.


If you're pulling big images you could try kube-fledged (it's the simplest option, a CRD that works like a pre-puller for your images), or if you have a big cluster you can try a p2p distributor, like kraken or dragonfly2.

Also there's that project called Nydus that allows starting up big containers way faster. IIRC, starts the container before pulling the whole image, and begins to pull data as needed from the registry.

https://github.com/senthilrch/kube-fledged

https://github.com/dragonflyoss/Dragonfly2

https://github.com/uber/kraken

https://nydus.dev/


Lazy pulling is already supported by a lot of container runtimes, most notably containerd with estargz

https://github.com/containerd/stargz-snapshotter/blob/main/d...


Ah thanks! "Lazy pulling" is what I was looking for. I was trying to find estargz (didn't remember the name) and I couldn't find a proper keyword to do it :P :D


Yeah I think we considered this, but we want the container to actually run as the user and have all the permissions set up so they can have all the right access on the cluster (kind of like a PaaS), although I think we are doing some of the stuff with the starting the container while the data is still streaming down. Black magic.


Wow, this is excellent! At a previous job, we had been using k8s + knative to spin up containers on demand, and likewise were unhappy with the delays. Spawner seems excellent.

One question: have you had to do any custom container builds on demand, and if so, have you had to deal with large "kitchen sink" containers (e.g. a Python base image with a few larger packages installed from PyPI, plus some system packages like Postgres client)? We would run up against extremely long build image times using tools like kaniko, and caching would typically have only a limited benefit.

I was experimenting using Nix to maybe solve some of these problems, but never got far enough to run a speed test, and then left the job before finishing. But it seems to me some sort of algorithm like Nixery uses (https://nixery.dev) to generate cacheable layers with completely repeatable builds and nothing extraneous would help.

Maybe that's not a problem you had to solve, but if it is, I'd love your thoughts.


It's always been my understanding that with things like k8s and other orchestration stuff, you're supposed to spin up before you need the capacity? You set a threshold, like 75% capacity, and if you're over that for a bit, you spin up a new container(s) to get you back to under effectively 75% capacity.

Is that not how this works?


Yes, that's the scaling model that works best for Kubernetes if the use case supports it. Our use case precludes it, because we are focused on uses where containers need to be spun up on a per-user (or per-group-of-users) basis as they use an application.


Precludes feels like the word word here. Nothing prevents you from satisfying your use case and spinning up vms before they are needed.

I wrote the student vm system for udacity, and I spun up student vms before they needed them, with some last mile loading to finalize the files they need. The student VMs were not using k8s, although a small piece of the infrastructure did.

I worried most about untrusted users working in a complex environment with the ability to harm the experience of other users, and just used GCE.

For me, boot time was < 5 minutes, so if you can predict the next five minutes of demand, you can boot those machines early. If you are wrong you will pay extra or someone will wait extra time, but still less than the full boot time. Generally it takes less than 10 seconds to access a vm with your coursework on in.


Keep a pool of eg 5 pods that are warm and good to go. When a user needs one, the pod pulls in necessary config data which should be quick as it's already initialized.


Can you not have a queue of whatever type of containers the users are likely to be using already ready to go, like the GP suggests?


This is really awesome. Thank you for sharing this.

One of my ideas lately has been to upgrade FaaS to a full on server after a set amount of traffic. Or said differently, a dedicated server spin up that serves the same app as callable functions ala scalable RPC and upgrade to a dedicated instance composed of said functions. The best of both worlds.

Combine the scale to zero of Serverless combined with the scalability and capacity of a dedicated server.


Kind of curious what made it too slow for your use case? I'm guessing you did not users to wait for something like kube-dns to update or the workload scheduler? Of course things like spinning up a Pod can be slow. Or non-Kubernetes things like doing DNS ACME challenges could affect things.

But on other hand, I can't quite figure out why something would prevent, you, yourself, from running the service that hosts the VMs that hosts the containers on demand on Kubernetes.


Our goal is sub-second container starts (admittedly, we're not there yet), and with Kubernetes we'd have to create Pods, create Services, wait for the scheduler to update, etc. We didn't go down the rabbit hole of profiling where the slowness was, but it was clear that Kubernetes just wasn't built with the type of speed we wanted. We realized we'd have to contend with a lot of design decisions that were the right choice for the things Kubernetes optimizes for (replication, resiliency), but not the right choice for us (fast launches of ephemeral containers).

> But on other hand, I can't quite figure out why something would prevent, you, yourself, from running the service that hosts the VMs that hosts the containers on demand on Kubernetes.

I'm not sure I understand this part, I guess we could use Kubernetes operators to scale up the underlying compute resources and manage the containers ourselves? This adds a lot of complexity for our use case.


I just wrote a controller that does pretty much that – spawn containers on demand and report back status changes. While this solution does require some knowledge, it so far has been perfectly reliable and reasonably fast. I can fathom the need for processes to spawn and tear down faster in specific use cases than the Kubernetes scheduler would allow for, but for us a few seconds of wait time has been perfectly reasonable.


Is there a fundamental reason why Kubernetes cannot start pods and services fast (outside of pulling images of course!)?


There isnt. K8 does start pods and services fast. Im able to launch an entire stateful WordPress pod (multi-container) in just ~10 seconds. Including the provisioning and attaching times of PVs from scratch. This is at Digitalocean. You can easily run stateful things like WP if you build your pods well and use PVCs - even without needing to make them stateful sets. It ends up being a neatly constructed, integral VM living on virtualization. Everything is taken care by K8.

When using K8, if you use the most basic K8 features and concepts, things generally work out pretty ok.


10s isn’t terribly long but, honestly I think 1s should be achievable and would open up more use cases for K8s in general (cloud provider physical machine latency not withstanding of course).


1s is very achievable iff you have spare capacity and images already pulled. There's good reasons the latter is hard to achieved, and the former is a tradeoff few are willing to make as it increases costs.


No, in fact we've gone running towards it after some initial success, especially when combined with ArgoCD for CD and Istio as a service mesh. My company has a lot of experience with running applications on VMs and Amazon's ECS. Our VM automation ultimately became expensive to maintain and ECS had its own set of issues I could probably fill up a blog post with.

From the Operations side, Kubernetes is scary. It's easy to screw things up and you can definitely run into problems. I understand why folks who work mostly on that side of the house are put off by the complexity of Kubernetes.

However, from the application side of things, our developers have been THRILLED with Kubernetes. For most developers my company provides a nice paved road experience with minimal customization required. For advanced use cases, we allow developers to use the Kubernetes API (along ArgoCD + GateKeeper policies) as a break glass type of approach. Istio gives the infra team the ability to easily move services between clusters and make policy changes easily. It also allows us to make use of Knative, although I think the Istio requirement is no longer there.

That said, you should be using managed Kubernetes wherever possible and not running your own clusters. That's where trouble lurks.


ArgoCD was our missing lynch pin for getting workloads migrated over and supported.

It makes it that much easier to actually use the cluster rather than mess with endless configuration tooling. Is it the best engineered tool? Probably not. But it's the one that works best for us.


Same story for us. We’ve been moving towards k8s and it’s been great for app devs. We ran in plain VMs for a decade and it was a good time to switch at 2k employees, maybe 500 devs?


I’m curious, do you use Vault, Datadog, or some Falco maybe? What is the rest of your Infra stack?


I migrated a company from k8s to ECS/Fargate in 2019. Kubernetes is very flexible, but I opted for simplicity.

The result of the migration was that there is little underlying infrastructure to maintain, and ongoing operational costs were lowered by 50% year over year. The CTO and I liked the setup so much, we started converting another large client of theirs. I followed up with them at the beginning of 2022 to see how things were going, and they still love it. There is so little maintenance, and now they have more time to focus on what they do best–Software!

Other options on the horizon that I'm testing include utilizing AWS Copilot with ECS/Fargate, and/or Copilot with Amazon App Runner.


I have settled on the ECS camp as well. Took a run at Kubernetes and was blown away by the complexity. With ECS/Fargate I don't spend any time on it. It just works for our setup.

I still wonder from time to time if I am missing something not going Kubernetes.


Are you big enough to need terraform? If the answer is yes, you may have a good justification to move to kubernetes migrate tf->k8s with lots of benefits for the app teams (if they care). If you're just yolo setup your cloud in AWS web console and you're fine with that, then you may not see much lift. A good reason to use declarative (often infrastructure as code) approach to deployments is that it improves bus factor and the ability to hire people who can pick up and maintain the infrastructure.


AWS CDK exists and IMO is way better than terraform if you're on AWS. So much so that terraform is making their own variant to be more CDK like.


I didn't know they were trying to be like CDK. Now I have to look this up :)


The CDK for Terraform went GA today (https://www.terraform.io/cdktf and https://www.hashicorp.com/blog/cdk-for-terraform-now-general...). It's a framework that extends the capabilities of CDK so that you can use the whole Terraform ecosystem of providers and modules.

Under the hood it means that the `cdktf synth` command ultimately generates Terraform configuration that can be executed like any other Terraform config. It's definitely not a case of Terraform trying to be like CDK. Each has it's strengths, choose whichever makes the most sense for your workflow.


We are big users of terraform. I couldn't imaging running our setup without it or some other tooling like CDK.


What about Pulumi? I love it


I use AWS Copilot and find it to be really easy to use and helpful. It is still a pretty young project and as such doesn't really handle all the edge cases, but for the things it supports, it makes using ECS even easier than it already is.


Chose Fargate over K8 too. I made the call, so no need for migrations :)


We have had a few teams try, but as soon as you go beyond "I want to run some code for a bit", nobody really has anything for you. Instead of trying to re-invent the wheel (service discovery, mutual TLS, cross-provider capabilities) successfully, it went downhill quite fast and they moved back. (this was mostly due to cost as other services can get expensive really quickly, and because of the lack of broadly available knowledge for the custom stuff they had to build)

If a team were to start with no legacy and no complexity and there isn't going to be multi-team/multi-owner/shared-services I could see them using something else. But that applies to anything.


I've been a K8s user for some time, but it does drive me bat shit crazy. My main beef with it is I often cannot discern the logic of how things work. For the developer platforms and systems I enjoy working with, you are presented with primitive axioms that you can then bootstrap your knowledge upon to derive more complex ideas (e.g., any decent programming language, or OS). K8s does not work that way -- at least as far as I can tell. A priori knowledge gains you nothing. When I run into a problem on K8s, I copy/paste the error into a search engine and I am presented with a 200 message long GitHub issue with users presenting their various solutions (how does this command relate to my original problem, who knows?), some work, but most of the time, they don't and you are left in a bigger hole than when you started. I end up tearing the whole things down and starting over, most of the time. That last comment is the biggest "code smell" for me with K8s. When it is easier just to nuke the thing and begin again, there is a problem.


I'll put blame on bad documentation and tutorials becoming the norm for k8s versus what was common early on, because k8s is very much about building more complex ideas from primitive axioms. The whole resource model is built around simple ideas being used to build more complex ideas.

Wish there was some better docs out there, not sure if I could handle writing one from scratch :/


I've never gotten too deep with K8s. It always came across as incredibly complex to maintain with limited managed service support. Whenever I spoke to engineers pushing it, the problems it solved didn't resonate with me as someone whose spent the last 10 years running hundreds of services across thousands of servers.

These days I'm a huge fan of CDK and Pipelines style deployments. I prefer to treat my compute layer as a swappable component which I'll change as and when I need to. I tend to lean towards serverless offerings which take care of the internal scaling details if I can while still giving me a traditional "instance", and if I can't then I'll go for the next best managed offering.

I've yet to see an example where internal tooling doesn't become a mess over time, and K8S requires a ton of work to keep things sensible.


Yep CDK and/or Pulumi. It’s very easy to map your own custom concepts and logic to your cloud provider, rather than making a cloud provider on top of the cloud provider you already pay for.


I've moved to a company that doesn't use Kubernetes at the moment (and that's a 100% calculated and rational decision). What I see, is that a lot of effort is put to provide functionalities that Kubernetes brings. In case of running a bunch of services, when you wish to do that in a stable and secure way, Kubernetes cuts down running costs. It covers so much cross cutting concerns that reimplementation of those capabilities is not possible unless you have heavy $$$ to spend.


I think you're right to point out how much ground k8s covers and to replace every vertical that it integrates could be challenging/costly. But k8s is not a zero-cost abstraction, so I think the calculus here is often more nuanced.

In the case of my org, we optimized for the features we thought were valuable and amortized that effort over time. Notably this was early in k8s history (2014/2015), but the fruits of those efforts have aged well so far (8 years or so). Small code footprint to cover service discovery, cert provisioning, release orchestration, and configuration management. The whole devops stack is less than 3k SLOC. Service ecosystem is ~150 distributed systems, roughly about 5 million SLOC, running on just over 1k servers on AWS.

I think if the aim is not to completely replace what k8s does, but to cherry pick the features that give you some pareto distribution of value, sometimes its worth it to build in-house. Nothing wrong of course with going with k8s for many orgs, but in our case we didn't have to reinvent the whole wheel to live without it.


3k SLOC of devops code to cover a system of that scale is super impressive. And I agree, there's no reason to invest in k8s when only a small fraction of its capabilities is necessary (or we have a team that's already experienced). Otherwise we may end up bending our requirements to k8s abstractions (even though they are well designed).


At least for most hosted solutions, Kubernetes seems to be "cheap" (compared to other offerings at that provider) after you pass some reasonable threshold: something like 6-8 services each running 3-4 instances or so. This threshold seems to roughly end up being $500/m.


When I hear numbers like this I wonder what percentage of the compute and memory resources of that 18-32 node cluster — not to mention the engineering that went into making it work — goes into the “hard” problems of horizontal scaling, cramming stateful services into an architecture designed for stateless ones, etc.

You can actually get a couple of pretty beefy bare metal boxes for that budget. Or a couple of more modest ones for app servers plus a nice big RDS instance with all the trimmings. Based on past experience, that’ll get you to a few hundred rps for even a fairly complicated, poorly-tuned Rails or PHP app; your well-factored Go API server should handle 10x that pretty easily.

You might have to write some Bash or systemd unit files instead of a bunch of YAML, which may or may not bug you. I find shell easier to understand and debug than YAML-based scripting but YMMV.


Right, I don't think you can beat buying the machines, of course.


You don't have to buy - Hetzner, OVH, etc will happily rent you these machines dirt-cheap and that includes hardware maintenance & replacement.


Sure, that sounds some problems. Try to get your average CTO on board, though!


OVH may be problematic if you don't speak French (their English support sucks), but Hetzner is pretty famous and well regarded. Their network is great. They provide a lot of automation. The provide little tooling compared to AWS, but what is there works and it works great. Its an engineering-minded provider. Also, its VM pricing is the lowest. What's best is their egress pricing cannot be matched by anyone else in the US or Europe.


contabo is interesting too…


First time i hear contabo.


Same, we are on ecs and there is a lot of reinventing the wheel


Genuinely curious what things k8s solves that you are reinventing. I run ECS and find that using microservices and their managed offerings (ie RDS, SQS) we don't any complicated topography to do complex work.


1. Creating confined development environments containing multiple services. With k8s we can spin up a cluster locally, install all the dependencies and develop on it.

2. Remote development. With k8s we can develop right out of the cluster, ECS has no comparative.

3. Installing OSS software. K8s has loads of supported packages for OSS tooling.


Nope, I like k8s. What I don't like is people trying to be overly smart with it and leaving a configuration hell of templates, weird network configurations and broken certs behind them. For my personal workloads it's all basic containers with a reverse proxy, though.


Hell no,

I remember managing hundreds of virtual machines in datacenters & cloud, using Ansible and a myriad of other tooling.

It's nice when you're at a small scale and you don't have a lot of people making changes, but over time as it grows the pain grows with it unless you've enforced a consistent cattle model.

The longer VMs live with custom changes/code and updates over time the more brittle they can become. Part of the cattle model is so that you can recreate/rebuild when changing code so things stay consistent. The drift from infrastructure as code can be scary otherwise.

With the cattle model you need to have pipelines in place to build new VM images for infrastructure updates (packer etc), have multiple APIs to hit (easier in cloud) to upload images and serve them in a non damaging way. (HA deployments/rollouts/dealing with load balancers) It's certainly a non-trivial amount of work.

With Kubernetes, a lot of this tooling comes out of the box. You've got autoscaling, load balancing, health-checks, limits/requests, failure mitigation, service mesh options. On top of that it's served in a strict semi-consistent way. Good luck replicating that with virtual machines without a lot of tooling and effort.

If you can learn the Kubernetes tooling it can do a lot for you. However I agree that not all setups need it, a lot of times small setups never grow and that's ok a few virtual machines aren't that big of a deal.

We still use virtual machines for workloads that aren't container friendly, and to be honest these days I abhor it, even with pipelines in place.


> The longer VMs live with custom changes/code and updates over time the more brittle they can become.

Honestly kubernetes is not harder than dealing with this. It's keeping you back in the land of default google-able problems longer as weird tweaks and unique configs aren't piling up to make esoteric issues.


Not only have we left Kubernetes, we left Docker.

Replaced with Linux servers and SSH.

Have done a lot of work with k8s in the past. Not the right tool for my startup.


Interesting! Feel free to elaborate. What does your CI/CD/deployment pipeline look like? Do you use something like Ansible, Puppet, Chef, Salt etc?


CI/CD pipeline is GitHub Actions which executes over SSH commands on our servers that execute deployment scripts: https://github.com/bugout-dev/spire/blob/main/deploy/deploy....

We use systemd to manage services.

We use Ansible to set up servers.

Our infrastructure spans AWS, Google Cloud, and servers in a datacenter.


Why did you come to this conclusion and how are Linux servers a better fit?


Came to this conclusion for teams of our current size based on years of experience and experimentation (never at the expense of the business).

They are a better fit because they are much easier to manage and it's much easier to debug issues when something goes wrong. We are a small team and we hope to stay that way. But our operational responsibilities are growing significantly. The extra cognitive overhead of working with technologies like kubernetes would prevent us from scaling up our effort the way that we want to.

It's hard to answer your question in detail outside of a very long essay.


Kubernetes is overkill, but one will have to pry containers from my cold, dead hands. I will not deal with installing dependencies from the OS package manager and editing /etc files any more.

Linux namespaces are a brilliant idea.


Yeah I am not hardline anti-Docker. And anti-k8s only for small teams/companies.

With larger teams, I have written and maintained custom k8s operators in production. It was a great fit for the problems we had at that scale of developers.

A terrible fit for the problems my current team has.


Are you using at least Nomad or something?


Nope, literally SSH and bash scripts. We are fully open source (except for our security/operational code): https://github.com/bugout-dev


Went to nomad, which is working better for my workloads.

There's still use-cases where k8s wins; but nomad handles state a bit better and is easier to reason about from scratch.


I really like the look of nomad and want to give it a go. The two things holding me back are:

1) I don't really want to manage the installation but there aren't any(?) cloud hosts for nomad that I can see. 2) It doesn't seem as widely used so community support seems thin. There aren't many blog posts about good patterns with it etc, and I'd worry that we'd get stuck and end up reverting back to k8s.


There is no installation needed with Nomad, it's a standalone binary. Just fire it in a small Debian (or Alma) instance on EC2 or GCE and you're done. That should solve point 1.

Point 2 is debatable. Lots of people nowadays put Kubernetes on their resume but that doesn't mean they are great architects or technicians, yet a good part of running production on Kubernetes is doing it right.

You'll see much fewer people with Nomad on their resume, but on the other hand you know they're not here for the buzz, they're usually more experienced and know what they're talking about.


HCP might be what you want, but it doesn't support Nomad yet, and unfortunately it's not clear when it will. https://discuss.hashicorp.com/t/status-of-hcp-nomad/33374


Koyeb also moved off Kubernetes and went with Nomad. We started with Kubernetes, thinking it was the right abstraction layer for us to build our platform, but then quickly ran into major limitations. The big ones: as others have mentioned in this thread, its complexity; security (we wanted to explore using Firecracker on Kubernetes, but it was very experimental at that time); we were not interested in keeping up with its release cycles; global and multi-zone deployments was not as straightforward as we needed; and the overhead (10-25% of RAM) was a cost we were not willing to take (we are around 100MB with our new architecture).

We wrote about our decision to switch here: https://www.koyeb.com/blog/the-koyeb-serverless-engine-from-...


Nomad replaces parts of K8. It is not a drop-in replacement. If one only want the container orchestration that is fine but then you need Consul for service discovery and so on.


I've described this as "Nomad is a container orchestrator, while Kubernetes has a container orchestrator".

Nomad, Consul and Vault interoperate extremely well and are mostly pleasant to use, but I found myself missing the rest of the ecosystem pretty quickly, especially around ingresses, and I think they made the wrong decision on the networking model compared to Kubernetes.

That said, I haven't played with Consul Connect, the Consul+Envoy service mesh, yet. That might address a lot of the problems. But fundamentally I can't help but think that Nomad and Kubernetes both made a run of it and Kubernetes came out the winner of mindshare and ecosystem.


Traefik and Nomad play very nice together if ingress is your main concern.


This is not true anymore. They released service discovery a little bit ago.


Thank you, I did not know that. It seems a bit limited in comparison to Consul but it would probably work in many cases.


They're also adding in a basic secrets k/v store in the next version. Their intention (I believe) is to target small use-cases and IOT use-cases; while allowing folks to scale to gigantic levels when mixing in Consul/Vault.


>"There's still use-cases where k8s wins; but nomad handles state a bit better and is easier to reason about from scratch."

Can you elaborate on how Nomad handles state differently than K8S and what makes it better?


Yes! My startup of 5 people did. We started out with a managed Kubernetes cluster on DigitalOcean, but there were a number of reasons that caused us to not be very comfortable with that setup.

   - Taking random .yml configs from The InternetTM to install an Nginx Ingress with automatic LetsEncrypt certs felt not-exactly-great. It's no better than piping curl to bash, except the potential impact is not that your computer is dead, but the entirety of prod goes down.
   - Because of this, upgrades of Kubernetes are a pain. The DigitalOcean admin panel will complain about problems in 'our' configs, that aren't actually OUR configs. We don't know how to fix that, or if ignoring the warnings and upgrading will break our production apps.
   - Upgrades of Kubernetes itself aren't actually zero downtime, and we couldn't figure out how to do that (even after investing a significant amount of research time).  
   - We were using only a tiny subset of the functionality in Kubernetes. Specifically we wanted high-availability application servers (2+ pods in parallel) with zero-downtime deployments, connecting to a DO managed PostgreSQL instance, with a webserver that does SSL-termination in front of it.  
   - Setting up deployments from a GitLab CI/CD pipeline was pretty hard, and it turned out the functionality for managing a Kubernetes cluster from GitLab was not really done with our use case in mind (I think?).  
   - It would be bad enough if DigitalOcean shit the bed, but the biggest problem was that we couldn't reliably recognize if something was a problem caused by us, or by DO. Try explaining that one to your customers.
Summarizing: it was just too complex and fragile, even once you wrap your head around what the hell a Pod, a Deployment, an Ingress and Ingress Controller, and all of the other Kubernetes lingo actually means. I suspect you need a dedicated infra person who knows their stuff to make this work, so it could very well make sense for larger companies, but for our situation it was overkill.

We were not intellectually in control of this setup, and I do not feel comfortable running production workloads (systems used by 20k high-school students, mission-critical applications used by logistical companies) on something we couldn't quite grasp.

We went to a much simpler setup on Fly.io, and have been happy since. It's a shame they seem to be too young of a company to really be super reliable, but I suspect this is only a matter of time. In terms of feature set, it's all we need.


For context, I ran a DevOps team for the last 4 years that managed two products on AWS - one on EKS and one on ECS. I also just finished building out more or less that exact infrastructure on DO.

I can pretty confidently say, that's not K8s, that's Digital Ocean. On AWS, we ran the EKS infrastructure (which was not simple) with basically half a dev's time for years. It was only when it started to scale to millions of users that we needed to build a team to support it. It was still a much smaller team than the one that supported the ECS product (two devops).

I was mostly managing and not coding by the time Kubernetes was in our stack, so while I'm very familiar with infrastructure in general (and I know ECS inside and out unfortunately), I hadn't used Kubernetes directly much before I build this DO infrastructure. But I got it up in a week and though DO is a nightmare, k8s is an absolute joy as a DevOps. Holy shit it's perfect. It does exactly what it needs to, with exactly the right abstractions, with perfectly reasonable defaults.

The reality is that infrastructure work is just that complicated.

You wouldn't try to have a team of front end engineers build your rest backend. It's not reasonable to expect javascript engineers to know how to build and operate an infrastructure - at least not with out dedicating themselves to learning the tooling and space full time for a while. Think of it from the perspective of a frontend engineer learning Python and Django to build out a rest backend, and then multiply the complexity by 4. That's just infrastructure regardless of what you're using.

That said, if something like Fly.io can fit your needs, that's great! I haven't used them so I can't speak to them directly, but I know that with Heroku, the trade off was cost and, eventually, being limited in what you could build. Eventually you would need to build something that just couldn't be built with Heroku. A quick glance at Fly, the pricing looks reasonable, but I'm guessing the build limits will still apply.


That's fair enough. We took a look at 'native' AWS, but there are a multitude of reasons why just dealing with AWS at all is a huge upfront time investment too if you don't hire somebody already skilled at this (complicated billing, just figuring out the product names for their various services, to name a few).

> The reality is that infrastructure work is just that complicated.

Yes, if you need the flexibility of running anything in any setup. What we really wanted was 'yeet a docker image with a web server in it + env vars at some magic beast that'll run it for me, slap an SSL-cert on it, and make sure it's always online'. We tried to replicate this with Kubernetes, so we got the full complexity of k8s unloaded upon us.

Heroku was what we really wanted, but it was always too expensive. Fly.io strikes a good balance here, the defaults are sane, it's still flexible enough for other services, and it's relatively cheap (spend is similar to DO K8s).

> You wouldn't try to have a team of front end engineers build your rest backend.

Well, yes and no. I wouldn't expect frontend engineers to know the ins and outs of everything backend, but to build on your metaphore a bit further: Setting up a basic Node backend with express serving static files shouldn't take multiple weeks, even for a frontend engineer. I feel like I was trying to do the infra equivalent of that, and it did take me forever.

> A quick glance at Fly, the pricing looks reasonable, but I'm guessing the build limits will still apply

The build limits could be an issue but really isn't for us right now. It's fairly easy to build locally though (in our case: in our GitLab CI/CD runners)


> Well, yes and no. I wouldn't expect frontend engineers to know the ins and outs of everything backend, but to build on your metaphore a bit further: Setting up a basic Node backend with express serving static files shouldn't take multiple weeks, even for a frontend engineer. I feel like I was trying to do the infra equivalent of that, and it did take me forever.

Yeah, that's just what infrastructure work is. Like I said, take that analogy, multiply the complexity by 4 (at least... really maybe multiply it by an order of magnitude).

Let me put it in perspective. I've been coding since I was 12, I taught myself C to build a MUD in middle school and high school. I had about a decade of full stack professional experience in Java, PHP, javascript and I'd done infrastructure work with EC2 and chef before. When I moved into DevOps it was overwhelming.

I've been in DevOps for 4 years. I built that equivalent DO infrastructure with Kubernetes just last week (and in a week). I started the week going "Fuck, I don't know what I'm doing." The first 3 days were just spent reading documentation. Day four was spent writing the terraform and kubernetes manifests - with a distinct feeling that none of this was going to work because I was missing several key pieces. Day 5 was spent putting a few of those pieces in place and debugging. I finally got it working late Friday night. I took on a ton of tech debt and made a bunch of compromises just to get something working. I'm not the least bit happy with what I have working and intend to totally rebuild it on AWS when it comes time to build production.

And that's with 4 solid years of doing infrastructure work full time under my belt. For someone with no infrastructure experience? I would estimate 1 - 3 months. There's just way too much to learn to think you could do it quickly and simply.

With an express backend, if you have javascript experience, you really don't have much to learn. You need to learn how http interacts with the backend, how the backend interacts with the database, and databases (SQL). That's it. Learning database is not nothing, there's a lot that comes with it, but that's still only 2 new tools really.

With infrastructure, you need to learn networking, databases, securty, container orchestration (how does high availability work? Scaling?), bash, linux, provisioning, terraform, Docker, Kubernetes manifests, monitoring, secrets handling, and more. And for a lot of these things, the solutions are far from simple or perfect. Even when done as well as can be with modern tech it feels shakey and cobbled together at the end. You're tying a dozen different tool types together to solve a dozen different problems and you have dozens of choices for each tool type.

Like I said, infrastructure is just like that. And it's important to have the right expectations going in to it.

If you can't tell, I've had this conversation with my peers who stayed in full stack a lot.


Sounds like you have never understand what you have deployed. Kubernete is complex, you need somebody know how in your team.

Meanwhile, going fly.io sounds sensible to me.


Yep! And we had to make a decision whether we would focus on our core business of developing great applications for end-users, or spend more time running infra and try to wrap our heads around the mountain of complexity that is k8s. That choice at our size is a no-brainer, although that trade-off might be very different for larger teams.


Well all of those issues are fixable, but I think it is a totally valid reason not to use k8s if you don't have a dedicated infra person/team.


Yeah, definitely! This is why I am not that harsh of Kubernetes as a tool at all, I'm just saying that it's not suitable for us for these reasons. In our context of <4 FTE of dev power it just isn't worth the manpower we have to throw at it to make it work, I'd much rather invest that time into moving our core business forward. I might see ourselves moving back to it in the future, but in the meantime we really just need a Heroku / Fly.io / DO apps / AWS ELB or so.

At my previous employer (~50 FTE of devs, 2-ish FTE dedicated to infra) Kubernetes worked perfectly fine, and I think in that context it made a lot more sense.


Kinda? We use Cloud Run because for our workloads GKE was a lot more expensive. So far, it's been great. I wouldn't say I've "left" Kubernetes, since from what I understand Cloud Run implements the Knative standard, which is itself built on Kubernetes. But much like it was predicted early on, I think Kubernetes is best used as a means of building an infrastructure platform, not an infrastructure platform in and of itself. You certainly can cobble all this stuff together and build a nice system, but it takes a lot of work, and there's probably a hosting company out there which already does something similar enough that you can adopt.

With this approach to hosting and deployment, I think Kubernetes' main advantage is that it opens the door to new kinds of infrastructure businesses, not that it makes hosting a website any easier.


+1 for Cloud Run.

I've tried many of the serverless platforms and maybe it's the types of applications I work on, but I've found most of their limitations (short runtime, limited access to resources on your private network) basically make them useless. The more self-hosted types that don't have these limitations lose out on many of the benefits or are leaky abstractions on k8s.

Cloud Run has all the benefits I want: extremely easy deployment and scaling, as well as the ability to scale to zero if you need it (though generally you don't), while still being able to run basically whatever workload I want. My current employer is mostly a Python shop but we recently deployed a little .NET core service on Cloud Run and it's been awesome.


Note that Cloud Run is not built on Kubernetes, but on Borg. It implements the Knative Serving API spec, mainly for portability reason with Knative and Kubernertes.

Source: I'm the Cloud Run PM and we have commmunicated about that publicly in the past.


TIL!

Do you have any Google docs or blog posts that talk about this?

I always wondered why you need a Serverless VPC connector for "vanilla" Cloud Run (or you have to use Cloud Run on GKE) to access VPC resources, but I suppose this answers that question.


Yep! Well, kinda... I still use it at work but for any of my personal stuff at home or for my side projects I use Fedora CoreOS [1] with Butane YAML [2] which I template with Jinja2. Being able to define a VM with Butane and launch it quickly is pretty great. Nothing I am running requires the benefits that Kubernetes can bring to my workloads and the reduced complexity is a breath of fresh air.

I am slowly moving towards using Hashicorp's Nomad running on Fedora CoreOS using the Podman and QEMU drivers. I rolled out a Nomad at work for internal projects and it let's me get things done quickly without living in a total YAML hellscape.

1: https://docs.fedoraproject.org/en-US/fedora-coreos/getting-s...

2: https://coreos.github.io/butane/examples/


We use Nomad from Hashicorp, it's super simple. Never liked the complexity K8s brings along.


I never liked the cost Hashicorp products bring along.


What, zero? (GP didn't say anything about using Hashicorp-managed products, they're open source and beer-free to use. Another comment says Hashicorp's platform doesn't even offer Nomad (yet?) anyway.)


If you need features offered by the self-managed Enterprise version of a Hashicorp product, I've heard the price tag is something like low six figures per product.


Hashicorp is very inflexible about support plans -- either you go all in on their Enterprise product, or you're self supported. By the time you've licensed Nomad, Consul and Vault -- because they interact and you will find Nomad support ends where Consul support begins, and so on -- it is a LOT of money.


Using terraform without TFE is something I would never recommend to any large org. Been there, done that.


Hmm. Please could you explain further? I'm genuinely curious what costs you associate with Hashicorp products.


It depends on your size. For a fairly minimal close-to-best-practices you'll need for each DC, each on a separate physical host (I may be missing something):

  3 x Consul server
  3 x Nomad server
  2/3 x Vault server
It's long since I operated k8s but IIRC I think you can get similar capabilities and redundancy with 3-5 machines?

That's before you start looking at actual runner nodes, load balancers, proxies, logging and monitoring infra, etc...

Unless you cheat (which I think many do) or you're big enough, that overhead can be meaningful.


(Disclosure: Nomad team lead)

FWIW we recognized this was too much overhead for many users. Nomad 1.3 supports service discovery so you can start without Consul, and 1.4 will support secure variables to get folks farther along without requiring Vault.

So 3 Nomad servers should give you a pretty featureful and highly available cluster these days.


Yeah or, like, spin up three medium servers in different zones and have each server run all three services. We did that for a production setup for years and it worked fantastically. There's no need to have nomad/consul/vault all on different servers unless they are significantly underpowered or the workloads are crazy.

If best practices say otherwise, then maybe they should be reconsidered.


Sure, but at this point there's so much else we get from Consul that, like, what's the point...

I guess the path is set but I'd personally much prefer having a recognized deployment scenario be hosting Consul server and Nomad server on the same physical machines, and accommodating (be it through code or just docs) for making that play well with security, certs, and resource usage without becoming a confounding mess.

Even Vault, if the operator accepts and/or mitigates the sidechannel aspects - from a security perspective that still shouldn't be a step down from anything Nomad-specific?

Seeing as HC already provides solutions for all of these supposed to be serving for Nomad, doesn't it make more sense to make them play together smoother and nice on the same machine rather than reinventing a lesser wheel for each of them?


Entirely true, but I also think that neither k8 nor Nomad are that useful if you're not at a scale where the above is negligeable? It costs roughly 500 usd a month on aws for those 9 servers.


OT: I really do not like Hashicorp. Terraform has a terrible DSL, and terrible documentation. Also, I paid $70 for their VMWare Vagrant plugin ages ago, it was so buggy to be unusable, they were unresponsive on the Github repo, and they completely ignored my request for a refund under their own 30-day guarantee. Not very professional.

I really don't get why people love that company so much.


I would love to. But what I hate about K8s is how you can't not use it. It's like Jenkins. A total piece of shit, slow, buggy, insecure, maintenance headache, expensive to maintain, never works the way you want without a ton of work, lots of footguns, bad practice is the default. But try explaining to management how you don't want to use Jenkins and they'll just come back with "but it's free" and "everyone uses it" and "no vendor lock-in". They don't understand that they're asking you to become a Ferrari mechanic when you really need a Ford F-350 pick-up.


No and we are happily using it within our overcommitted cluster (combination of shared and dedicated nodepools).

We are a small team of 5 infrastructure engineers and previously managed 200+ libvirt VMs running on bare-metal HA hypervisors in a GlusterFS storage pool (software agency, different customer application services). We started to migrate to GKE in 2017 and finished within a year or so.

I know many associate k8s with a yaml mess, but this is actually our most favourite part of it. We are able to describe a whole customer project in this format and it's not something we have to maintain in-house (Ansible). As long as you don't try to be smart (templating/helm, operator dependance), it works out pretty well, prefer plain manifests and extend that with you own validation scripts.

Nevertheless, if you have no 24/7 operations, stay the hell away from bare-metal - go managed.


I'm particulary interested in a variant of this question.

My company has clients who usually have very simple requirements. A Python/Django app server and a database. Sometimes there will be another background service or two (memcached or equivalent etc).

The most complex site we had was the above but with some Postgres replication clients.

We use docker and docker-compose. We've used ansible in the past as well as fabric and other simple solutions.

We've had a couple of devs try and convince us that we should be using Kubernetes and I counter with "it's overkill for what we need". Am I wrong?


Nope.

You can install all of those servers in different containers, and then combine them in the same pod. For all intents and purposes from the outside, it will be a singular VM. But from the inside, you will be able to separate all those servers/tasks to separate containers, running from inside the same virtual localhost machine. They can also use the same PVC, making running a stateful app much more easier. You dont even need to make it a stateful set.

You get a lot of benefits with this - you will be able to easily manage each different server in the containers inside the pod. Easily manage their resource constraints. Security. You can make the pod's containers not accept connection from anything that does not belong to the particular app that they belong to. K8 will manage resources in the cluster, its autoscaling up and down, everything. All of the stuff that you had to maintain scripts or ansible to make happen in non k8 setups will be automated.

K8 is basically abstraction of the non-business stuff a lot of infra approaches were doing. Its containers inside VMs without you needing to manage VMs.


> We've had a couple of devs try and convince us that we should be using Kubernetes and I counter with "it's overkill for what we need". Am I wrong?

You're not necessarily wrong, as long as Docker Compose isn't incompatible with what you're trying to do - e.g. if you'd need overlay networking across multiple nodes, or scheduling things across them in one go, then Docker Compose might not be the best fit and you might instead be better served by looking in the direction of Nomad or even Docker Swarm, though the future there is unclear - maintenance mode project, but very similar to Docker Compose and comes out of the box with any Docker install.

Either way, Kubernetes might indeed be overkill for simple setups, unless you're using just a subset of its functionality and are running lightweight clusters, like K3s or K0s. I guess some might be pushing it because it's basically become the industry standard, at least in some capacity, in some places, or maybe people just want to put it on their CVs.


Well, "it depends". This is similar to my first production k8s deployment, where we used k8s to host a lot of PHP, some node.js, some other stuff, and due to legacy code it meant a lot of apache2 containers with mod_php.

The reason we went for that setup is that it helped us cut cloud/hw costs significantly (at the start we pretty much had two workers and that was because we ran everything with replicas=2) - each individual site had small requirements, and with k8s we could guarantee enough resources while binpacking as many of them per server as possible.

The actual deployment story can possibly get simpler than docker-compose, but I'd say the real question is whether you'd get a financial win out of it, as it seems you have a pretty good steady state going.


Back in time, k8s was a glorified docker swarm and swarm was largely compose spread over multiple servers, so if you deploy everything on a single computer and don't have requirements to care about redundancy/failover and all that, then k8s is almost certainly overkill.


I would first do calculation of what it would cost to host it on vanilla cloud services. They are often cheaper than what people think, if you include work hours needed.


I'm not sure I understand the distinction you're making?

I would be hosting on vanilla cloud with or without Kubernetes.


Kind of?

For my new projects nowadays, I'm pushing mainly serverless approaches using AWS Lambdas (behind API Gateways for stuff that needs to be reachable by HTTP).

I think this shifts the complexity from managing Kubernetes and its accompanying ten-thousand-yaml-files to infrastructure-as-code and the complexities of dealing with AWS. And I happen to prefer the latter, even though it's not infinitely better by any margin.

For the few things that needs to be always-online, or 3rd party self-hosted apps, I'm still on Kubernetes, or pure Docker if possible.


How do you get over vendor lock-in, and the issue where it is often difficult to debug these solutions?

Also how is the cost of Lambda? I know for AWS the logic apps have a really high cost (but function apps seem to be reasonable)


Vendor lock-in is always omni-present, even if you are using Kubernetes. The only difference is that the vendor lock-in you get when you run Kubernetes is mainly with the infrastructure layer hosting the code, instead of the code entrypoints.

Not to mention any managed services you want to use, which also will lock you in. So I don't do anything extreme to avoid vendor lock-in, other than making my code general enough to only have a small surface area for the lambda entry point. As a practical example, all of my APIs hosted on Lambdas are ordinary ASP.Net apps that would work identically if hosted in Docker containers.

Pricing so far is one of the biggest benefits of doing serverless approaches. I'm down to paying a couple of dollars per month for something that I'd pay tenfold for if doing Kubernetes. Both the monetary sum of only paying for what you're using, and also not having to worry about cluster maintenance, scaling and management is a godsend.


infrastructure-as-code and the complexities of dealing with AWS

So cloudformation YAML? or CDK?


Not the OP, but I have had success with CDK. The main advantages for me have been discoverability with respect to resource properties, along with proper, higher-level abstractions pertaining to AWS infrastructure. https://aws.amazon.com/blogs/devops/leverage-l2-constructs-t...


Or terraform/terragrunt, Pulumi or one of the other options out there


Terraform only for me, have not experienced anything better.


Sort of.....

I went from on-metal K8s clusters, which were a complete PITA and required a full team to manage, to using EKS which has been everything K8s should be... easy peasy.



Nomad + Consul(with Consul Connect) + Vault. With Terraform obv.

We don't really have a use-case for Boundary but it looks pretty neat as well if you do.

Was on k8s for years and I don't miss it one bit.

While there definitely is some complexity once you get serious and set everything up properly with raft, federation, Connect, CAs, proxies, ACLs, proper secrets lifecycles... I find it's worth it. With the current assumptions that HC will keep improving and existing bugs and edge-cases will be ironed out.


We adopted it in 2017 and got rid of it in 2021. It introduced a lot of complexity, while still leaving a lot of issues up to us to figure out. E.g. deployment strategies.

Also: our main reason to adopt Kubernetes was to stay cloud-agnostic, but we soon realized that this is as unrealistic as writing a complex app's SQL in a vendor-independent way.

Instead, we decided to embrace our cloud (AWS) by using their CDK tooling and leveraging their features as much as possible. If we ever need to switch to another cloud we will bear the cost then, but for now it is clearly YAGNI.


I am not sure so much that Kubernetes itself is an issue, as far as the technology. I'm personally a fan of serverless/lambda style functions, but my understanding is that many of those can run on Kubernetes under the hood.

Same goes for heroku/digital ocean app services. Even elastic beanstalk. If you are large enough that you need to manage your own k8s cluster, that is one thing, but I would encourage you to look at your needs from a usage and compute perspective long before you start solutionizing with trendy technologies.


I really wish Rancher didn't abandon Rancher 1.6 and moved to k8s. This was a perfect solution for a small business and bare metal.

I am trying to move on k3s but it is just too complex to run anything and there is still not solved problem of exposing services to internet.

What I want is to declare I want this service to be under this domain and this IP - so for that you still need to configure your load balancer (bare metal) manually, setup certificates etc. I am writing a tool to automate this, but it's been a pain.


> What I want is to declare I want this service to be under this domain and this IP - so for that you still need to configure your load balancer (bare metal) manually, setup certificates etc. I am writing a tool to automate this, but it's been a pain.

After initial setup you can do it quite easily.

Exposing a service on selected domain is several lines in Ingress and adding certificates is several more. Example: https://cert-manager.io/docs/tutorials/acme/nginx-ingress/#s...


So this is not going to work for several reasons. One being that on bare metal you don't have a cloud provider, so there is no load balancer it can talk to. Second - it will setup a hostname and a certificate on the ingress, but there is no way to contact it from outside world. The domain still needs A record pointing at the server and in the cluster that may be a local IP or a set of IPs.

What I have in mind is an external server that is not being a part of the cluster that bears the role of load balancer. It will contact the cluster and look for services and then setup up a reverse proxy based on their declared hostname, then setup certificates and update DNS records at DNS provider.

As far as I know something like this does not exist.

Maybe Traefik has such a capability, but their documentation is so complex I have no idea.


Actually I'm using it on bare metal and it works. Initial setup wasn't very hard but I think it could be more intuitive. Overall I think documentation for self-hosting kubernetes sometimes a bit incomplete.

Yes, I need to add A records with IPs for each domain, but that's one time setup. I did it manually, but you can automate it [1] (depends on what you use for DNS provider but you can extend it to support your provider or maybe there is another existing solution).

I'm not sure that one server in front of the cluster is more reliable than using all cluster nodes for load balancing. I guess that in automated solutions like [1] cluster's node could be automatically deleted from DNS if it went down.

My setup is not so big so I don't have real need for load balancing, but it seems possible with existing solutions.

[1] https://github.com/kubernetes-sigs/external-dns


Sure it does, I ran kube-vip[1](but there are many others, e.g. metallb) as my cloud controller, all it needs are valid static IPs/range/dhcp and it will assign these to LoadBalancer services(which you usually only need one of for your ingress) and it will either ARP or use BGP to route external traffic.

As for DNS records, external-dns[2] works perfectly as long as your DNS as some way to doing automatic updates.

1. https://kube-vip.io/

2. https://github.com/kubernetes-sigs/external-dns


The problem with kube-vip is that it has poor documentation. I have read it many times and still don't know how I could use it. Last time I was running something assigning IP addresses to the dedicated server interface I got it null routed and provider threatened to terminate the service because it was interfering with other clients network. So if I see things like ARP, BGP, DHCP it is not clear what exactly it does on the network and how that would work in the real world. I am missing an example where I have a server with a static IP from which I want to access the exposed services that are on a private network. All I really want is an automatically configured reverse proxy that will direct traffic to appropriate services and take care of certificates and DNS.

Before the Kubernetes I used Rancher 1.6 and that was super simple. For instance I would start a wordpress container and then all I needed to do was to add a reverse proxy entry with its hostname as a backend and point where the certificates are (that was before lets encrypt).

Closest I could get was exposing a NodePort and having nginx to reverse proxy to the nodes at given port, but that seems more complex / fragile, as I need o keep track which service uses which port and it is still manual, so I might as well just use containers without Kubernetes.


Another option is running something like haproxy ingress in external mode on dedicated vms

https://www.haproxy.com/documentation/kubernetes/latest/inst...


My opinion on Kubernetes is that it's great orchestration software that is trying to fix poor underlying application architecture issues. The biggest underlaying software application architecture issue today is the idea of a single responsibility worker. Why do applications have one worker only working on messages from one queue? This architecture looks great on a whiteboard, but has issues around spikes in traffic, uses tons of unused server resources, and requires lots of custom software plumbing. The solution is a generic worker that does work from any queue. It is such an obvious fix to lots of the scaling issue that large software applications face today. I'm personally only using Tempral.io, which is a generic worker orchestrator, from now on when making large distributed applications.


I think this is due to "needing teams to be independent" which for some orgs is a real thing (Conway's law) but there are small teams trying to build microservices, which, IMO, is a bad idea.


I don't think the two are mutually exclusive


To those who have used K8s extensively...

1. Is it really so complicated?

2. Is that complexity incidental or essential?

3. Could we get away with a simpler set of abstractions for 90% of applications?


In my experience, Kubernetes can be straightforward if:

1. You read the O'Reilly book first (or another good book). There are a few unexpected abstractions (replicas, services, deployments, etc) which the book explains nicely.

2. You pay for a hosted Kubernetes. Google's is great. EKS is workable, but you may need to spend more time configuring it.

3. You don't mess with the networking system, and nothing goes horribly wrong.

Our clusters peak out at close to 400 CPUs, and Kubernetes generally does what it says it will do.

One caveat: If your app can be deployed using a "platform as a service" (Heroku, Render, etc), that's usually a better idea than Kubernetes. Kubernetes makes sense when a PaaS starts feeling too limited.


On the PaaS front, I've also found that for smaller applications / new startups the pricing between starting with k8s or starting with a PaaS are pretty similar.


Definitely. Works for FaaS too up to some extent (especially if you need hot-standby/zero-latency startup). Problem is mostly that once you scale beyond the "look it says hello world" levels the price goes up so fast you can essentially pay someone to "make it cheaper by running it elsewhere" and still be better off.

Most questions seem to revolve around a tiny part of the puzzle, or a small "just starting out" phase and completely forgets about the lifecycle of the business process that it is built for, and the existing systems it needs to interact with. Even a startup will have that problem considering most are trying to get bought which essentially means being absorbed into a legacy company. So even starting out with no legacy to worry about is just a stay of execution.


If you are based somewhere else than in USA and without VC backing, the pricing wall hits you even faster :V


Yep, and it's even worse if you also have to account for traffic, usual lack of GDRP compliance in the USA and even time zone issues.


1. I used to use it at an old company in the early days of K8s. We ran our own setup, as EKS and AKS didn't exist. GCP did, but we were on AWS. It really is very complicated, however, with EKS, GCP, and AKS, it makes it a lot easier. Note, for users it's a lot simpler than the alternatives. Sure it's no heroku, but it vastly makes things easier compared to running on AWS, GCP, Azure, or worse, on bare metal.

2. Essential. K8s solves a problem that's quite complex. You can't really solve it in a simple manner.

3. Probably, but that 10% will require the additional abstractions and complications anyway, and it'll be easier to manage one system rather than 2.


It's too dynamic to have universal answers.. but I'll give it a shot:

1. It's only as complicated as you make it. Kubernetes is essentially PKI (which is a must in any case), a REST API, and a scheduler. It stores some stuff somewhere, and you can add more stuff for it do have more features. I wouldn't call that complicated and it's essentially what Swarm and Mesos do as well (minus the PKI part).

2. PKI is essential. If you think that's complicated that's a whole different problem. Everything else is incidental. If adding more OpenAPIV3 schemas or REST API seem complex, again, not really a Kubernetes thing, mostly a general software development thing.

3. Yes, as 90% of applications really only exist as mediocre CRUD viewers you could run on a potato. Also, 90% of applications don't need to be as highly available or scalable as people might think. Then again, ecosystem complexity in software development combined with the lack of general knowledge (i.e. how to use an RDBMS properly) makes that while the software is simple and could be run as a single statically compiled binary, it generally is a mess, requiring more messes to make it run. But since that is cheaper (less developer time spent, more cheaper developers available to do that type of work), that is where we end up.


1 - No. People try to implement old, complex, stateful apps on K8 by just slapping on some stuff. That creates problems.

2- See 1.

3 - If you can containerize your app in a simple way, then yes.

Note that a stateful app that would require attention in a bare metal server or a singular VM would still require that kind of attention on K8 as well. K8 just removes the need to manage the VM infra. And makes running your infra as code much easier.

If you need to run a stateful app in a highly available manner, you can do it in K8 and it would be good - however you will spend a similar effort for maintaining the high available services like you do in other venues. Ie, if your stateful app requires a Percona cluster and a NFS cluster off of K8, you will still need to launch and maintain those services. K8 operators make these a lot easier to launch and maintain. But its still maintenance nonetheless.

Using managed, hosted databases can work for the database part. But they are expensive. So launching a database cluster via a K8 operator would be cheaper to maintain. NFS is a problematic thing across all platforms. So if you need it, you either launch a rook-ceph cluster to provide a shared filesystem or use a hosted service like Google File Store.


K8s does not have to be complex. If all your doing is, hosting a bunch of various web services, it is really simple. Actually, doesn't have to be just web: you can host services that only can communicate within kubernetes, or services that monitor and manage some XYZ resource, etc.. and that will all be really simple.

Even hosting redis etc.. is really straight forward.

It is funny, but the complexity starts to happen where you want kubernetes to handle other stuff: like hosting databases, or other storage resources, and if you want to for some reason I will never understand have your external services essentially communicate directly with kubernetes rather than have some middleware service you pay for do that for you (like a load balancer, etc.)

One thing I did have an issue with was setting up SSL... that was surprisingly stupid. Should have been much easier to do that with LetsEncrypt.


The problem is that it's often simple at first until you dive into the management of the cluster.

Then you run into a litany of issues with networking (like you mentioned SSL termination) and stateful apps or databases.

Even in this thread, someone mentioned how Redis defaults lead to a lot of issues in containers.


Do you write code in python? Is it really complicated to write a script to fetch some data, extract info, upload it someplace else? It’s not.

But then, someone is trying to fetch 50GB files and now you need to play with buffers. The script misbehaves so the API rate limits you and now you need to handle credentials, back-off, etc. The script hangs in some strange state and you need to add structured logs to figure out what is happening. Now we need to upload multiple files in parallel, are we going multi-process or multi-thread? Is python the right language? Are we going to use one pod or many?

See how it quickly gets complicated? Add to all that the fact that it’s easy to spin-up rabbitMQ with some defaults with helm locally. So you do that in production as well and when it goes down you don’t know what’s happening.

As another commentator said, there is a level of knowledge that is required to things in production reliably.


I feel like you might be listing a bunch of edge cases that won't affect most people, or at least are as likely as a bunch of other edge cases with completely different optional solutions


k8s is actually very simple as a user. It's complicated to operate it yourself without EKS< GKE, etc. But from an end user perspective you write some delcarative manifests, they get put into an event bus, and then the state of the world is reconciled with your manifests. Easy peasy.


> It's complicated to operate it yourself without EKS< GKE, etc.

... and we're now fully back to the mainframe era with the people in white coats who "run" the computer.

The cloud truly is mainframe 2.0.


> 1. Is it really so complicated?

Depends. Are you a >500 Developer org with many services? Then it's easy compared to what's out there. Anything less than that I'd say it's complex and you'd be better off using a PaaS

> 2. Is that complexity incidental or essential?

Depends. If you're going to do simple things forever then it's an overkill. But if you expect to grow in unknown ways in the future and don't want to waste your time doing bunch of migrations in the future then it's essential.

> 3. Could we get away with a simpler set of abstractions for 90% of applications?

Maybe? Heroku, AppEngine, CloudFoundry tried, but didn't go too far. Let's see what new crop of PaaS offerings are able to do


> Anything less than that I'd say it's complex and you'd be better off using a PaaS

Kubernetes is available as a managed service in AWS, Azure, Google etc and this is likely to be the most popular deployment model.

By any definition this is a PaaS and if you add in custom monitoring, logging, security, ingress etc. is going to be just as simple and significantly cheaper than using a managed solution.

If you're just building a basic website then sure it's an overkill but fewer people are building those these days.


it really comes down to what you're building. many web app startups would be better served paying for PaaS that manage this for them. as an example: Netlify/Vercel. if you need a database add FaunaDB to that. if that sounds risky or expensive, consider the cost of building a DevOps team.


It’s not any more complicated than doing it other ways if you want control of the full infra stack. Advantage of k8s is you get a hardened unified API that works everywhere.

> 3. Could we get away with a simpler set of abstractions for 90% of applications?

Yeah but you can do that in k8s too, check out knative serving for example. K8s encourages the creation of higher level abstractions, with the advantage that you always have the break glass to dig into the primitives, which you don’t get with a lot of other systems.


1) No, the problem is it is all or nothing. Knowing a little k8s means your stuff doesn't work.

2) necessary at scale, incidental before then.

3) yes.


imagine trying to create an infinitely scalable system that is bound by aws account limits


a) AWS account limits are very flexible if you have spend enough money with them.

b) Kubernetes clusters can span multiple accounts, clouds etc.


I tried using k8s for a personal projects cluster a while back and found it very frustrating to use in a whole bunch of ways, whether managed or not. I ended up just using straight docker swarm and it works fine for that level of need, especially combined with something like portainer. Much simpler and easy to understand what's going on. Obviously it's not a very useful solution on its own beyond a certain scaling point but it probably meets most small use case needs.

But it doesn't get brought up as an option very often because docker basically FUDed themselves by having two things called swarm and then loudly killing the older one making everyone think it no longer exists.


+1 for Swarm.

It's not for hyperscalers and it's got a limited feature set compared to k8s, but it's simple enough that you can really learn how it works and how to make it do what you want even if it's only a small part of your job.

If you just want redundant services, zero-downtime upgrades, and either manual-only scaling or very restricted autoscaling, Swarm is likely sufficient.

Main downside? Not available as a managed service, at least from major providers. Then again, if you're OK with managed services, you would probably prefer either a fully-managed PaaS (Heroku, Azure Web Apps, etc.) or a managed k8s.


This is kinda the mental-cage I am in right now: For some small amount of containers (300 at most, almost all webserver-like), I would like to have some basic high availability and scheduling on a few nodes. K8S, K3S and even Nomad feels overkill, I tried all of them. Swarm on the other hand is so easy so setup and get running, its seems like the perfect solution. The only thing stopping me is the stigma of Swarm being dead, which is not even the case right now (there is still support but no new features / communication). I feel like starting with swarm right now would be perfectly fine but using a technology which likely may be declared official dead in about 1-2 years, just some how feels wrong. This is my own mental-cage-issue here, right?


This is what I mean about how they FUDed themselves. There is a thing called swarm that isn't supported anymore but there's no reason to think the newer thing called swarm is gonna go away, and the only way I think it matters if it gets new "features" is if docker as a whole does. If it started collecting new features unique to swarm it'd just become another k8s.


This is my first time hearing about swarm and swarm ? I always thought they killed it and brought it back zombie-style soon after. How can I distinguish between them? Like is there any way to make sure I use the new one? Is there documentation? Now you made me question reality :D


This SO thread covers it I think https://stackoverflow.com/a/40045865

The messaging around this was terrible, but it's basically that a separate product got killed and they made it a core feature with the same name at the same time.


Never used it.

Didn't pass my BS test.

I am glad that people are moving on to something that will exhaust their creative juices on something ... pointless instead of focusing on delivering value for their customers :)

More people using the brand new tech - less competition :)


My company (and sector for that matter) is typically 10 years behind the mainstream, so we're just transitioning a giant legacy monolith to kubernetes/micro-services now (literally installing Longhorn today).

To be honest, even with the technical overhead it'll probably solve a lot of problems for us from a workflow perspective. We've (the engineers) been arguing for more component-level testing for years (as opposed to the all-up E2E testing we're required do now, which typically turns into component-level testing anyway), and containerizing everything is a good excuse to push it into reality. It'll also make deployments a lot easier (just roll back to X image if there's a problem). Right now we have tens of thousands of lines of hand-written deployment scripts that manage everything and have to be maintained, and intimate knowledge of how they work is often limited to who wrote it (many of whom are no longer with the company), and if there's a problem you have to do surgery on the environment. Kubernetes will give us a unified deployment architecture with problems you can google.


From my experience, making downgrade of a single component will not be easier with K8 unless you design for it.

Also from my experience, people will start complaining as soon as the new deployment with k8 start failing and they have to fix it.

But its a good opportunity to make the transition to more stable architecture.

My suggestion is to take it slow and do changes one system at a time. Start with stateless application with less risky deployment and as you learn move others.


Why would I leave Kubernetes? It’s the best thing since Microwave and sure as heck way better than Mesos.


No. It's really good.

We have about 100 devs in multiple teams. Kubernetes provides great level of standardization and transparency - completely different experience than VMs, where admin team had too much ability to cut corners and build technology debt. People would riot if they had to go back to these days.

A few warnings: * It takes some resources. Maybe can be mitigated with k3s or similar, but I don't have first hand knowledge here. * It requires some time to learn and configure properly. If your entire team is 3 people and you are on limited budget, probably not a good idea. * Adopt some tools (helm?), standardize deployments, where possible. Bare k8s is bit too much for daily work. * Read good practices and don't try to be smarter, at least until you really know what you are doing. Limit misconfiguration may really burn you at least convenient moment.


I've been dragging me feet on implementing k8s, suspecting that its complexity would eventually be reduced by its evolution.

And then a couple weeks ago, I was tasked with standing up a new Ansible AWX server, which now done via a k8s operator. It was an exquisitely painful experience. This is potentially a bad example because I'm pretty sure IBM's plan with AWX is now to make me suffer, but through that entire process, k8s just felt like extreme overkill.

I'm pretty sure that's going to be the last time I use k8s. I know it makes sense for some use cases, but I it just doesn't feel intuitive in any way. And although it may seem more efficient, I absolutely dread having to troubleshoot any problems down the road.

I'm probably not the target audience, but thought I'd leave a comment for fun anyway.


Can you share more about your awx-operator and/or k78s pains? I was tinkering with it this weekend and I'd like to compare notes. I got the basic install via kustomize working just now, because I got stuck with the helm-based method.


Yes. I manage hosting for a sizeable online business. I thought it would be more efficient to run our containers on Kubernetes, but it’s very complex. Now I run all production workloads on FreeBSD jails managed with iocage. It works very well for our needs and with less overhead.


No, and I wouldn't, since I absolutely love it. I've put our entire build pipeline and everything into one single cluster at the moment, and been finding it incredibly straight-forward and easy to build our CI/CD pipelines using it.

Do I recommend Kubernetes to other people/companies though? Absolutely not! The learning curve is incredibly steep, and it really does take investment into understanding how it works.

But to anyone who is looking to use Kubernetes, I highly recommend https://helm.sh since it actually makes templating deployments significantly easier.


My guess is most people are hitting the Trough of Disillusionment

https://en.wikipedia.org/wiki/Gartner_hype_cycle


Personally, I never joined. I have tried, I really have. I spent so long trying to move my business over, because on paper it's a developer's dream. Everything in config files. Perfection!

But in reality, I think I developed an allergic reaction to complexity and hype. I took some metrics; things like recording the time taken, steps taken and happiness generated from my current build/release stages, then comparing to k8s.

In conclusion, struggling to learn k8s forced me to find joy in the simplicity - knowing that one day (that will never come), I can just hire someone to do this... "It's only a problem when it's a problem".

For now, I have a lovely bash script that is triggered on Github releases (using Actions), which uses doctl to do the following:

1) Create a new server from my baseline image 2) Run the setup steps as defined in the Dockerfile, although it doesn't use docker (it just makes sense to keep the configuration I used to have) 3) Copy the built-and-tested version of the repository to the new server 4) Run any post deployment scripts, like database migrations, whatever 5) Move the reserved IP to the new server

It takes about a minute from me clicking "new release" in Github to seeing the changes hit production. If there's a problem, I move the reserved IP back. Load balancers, database clusters, etc... they're all set up manually because "it's only a problem when it's a problem".

Kuberneeties only ever generated problems for me.


No. It's working so well for us. I love it.


We never started because we realized how silly it was. A clustering solution is a great thesis, but nobody has "done it right" yet. K8s was over-designed from the get-go and missed what actually causes issues in scaling.

We feel the same about Docker. People have no idea what's running when they download a docker image. Stuff can be buried deep deep within an operating system image. Security should be simple, transparent, and minimal so it can be reviewed easily. Reviewing a docker image is impossible. I'm convinced the correct place for isolation is systemd. This guy wrote a great starter for hardening the crap out of your services: https://docs.arbitrary.ch/security/systemd.html Systemd offers a bridge too with nspawn if you're not ready to undertake ultra minimal hardening of services.

Scaling is a "sexy" problem to have though, and software "engineers" love to think that their SAAS product with 100 users is going to take Google scale workloads; thusly what could be done in a LAMP stack on a single DO server, is inflated into a fantasy that will never come to fruition.


> People have no idea what's running when they download a docker image.

This is true of software packages and especially of third-party libraries; supply chain attacks are supply chain attacks. But similarly, supply chain controls are supply chain controls, and using Docker does not mean running someone else's container.

(For example, we build our own hardened base images, and on those we install our own services, and the result is precisely as trusted as building our own hardened AMI and installing our services on that.)


I only ever see Kubernetes mentioned on HN in two contexts: "Kubernetes was ruining my business" and "Kubernetes saved my business".


Well, "Kubernetes didn't really have much effect on my business" doesn't really make for an interesting or memorable post.


Not really. It is working well for us, but it wasn't as easy to begin with. Storage, Networking & Debugging are the biggest challenges.


My current company uses it and that is not a bad choice. I prefer nomad which I have used since 2016.

I have spent the past couple weeks working with kustomize since I do not like helm and while it gets the job done I think Tanka would be better.

We are on GKE which makes things a lot easier and I personally would not choose to run my own cluster.

(disclaimer: i worked at hashi for 4 months in 2020 but not related to nomad)


I switched jobs recently and became the defacto DevOps person so have been able to deploy mostly how I want. I’ve used kubernetes at multiple jobs, side projects and at home but for a cost and time constrained startup we are leveraging ECS/Lambda/Batch/Cloudfront. B2C application, mostly low traffic with nearly no traffic off hours. Occasionally we’ll get a big rush, 2 to 3 orders of magnitude more traffic than usual, from a marketing push and haven’t ran into any issues yet.

I still run KEDA at home for managing plex, home assistant, some game servers and other of my own projects. But being the only one who is using the cluster is a different use case than getting RBAC, ingress and management set up correctly for a production cluster IMO. I’ve never had the sole responsibility or permission over a cluster before, so it was a daunting step I decided not to take for my own sake


The second question is my question as well. Because K8s has failed to deliver as a true "platform". i.e Application teams still have to care about infrastructure. That statement is true even for the "managed K8s" by cloud providers. But what's the alternative? We are stuck.


Fargate


Solo creator. Never went there. Don't miss anything. :-)


We switched to it in the last year. It's been good for us, but it really depends on your use case. We deploy lots of different applications with different scaling requirements, so it's a great fit. If you're deploying one app without any unusual deployment requirements, it's probably overkill. Certainly there is a learning curve, but once you've got it down it's easy to throw new things into the mix.

People talk about it being incredibly complex, and honestly I don't see it. Yeah there's a layer of jargon you have to dive into, but it all makes sense once you start building something with it. By far the most complex pieces for us are the integration points with AWS (we're using EKS.) The examples/docs available are just not that great.


Our story involves moving onto k8s, then moving off it.

We run most of our app on Google App Engine explicitly to avoid devops work. However, we have a stateless-but-memory-hungry image manipulation service that was just too expensive on GAE. We migrated that service to k8s on Digital Ocean.

It was a disaster. I mean, it worked, but suddenly we were spending a lot of time learning k8s and fussing with k8s and it slowed down feature development. K8s is a time sink. So we migrated the service to Digital Ocean App Platform and velocity returned to normal.

I'm not wholly thrilled with DO App Platform. It has some maturity issues, and while it's cheaper than GAE, RAM is still more expensive than Elastic Beanstalk (which charges you more or less the EC2 VM cost). So we'll probably move it there someday.


If RAM cost is an issue, why not renting dedicated servers? You can rent a dedicated server with 256gb RAM for less than $400/mo on various low cost providers such as Hetzner and OVH.


We don't need anywhere near that much RAM per instance, and we have somewhat bursty traffic. Cloud is a decent fit. I'm willing to pay for the convenience, but that doesn't mean I won't cost-optimize.


No. In my current company k8s + helm + istio + argocd + good support from SRE/infra team has made things pleasurable. Seniors introduce the complexities of the system to juniors in a controlled paced way. One thing I would change is to replace terraform for crossplane.


what i see as a consultant and i am with k8s since v1.2 that companies try to get from 0 to 100 and then wonder why the fail. so you are going from java 1.7 on jboss with a tightly coupled monolith to microservices on k8s with docker. All the nice things k8s can provide goes hand in hand with the ability to work together. cert provisioning, network infra, storage, db offers etc.. there is so much k8s needs to succeed and all the teams have to work hand in hand which is what the companies brutally underestimate.

i would stick no matter the company size on IAAS + $Deploymenttool (ansible or so) and docker and then get comfortable with and only then, when everything works as intended make the switch to k8s.


we have used k8s for about 4 years and are now slowly moving back from k8s to fargate

creating a scalable system is complicated within aws account limits

all we really want is to shove docker containers behind a load balancer and not worry about having to manage yet another system


I'm curious what AWS "account limits" you ran into. I've very rarely come across a quota/limit that wasn't increasable upon request in AWS.


Fargate runs with EKS or ECS. Does this mean you dropped EKS for ECS?


Fargate just sort of registerede as a “hosted Kubernetes” in my mind, guess not.


It can be "hosted serverless Kubernetes" (in the EKS flavour), or "hosted serverless Amazon ECS" (in that flavour).


Yes, the last two companies I've worked for migrated to AWS Fargate and Serverless/Lambda. There are some advantages to using k8s when you have large stateless applications that need to scale, but it requires well thought out patterns for things like caching and involvement with the dev team from the beginning. Most small to medium size companies get no real benefit from Kubernetes as it introduces a lot of devops overhead and rot (examples: the companies that chose to use Skaffold instead of Helm, or customized deploy scripts that don't make sense to developers and aren't integrated into a CI/CD pipeline)


I use remote Docker contexts with my home cloud now instead of Kubernetes. I'm actually looking at moving to Docker Swarm. As a single dev mostly satisfying my own needs, it's pretty much all I need. Happy to answer questions.


I've also looked at docker swarm mode, but as a noob in these things, its the persistent database that makes me nervous - how do you do it? And do you have any tips or tricks?


I keep it simple by scheduling services to particular nodes: https://docs.docker.com/engine/swarm/services/#control-servi...


I hacked together https://github.com/piku and run all my personal projects using it. Haven’t looked back in years, although I do use k8s at work.


Approaching the 1 year mark of running my team's various data pipelines on k8s. Keeping up to date with k8s/EKS version lifecycles has been more work than expected. No plans to stop using it anytime soon.


Luckily I never got picked up K8s, I found Docker Swarm simple and easy to use.


It's a little sad about Docker Swarm because for a certain critical mass of nodes, Docker Swarm is actually quite nice because of lower overhead and complexity. The problem is that Docker Swarm seems to be a dead project now.



In my lab at home I ditched everything and I know run k8s on my servers with everything managed using ArgoCD and GitOps, best config I had by far.

At work, we're currently trying to migrate our stack to k8s. Why ? Because our startup is getting bigger and bigger and our current platform sucks, but our products are becoming a lot more complex as time goes. We benched a few platforms and landed on EKS + ArgoCD + Vault. Works really well.


Still going strong in clients works, big companies.

For personal projects I roll with just docker.

I'm bothered by the minimal requirements of k8s, I want to deploy on 5$ machines


Take a look at k3s[0] for lighter k8s.

[0] https://rancher.com/docs/k3s/latest/en/


K3s is awesome!

I used it to host several small projects in cheap virtual machines. The setup is very straightforward. I guess we just need better editing support of YAMLs.

Thanks for the recommendation!


do you run docker swarm or anything?

or just containers on the virtual machine?

I would love to deploy with docker and no other orchestration tools.


We run a few largish rails apps. Main rails containers, nginx, sidekiq and almost everything in ECS. RDS for postgres. Elasticache redis. Opensearch for ES. We tried moving to k8s but found the current setup simpler. Our terraform state is a huge blob though and if we started from scratch i think we would have defaulted to EKS for more things.


Story of one of the projects I am involved in:

We came from Ansible managed deployments of vanilla docker with nginx as single node ingress with another load balancer on top of that.

Worked fine, but HA for containers that are only allowed to exist once in the stack was one thing that caused us headaches.

Then, we had a workshop for Rancher RKE. Looked promising at the start, but operating it became a headache as we didn't have enough people in the project team to maintain it. Certificates expiring was an issue and the fact that you actually kinda had to baby-sit the cluster was a turn off.

We killed the switch to kubernetes and moved back to Ansible + nginX + docker.

In the meantime we were toying around with Docker Swarm for smaller scale deployments and inhouse infrastructure. We didn't find anything to not like and are currently moving into that direction.

How we do things in Swarm:

1. Monitoring using an updated Swarmprom stack (https://github.com/neuroforgede/swarmsible/tree/master/envir...)

2. Graphical Insights into the Cluster / Debugging -> Portainer

3. Ingress: Treafik together with tecnativa/docker-socket-proxy so that traefik does not have to run on the managers

4. Container Autoscaling: did not need it yet for our internal installations as well as our customer deployments on bare metal, but we would go for a solution based on prometheus metrics, similar to https://github.com/UnclePhil/ascaler

5. Hardware Autoscaling: We would build a custom script for this based on prometheus that automatically orders servers of Hetzner using their hcloud-cli

6. Volumes: Hetzner Cloud Plugin, see https://github.com/costela/docker-volume-hetzner - Looking forward to CSI support though.

7. Load Balancer + SSL: in front of the Swarm using our Cloud Provider

Reasons that we would dabble in k8s again:

1. A lot of projects are k8s only (see OpenFaaS for example)

2. Finer grained control for User permissions

3. Service Mesh to introduce service accounts without requiring to go through a custom proxy


I went from Kubernetes to Ansible to Nix for my personal installation. Kubernetes was too complicated, and Ansible too brittle.


Never started. My work infrastructure is Elastic Beanstalk and my personal infra is either hand-managed containers or Dokku. Previous gig maintained an internal abstraction on top of Kubernetes but it wasn't something I ever had to mess with.


No. I do feel like k8s will be superceded with vendor specific offerings though. Even with a CKA I think there's just too much overhead with what is fundamentally /usually compute at scale.


No. Nor do we plan to.

Honestly - I even use it personally for my self-hosted stuff at this point. The learning curve is... steep. But once you come out the other side, it's a great tool.


Yes, Kube is a mistake for many. I'm a consultant who specialises in Kube, but actually spends most of my time decommissioning it and migrating to other solutions.


- Terraform to spin up VMs in the cloud (e.g., "give me a Ubuntu machine with 4GB of ram")

- Ansible to provision such VMs

- Docker to start/stop containers on such VMs

It feels like a breeze of fresh air!



If you are using AWS, no one even really needs Kubernetes there. It goes without saying that you can handle pretty much any task and any load with AWS.


Good luck managing a labyrinth of virtual machines in a cattle friendly way like a Kubernetes cluster can without a bucket of other tooling to invest in.


Depends on your workload. AWS is pretty slow and constrained on ECS-EC2 and Fargate. AppRunner is even more limited and more expensive (to the point where we could hire a full time 6 FTE team to run on-prem K8S).

AWS has EKS for a reason.


Yes after using it for 5 or so years.

Now building fully with serverless


Do you have your full server code run in a cloud function, and let the function handle routing based on inputs? Or do you have a function per behavior?


I switched back to Docker swarm manages by portainer. So much easier and all I really needed is one node taking over when another dies


I haven’t tried it yet, the docs looks promising but if I can’t run it on baremetal with official supported docs I won’t bother


a lot of these "just use bare VMs" arguments don't address how you do service discovery, load balancing, networking, rolling deployments, etc. A lot of the benefit my team gets from K8s is from these abstractions, not anything to do with "scale" per se


Sure we are replacing Kops with EKS now..


Managing k8s requires a strong DevOps team. A strong DevOps engineer is a solid software engineer plus a specialty. These people are rare and expensive and extremely hard to attract to startups.

Therefore you either can’t find anyone or more likely you hire less good DevOps engineers.

The solution is to not use k8s as a startup. The less a DevOps engineer can shoot themselves in the foot the better.


we migrated all erlang, golang and nodejs code to edge workers on cloudflare and simply did not need to run containers anymore so there was also no need for k8. i would say this reduced the operational complexity by 2 orders of magnitude.


Do you mean that you got erlang and golang code running in Wasm under Workers, or did you convert the logic to JavaScript?


No we rewrote all golang and erlang code in javascript. The erlang and golang runtime and concurrency primitives do not really matter for most use cases when you have unlimited edge functions running wherever users are. On top of that hiring is just so much simpler when you have a single language everywhere.


in our case we integrated k8s in our dev pipeline and all devs need to do to release a app in merge pr to main branch.

it all scales and works fine. there is one or two problems around but not enough for me to consider it does not work.


Kubernetes is horribly complex.


Kubernetes is not really complex at what it tries to achieve, but there might be some scenarios, where you need more tailored tool for the job.

Would I use k8s just for static websites or single API? No. Would I use k8s, for rarely updated solution, where low costs are #1 priority? No. Would I use k8s for complex microservice architecture with a long list of ever-growing implicit/explicit requirements and a lot of moving parts? Definitely, because now you just need to either use some built-in k8s feature and/or use CNCF eco-system to supply you with almost anything you need.

Kubernetes gives standardization, it's good in enterprise, where high complexity and poor communication is normal. It covers a lot of typical requirements for applications and you can learn a lot about solution just by looking at k8s cluster. However it's a time sink, if you really want to learn more about k8s and CNCF eco-system.


Can you be more specific?


It's often very difficult for developers to test applications in development because of the complexity of setting up `minikube` or `k3s`, so if your small to medium sized company doesn't have a dedicated devops or QA team, it can be too much overhead. I've worked with talented engineers who struggle getting up to speed with networking in k8s because there are a litany of tools, best practices, and terms to learn (nodeport, loadbalancer, and various ingress controllers) and it's heavily dependent on which cloud provider you're using (i.e. GKS , AWS, or Azure) so that adds weeks to dev time.

This is a major problem if the team isn't well versed with devops tools (which is often the case at smaller companies) and can lead to lots of issues pushing a lot of work to a devops resource (or team) which requires setting up a separate dev/staging cluster.

I think the preferred alternative for companies that struggle with this overhead is to use a slightly more expensive managed service, especially if you're just developing a typical MVC/MV* app.


not in definitive, it was more a "see you later". The company didn't have enough resources (people) to maintain the cluster and we decided to use ECS while we were still maturing as a team.


Too soon. Everyone is still milking it. Ask again in 10 years time.


Hell no. Im a big convert. It solves a !lot! of problems.


WellYesButNo.gif

The two philosophies at megacorp here seem to be "I built it from the ground up to target X service" where X service is usually amazon serverless or something, and "I built it in docker containers but I don't know about the cloud".

The former is a conscious decision, and we (Architecture) have a serious, sit down discussion with them about what it actually means to be fully cloud native for that particular service. This discussion ranges from cost analysis, to things like "is your application actually build correctly to do this", to "you're not going to have access to onprem resources if you do this", even asking them simply "why".

A lot of the time when the teams realize they're going to be on the hook for the cost alone they back out, and a lot of teams try to do it because "we don't understand K8s". Well, it doesn't get much better in Cloud Run either folks because you're trading K8s yaml for terraform or cloudformation.

Where it has been successful is for teams which own APIs which only get called once a month, or very low traffic APIs. I hate to say it boils down to cost, but a lot of the time it really does boil down to cost.

Additionally we've seen a weird boomerang effect as clouds offer K8s clusters which are simply priced per pod rather than per worker node (like GKE Autopilot). A lot of teams which straddled the middle of "low traffic but not low enough to really migrate" have found they're quite happy in GKE Autopilot. They use autoscalers to provide surge protection, but they just use Autopilot with 1 or 2 pods running and it keeps the costs down. That also means we can migrate them to beefier clusters in a heartbeat if they get the Hug of Death or something from HN. ;)

The second use case I discussed gets railroaded into our K8s clusters we built ourselves because we can typically get them to use our templates which provide ingresses and service meshes and the developers don't have to think about it too much, and the Devops team is comfortable with the technologies. While it means that there's a bit of "rubber stamping" and potential waste, it's allowed us to use K8s and the nice features it provides without having to invest too much in thinking about it for an individual application.


At work; no.

Personal; I use docker-compose on VMs


Any love for LXC here?


LXC is nice, but different scope.


Okay, good point.


Nope, moving to Kubernetes and happily staying there.

For context, I ran a DevOps team for the last 4 years that managed two products on AWS - one on EKS and one on ECS. I was mostly managing by the time we had k8s in our stack, so I didn't get to interact with it much directly, but I know infrastructure generally (and I know ECS inside and out, unfortunately). For that infrastructure, we had a whole team managing the ECS deployment. We managed the EKS infrastructure with the equivalent of one DevOp's time or less for years. It was only when it started scaling to millions of users that we needed to give it more time and attention. Both infrastructures (ECS and EKS) were pretty complex with multiple services that needed their own configuration and handling.

I left that company a few months back to try to build my own thing and I just finished building out the alpha infrastructure for it on Kubernetes. I can now safely say, as an infrastructure engineer, Kubernetes is an absolute joy to work with compared to lesser abstractions. At least, when someone else is managing the control plane for you. It has exactly the right abstractions, with the right defaults, and it behaves basically exactly as it should.

Yes, it's complicated. Yes, there are a lot of moving pieces. Yes, there are hard problems. That's just the reality of software infrastructure. That's not kubernetes, those are just the problems of infrastructure. There's a whole set of problems kubernetes is working to solve in addition to those. Remove kubernetes and you still have those problems, but then you also have the whole set of problems Kubernetes solves as well.

I think what's really happening with this whole "Kubernetes is too complicated thing" is that a lot of teams expect to be able to use it like Heroku. That's not what it is. Or they try to build out infrastructures with javascript/php/python/etc engineers. You wouldn't try to have a team of front end engineers build your rest backend. It's not reasonable to expect javascript engineers to know how to build and operate an infrastructure - at least not with out dedicating themselves to learning the tooling and space full time for a while. Think of it from the perspective of a frontend engineer learning Python and Django to build out a rest backend, and then multiply the complexity by 4. That's just infrastructure regardless of what you're using.

If you just need to run and scale a container fast and simple, with maybe a single database - then sure the PaaS providers might fit your needs for a while. But eventually the trade off is going to be cost and limitations. You'll eventually need a piece of infrastructure they don't provide.

TL;DR Kubernetes isn't the problem here. Infrastructure work is just plain complicated. If you want multiple services, high availability, reliability, scalability, security, and performance, it's just complicated and hard. Don't short change it. Dedicate someone to learning it or hire someone who knows it.


This is what people tend to forget: it's generally the combination of requirements and the reality of technology that is "Complicated". Not some individual tool or scheduler.

It's like asking people if they have moved from VLANs to duplicated flat networks because that is "better". You're just trading one complexity for another.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: