Karmada: Open, Multi-Cloud, Multi-Cluster Kubernetes Orchestration

endymi0n · on April 27, 2021

No questions on the technical implementation, this looks impressive. Nevertheless, I'm getting more and more convinced that the whole multi-cloud topic is snake oil, an engineering self-purpose and basically the ORM discussion of the '00 years reloaded.

When you're betting on any of the large, public clouds, what you're paying extra on top compared to colocation is basically renting an insanely large and experienced operations team that's solely incentivized by keeping the services running and connected. I'm not saying the large players will never fail, but at least they are designed and built to that standard at any single point of contact.

Even at a _single_ public cloud, bugs and misconfigurations far outweigh any hardware failure. I've seen routing configurations or the Kubernetes backplane fail on GCP that took down my services, but never ever even a whole Availability Zone.

The farther you zoom out, the larger the probability of misconfiguration and failure by excessive technological complexity compared to individual server or service failure.

I did myself a favor once and wondered what's the actual cost if our site goes down and for how long. Surely it's high, you might be surprised to actually compare that against what you're paying for in risk management.

With that number in hand, I decided for example that the risk and adjusted cost of whole AZ failure is negligible compared to what we were paying for in intra-AZ traffic and happily put all of our production network back into one AZ (backups excluded of course).

Also, on the cost side, you'll usually get larger discounts and massively save on traffic costs if you just bet on a single cloud that should far outweigh any savings you might get by the leverage of avoiding lock-in, or buying individual services cheaper on specific clouds.

Then again, we're not servicing any life critical systems. Your mileage may vary.

x3c · on April 27, 2021

Regardless of multi-cloud deployment preferences, multi-cloud readiness has tremendous value in itself.

Biggest of the values is having the freedom and choice to pack up from one cloud and go to another at the drop of hat.

Another equally important reason is to provide some commoditization of cloud providers so they don't grow all powerful.

It also keeps cloud market open for new entrants that can come in with better/different value propositions than the existing players and stand a fighting chance of playing upon merits.

Unless you (as in general developer and not you specifically) want cloud providers to become massive rent seeking institutions, keep optionality in the ecosystem.

regularfry · on April 27, 2021

For us, it's less that I want to run things on more than one cloud at a time. It's that the next time there's a shuffling of the deck-chairs 5 levels above my pay grade and we end up under someone who says "Oceania Web Services is much better than the Eurasia Cloud Platform! We must move everything at once!" it's not a 3 year migration project (by which time, guess what, Eurasia is mysteriously better than Oceania again).

ClumsyPilot · on April 27, 2021

In my understanding the parent talks about having a system run across several clouds at the same time. Thats sounds like madness indeed

On another hand, I agreee whole-heartedly that having cross-clound compatiable application is actually quite sane.

thecrash · on April 27, 2021

Even if you're not concerned with real-time redundancy, running across multiple clouds is the only way to ensure that a complex system is truly multi-cloud capable.

If the system is theoretically multi-cloud but only runs on one in practice, it will eventually drift into a state of dependence on that specific environment, even by accident.

throwdbaaway · on April 27, 2021

Consider yourself lucky if "being better" is the only reason to initiate a multiyear cloud migration project. We are doing one at the moment where everything is going to be objectively worse with the new provider.

I can probably do multi-cloud with Oceania Web Services and Eurasia Cloud Platform, and still end up spending less money *and* having fewer issues, than using this provider.

But your point still stands. Multi-cloud is the way to go.

ex_amazon_sde · on April 27, 2021

Ex-Amazon SDE here.

> Then again, we're not servicing any life critical systems.

Reminder: you almost never increase reliability by adding layers of abstraction.

You increase reliability and security by removing complexity/code as much as possible and implementing redundancy in few crucial places.

jcims · on April 27, 2021

Some places are mandated by regulators to demonstrate multi-cloud capability as an extension of supplier diversity requirements. It can also provide greater leverage in contract negotiations, and more flexibility to take advantage of promotions and differentiated services if you are fluent in the belt-and-braces compute and networking.

Those caveats in mind, I largely agree with you. Many folks treat these incremental fees, redundancies and inefficiencies as a sacrifice made at the altar of availability without acknowledging that the ensuing complexity creates a risk all its own.

pibefision · on April 27, 2021

It's a big advantage to be able to move to another cloud, not because reliability of course, but having a lever so good in the negotiation with public cloud vendors will help you a lot.

throwaway823882 · on April 27, 2021

You're right that it requires a very specific strategy and cost layout to go multi-cloud, and 99% of people don't get it. Just like most people shouldn't use K8s, most people shouldn't go multi-cloud.

If it costs $500K in opex to run in one cloud, expect it to cost $1M for two clouds. It's like having Windows tech support for 200 users, and then asking them to support 200 more users on Macs, and support has never touched a Mac before. Support will be spread thin real fast, and this leads to quality issues and service problems.

There's no magic software which erases all of the hurdles required to double your infrastructure using a completely different system.

Bombthecat · on April 27, 2021

Sometimes, some service you want or need is not in the cloud you are in. So you start to move some workload closer to whatever you need to use.. And, surprise, surprise. You just moved multi cloud.

merb · on April 27, 2021

> multi-cloud topic is snake oil, an engineering self-purpose and basically the ORM discussion of the '00 years reloaded.

well it really depends.

guess what with k8s you can do a lot of stuff if you want the most simple high availability:

    - regional clusters/multi-az, that most cloud offers
    - single region/single az clusters, but still a combined load balancer
    - multi cloud single region/single az, i.e. only target a single az 
    - multi cloud multi region/multi az

the thing is for many it might be simple to "use multi cloud single region/single az, i.e. only target a single az"

the biggest problem in these multi cloud setup is mostly not some orchestration that needs to happen on top, there are many tools that can do that. it's mostly how to have a backing store that supports it and how to have a load balancer that might not fail if cloud's go down.

nonameiguess · on April 27, 2021

In addition to regulated industries that require this because they are life critical, there are applications with legal requirements to host data in the same country as users and vendor X may not operate in that country, plus military applications where the threat model includes people firing missiles at your data centers and spies and saboteurs infiltrating vendors.

Most use cases for self propelled flight don't require rockets that can get to Mars, either, but we still celebrate their existence on Hacker News.

sandGorgon · on April 27, 2021

all healthcare and financial service providers have no choice here.

especially for business continuity, etc audits

throwaway823882 · on April 27, 2021

Every healthcare and finance service provider I've ever seen is single-cloud. They may talk big about multi-cloud but none of them are actually doing it for their whole stack.

reilly3000 · on April 27, 2021

I’ve seen different projects that aspire to this, but it may be the most complete. The rationale is simple: run workloads that can survive a public cloud failure, or local workloads that can failover to public clouds. Kubernetes promises to avoid vendor lock-in but this facilitates a level of resilience that could only otherwise happen with some hardcore network configuration and usually still accepting some single point of failure.

tluyben2 · on April 27, 2021

Is there a solution like this that also includes bare metal? I'm a Kubernetes novice but we run (large) Kubernetes dev/test clusters on baremetal (because it's vastly cheaper than the cloud providers). Especially the (very annoying to me) point of handling all the parts that are there with cloud providers but not necessary on baremetal. The most obvious example being load balancers/port networking: all baremetal load balancers need layer 2 or bgp ; we cannot use either so we need nodeport + Nginx ingress which sucks (nodeport opens ports which we don't want opened) but works. I'm specifically looking for a multi-cloud solution that handles this identically on all platforms including baremetal.

Like said; I'm not very experienced with Kubernetes, but reading and asking around, I have not found a good solution for it.

throwaway823882 · on April 27, 2021

This is a system that is basically adding missing features as a super-set on another system. Rather than deal with one complex system, you are now dealing with a complex system within a complex system.

For this to actually pay off for you, it needs to cost less to run this super-system than the base system. Without any numbers (quality, reliability, bugs, etc) to speak to, there's no way to know if using this system will result in a better outcome. All you can do is hope that the base system is such a pain in the ass, and the super-system is so good, that somehow you come out on top.

asim · on April 27, 2021

It's inevitable that we'll see many of these as open source projects along with the cloud providers own "multi-cloud" solutions that pin the control plane to their own platforms. This is the holy grail of hybrid solutions we've been chasing for years, but whenever we arrive at a solution a replatforming effort begins. Before it was VMs, this time it's containers. Maybe, just maybe the kubernetes container wave will stick and we'll see some runtime agnostic system exist where workloads can be placed based on a set of tags e.g gdpr, eu-west-1, etc, etc.

But at the same time. What does it matter? Surely the cloud was about handing off to this magical thing somewhere far away and letting it handle the details. I think cloud got dissected down further and further into the nitty gritty details to a point where we just turned everything that was physical into an API, UI click or terraform script. It's useful for a certain operator, a certain layer in the stack, but honestly it is not useful to anyone trying to build a consumer facing product. It is a level of detail no one should have to contend with. Cloud vendor risk, is the same as datacenters of the past. There's tradeoffs that you accept in the face of the ability to move faster and letting that cloud providers handle everything. There's always the potential for black swan events but the optimisation for that 0.001% leads to overly complex solutions for the sake of it.

I really hope that in a few years we are not talking about this stuff. I really hope cloud is reimagined as an ever evolving network, where you build programmatically for a local experience and it gets replicated everywhere without having to think about the details. Hopes and dreams and who am I kidding, these things never happen.

jcims · on April 27, 2021

>Surely the cloud was about handing off to this magical thing somewhere far away and letting it handle the details.

One of those details that is frequently ignored on the engineering side: capex. Cloud effectively outsources capital expense management. For smaller shops this may be insignificant, but when you have (aging) datacenters on multiple continents it becomes a larger factor.

nonameiguess · on April 27, 2021

SaaS is hand off to some magical thing that handles everything for you. Managed services in clouds offer many different levels of how much they manage for you. IaaS just frees you from having to buy, install, and maintain your own servers, disks, network cabling, HVAC units, physical security, land, dedicated fiber lines from the local ISP, but you still get all the same complexity in terms of setting up all of these things to run your application.

What level of detail you have to worry about is entirely up to you. Use nothing but fully managed application services and everything will be easy. Turning "buy a new server and attach it to the various networks it needs access to" into button clicks is still making at least a few things much easier.

Jonnax · on April 27, 2021

How does it work with Ingress?

harpratap · on April 27, 2021

You may be interested in this API - https://github.com/kubernetes/enhancements/tree/master/keps/...

sofixa · on April 27, 2021

From what i see, it doesn't, it only takes care of deployments, you need something cross-cluster in front for load balancing and another behind for data storage.

nonameiguess · on April 27, 2021

Is there a website other than the GitHub page? You might consider a name change before you reach 1.0, given this already exists: https://www.karmada.org/. They came up first in web search and you came up second, but they have an actual web page.

Some implementation questions. First, what is this perceived to buy on top of existing systems like Helm and Rancher that use templating engines and overrides at build and deployment time to support deploying identical services to multiple clusters possibly hosted across multiple cloud providers? You remove some of the complexity of dealing with Go templates, but add an entire new control plane and a master cluster that I'm not sure is needed if all it gives you is another level of indirection regarding where you deploy to.

Second, regarding the quick start guide, why use resets of the KUBECONFIG variable to control which cluster you're talking to? Your scripts can merge configs and set current-context so the user can just use contexts instead. Also, the KUBECONFIG variable itself supports multiple files, so you can do this without having to merge definitions into a single file if you want the control plane to be in the special /var/run path instead of the user's home. Last, you may want to change the output of the quick start to indicate the kubeconfig file generated for the host cluster is going to be in the current user's home, not necessarily root. I'm guessing whoever wrote the guide was simply running as root (probably not recommended, but a lot of people like to live dangerously) and copy/pasted out of their own terminal. Granted, they'll see the real location when they run the script anyway, but the README gave me the impression at first it will always put the file in root's home, which would clearly fail if you're not running as root. Lastly, your local-up script should support installing the control plane into different local cluster solutions instead of having a hard dependency on kind. People might be already using k3d, minikube, or even have a homelab setup with a fleet of NUCs or libvirt hosts and they don't want yet another kubernetes installation on their laptop. You might even prefer using something else for your own development. For instance, k3d does all that docker hackery to inject container ip as the api-server endpoint for you, so it supports multiple clusters on the same host out of the box without you needing to manually manipulate the kubeconfig after creating a cluster.

Also, maybe even de-emphasize multi-cloud? All the other comments so far seem to be fixating on that as a "99% of all applications don't need to be multi-cloud" bugaboo point you see all the time posts like these happen, but this is really just supporting deployments to multiple clusters with cluster-specific overrides of api object definitions. In some cases, like storage plugins, those may be specific to a cloud-vendor, so in that case you're also buying the potential for a multi-cloud deployment. But whether or not an application is multi-cloud is a separate concern on top of it being multi-cluster. Multi-cluster is required a lot more than multi-cloud, i.e. if you scale to the size where kubernetes simply doesn't support that many nodes in one cluster, or you want to easily deploy multiple versions of the same application, or you simply want to be able to orchestrate staging and production clusters from one control plane. Maybe readers would focus more on your actual technology and not fixate on the single phrase "multi-cloud."

tyingq · on April 27, 2021

I couldn't help but think "I heard you like orchestrators, so we put an orchestrator on your orchestrator".

andrewstuart2 · on April 27, 2021

To be fair, the authors of Kubernetes were pretty clear from very early on that it wasn't expected to scale to "one cluster to rule them all," at least in part since that's not how the system it was designed to replicate/replace worked. I don't think that's changed, but at the same time I think there was always an idea that the same concepts used to orchestrate e.g. pods and services, etc, could be extended to the clusters themselves. From what I can tell at a quick glance, that's exactly what karmada looks like it's intending to do.

pm90 · on April 27, 2021

One (most likely more) of the public cloud vendors actually does this (use k8s to orchestrate K8s).

harpratap · on April 27, 2021

It's actually the recommended approach from K8s upstream - https://github.com/kubernetes-sigs/cluster-api

larntz · on April 27, 2021

This is how vmware tanzu works also.

wolframhempel · on April 27, 2021

But what will we use to orchestrate the Karmada cluster?

brujoand · on April 27, 2021

Some bash scripts

jimmySixDOF · on April 27, 2021

It's Turtles all the way down

totetsu · on April 27, 2021

write a systemd service ?

nurgasemetey · on April 27, 2021

Admin panel

ironicsonic · on April 27, 2021

Drain you of your sanity... Face the thing that should not be.