This is awful - I don’t think GCP is fully aware of their position in the market...

sethvargo · on March 4, 2020

Thank you for the feedback. The management fee is per cluster. You are not billed for replicated control planes. You can use the pricing calculator at https://cloud.google.com/products/calculator#tab=container to model pricing, but it should work out to $73/mo regardless of nodes or cluster size (again, because it's charged per-cluster).

There's also one completely free zonal cluster for hobbyist projects.

rolleiflex · on March 4, 2020

Seth — I appreciate you being here to take feedback, and for the clarification as well. The very surprising email I’ve received this morning is very hazy on the details, and the docs linked from the email are not updated yet.

The main issue is that not charging for the control plane and charging for the control plane leads to two very different Kubernetes architectures, and as per your docs, those decisions made at the start are very much set in stone. You cannot change your cluster from a regional cluster to a single zone cluster for example. So you have customers who built their stacks taking into account your free control plane, and you’re turning the screws in by adding a cost for it — but they cannot change the type of their cluster to optimise their spend, since, per your docs, those decisions are set in stone. That’s entrapment.

You should keep existing clusters in the pricing model they’ve been built in, and apply this change for clusters created after today.

That said, many of us made a bet on GCP. For us in particular, we made a bet to the point that our SQL servers are on AWS, but we still switched to GCP for ‘better’ Kubernetes and for not nickel and diming, since AWS had a charge that looked like it was designed to convey that they’d much rather have you use their own stuff than Kubernetes. It is a relatively trivial amount, but it makes a world of difference in how it feels and you guys know more than anyone how much of these GCP vs AWS decisions are made based not on data sheets but for the ‘general feel’ for the lack of a better word.

AWS’ message is that they’re the staid, sort of old fashioned, but reliable business partner. GCP’s message, as of this morning, is stop using GCP.

sethvargo · on March 4, 2020

Thank you <3. I apologize the email was hazy on details. I can't un-send it, but I'll work with the product teams to make sure they are crystal clear in the future. I'm interested to learn more about what you mean about outdated docs? The documentation I'm seeing appears to have been updated. Can you drop me a screenshot, maybe on Twitter (same username, DMs are open).

These changes won't take effect until June - customers won't start getting billed immediately. I'm sorry that you feel trapped, that's not our intention.

> You should keep existing clusters in the pricing model they’ve been built in, and apply this change for clusters created after today.

This is great feedback, but clusters should be treated like cattle, not pets. I'd love to learn more about why your clusters must be so static.

rolleiflex · on March 4, 2020

> This is great feedback, but clusters should be treated like cattle, not pets. I'd love to learn more about why your clusters must be so static.

What’s inside our clusters are indeed cattle, but the clusters themselves do carry a lot of config that is set via GCP UI for trivial things like firewall rules. Of course we could script it and automate, but your CLI tool also changes fast enough that it becomes an ongoing maintenance burden shifted from DevOps to engineers to track. In other words, it will likely incur downtime due to unforeseen small issues.

It’s also in you guys’ interest that we don’t do this and clusters are as static as possible right now, since if we are risking downtime and moving clusters, we’re definitely moving that cluster back to AWS.

sethvargo · on March 4, 2020

Hmm - have you considered a tool like Terraform or Deployment manager for creating the clusters? In general, it's best practice to capture that configuration as code.

leg100 · on March 4, 2020

Managing clusters in terraform is not enough to "treat clusters like cattle". Changing a cluster from a zonal cluster to a regional cluster in the terraform configuration will, upon a terraform apply, first destroy the cluster then re-create the cluster. All workloads will be lost.

I'm sure there are tools out there to help with cluster migrations, but it is far from trivial.

Aeolun · on March 4, 2020

I think it’s a bit optimistic to assume that your customers will just change their deployment model because you introduced a fee.

You provide a web interface, so it’s reasonable to assume people will use it.

jiggawatts · on March 5, 2020

Cloud providers assume everyone is just like them, or like Netflix: Load balancers in clusters balancing groups of clusters. Clusters of clusters. Many regions. Many availability zones. Anything can be lost, because an individual data centre is just 5% of the total, right?

Meanwhile most of my large government customers have a couple of tiny VMs for every website. That's it. That's already massive overkill because they see 10% max load, so they're wasting money on the rest of the resources that are there only for redundancy. Taking things to the next level would be almost absurd, but turning things off unnecessarily is still an outage.

This is why I don't believe any of the Cloud providers are ready for enterprise customers. None of you get it.

onefuncman · on March 5, 2020

I think you're wrong -- containers aren't ready for legacy enterprise, VMs are a better choice of abstraction for an initial move to cloud.

Get your data centers all running VMWare, then VMDK import to AWS AMIs, then wrap them all in autoscaling groups, figure out where the SPOFs have moved to, and only then start moving to containers.

In the mean time, all new development happens on serverless.

Don't let anything new connect to a legacy database directly, only via API at worst, or preferably via events.

varshithr · on March 5, 2020

Have you considered the Google App Engine on GCP in standard mode? That seems like a good fit based on your explanation but I could be wrong.

jiggawatts · on March 5, 2020

I had a similar conversation with a government customer, saying that they should pool their web applications into a single shared Azure Web App Service Plan, because then instead of a bunch of small "basic" plans they could get a "premium" plan and save money.

They rejected it because it's "too complex to do internal chargebacks" in a shared cluster model.

This is what I mean: The cloud is for orgs with one main application, like Netflix. It's not ready for enterprises where the biggest concern is internal bureaucracy.

derefr · on March 4, 2020

Why would one want lots of little GKE clusters, anyway? Google itself doesn't part up its clusters this way, AFAIK. I don't want a cluster of underutilized instances per application tier per project; I want a private Borg to run my instances on—a way to achieve the economies-of-scale of pod packing, with merely OS-policy-level isolation between my workloads, because they're all my workloads anyway.

(Or, really, I'd rather just run my workloads directly on some scale-free multitenant k8s cluster that resembled Borg itself—giving me something resembling a PaaS, but with my own custom resource controllers running in it. Y'know, the k8s equivalent to BigTable.)

busterarm · on March 5, 2020

We run lots of small clusters in our projects and identical infrastructure/projects for each of our environments.

Multiple clusters lets us easily firewall off communication to compute instances running in our account based on the allocated IP ranges for our various clusters (all our traffic is default-deny and has to be whitelisted). Multiple clusters lets us have a separate cluster for untrusted workloads that have no secrets/privileges/service accounts with access to gcloud.

Starting in June our monthly bill is going to go up by thousands. All regional clusters.

jrockway · on March 5, 2020

Namespaces handle most of these issues. A NetworkPolicy can prevent pods within a namespace from initiating or receiving connections from other namespaces, forcing all traffic through an egress gateway (which can have a well-known IP address, but you probably want mTLS which the ingress gateway on the other side can validate; Istio automates this and I believe comes set up for free in GKE.) Namespaces also isolate pods from the control plane; just run the pod with a service account that is missing the permissions that worry you, or prevent communication with the API server.

GKE has the ability to run pods with gVisor, which prevents the pod from communicating with the host kernel, even maliciously. (I think they call these sandboxes.)

The only reason to use multiple clusters is if you want CPU isolation without the drawbacks of cgroups limits (i.e., awful 99%-ile latency when an app is being throttled), or you suspect bugs in the Linux kernel, gVisor, or the CNI. (Remember that you're in the cloud, and someone can easily have a hypervisor 0-day, and then you have no isolation from untrusted workloads.)

Cluster-scoped (non-namespaced) resources are also a problem, though not too prevalent.

Overall, the biggest problem I see with using multiple clusters is that you end up wasting a lot of resources because you can't pack pods as efficiently.

busterarm · on March 5, 2020

Aware of all of this, but we have a need to run things relatively identically in GKE/EKS/AKS and gVisor can't be run in EKS, for example.

We're okay with the waste as long as our software & deployment practices can treat any hosted Kubernetes service as essentially the same.

yoshiat · on March 5, 2020

Exactly!

https://cloud.google.com/kubernetes-engine/docs/best-practic...

sharms · on March 5, 2020

For those that didn't click through, I believe the parent is demonstrating that it is a best practice to have many clusters for a variety of reasons such as: "Create one cluster per project to reduce the risk of project-level configurations"

yoshiat · on March 5, 2020

For robust configuration yes. However one can certainly collapse/shrink if having multiple clusters is going to be a burden cost-wise and operation-wise. This best practices was modeled based on the most robust architecture.

busterarm · on March 5, 2020

This is it exactly.

Thank you.

iampims · on March 5, 2020

Namespace are not always well suited to hermetically isolate workloads.

jrockway · on March 5, 2020

It's probably not worth $75/month to prevent developer A's pod from interfering with developer B's pod due to an exploit in gVisor, the linux kernel, the hypervisor, or the CPU microcode. Those exploits do exist (remember Spectre and Meltdown), but probably aren't relevant to 99% of workloads.

Ultimately, all isolation has its limits. Traditional VMs suffer from hypervisor exploits. Dedicated machines suffer from network-level exploits (network card firmware bugs, ARP floods, malicious BGP "misconfigurations"), etc. You can spend an infinite amount of money while still not bringing the risk to zero, so you have to deploy your resources wisely.

Engineering is about balancing cost and benefit. It's not worth paying a team of CPU engineers to develop a new CPU for you because you're worried about Apache interfering with MySQL; the benefit is near-zero and the cost is astronomical. Similarly, it doesn't make sense to run the two applications in two separate Kubernetes clusters. It's going to cost you thousands of dollars a month in wasted CPUs sitting around, control plane costs, and management, while only protecting you against the very rare case of someone compromising Apache because they found a bug in MySQL that lets them escape the sandbox.

Meanwhile, people are sitting around writing IP whitelists for separate virtual machines because they haven't bothered to read the documentation for Istio or Linkerd which they get for free and actually adds security, observability, and protection against misconfiguration.

Everyone on Hacker News is that 1% with an uncommon workload and an unlimited budget, but 99% of people are going to have a more enjoyable experience by just sharing a pool of machines and enforcing policy at the Kubernetes level.

iampims · on March 5, 2020

It doesn't have to be malicious. File Descriptors aren't part of the isolation offered by cgroups, a misconfigured pod can exhaust FDs on the entire underlying Node and severely impact all other pods running on that node. Network isn't isolated either. You can saturate the network on a node by downloading large amount of data from maybe GCS/S3 and impact all pods on the node.

I agree with most things you’ve said around gVisor providing sufficient security, but it's not just about security, noisy neighbors are a big issue in large clusters.

alexeldeib · on March 5, 2020

IOPS and disk bandwidth aren't currently well protected either.

aaronblohowiak · on March 5, 2020

RLIMIT_NOFILE seems to limit FDs, or am i missing something?

AlphaSite · on March 5, 2020

CRDs can’t be safely namespaced atm, aiui.

rolleiflex · on March 4, 2020

We use Skaffold and it’s great. I’m talking about very minor unforeseen stuff that causes outages, not that we do it manually.

btmiller · on March 4, 2020

This is an interesting exchange if only for the thread developing instead of a single reply from a rep; it’s nice to see that level of engagement.

More importantly, this dialogue speaks volumes to Google’s stubbornness. Seth’s/Google’s position is: do it the Google way, sorry-not-sorry to all those that don’t fit into our model.

Like we haven’t heard of infrastructure as code? That can’t paper over basics like being unable to change your K8s cluster control plane. This is precisely the attitude that lands GCP as a distant #3 behind AWS and Azure.

busterarm · on March 5, 2020

Google stubbornly resists the idea that their platforms have actual users who depend on things not being broken for them constantly. It's cultural.

AWS has the complete opposite model.

te_chris · on March 4, 2020

Even still it’s not like it’s non-trivial to just bring up and drop clusters. Just setting up peering with cloud sql or https certs with GKE ingress can be fraught with timing issues that can torpedo the whole provisioning process.

mitsoz · on March 5, 2020

How is this any helpful? Are they supposed to implement everything in terraform or similar, is that your suggestion? Why don't you completely remove the editable UI then, if whoever is using it is doing it wrong. What a typical arrogant and out of touch with customers Google response.

varshithr · on March 5, 2020

It could be totally unrelated but having an option such as equivalent TF along with REST and CLI options could dramatically speed up the configuration process.

bluhbi · on March 6, 2020

Right now with k8s it is definit a 'ongoing maintenance'. We allocate around 0.5-2 pt per week on only doing that. If we would not do that, most of our stuff would be already outdated.

I know already too many people which are stuck at a certain k8s version. Do not allow that to happen!

yongjik · on March 4, 2020

> clusters should be treated like cattle, not pets.

Off-topic, but is this really how people do k8s these days? Years ago when I was at Google, each physical datacenter had at most several "clusters", which would have fifty thousand cores and run every job from every team. A single k8s cluster is already a task management system (with a lot of complexity), so what do people gain by having many clusters, other than more complexity?

slovenlyrobot · on March 4, 2020

The most common thing I've heard is "blast radius reduction", i.e. the general public are not yet smart enough to run large shared infrastructures. That seems something that should be obviously true.

People had exactly the same experiences with Mesos and OpenStack, but k8s has decent tooling for turning up many clusters, so there is an easy workaround

yongjik · on March 4, 2020

I still feel like that would only work in very niche cases.

I mean, if people aren't smart enough to run a large shared infrastructure, how can I trust them to run a large number of shared clusters, even if each cluster is small. The final scale is still the same.

iampims · on March 5, 2020

Updating 100 clusters bares less risk than updating a single giant one.

GauntletWizard · on March 4, 2020

And no SRE would allow you to run your application in a single cluster. Borg Cells were federated but not codependent - Google's biggest outages were due to the few components that did not sufficiently isolate clusters from one another.

Clusters are probably still pets to most orgs, but the lessons about how to manage complexity still apply. Each of my terraform state files is a pet and I treat it like such... but I also use change-control to assure that even though I don't regularly recreate it from scratch, I understand all that was there.

skboosh · on March 5, 2020

There are potentially quite a few benefits of being able to spin up clusters on demand [1]:

* Fully reproducible cluster builds and deployments.

* The type of cluster (can be) an implementation detail, making it easy to move between e.g Minikube, Kops, EKS, etc. After all, K8s is just a runtime.

* Developers can create temporary dev environments or replicas of other clusters

* Promote code through multiple environments from local Minikube clusters to cloud environments

* Version your applications and dependent infrastructure code together

* Simplify upgrades by launching a brand new cluster, migrating traffic and tearing the old one down (blue/green)

* Test in-place upgrades by launching a replica of an existing cluster to test the upgrade before repeating it in production

* Increase agility by making it easier to rearchitect your systems - if you have a pet, modifying the overall architecture can be painful

* Frequently test your disaster recovery processes as a by-product for no extra effort (sans data)

* Reduced blast radius

[1] https://docs.sugarkube.io/#benefits-of-sugarkube

shaklee3 · on March 5, 2020

I think for one, you cannot easily have Masters span regions without risk of them falling out of communication. Similarly the workers should be located nearby. If there's a counterexample to this I'd love to see it.

geerlingguy · on March 4, 2020

> clusters should be treated like cattle, not pets

Heh... how many teams actually treat their clusters like cattle, though? Every time I advocate automation around cluster management, people start complaining that "you don't have to do that anymore, we have Kubernetes!"

Some people get it, yes, but even of that group, few have the political will/strength to make sure that automation is set up on the cluster level—especially to a point where you could migrate running production workloads between clusters without a potentially large outage / maintenance window.

skboosh · on March 4, 2020

> clusters should be treated like cattle, not pets

Sugarkube is designed to do exactly that.

[1] https://docs.sugarkube.io

blazespin · on March 4, 2020

For any real production system you have to use terraform and their ilk to manage clusters, as you need to be spinning up and down dev/qa/prod clusters.

I don't know GCP though. In the past I've seen kube cluster archs which are very very fragile as they spin up. If that's the case with GCP I can see why you wouldn't do the above and rather hand hold their creation.

geerlingguy · on March 5, 2020

I would love if this happened in the real world, but for every well-architected automated cluster management setup I’ve seen using Terraform, Ansible, or even shell scripts and bubble gum, there are five that were hand-configured in the console and poorly (or not at all) documented, and might not be able to re-create without a substantial multi-day effort.

pm90 · on March 5, 2020

GKE makes it incredibly easy to spin up + tear down GKE clusters. UI/CLI/Terraform etc, all just work for 99% of the cases.

BossingAround · on March 4, 2020

> but clusters should be treated like cattle, not pets

Ha. They should, but they are absolutely not. Customers typically ask "why should we spend time on automating cluster deployment when we are going to do it just once?" and when I explain that it's for when the cluster goes away, if it goes away, they say it's an acceptable risk.

The truth of the matter is, even in some huge international companies, they don't have the resources to keep up with development of tools to have completely phoenix servers. They just want to write automation and have it work for the next 10 years, and that's definitely not the case.

p_l · on March 4, 2020

> This is great feedback, but clusters should be treated like cattle, not pets. I'd love to learn more about why your clusters must be so static.

Clusters often are not "cattle". If your operation is big enough, then yes, they might be. Usually they aren't, they are named systems and represent mostly static entity, even if the components of said entity change every hour.

Personally, I'm running in production a cluster that by now had witnessed upgrades from 1.3 to 1.15, in GKE, with some deployments running nearly unchanged since then.

Treating it as cattle makes no sense, especially since on API level, the clusters aren't volatile elements.

apple4ever · on March 5, 2020

I think our architects’ heads would explode if they were told we should treat them like cattle.

For us, Clusters are a promise to our developers. We can’t just spin up a new cluster because we feel like it. I must be missing something or maybe our culture is just different.

jmb12686 · on March 5, 2020

As an architect, I am currently working on our org's first cloud deployment initiative. Due to federal compliance / regulations, we have no write access in higher / production boundaries, and everything is automated via deployment jobs, IaC, etc. Given the experience of the teams involved, I took the opportunity (burden) of writing nearly all the automation. If your architects can't handle shooting sick cattle in prod, I'd say get new architects.

battery_cowboy · on March 5, 2020

> If your architects can't handle shooting sick cattle in prod, I'd say get new architects.

For every 1 competent person who can develop a solution to fully automate everything, there are 99 others who can automate most of that, maybe minus a cluster or DB or two, and another 500 whom cannot do either, but can run a CentOS box at a reasonable service level.

You experience using great tools and your vast knowledge of k8s each day to do all that, and you have the support of your org, but those other folks may not have the tools, support, knowledge, or sometimes even the capability to attain the knowledge to do that. That doesn't mean they're useless to anyone, to be cast off at will!

The type of thinking that leads to, "get new engineers/developers/designers/architects if yours aren't perfect" needs to die, and needs to be replaced with, "let's do what we can to train and support our current employees to do a great job" because, frankly, there aren't enough "superstar" people who have your skills to do that at every org.

We need to work on accepting people for who they are-- helping them to strive to be a bit better each day of course-- and utilizing those skills in the right place, rather than trying to make everyone the same person with the same skills doing the same things.

Some applications don't need clusters which can be rebuilt and destroyed at will, so let's not make that the bar for every project.

p_l · on March 5, 2020

That only means that the "named system" moves upwards, somewhere. It doesn't mean you don't have "pets".

P.S. I consider "pet vs cattle" to be a horrible metaphor that should be taken behind the barn and short like the diseased plague bearer it is.

gowld · on March 4, 2020

> I'm sorry that you feel trapped, that's not our intention.

Please don't do this. You can apologize for your actions work to improve in the future , but you cannot apologize for how someone feels as a result of your actions.

Also, intent doesn't matter unless you plan to change your behavior to undo or mitigate the unintended result.

patrec · on March 5, 2020

> clusters should be treated like cattle, not pets

So my understanding that the official k8s way to upgrade your cluster is also to throw it away and start a new one (with some cloud provider proprietary alternatives).

Let's say there is something actually important, stateful, single-source-of-truth in my k8s cluster, like a relational DB that must not lose data. I don't want downtime for readers or writers, and I want at least one synchronous slave at all times (since the data is important). I also don't want to eat non-trivial latency overheads from setting up layers of indirection.

What's the recommended way of doing this?

zekrioca · on March 5, 2020

In this case, one needs fault-tolerance. One way to achieve it is through replication, where an extra copy (or perhaps a reconstruction receipt) of your DB instance runs somewhere else. Usually DBs achieves this through transactions. Additionally, you can have distributed DBs, which then use distributed transactions for achieving so.

I am not expert, but K8s handles task replication, and either spawn or route a request to another task instance somewhere else. However, the application logic itself must handle the fault-tolerance (by handling its states through transactions or something else) should an instance fail. K8s doesn't do that for you.

patrec · on March 5, 2020

Distributed transactions are a non-starter – I already said I want to run master-slave with synchronous replication, which is basically what you want to do in >99% of cases where you have a DB with important stuff in it.

zekrioca · on March 5, 2020

It is not really about what you want, but about how to migrate the DB* to another location while also minimizing its downtime/slower-ness.

You need to instantiate a secondary DB replica somewhere else and start the DB migration. Since there will be "two instances" of the same DB running, you will also need to set up a (temporary) proxy for routing and handling the DB requests w/ something like this:

1) if the data being requested is already migrated, request is handled by the (new) secondary replica. 2) Primary instance handles the request, otherwise 2.1) Requested data should be migrated to secondary replica (asynchronously, but note that a repeated request may invalidate a migration).

Turn the proxy router off once the whole state of your primary DB instance is fully migrated, making the secondary replica the primary one. That's really just a napkin recipe for completing a live migration, though.

* We are now getting into the distributed transaction world because you can never be 100% sure that writing to 2 databases can succeed or fail at the same time. There is this talk from this guy who deals with similar problem you have: http://www.aviransplace.com/2015/12/15/safe-database-migrati...

patrec · on March 5, 2020

It's kind of revealing that there are zero replies to this, 10 hours later.

dserodio · on March 9, 2020

Thanks for replying to feedback Seth. Stuff like this - following the Google Maps API massive pricing increase, the G Suite pricing increase - is what makes me wary about building stuff on GCP: I'm afraid that Google will increase prices for stuff I rely on. AWS has made users expect pricing for cloud services to only go down.

rexreed · on March 4, 2020

Can someone explain to me the cattle vs. pets analogy? I'm not sure I get it.

randomdude402 · on March 4, 2020

"In the old way of doing things, we treat our servers like pets, for example Bob the mail server. If Bob goes down, it’s all hands on deck. The CEO can’t get his email and it’s the end of the world. In the new way, servers are numbered, like cattle in a herd. For example, www001 to www100. When one server goes down, it’s taken out back, shot, and replaced on the line."

http://cloudscaling.com/blog/cloud-computing/the-history-of-...

dashwav · on March 4, 2020

https://devops.stackexchange.com/questions/653/what-is-the-d...

This is a pretty good in depth explanation, but at a high level if a your server dies and you are extremely upset about it (similar to if your pet died) you are putting too many eggs in that single basket, with no secondary plan. Conversely if you build your infra in such a way that your server dying is something you see no worse than how a farmer sees one of his cattle dying (which are raised to be killed) - you are much better prepared for the inevitable downtime from your server and can very easily recover

rexreed · on March 12, 2020

I think a pets vs. corn would be a better analogy as it's a high value single instance vs. a commodity farm.

dward · on March 4, 2020

GKE can't offer financial backed SLOs without charging for the service. This is something that, I assume, significant customers want and that competitors already have:

https://aws.amazon.com/eks/sla/

outworlder · on March 4, 2020

Workers are not free and never were. So they were already charging.

sethvargo · on March 4, 2020

Correct, but the control plane nodes _were_ free and had no SLA. This changes that. [edit: spelling]

gowld · on March 4, 2020

_were_ free. (Emphasis yours.)

lilbobbytables · on March 4, 2020

> and the docs linked from the email are not updated yet.

That about sums up most things Google does for developers.

alasdair_ · on March 4, 2020

I thought the standard advice for Google stuff was "there are always two systems - the undocumented one, and the deprecated one"

nickbarnwell · on March 5, 2020

What I most frequently heard was "There are two solutions: the deprecated one, and the one that doesn't work yet."

andybak · on March 4, 2020

That's a wonderful quote that applies to many companies. (I think that will resonate with the Unity developer community right now.)

ones_and_zeros · on March 4, 2020

I agree the rollout is a little bumpy but I'm curious what workloads you are using k8s for where a $74/mo (or $300/mo) bill isn't a rounding error in your capex?

ssmw · on March 4, 2020

Think about any medium sized dev agency managing 3x environments for 20x customers. That's 50k/year out of the blue.

My problem is that this fee doesn't look very "cloud" friendly. Sure the folks with big clusters won't even notice it, but others will sweat it.

The appeal of cloud is that costs increase as you go, and flat rates are typically there to add predictability (see BigQuery flat rate). This fee does the opposite.

manigandham · on March 5, 2020

It's charged per-cluster. GKE encouraged (and was great for) running multiple clusters for all kinds of isolation and security reasons.

This cost increases rapidly for those scenarios.

oeoe · on March 4, 2020

$3600/year is significant for a startup on a shoestring budget.

dahfizz · on March 5, 2020

Then manage k8s yourself.

Or, better yet, don't use k8s. You don't need it, especially as a startup on a shoestring budget. You can migrate later if you decide you really need to, but just a plain LAMP gets you 99% of the way.

sheeshkebab · on March 5, 2020

But then you can’t put k8s on you resume for when said startup implodes.

Glyptodon · on March 5, 2020

If there were a lower complexity way to deploy containerized apps supported widely I think tons of people would go for it. Currently there's not really much of a middle ground between Cloud Run and K8s offered. It's kind of absurd, honestly.

stickfigure · on March 5, 2020

Google App Engine has been exactly this since 2008.

Glyptodon · on March 5, 2020

My impression of app engine is that you have to use all the cloud* services like SQL, cache, etc, which will make it significantly more expensive, even if it does that app layer fine. Is that wrong?

stickfigure · on March 5, 2020

It's wrong today. It was true in 2008, when GAE was Google's entire cloud offering (and there was no Docker or K8s).

Around the time "Google Cloud Platform" became a thing, Google changed GAE from an encapsulated bubble into a basic frontend management system that interacts with normal services through public APIs (either inside or outside GCP). It's more expensive than GCE, but it's fully managed and lets you skip the devops team.

nimish · on March 4, 2020

So ask for Google cloud for startups? One free cluster is enough to get started.

jorams · on March 4, 2020

> Google Cloud for Startups is designed to help companies that are backed by VCs, incubators, or accelerators, so it's less applicable for small businesses, services, consultancies, and dev shops.[1]

This makes it seem like Google Cloud for Startups is aimed at startups that aren't really on a shoestring budget.

[1]: https://cloud.google.com/developers/startups/

p_l · on March 4, 2020

Like every "special offer for startups", it's a vulture waiting for funding round to close.

Glyptodon · on March 5, 2020

My boss viewed it as the main way to deploy containerized systems offered by cloud providers and figured we could run most of our internal only things in it for a couple hundred a month - we don't really need the guarantees and scale, and he saw it as a way to avoid creating excess numbers of dedicated VMs, as cloud run isn't sufficient for our non-static stuff. This view up until now has actually been quite accurate because of the dedicated usage discounts.

So I guess the big question in my mind is how do you run containerized apps in the major clouds besides K8s if it's a bulldozer and you just need a cargo bike? Is there something simpler?

Aethelwulf · on March 5, 2020

https://en.wikipedia.org/wiki/Apache_Mesos

ickyforce · on March 6, 2020

E.g. https://aws.amazon.com/ecs/ - that was quite nice https://aws.amazon.com/fargate/ - haven't tried

YawningAngel · on March 5, 2020

You could consider the flexible app engine

a_imho · on March 5, 2020

How is it not a rounding error for Google?

aaronblohowiak · on March 5, 2020

I think you mean opex and not capex here.

cmhnn · on March 4, 2020

Sorry for technical tangent but curious. Your decision making on GCP appears to appeal to best of breed + cost. But you put SQL Server on AWS? If you are saying SQL Server is better on AWS than on Azure it would be interesting to learn why.

rolleiflex · on March 4, 2020

We need MySQL 8 because of window functions, which GCP does not offer. That is available on AWS.

cmhnn · on March 4, 2020

My bad. A clever marketing decision made me see the capital SQL as SQL Server since I am used to people saying Postgres, MySQL or SQL.

carterehsmith · on March 4, 2020

I see. Curious about the latency between your GCP apps and the database on AWS - is it like 1 ms or 100ms? Does it affect the product?

rolleiflex · on March 5, 2020

About 4ms for us. However, we chose our data centres on both ends very carefully. There are tables online you can find that for those pings, one such is here: https://medium.com/@sachinkagarwal/public-cloud-inter-region...

However this means we are paying for egress on both sides. This was something we chose to eat due to GCP Kubernetes, but considering today’s changes, it probably no longer makes sense.

vasco · on March 5, 2020

So you decide to eat egress costs in perpetuity, which will scale as you go, but a one time increase of $70 per month is enough to make you go back? What are you even trying to optimize for?

sheeshkebab · on March 5, 2020

sheesh, the lengths you guys went to build that monstrosity.

andrewmutz · on March 4, 2020

If this cost bothers you a great deal, why not just deploy a new cluster?

dcolkitt · on March 4, 2020

Hi Seth,

What about clusters that are used for lumpy work loads? Like data science pipelines? For example, our org has a few dozen clusters being used like that.

Each pipeline gets its own cluster instance as a way to enforce rough and ready isolation. Most of the times the clusters sit unused. To keep them alive we keep a small, cheap, preemptive node alive on the idle cluster. When a new batch of data comes in, we fire up kube jobs which then triggers GKE autoscaling that processes the workload.

This pricing change means we're looking at thousands of dollar more in billing per month. Without any tangible improvement in service. (The keepalive node hack only costs $5 a month per cluster.) We could consolidate the segmented cluster instances into a single cluster with separate namespaces, but that would also cost thousands in valuable developer time.

I don't know how common our use pattern is, but I think we would be a lot better served by a discounted management fee when the cluster is just being kept alive and not actually using any resources. At $0.01, maybe even $0.02, per hour we could justify it. But paying $0.10 to keep empty clusters alive is just egregious.

thockingoog · on March 4, 2020

Those empty clusters that you get for free cost Google money. Perhaps it never should have been free, because that skewed incentives towards models like this.

p_l · on March 4, 2020

Unfortunately, even if they switch to dynamically started clusters, the latency of spinning a new cluster is much higher than the latency of adding a bunch of preemptible nodes to existing node pool :/

yebyen · on March 4, 2020

Google are (were) not the only ones offering this free control plane model, though. My DigitalOcean DOk8s managed tend toward unstable if they are used with too small of node pools. (I don't know why that is, but it seems like a good way to make sure I pay attention to the workloads and also spend at least $20/mo for each cluster I run with them.)

It will be interesting in any case to see if DigitalOcean and Azure are going to follow suit! I'd be very surprised if they do, (but I've also been wrong before, recently too.)

zerotolerance · on March 5, 2020

The term is "loss leader." GKE provides the manager node, and cluster management so that we don't have to. And in exchange you sell more compute, storage, network, and app services. This is some ex-Oracle, "what can we do to meet growth objectives," "how can we tax the people who we own" thinking. They're customers, not assets Tim. Your cloud portability play should be the last project to jerk them around on.

twistedpair · on March 5, 2020

Keep in mind that GKE cluster management was paid in the original GKE. GCP only stopped billing for cluster management when EKS released free cluster management.

aphistic · on March 5, 2020

When did EKS release free cluster management?

zomglings · on March 4, 2020

On GKE, you can use a single cluster with multiple node pools to achieve a similar effect. Just set the right affinity on your job resources.

yoshiat · on March 5, 2020

PTAL at doing Multi-Tenancy in GKE!

https://cloud.google.com/kubernetes-engine/docs/best-practic...

We don't recommend using node pools for isolation.

fluuuhi · on March 5, 2020

If it is only workload isolation, why not?

yoshiat · on March 5, 2020

For secure isolation, we learned it's not sufficient. It's good for resource isolation though.

PTAL at https://www.youtube.com/watch?v=6rMGRvcjvKc

zomglings · on March 5, 2020

That guide looks nice. Have you guys thought about releasing a terraform module or even a cloud composer workflow that will set that up in a project?

yoshiat · on March 5, 2020

Thanks! We actually do and shipped together with the best practices.

https://github.com/GoogleCloudPlatform/gke-enterprise-mt

Please give us feedback there in case you hit any issue!

blazespin · on March 4, 2020

Yes, this is the general approach. However it unfortunately has security implications as you are putting MT workloads on a pool with access back to a shared control plane. Dealing with customer uploaded code is a nightmare.

rcarmo · on March 5, 2020

Here you go:

https://github.com/rcarmo/azure-k3s-cluster (this is an Azure template that I use precisely for testing that kind of workloads - spinning up one of these, master included, takes a couple of minutes at most).

(full disclosure: I work at Microsoft - Azure Kubernetes Service works fine, but I built the above because I wanted full control over scaling and a very very simple shared filesystem)

blazespin · on March 4, 2020

Likely this model is precisely why they are introducing this fee.

I guess they realized they couldn't make cluster management MT.

xur17 · on March 4, 2020

We currently spin up dev clusters with a single node. $73/mo is going to basically double the cost of all of these..

lvh · on March 4, 2020

This highlights a sorta-weird consequence of this pricing change: suddenly pricing incentivizes you to use namespacing instead of clusters for separating environments.

(As a security person: ugh.)

rolleiflex · on March 4, 2020

That’s interesting - I think you’re right. We might move our staging cluster into our main production deployment.

More likely though, AWS or OpenShift running on bare metal on a beefy ATX tower in the office. We want to have production and staging as close to each other as possible, so this is an additional reason and a p0 flag on reducing the dependency on Google-specific bits of Kubernetes as much as possible, hopefully also useful for our exit strategy as well.

lazyier · on March 4, 2020

Kubespray works well for me for setting up a bare bones kubernetes cluster for the lab.

I'll use helm to install metallb for the load balancer, which you can then tie into whatever egress controller you like to use.

For persistent storage a simple NFS server is the bees knees. Works very well and a NFS provisioned is a helm install. Very nice, especially, over 10GbE. Do NOT dismiss NFSv4. It's actually very nice for this sort of thing. I just use a small separate Linux box with software raid on it for that.

If you want to have the cluster self-host storage or need high availability then GlusterFS works great, but it's more overhead to manage.

Then you just use normal helm install routines to install and setup logging, dashboards, and all that.

Openshift is going to be a lot better for people who want to do multi-tenant stuff in a corporate enterprise environment. Like you have different teams of people, each with their own realm of responsibility. Openshift's UI and general approach is pretty good about allowing groups to self-manage without impacting one another. The additional security is a double edged. Fantastic if you need it, but annoying barrier to entry for users if you don't.

As far as AWS goes... EKS recently lowered their cost from 20 cents per hour to 10 cents. So costs for the cluster is on par with what Google is charging.

Azure doesn't charge for cluster management (yet), IIRC.

geerlingguy · on March 4, 2020

(replying to freedomben): NFS has worked fairly well for persistent file storage that doesn't require high performance for reads/writes (e.g. good for media storage for a website with a CDN fronting a lot of traffic, good for some kinds of other data storage). It would be a terrible solution for things like database storage or other high-performance needs (clustering and separate PVs with high IOPS storage would be better here).

lazyier · on March 4, 2020

It's good to have multiple options if you want to host databases in the cluster.

For example you could use NFS for 90% of the storage needs for logging and sharing files between pods. Then use local storage, FCOE, or iSCSI-backed PVs for databases.

If you are doing bare hardware and your requirements for latency are not too stringent then not hosting databases in the cluster is also a good approach. Just used dedicated systems.

If you can get state out of the cluster then that makes things easier.

All of this depends on a huge number of other factors, of course.

lazyier · on March 4, 2020

> Have you used NFS for persistent storage in prod much?

I think NFS is heavily underrated. It's a good match for things like hosting VM images on a cluster and for Kubernetes.

In the past I really wanted to use things iSCSI for hosting VM images and such things, but I've found that NFS is actually a lot faster for a lot of things. There are complications to NFS, of course, but they haven't caused me problems.

I would be happy to use it in production, and have recommended it, but it's not unconditional. It depends on a number of different factors.

The only problem with NFS is how do you manage the actual NFS infrastructure? How much experience does your org have with NFS? Do you already have a existing file storage solution in production you can expand and use that with Kubernetes?

Like if your organization already has a lot of servers running ZFS, then that is a nice thing to leverage for NFS persistent storage. Since you already have expertise in-house it would be a mistake not to take advantage of it. I wouldn't recommend this approach for people not already doing it, though.

If you can afford some sort of enterprise-grade storage appliance that takes care of dedupe, checksums, failovers, and all that happy stuff, then that's great. Use that and it'll solve your problems. Especially if there is some sort of NFS provisoner that Kubernetes supports.

The only place were I would say it's a 'Hard No' is if you have some sort of high scalability requirements. Like if you wanted to start some web hosting company or needed to have hundreds of nodes in a cluster. In that case then distributed file systems is what you need... Self-hosted storage aka "Hyper Converged Infrastructure". The cost and overhead of managing these things is then relative small to the size of the cluster and what you are trying to do.

It's scary to me to have a cluster self-host storage because storage can use a huge amount of ram and cpu at the worst times. You can go from a happy low-resource cluster, then a node fails or other component takes a shit, and then while everything is recovering and checksum'ng (and lord knows what) the resource usage goes through the roof right during a critical time. The 'perfect storm' scenarios.

rcarmo · on March 5, 2020

I use an SMB file share for my node pools - here's how to set up a non-managed cluster on Azure that does that: http://github.com/rcarmo/azure-k3s-cluster

freedomben · on March 4, 2020

Have you used NFS for persistent storage in prod much? I know people do it, but numerous solutions architects have cautioned against it.

yetanotherme · on March 4, 2020

My experience with NFS over the years has taught me to avoid it. Yes, it mostly works. And then every once a while you have a client that either panics or hangs. Despite the versions of Linux, BSD, Solaris, Windows changing over the decades. The server end is usually a lot more stable. But that's of little to no comfort to know that yes, other clients are fine.

However, if you can tolerate client side failure then go for it.

bluhbi · on March 4, 2020

What? Shouldn't you try to make the creation and deletion of your staging cluster cheap instead of moving it to somewhere else?

And if that is your central infrastructure, shouldn't it be worth the money?

I do get the issue with having cheap and beefy hardware somewhere else, i do that as well, but only for private. My hourly salary spending or wasting time on stuff like that costs the company more than just paying for an additional cluster with the same settings but perhaps with much less Nodes.

If more than one person is using it, the multiplication effects for suddenly unproductive people, is much higher. Also that decreases the per head cost.

mrbrowning · on March 4, 2020

I suspect I'm in the minority on this, but I would love for k8s to have hierarchical namespaces. As much as they add complexity, there are a lot of cases where they're just reifying complexity that's already there, like when deployments are namespaced by environment (e.g. "dev-{service}", "prod-{service}", etc.) and so the hierarchy is already present but flattened into an inaccessible string representation. There are other solutions to this, but they all seem to extract their cost in terms of more manual fleet management.

aludwin · on March 4, 2020

Hey - I'm a member of the multitenancy working group (wg-multitenancy). We're working on a project called the Hierarchical Namespace Controller (aka HNC - read about it at http://bit.ly/38YYhE0). This tries to add some hierarchical behaviour to K8s without actually modifying k/k, which means we're still forced to have unique names for all namespaces in a cluster - e.g., you still need dev-service and prod-service. But it does add a consistent way to talk about hierarchy, some nice integrations and builtin behaviours.

Do you want to mention anything more about what you're hoping to get out of hierarchy? Is it just a management tool, is it for access control, metering/observability, etc...?

Thanks, A

sah2ed · on March 4, 2020

Any reason why you put your link behind a URL shortener besides tracking number of clicks?

Since there are no character limits to worry about here unlike Twitter, better to put up the full URL so the community can decide for themselves if the domain linked to is worth clicking through or not.

aludwin · on March 4, 2020

Nope (other than that Google Docs URLs are looong), sorry.

Friendly docs: https://docs.google.com/document/d/1R4rwTweYBWYDTC9UC-qThaMk...

Code: https://github.com/kubernetes-sigs/multi-tenancy/tree/master...

mrbrowning · on March 5, 2020

Hey, thanks for asking! My interests in it are primarily for quota management -- in my experience, this is inevitably a hierarchical concern, in that you frequently run into the case of wanting to allot a certain cluster-wide quota to a large organizational unit, and similarly subdivide that quota between smaller organizational subunits. Being able to model that hierarchy with namespaces localizes changes more effectively: if you want to increase the larger unit's quota in a flat namespace world, for example, there's no way to talk about that unit's quota except as the sum of all of its constituent namespace quotas.

aludwin · on March 8, 2020

Thanks! We're not currently planning on implementing a hierarchical resource quota in HNC, but HNC is trying to define a definition of hierarchy that could certainly be used to create a HRQ. Give me a shout if you're interested in contributing.

yoshiat · on March 4, 2020

Yes, namespace alone isn't sufficient for isolation. Would you be able to look at our latest Multi-Tenancy best practices?

https://cloud.google.com/kubernetes-engine/docs/best-practic...

It's a living product which comes with Terraform modules. We introduced various features to enable doing Multi-Tenancy as well (and more on their way!)

aaronblohowiak · on March 5, 2020

I’m sorry if i am reading it wrong, but this guide to multi-tenancy seems to suggest not being multi-tennant and instead running a separate cluster per project. This seems more like scaling single-tenancy than multi-tenant (no bin packing oppty for instance.) or did i read it wrong?

yoshiat · on March 5, 2020

Sorry if it was confusing. You need to read into more about how to set up in-cluster multi-tenancy.

We do recommend robust configurations for production setup (e.g. dev, staging and production) however you can certainly squash and skip it if not necessary.

Thanks for the feedback though. We'll consider adding such notes explicitly.

aaronblohowiak · on March 5, 2020

>You need to read into more about how to set up in-cluster multi-tenancy.

I am trying to do that. Where would you suggest? Throughout the comments on this post, when people suggest namespace, pod or node level separation you ask them to PTAL and read the link which suggests the single-tenant cluster-per-project approach (that is under the Multi-tenant cluster, confusingly.) The link you sent talks about cluster-per-project, which is not multi-tenancy as I understand it. Perhaps a different name would be less confusing (robust federated cluster administration?)

yoshiat · on March 6, 2020

"This guide provides best practices to safely and efficiently set up multiple multi-tenant clusters for an enterprise organization."

This "multiple" multi-tenant clusters part isn't coming through. Please do jump into "Securing the cluster" section to cut corners and learn what to do in a single cluster. We're fixing the sections to avoid the confusions. Thanks for the feedback!

https://cloud.google.com/kubernetes-engine/docs/best-practic...

lokar · on March 4, 2020

You can dedicated nodes by namespace, at which point the isolation is pretty strong.

yoshiat · on March 4, 2020

Nit: we don't recommend dedicated nodes for isolation. PTAL at https://www.youtube.com/watch?v=6rMGRvcjvKc And the guidance from GKE is at https://cloud.google.com/kubernetes-engine/docs/best-practic...

dharmab · on March 4, 2020

* Assuming you also configure strong RBAC, network isolation and don't let persistent volumes cross-talk

sethvargo · on March 4, 2020

As also a security person (:wave:), you can use dedicated node pools and workload identity to isolate workloads in the same cluster.

lvh · on March 4, 2020

Workload identity is a GCP-specific beta feature for mapping to GCP IAM, right?

yoshiat · on March 5, 2020

Yes and it's going GA.

https://cloud.google.com/blog/products/containers-kubernetes...

https://cloud.google.com/kubernetes-engine/docs/best-practic...

bluhbi · on March 4, 2020

blazespin · on March 4, 2020

Or move to minikube and friends. If it's a dev environment you can usually get away with such things.

atombender · on March 4, 2020

Kubernetes consumes a lot of CPU even when idle, due to the polling design, which makes Minikube is a really poor fit for developer machines. It's well known [1] to sit there eating 20-30% CPU, draining your battery and frustrating your life while doing absolutely nothing. This applies to all the Kubernetes distributions, including Kind and Docker Desktop. Not sure if the same applies to K3s, though.

[1] https://github.com/kubernetes/minikube/issues/3207, https://github.com/docker/for-mac/issues/3065, https://github.com/docker/for-mac/issues/3539, etc; there must be dozens and dozens of these

rcarmo · on March 5, 2020

It does. My k3s setups (like https://github.com/rcarmo/azure-k3s-cluster) take up nearly 100% of the puny master node I allocate to them, and kill my Raspberry Pi SD cards as well.

Swarm, in comparison, is much friendlier (and you can use it for dev/test across multiple machines just fine)

outworlder · on March 4, 2020

Assuming you can do that, and your system is not using namespacing for its own purposes.

aludwin · on March 4, 2020

I know kubeflow can use namespaces for its own purposes, but otherwise I thought that was quite rare. Namespaces are intended to be used for exactly this usecase (isolating teams and/or workloads).

What kind of system have you seen where this isn't true?

lvh · on March 5, 2020

We've seen plenty of examples where people do this. Sometimes it's different teams (e.g. the ML team is namespaced away from the primary customer flow) and sometimes it's for different customers. Really, it sounds like we're in agreement about why this happens, the confusion is just whether or not that normally happens within a company in the course of doing business?

aludwin · on March 5, 2020

Gotcha. If you're looking for subnamespacing, HNC offers self-service subnamespace creation, have you looked into that? https://github.com/kubernetes-sigs/multi-tenancy/tree/master...

(My apologies if we've chatted about this before in another venue, I'm losing track of whom I've already talked to)

bboreham · on March 5, 2020

I use namespaces to isolate names. So I can have a service called memcached in namespace A and also one in namespace B.

bluhbi · on March 4, 2020

Its still billed by the minute. If you run your dev clusters all the time 24x7 then they apparently are critical enough.

dahfizz · on March 4, 2020

For a dev environment, why not host your own hardware? Especially if cost is a concern, it seems like a no brainer.

rad_gruchalski · on March 4, 2020

Generally curious, isn’t Docker Kubernetes an option?

xur17 · on March 4, 2020

It is - especially on OSX, it is very cpu and memory intensive thought.

halbritt · on March 4, 2020

> There's also one completely free zonal cluster for hobbyist projects.

Nice.

tmpz22 · on March 4, 2020

Too many people drank the cloud kool-aid. The move from day one was to create provider agnostic cloud architectures and repent the use of provider-specific services.

That said they do make it damn hard. Our k8s cluster is as basic as it comes, no databases, simple deployments, but we do still have a dependency on Google Cloud Loadbalancer (which we hate).

If pricing goes up too much from this we'll move, but the GCL dependency will be a PITA :/

rolleiflex · on March 4, 2020

We’re in the same situation — we’ve engineered for minimum provider-specific dependencies but GKE LoadBalancers were where they got us via arm twisting as well. There is no way to expose a cluster to the outside world in a production environment otherwise.

lvh · on March 4, 2020

It's kind of ridiculous internal load balancers can't get automatic certs. We've had to do a stupid dance just to get certs via the LE DNS challenge out of band, and then regularly install them on internal LBs.

tmpz22 · on March 4, 2020

Same! I still manually provision some certificates just because LEGO/etc. just don't work with GCP + Google Cloud Load balancer! And the docs for the entire subject are useless..

rolleiflex · on March 4, 2020

We paid for long-lasting wildcard certs because of that. Which Apple killed a few days ago. It’s going to be fun when they are close to expiry.

hitpointdrew · on March 5, 2020

Maybe I don't understand your problem, but can't you just use Traefik (https://docs.traefik.io/user-guides/crd-acme/). It will get certs from letsencrypt for you.

jrockway · on March 5, 2020

TCP coming into your cluster means that you practically have to go through kube-proxy (because the load balancer and the Kubernetes scheduler aren't perfectly synchronized) and that the load-balancer can't balance per-request, only per-connection. If the load balancer terminates TLS, then it can just watch cluster endpoints and automatically route to the right node without any extra hop through kube-proxy, and it can also split large individual requests out of HTTP/2 and GRPC streams.

I'm guessing 99% of workloads won't notice either of these issues, but it is an actual issue.

lvh · on March 5, 2020

Key word: "internal" -- these aren't on the internet, Traefik does ALPN, which means the LB itself has to be on the Internet. (Or something else that leaks the cert to the LB, but that doesn't sound any less complicated than using the DNS challenge.)

zomglings · on March 4, 2020

Does cert-manager not for your needs?

https://github.com/jetstack/cert-manager

tmpz22 · on March 4, 2020

> As this project is pre-1.0, we do not currently offer strong guarantees around our API stability.

Notably, we may choose to make breaking changes to our API specification (i.e. the Issuer, ClusterIssuer and Certificate resources) in new minor releases.

yebyen · on March 4, 2020

In practice, the cert-manager team has made breaking changes in probably close to 1/3 of minor releases (which is really fine pre 1.0, IMHO), there has been comprehensive guidance to lead users or cluster admins through upgrading, that walks through exactly what steps are needed, and followed well does not interrupt your cluster's service in any way.

It's not dark magic, it might make building off of it in the form of integrations prohibitive, but they have done a great job making sure users can upgrade one release to the next.

It is a little bit of a treadmill, but it certainly beats manually renewing certificates!

tmpz22 · on March 4, 2020

Do you also have occasional outages because the load balancer gets into a confused state and changes take 10+ minutes to propagate with no re-course other then than to destroy and re-create the entire resource?

uberduper · on March 4, 2020

There are ways to expose your cluster to public and/or run your own load balancers on GKE (or any other cloud k8s deployment).

manigandham · on March 5, 2020

Load balancers are created by the pre-installed controller that each cloud provides to let external traffic reach the nodes. You don't have to use it.

It's no different than running your own load balancer like HAProxy pointed at the nodes which forward to a node-port service.

There's also MetalLB if you're running your own hardware: https://metallb.universe.tf/

pojzon · on March 4, 2020

How about managing own k8s running on VMs / bare-metal?

Pretty much anyone who works in ops longer understood from the go that its impossible to be totally provider-agnostic. K8S is just a nice api on top of provider api that still requires provider specific configuration.

freedomben · on March 4, 2020

Disclaimer: I work for Red Hat and am very biased, but this is my own honest opinion.

If you're going to run on bare-metal or in your own VMs, OpenShift is very much worth a look. There are hundreds, maybe thousands of ways to shoot yourself in the foot, and OpenShift puts up guard rails for you (which you can bypass if you want to). OpenShift 4 runs on top of RHCOS which makes node management much simpler, and allows you to scale nodes quickly and easily. Works on bare metal or in the cloud (or both, but make sure you have super low latency between data centers if you are going to do that). It's also pretty valuable to be able to call Red Hat support if something goes wrong. (I still shake my head over the number of days I spent debugging arcane networking issues on EKS before moving to OpenShift, which would have paid for a year or more of support just by itself).

JeremyNT · on March 5, 2020

So you move from vendor lock in with the cloud provider, to vendor lock in with an expensive, proprietary* IBM k8s distribution with its strange nonstandard opinions about workflows that you have to manage yourself?

Don't get me wrong, I appreciate RHAT's code contributions very much, they have done a lot for k8s! But running OKD on one's own is a bad idea, while paying for IBM support makes you as much a hostage as anything Google will do to you. Better to just stick with a distribution with better community support and wait for RHAT's useful innovations to be merged upstream (while avoiding the pitfalls of their failed experiments...)

* Yes it's open source, but the community version (okd) isn't really supported, nor is it widely used, so if you're serious about running this you're doing so for the Enterprise support and you're going to be writing those checks

freedomben · on March 5, 2020

Thanks for the edits (and acknowledging our contributions). I wasn't sure if you were just trolling or not before, so I didn't want to engage.

Your concern is valid, and I agree with you that OKD is not supported enough. I have my own theories as to why, but I will keep my criticism "in the family" (but do know there are people that want to see OKD be a first-class citizen, and know we are falling short right now). We had some challenges supporting OKD 4.x because the masters in 4.x now require Red Hat CoreOS (and nodes it is highly recommended), but RHCOS was not yet freely available. This is obvoiusly a big problem. Now that Fedora CoreOS is out, there is a freely distributable host OS on which to build OKD, so it will be better supported and usable. FWIW I have a personal goal to have a production-ish OKD cluster running for myself by end of the year.

I'll admit I am a little offended at being called a "proprietary IBM K8s distribution," but I don't think you meant to be offensive. IBM has nothing to do with OpenShift, beyond the fact that they are becoming customers of it. Every bit of our OpenShift code is open source and available. You are right that it's not in a production-usable state (although there are people using it) but it's a lot better than you'll get from other vendors. We are at least trying to get it to a usable state, unlike many of them. We are strapped for resources like everyone else, and K8s runs a mile a minute and requires significant effort to stay ahead). This space is still really young and really hot, and I am confident we'll get the open source in a good, usable, state, much like Fedora and CentOS are. I also don't think OpenShift is really all that expensive considering the enormous value it provides. The value really does shine at scale, and may not be there for smaller shops.

I don't blame you for waiting, I probably would too. Our current offering is made for enterprise scale, so isn't tailored to everyone. I've heard OpenShift online has gotten better, but haven't tried it myself. Eventually I plan to run all my personal apps on OKD (I have maybe a dozen, mostly with the only users being me and my family), but until then I've been using podman pods with systemd, which will be trivial to port to OKD once it's in a good state.

JeremyNT · on March 5, 2020

Yes, I realize I will come across as being overly negative here, and I apologize for this.

It's not that openshift is bad per se, I just don't imagine it solves many problems an org that is fretting about lock in or gcp pricing will have. Such an org is probably cost sensitive and looking for flexibility, but openshift is expensive, and if you adopt its differentiating features you are de facto locking yourself in. And if you do not leverage those features out of a desire to avoid lock in, you are effectively paying a whole lot just for k8s support...

And I really should say, for certain orgs (especially bigcos) this may well be worth it, I just don't think it is a good option for anybody worried primarily about avoiding vendor lock in and keeping costs in check.

shaklee3 · on March 5, 2020

Honestly, the fact that it requires rhel or centos was enough to make it not feasible for us. I wish that would change, since I can't think of any reason the distribution should affect openshift.

freedomben · on March 5, 2020

There are several reasons, many of them are that the nodes themselves are managed by OpenShift operators. If you run (as cluster admin) `oc get clusteroperators` you'll see plenty that are for hosts, such as an upgrader and a tuner. If the operators had to be distro agnostic it would be a support nightmare, and we wouldn't be able to do it. With RHCOS (immutable OS) we also have enough guarantees to safely upgrade systems without human intervention. Can you imagine doing that in an Enterprise environment while trying to support multiple distributions? I can't.

shaklee3 · on March 5, 2020

Can you describe what kind of tuning the rhel tuning tools do that are not available using the normal kernel constructs? Last I checked tuna and others did everything you could do in Ubuntu, but without knowing the guts of the system.

Again, I think the idea of OS is great, but you've lost us, and likely other big customers because of that restriction. Having old kernels is just not an option for some people.

freedomben · on March 5, 2020

I'm not informed enough to tell you what the tuning tools do, so I'll dodge that question. But "[h]aving old kernels is just not an option for some people" is exactly the type of problem this solves. You literally don't have to know or care what kernel your node runs, because it doesn't matter! The OS is a very thin layer underneath K8s, a layer which is entirely managed by applications running as pods (supervised by an operator) on the system. Whatever apps/daemons/services you need to run move to pods on OpenShift. If you need to manage the node itself there is an API for it. If you truly need underlying access, then this is not for you, but you'd be amazed at how many people (myself included) started out balking at this and thought "no way, for compliance we need <tool>" but after re-thinking the system realized you really don't. By "complicating" the system with immutable layers, we actually simplify the system. It was much like learning functional programming to me. By "complicating" programming by taking away stuff (like global variables, side-effects, etc) it actually simplified it and reduced bugs by a huge margin.

If you are like me and are old school and think "huh, yeah that makes me nervous" I completely understand that, but we've seen some serious success with it. I'm a skeptical person, and telling me I can't SSH to my node freaks me out a bit, but I'm becoming a convert.

I would also note if you buy OpenShift you get the infrastructure nodes (masters, and some workers for running openshift operator pods) for free (typically, but I'm not a salesperson so don't hold me to that if I've misspoke :-P), so you aren't paying for the super locked in OS. I suppose you do have to pay for RHEL8 or RHCOS on the worker nodes running your pods, and we don't support other distros (because we expect a very specific selinux config, CRI-O config (container runtime), among other things), so I guess there's some dependence there, although I recommend RHCOS for all your nodes and then just use the Machine API if you need it.

merb · on March 5, 2020

btw. I would never run OpenShift after the CoreOS debacle. This was a really sketchy move and still is.

Yes RH did a lot for k8s, but killing of a working distribution without a direct migration path that is like "start again". will make your customers angry.

also I think the OpenShift terminology is way too much and OpenShift should be a way more thinner layer on top of k8s.

freedomben · on March 5, 2020

I agree, the CoreOS thing went down grossly. It was a technical nightmare tho. They deeply merged CoreOS and Fedora/RHEL and created a hybrid animal. Creating an upgrade path would have been an insane challenge, and in the end the advice would have been to rebuild anyway to avoid unforeseen issues. They could have left CoreOS repos and stuff up tho and given a longer transition period.

rcarmo · on March 5, 2020

I work for Microsoft and I quite like OpenShift (although we have AKS as the default, "vanilla" managed k8s service, you can also run OpenShift on Azure).

Sure it's opinionated, but at this point, which flavor of k8s isn't? Even k3s (which I play around with on https://github.com/rcarmo/azure-k3s-cluster) has its quirks.

Everyone who's targeting the enterprise market has a "value-added" brew of k8s at this point, so kudos to RH for the engineering effort :)

stas2k · on March 4, 2020

Don't want to sound snarky, but how about an upgrade path from 3.11 to 4.x? I am a heavy Openshift user and it seems that RH just dumped whatever architecture they had with pre-4 clusters and switched to a Tectonic-like 4.x installations without any way to upgrade other than a new installation. This makes it hard to migrate with physical nodes.

freedomben · on March 5, 2020

Not snarky at all, you are more correct than you may realize. The upgrade is a challenge because we move from RHEL 7 to Red Hat CoreOS 8 as the host OS, as well move all the OpenShift code to operators instead of in the binary. OpenShift itself can now also manage the nodes (thanks to RHCOS), as well as fully self-upgrade. For container runtime we move from Docker to CRI-O (for a number of reasons, high on the list is security). It's a massive overhaul which is really more akin to a brand new product than a major version. I generally can't stand major overhauls because most of the time they don't give you much new and often bring regressions. However, this really was a tremendous improvement, the fruits of which are not even fully realized.

Because of that major change, the cluster upgrade path is a little more involved than usual. It's a complete reinstall of the OS and rebuild of the cluster. There are tools to help tho, and as the path is tread things will get easier and better supported. If you go through the same wave as me, you'll be annoyed at first but then once it is done and you have a 4.x cluster you'll be really happy with it (especially when you can manage everything as an operator).

Luckily from an app perspective very little will change since Kubernetes is the API. A sibling comment linked to some helpful documentation. I can't be specific right now but I can tell you that your need is known, and some very smart people are working on it. If you want to email me (check my HN bio page) or jump in the Keybase group called "openshift_okd," I'm happy to chat more about it (not in an official Red Hat support capacity, just as friend to friend :-) ). I haven't done a migration myself yet but I know people who have, and I plan to get into it personally soon as well.

bgracely · on March 5, 2020

There is a free tool to help with migrations from OpenShift 3.x to 4.x - https://access.redhat.com/documentation/en-us/openshift_cont...

It's not a rolling upgrade, but it allows you to move the apps, PVs, policies in as granular a manner as you wish.

bcheung · on March 4, 2020

I'm running my own on bare metal dedicated servers. You will need to install a few extra things (MetalLB for LoadBalancer, CertManager for SSL, an ingress controller (nginx, Ambassador, Gloo), and one of the CSI plugins for your preferred storage method). It is extra work but as a personal cluster for hobby work, I'm paying $65/mo total for the cluster. Same specs would probably be $1000/mo at a public cloud provider.

theptip · on March 5, 2020

MetalLB looks fun, hadn't seen that one.

If you want something production-grade (i.e. doesn't say "beta" on the tin) then I think Calico should solve most of the same problems too (it does BGP peering to your ToR switch):

https://docs.projectcalico.org/networking/determine-best-net...

Does MetalLB do something extra I'm missing?

shaklee3 · on March 5, 2020

Metallb had more features than calico before calico had the external service advertisement feature. Now that they do, services can use ecmp load balancers just as metallb does.

hitpointdrew · on March 5, 2020

Traefik can also automatically get letsencrypt certs, if you don't want to use CertManager. Traefik gives the added benefit of also doing Ingress.

takeda · on March 4, 2020

From what I've seen looks like managing k8s on your own often ends up requiring a dedicated team to keep with their insane release cycle.

freedomben · on March 4, 2020

Can confirm. Depending on your cluster size you will need at least 2 dedicated people on the "Kubernetes" team. You'll probably also end up rolling-your-own deployment tools because K8s API is a little overwhelming for most devs.

pojzon · on March 5, 2020

To be honest we are heavly using EKS and AKS in other teams and each of those teams has a dedicated devops subteam to help them not only with k8s but also other infrastructure because bare k8s is pretty useless for business.

So either way you end up in a situation where you require dedicated devops team pr dedicated teammembers to keep up with changing requirements.

mleonhard · on March 4, 2020

I started learning Kubernetes and was overwhelmed. The biggest problem was the missing docs. I filed a Github issue asking for missing Kubernetes YAML docs:

https://github.com/kubernetes/website/issues/19139

Google will ignore it like all of the tickets I file. The fact is that Google is in the business of making money and they are focused on enterprise users. Enterprise users are not sensitive to integration difficulty since they can just throw people at any problems. So eventually everything Google makes will become extremely time-consuming to learn and difficult to use. They're becoming another Oracle.

jrockway · on March 5, 2020

"kubectl explain deployment"

apple4ever · on March 5, 2020

Confirmed here too. I don’t think management realized the amount of toil that Kube requests to stay up to date.

jrockway · on March 4, 2020

The big problem with running your own cluster is the extra machines you need for a high-availability control plane, which is expensive. That is why Amazon and now Google feel like they can charge for this; you can't really do it any better yourself.

Glyptodon · on March 4, 2020

Doing load balance, DNS, and egress has been way uglier in the Google Cloud K8s than I expected. Pushes projects towards doing it themselves in cluster IMO.

endymi0n · on March 4, 2020

Interesting. As a mid level GCP customer, it won't make a big dent on our bill specifically, but in the end, I'm not sure this pricing move is a smart strategy.

With this fixed fee model, the change will barely make a difference (== Google revenue) for the large customers who can spare the money, but will create a significant entry barrier to that side project / super-early stage that considers getting hooked on GCP, specifically GKE.

Then again, not my decision to make.

bluhbi · on March 4, 2020

Thats for me the most frustrating thing with GCP, AWS and Azure. I would never use them as a very early small 3 people startup or for private reasons.

There is no billing protection (which could make you very poor very fast) and every service has a certain cost and quality which is just not feasable in the beginning.

Even GKE with its free kubernetes master does block a lot of resources on the nodes: https://cloud.google.com/kubernetes-engine/docs/concepts/clu...

Also a ton of great features on gke you will probably never use if you are too small. It is so much cheaper to just get cheap hardware somewhere and put your own k8s onto it if you have more time then money.

Even on Digital Ocean you have the load balancer problem: you need to use the provided and also 'costly' LoadBalancer service. There is only one hacky way to prevent it by exposing your ingress on the host and mapping that one ip but then you loose all the self healing stuff and loadbalancing capability.

ak217 · on March 5, 2020

Both AWS and Google offer free tier products and pay-for-what-you-use products. Reserved instance pricing starts at around $25/year. Many other incredibly useful products (S3, Lambda, VPC, etc.) are free with an instance or start at $0.

You can set billing alerts that will project your monthly budget every hour, and send you an alert when it's projected to be exceeded.

IMHO your claim (that there is an entry cost barrier) is the opposite of reality. AWS and Google have brought incredible power and choice to developers starting at zero initial cost.

fluuuhi · on March 5, 2020

My main concern is, that i can't define an upper limit. My billing alert is nice and im aware of it, but it doesn't help you if someone takes over your account, mines bitcoins on expensive machines and a day later you read your email.

ak217 · on March 5, 2020

AWS refunds you when that happens.

The fundamental issue with setting a limit is it's technically infeasible to decide what to do when it's exceeded. They have no way of knowing what assets to terminate. The way to avoid what you describe is to shut off access to APIs that you don't want to use, and keep your credentials safe.