Hacker News new | past | comments | ask | show | jobs | submit login

Typically, you would have a set of "helm charts" (packages) for your application(s). So deploying, without data, would be something like a sequence of "helm install app-appId --values /path/to/config/for/environment.yaml"

At this stage, you have no data. Normally, if you app depends on data, the pods will keep failing until data has been moved and is available (at which point they stabilize in an equilibrium state).

If you run Ceph in both environments, then you may want to use Ceph replication. If you use another storage layer, then it's also up to you to make sure the data is moved around.

rook should have support for that use case. However, it is still alpha as per the GitHub readme, so use at your own risk. However, you may want to consider the data replication problem outside of the scope of k8s.

If the 2 sets of clusters are physically close to each other, you may want to just point the new apps to the old data and pull the switch from the first one.

Another option would be to run another beta feature in K8s called Federation, which allows to manage several cluster via a single control plane.

Sorry for the long comment, your question has a broad scope, and it's hard to answer without diving into details.




Thanks for your explanation. Looks like k8s is designed by Google with Google in mind - where data is somewhere over network and main problem is to manage code. I just thought it could be perfect solution to completely abstract and isolate such small services and move them around on local hardware with one push of a button, but it seems I'll have to think about external ways to manage data replication.


So, your thought process has apparently led you quickly to the same conundrum that I have found myself in, after experimenting with this stuff for several generations.

You might be interested in Deis v1 PAAS as a historical reference. Deis is a company that specializes in Kubernetes (was just bought by Microsoft). They have been in the container orchestration game since before Kubernetes was a kid (and even before containers were really en vogue.) Deis v1 PAAS is the ancestor of Deis Workflow (or v2) which is a product that runs solely on Kubernetes.

Workflow does not do distributed filesystems internally where PAAS v1 did. That is why I'm telling you about it. PAAS v1 had its own storage layer called deis-store, which is (was) essentially CephFS and RBD under the hood. They did the best they could to make sure you did not have to be a competent Ceph admin just to get it started, but as it happens you would be running Ceph and susceptible to all of the Ceph issues.

Distributed filesystems are complicated business.

Deis was running Ceph for internal purposes. Deis used the Store component to take care of log aggregation ("Logger"), container image storage ("Registry"), and Database backups. When Workflow was released, it was targeting Kubernetes and required PVC support (AWS S3, or GCE PD, or one of the other storage drivers).

It still handles Log Aggregation, Database Backups, and Image Storage, but it uses the platform services to do this in an agnostic way (that is, whatever way you have configured to enable PVC support in Kubernetes.)

The Ceph support provided by Deis v1 was never intended to be an end-user service, it was for internal platform support. I thought about using it for my own purposes but never got around to it. The punchline is this: porting your applications to Deis requires you to re-think the way they are built to support 12factor ideology. Porting your applications to Kubernetes requires no such thing... but it helps!

Also that distributed storage is a complicated problem, and if you undertake to solve it for yourself, you should not take it lightly. (OR do take it lightly, but with the understanding that you haven't given much rigor to that part.)

What was good advice for Deis v1 is still good advice for Kubernetes today. If you are building a cluster or distributed architecture to scale, you should really consider separating it into tiers or planes. In Deis v1, the advice was to have a control plane (etcd, database), storage plane (deis-store or Ceph), data plane (your application / worker nodes), and routing mesh plane (deis-router, traefik, or the front-end HTTP serving nodes.) All of those planes may require special attention to make them reliable and scalable.

In my opinion none of this has anything to do with AWS or Google, but those two providers have positioned themselves well to be the people that do work on solving those hard problems for you. I would certainly start experimenting with Rook, I had good experiences with deis-store and I've been looking for something to fill the void for me.


Thanks, I know that distributed storage is hard, that's why I would be ok if K8s could just work with something like docker compose volumes on local storage and copy them between servers if needed.


It would be cool if Kubernetes had a native distributed filesystem. I don't read the future roadmaps but I wouldn't be too surprised to see it coming in a future release.

The first thing that anyone doing serious deployments needs is an image registry. For that to be HA hosted on a cluster, you need some kind of distributed filesystem.

But those PD/EBS solutions are pretty compelling and they're not going away.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: