Hi! I’m David Aronchick, PM on Kubeflow, I’m happy to answer any questions! I was one of the early PMs on Kubernetes, and we very much want to make this a community project, so please join us in thinking about what’s next!
NOTE: The name "flow" does not refer directly to TensorFlow; if anything it's a nod at all the river themes that pop up in the ML community (eg FBLearner Flow)
Thank you! We still love Google Cloud ML engine - it's perfect for those who want to run in the cloud and want a layer of abstraction. This is for people who want portable stacks and a bit more control; and/or want to use their Kubernetes deployments (particularly on-premise or multi-purpose).
Off the top of my head, maybe a maintained "ml-engine aligned" kubeflow setup, to the extent that's possible.
The use case I'm think of is an ml dev team building on kubeflow and proving a system. Then wanting to transfer it to a non-engineering team, yet wash their hands of any ongoing infrastructure ops responsibility.
Knowing that a "ml-engine aligned" kubeflow config would transfer cleanly (including associated bells and whistles) would make that a much more attractive option.
Caveat: I'll admit I'm not keeping up on what's in the managed offering, but I'm assuming there are a number of value-adds of the type that end users like (visualizations, etc).
Yes, this is EXACTLY what we're trying to do! However, it's a bit early, so I can't say when or where we'll be able to get to it. Also, I should be clear, though I'm from Google, we would really like the same story to work with other cloud's hosted offerings as well, but we'll need their support to do so!
Interesting, though I don't see how it is better than a plain docker image over kubernetes? Not much of a hassle now too. And how is it different from what DL4J is already doing with Zeppelin and supporting both Keras, TF, MXNet and PyTorch on the way?
> ...how [is it] better than a plain docker image over kubernetes?
Scalability for people with existing on-premise (or cloud based), kubernetes workflows, especially once it comes to training or heavy crunching.
That's not to say that Docker Machine/Swarm/Compose couldn't handle the same, but it's an extra step for kubernetes users and pushes people onto a slightly different toolchain than minikube->K8s.
Correct! Many folks have more complicated deployments in the cloud, and we're trying to align (as close as humanly possible) your on-prem stack with your cloud stack, to minimize the pain in migration.
If you have a single container, and a simple pipeline, this may be a bit more than you need. We've just found that there are normally 5 or more services/systems that people wire together to create an ML stack, and that's what we're trying to solve for/simplify.
Correct! We're currently thinking a lot about orchestration of the various components but for now, our goal is to use the native loose coupling between services available in K8s. So if you wanted Spark for data processing, for example, you could start a service, and the deployment, and feed that into the TF CRD.
Its very close to how we think about ML internally, but not what we use. Your best bet to read that is look at the TFX paper[1] which describes our internal thoughts in great detail. (Though Kubeflow is not designed to be an externalization of TFX, we're very much working in collaboration with that team)
We're absolutely looking at it! Please join our discussion, we'd love to talk about what you're building and if we can help and/or what you'd like us to OSS.
I am at KubeCon 2017 in Austin, TX and, yeah, based on the presentation, it looks like an internal tool they just opened to the public with some bold goals.
Hi! It was actually designed from the start to be an extension of GitHub.com/tensor flow/k8s and then it took on larger goals = Making an entire ML stack (of any ML framework) easy to use, portable and composable on k8s.
The controller part is a Custom Resource Definition. It can run on any cloud.
However, to benefit from GPUs you need to configure the controller correctly. Default configuration are there for GCP and Azure, but you would need to do that manually for other cloud (not that it is very hard)
- GH: https://GitHub.com/Google/Kubeflow
- kubeflow-Discuss: https://groups.google.com/forum/m/#!forum/kubeflow-discuss
NOTE: The name "flow" does not refer directly to TensorFlow; if anything it's a nod at all the river themes that pop up in the ML community (eg FBLearner Flow)
Disclosure: I work at Google on Kubeflow