Hacker News new | past | comments | ask | show | jobs | submit login
Kubeflow – Machine Learning Toolkit for Kubernetes (github.com/google)
203 points by nikolay on Dec 7, 2017 | hide | past | favorite | 34 comments



Hi! I’m David Aronchick, PM on Kubeflow, I’m happy to answer any questions! I was one of the early PMs on Kubernetes, and we very much want to make this a community project, so please join us in thinking about what’s next!

- GH: https://GitHub.com/Google/Kubeflow

- kubeflow-Discuss: https://groups.google.com/forum/m/#!forum/kubeflow-discuss

NOTE: The name "flow" does not refer directly to TensorFlow; if anything it's a nod at all the river themes that pop up in the ML community (eg FBLearner Flow)

Disclosure: I work at Google on Kubeflow


Kudos on the release, and thanks for your and the team's work!

Any thoughts on this vs managed ml-engine? Cost-aside, seems like this nibbles on the smaller scale "but ml tooling is too hard" use cases?


Thank you! We still love Google Cloud ML engine - it's perfect for those who want to run in the cloud and want a layer of abstraction. This is for people who want portable stacks and a bit more control; and/or want to use their Kubernetes deployments (particularly on-premise or multi-purpose).

Does that help?

Disclosure: I work at Googlr on Kubeflow


Do you foresee some kind of integration between Google Cloud ML engine and Tensorflow on k8s in the future?


We're always ready to talk roadmap - anything in particular you'd like to see integration-wise?

Disclosure: I work at Google on Kubeflow


Off the top of my head, maybe a maintained "ml-engine aligned" kubeflow setup, to the extent that's possible.

The use case I'm think of is an ml dev team building on kubeflow and proving a system. Then wanting to transfer it to a non-engineering team, yet wash their hands of any ongoing infrastructure ops responsibility.

Knowing that a "ml-engine aligned" kubeflow config would transfer cleanly (including associated bells and whistles) would make that a much more attractive option.

Caveat: I'll admit I'm not keeping up on what's in the managed offering, but I'm assuming there are a number of value-adds of the type that end users like (visualizations, etc).


Yes, this is EXACTLY what we're trying to do! However, it's a bit early, so I can't say when or where we'll be able to get to it. Also, I should be clear, though I'm from Google, we would really like the same story to work with other cloud's hosted offerings as well, but we'll need their support to do so!

Disclosure: I work at Google on Kubeflow


Great to hear this not tied to TensorFlow! How would one use a different DL platform, say PyTorch or DyNet?


The steps would basically be:

- Containerize the DL platform

- Create a k8s manifest (similar to our CRD if necessary)

- Create a service endpoint

- integrate all that into the JH deployment

This is less hard than it sounds, but we'd love help! We only started with TF because that's what we know.

Disclosure: I work at Google on Kubeflow


Interesting, though I don't see how it is better than a plain docker image over kubernetes? Not much of a hassle now too. And how is it different from what DL4J is already doing with Zeppelin and supporting both Keras, TF, MXNet and PyTorch on the way?


> ...how [is it] better than a plain docker image over kubernetes?

Scalability for people with existing on-premise (or cloud based), kubernetes workflows, especially once it comes to training or heavy crunching.

That's not to say that Docker Machine/Swarm/Compose couldn't handle the same, but it's an extra step for kubernetes users and pushes people onto a slightly different toolchain than minikube->K8s.


Correct! Many folks have more complicated deployments in the cloud, and we're trying to align (as close as humanly possible) your on-prem stack with your cloud stack, to minimize the pain in migration.

If you have a single container, and a simple pipeline, this may be a bit more than you need. We've just found that there are normally 5 or more services/systems that people wire together to create an ML stack, and that's what we're trying to solve for/simplify.

Disclosure: I work at Google on Kubeflow


Are you at kubecon this week?


I am! You can reach me at aronchick (at) Google if you'd like to meet or have questions.


Looks like different components could be added in the future, but not clear how.

The following are included:

- A JupyterHub to create & manage interactive Jupyter notebooks.

- A Tensorflow Training Controller that can be configured to use CPUs or GPUs, and adjusted to the size of a cluster with a single setting.

- A TF Serving container.


Correct! We're currently thinking a lot about orchestration of the various components but for now, our goal is to use the native loose coupling between services available in K8s. So if you wanted Spark for data processing, for example, you could start a service, and the deployment, and feed that into the TF CRD.

Disclosure: I work at Google


Any chance to make this a helm chart?



We're looking at a bunch of deployment packaging solutions, helm is probably one of the top ones!

Disclosure: I work at Google on Kubeflow


Pretty interesting. I'm guessing this is something that Google uses internally for their Kubernetes workflows.


Its very close to how we think about ML internally, but not what we use. Your best bet to read that is look at the TFX paper[1] which describes our internal thoughts in great detail. (Though Kubeflow is not designed to be an externalization of TFX, we're very much working in collaboration with that team)

[1] http://www.kdd.org/kdd2017/papers/view/tfx-a-tensorflow-base...

Disclosure: I work at Google on Kubeflow


Does google intend on open sourcing TFX ?. I only ask because we're building a lot of the same infrastructure.


We're absolutely looking at it! Please join our discussion, we'd love to talk about what you're building and if we can help and/or what you'd like us to OSS.

Disclosure: I work at Google on Kubeflow


Sure, is there an issue/doc/pr to comment on ?.


No, would you mind adding one?


I am at KubeCon 2017 in Austin, TX and, yeah, based on the presentation, it looks like an internal tool they just opened to the public with some bold goals.


Hi! It was actually designed from the start to be an extension of GitHub.com/tensor flow/k8s and then it took on larger goals = Making an entire ML stack (of any ML framework) easy to use, portable and composable on k8s.

Disclosure: I work at Google on Kubeflow


This looks pretty cool. Is it dependent on Google’s kubernetes or can it be run on Openshift or DC/OS as well?


It says it runs in "in any environment in which Kubernetes runs." So as long as you are asking if it runs on Openshift's Kubernetes, than yes.


Absolutely! Redhat are already contributing :)

Disclosure: I work at Google on Kubeflow


The controller part is a Custom Resource Definition. It can run on any cloud.

However, to benefit from GPUs you need to configure the controller correctly. Default configuration are there for GCP and Azure, but you would need to do that manually for other cloud (not that it is very hard)


Correct, anytime you bleed through hardware, it requires some setup, sadly

Disclosure: I work at Google on Kubeflow


With Openshift you can also exploit the S2I feature(This is not based on CRD.)You can refer this Blog on some ways which this can be done for TF. https://blog.openshift.com/openshift-commons-briefing-110-co...


Yep, anywhere k8s runs! (Or that's the idea anyway). If it doesn't, please file a bug!

Disclosure: I work at Google on Kubeflow




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: