Hacker News new | past | comments | ask | show | jobs | submit login
Deepo: a Docker image containing almost all popular deep learning frameworks (github.com/ufoym)
194 points by ufoym on Oct 30, 2017 | hide | past | favorite | 38 comments



From the reddit thread 2 days ago:

> The whole point of Docker containers is that they're very efficient with their resource and space usage. If you have 15 redundant frameworks pre-loaded in a container, of which you use at most 3, you're using Docker wrong.


Well, I'd say the point of Docker is to make distributing and running something easy, reliable, and reasonably safe. Which is what is happening here.

It is true that Docker is more space efficient than setting up whole VMs for each app. But it's a lot less space efficient than installing everything on a big server. If you are running, say, a bunch of Python apps, the lowest-space approach is making sure they all use the same set of libraries, modules, and other required resources. But harmonizing those versions is a pain in the ass, so Docker lets you easily use multiple copies of python, overlapping libraries, python packages, etc.

It's true that if you were deploying this in production, you'd want to trim this back. But this is explicitly a "research environment".


Shall we talk about Deepo, not docker?

Absolutely loading loads of packages in a container is never a good idea for production. But Deepo is not for production, it is a playground. It is a thing which help people to explore and learn these packages without a hassle and the worry for messing up their system.

Indeed, having played around dl-docker, I really appreciate the author of Deepo. It seems like to be more updated, more well-covered, and more promising to me.


It's also much easier to make "sure they all use the same set of libraries, modules, and other required resources" with smaller teams.

The larger you get, the higher the chance that dependency hell will cripple the velocity of a project.

Docker helps a ton in these types of scenarios.


> Well, I'd say the point of Docker is to make distributing and running something easy, reliable, and reasonably safe. Which is what is happening here.

No, that's http://flatpak.org/


you may not think that this is what containers are for. But it's certainly how people use them, and it's echoed on https://www.docker.com/what-container


That seems targeted at applications, not development, and it's got much less momentum behind it.


Watch it turn into a Docker rival soon enough...


"Don't use it that way! That's not how it was intended."

Good to see the hacker spirit is alive and well here.


For production - 100% agree. For playing with code - not at all.


I agree with this sentiment. That said, someone is totally going to deploy this.


That's immediately where I saw the value of this as well.


As someone who has wasted hours setting up these various tools on an EC2 instance (just to play around with cool projects and experiment) this is a gorgeous constellation of utility.


I thought the point of Docker was to compartmentalize software setups to make them portable? Who are you to say someone is using it "wrong" if people find the utility in it?


I think it's ok to have them all set up but not running.

We have a 9GB+ docker image that we create containers from some 10k times per day for our users in production.


Why?

Would be interesting to hear about what your use case is, and how it works for you.


To offer online compiler for 20+ languages. We create a temporary container for each user on our servers.

Here is one of our high-traffic product: https://codepad.remoteinterview.io/

It's used by interviewers, students, teachers who need a quick place to practice and pair-program.


It feels more and more like containers in general is being used as a band-aid over various underlying issues concerning inter-dependencies and conflicts between various projects within the Linux ecosystem.

Why take the effort to get the kids to play nice with each other when you can lock each of them in their own room?


If it works and it's useful who cares


For multiple layers in a Dockerfile use overlayfs2 and not aufs. You will suffer from slow io


I've been having issues with slow IO on some of my containers, and I'm using AUFS. What is the cause of the slowdown with AUFS? Why do they not use Overlay2 as default if AUFS has such a major issue?


AUFS is the only fs that was broadly distributed until recently. Now, it's been removed in the latest linux distributions because it's too buggy and unstable.

Overlay was started and quickly abandoned. Only to be replaced by overlay2 that is work in progress and not wildly available yet.

This article contains a bit of history on Docker filesystems: https://thehftguy.com/2016/11/01/docker-in-production-an-his...


I just want to say one thing: setting up your Python for DL is a piece of cake compared to using Python for DL. It's like automating astronaut application forms.


"It's like automating astronaut application forms."

Good potential astronauts could get tired of filling application forms for everything and just choose another career.


But have you set up 15 of them?

And even if you just set up one framework, did you find it getting corrupt after a lot of use, and did you have to tear down and rebuild it?

I found it easy to set up, but have had to do it a few times, and that was just for Tensorflow. I haven't tried other frameworks (CNTK, for instance) because of the work of figuring out how to get it up and running.

That's just for 1 other framework, let alone 14 others.


I don't think this is true. How many (proper) mathematicians have you met?


I met a few, but what does that have to do with Python installations?


I used LXC in the past, the current docker still puzzles me a bit.

To save resource(cpu/memory,etc) and stay lightweight, the docker I pulled needs to reuse whatever on my host Linux to achieve that goal, otherwise it has to provide its own dependencies which nearly-doubles the storage. Pulling from others' docker image normally means it has different libraries etc from my host(e.g. Linux), so I ended up running duplicated libraries/dependencies on the same host, how does that save anything? Why not just do kvm?

For small applications that use the same libraries/dependencies as the host I can run way more dockers than KVMs as the former is indeed resource efficient because it can share them with the host OS, but again it only happens when the host has essentially the same software installed for dockers to reuse, and secondly docker itself should be light-weight(otherwise, why not kvm to avoid all the docker-container complexity).

So, is it true that, docker is _only_ good for light-weight-application that happens to running the same libraries/dependencies as the host OS to save resources? At least that's what I did with LXC in the past.

Or is Docker just for easy of deployment, which most of the time it does not save any resource, but increases it(as most of the time host OS will not have the same shared resource installed)?

Also I can not understand why Linux-Docker saves any resources on Windows host, other than it can be easier to deploy? Then again, the past *.exe installation worked fine as well, why docker?


It seems docker is being "abused" as a way to build one-off playrooms for spare time tinkering.

And frankly i can't help wonder if this is because more and more languages is sprouting their own package managers, while at the same time developers are getting ever more lax about dependencies hygiene.


It's not being abused. It's a much better alternative than having users download a VM image full of preinstalled tools.


How easily does it work with various types/brands of GPU?


This is a very useful project.

Something similar (even if it's not exactly the intended functionality) that already exist is the Kaggle image:

https://github.com/Kaggle/docker-python


How is this better than kaggle’s docker image ?

https://hub.docker.com/u/kaggle/


seem's basically the same (didn't check every framework but seemed like most big names were on both) but being built on the cuda base image might make it more easy to attach a GPU?

Also I just noticed this is about the craziest dockerfile. It's all one RUN which good luck trying to troubleshoot if theres a non obvious error during building


Every RUN statement creates a new layer in the image. Best practice for image size and io performance is to use the lowest number of layers possible.

That said, recent versions of docker do have the ability to squash layers at the end of a build which could give the best of both worlds.


It would be the most amazing if this also included a step-by-step-step tutorial for setting up on the various cloud gpu providers (aws, compute engine, that newfangled floyd thing...)


A CPU version would be nice.


I recall that there were lots of special-use Linux distributions which include specified packages.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: