An API for hosted deep learning models

vonnik · on July 15, 2016

The history of machine learning startups is littered with companies that thought a hosted web service was a good idea. The problem with this model is that big data, by definition, is costly to move. So if a managed service is not generating and storing the data you need to process with machine learning or deep learning (as you might conceivably with AWS), then you probably don't want to move your data to those algorithms or models. All you'll get are small-data users. The models and algos need to go to the data. That's the most efficient approach, and it means you have to go on prem... Fwiw, that's what we're trying to do with Skymind and Deeplearning4j.

https://skymind.io/ http://deeplearning4j.org/

iraphael · on July 15, 2016

My understanding of the post is it would host learned models. You'd train them wherever, but host the learned model in algorithmia, which exposes it through an api, making it easy for others to use your model.

incongruity · on July 15, 2016

It takes big data to train but that trained model can work on "small data" or big data. One-off uses for apps really do lend themselves well to a hosted solution like this IMHO. If you need to classify lots of data – then you are probably at a point to either train your own model or buy it from the developer via this site, I'd think.

scottlocklin · on July 16, 2016

Because if I am capable of training a fancy pants deep learning model to do something helpgul, obviously I need a service to host my model so badly giving someone my model is a better idea than paying $10 a month for an EC2 instance.

platypii · on July 15, 2016

There's a clear trend in the industry to increasingly rely on cloud services, so it seems reasonable that machine learning would follow the same trend. As long as the compute is in the same data center, data transfer is rarely the bottleneck for these kinds of deep learning algorithms, which is why we designed algorithmia to be able to operate anywhere -- on all the major cloud providers, as well as on premise.

vonnik · on July 15, 2016

Right, but the question is: Whose cloud and what kind of cloud? Are we talking private cloud, virtual private cloud? Who manages it? Even saying "as long as the compute is in the same data center" is a huge assumption. I think it's great that Algorithmia can go operate anywhere. How do you do that? What do you need to operate well on prem?

neurotech1 · on July 15, 2016

At "enterprise" level there is a lot of interest in Hybrid Clouds, because on premises is still a requirement.

agibsonccc · on July 16, 2016

(Disclaimer: I'm biased, I'm vonnik's cofounder):

I agree that most startups need to get an MVP out the door as soon as possible which leads to clouds. I think hybrid cloud will be the way to go long term.

If you think about it, on one side we have things like AWS and others where devops and "make running your own infra at scale easy" like docker and k8s. On prem in some form isn't going anywhere. What WILL be interesting are the plays like say: convox where you can manage a cloud like you would an on prem openstack/k8s deployment.

visarga · on July 16, 2016

Can they anonymize the data before receiving it from their clients? It would be a great advantage to be able to use their service in a privacy-conscious way.

rjdagost · on July 16, 2016

As an algorithm developer and manager I have thought of business ideas similar to what Algorithmia is pursuing. There are a few reasons why I think “algorithms as a service” will not work so well. In most products / services that rely on non-trivial algorithms, the core algorithms are often the “secret sauce” of the business. They are what gives you your edge over your competition. And you need to fully understand and control your secret sauce. You need to know where the core algorithms work well and where they don’t work so well. With an outsourced service, your core algorithms are basically a black box outside your control. Another problem: for most real world algorithms it is pretty rare to be able to take an off the shelf algorithm and have it “just work” well enough for your problem. Often there is a bit of parameter tuning and domain specific knowledge that must be incorporated to get the best results (this is how people like myself get a lot of consulting work). If a generic algorithm does work quite well for your problem, your competitors probably already know about it and you have no real edge over them. A third problem, and this is really the main one: one of the main benefits of developing an advanced algorithm is that once you have it, you “own” it and can deploy it as you see fit. You amortize your costs upfront and are able to use this sunk development cost over and over again without extra cost. But with a service like Algorithmia, you are never able to take full advantage of the tremendous leverage that algorithms can give you. The more you use the algorithms, the more you pay. And if you start paying a lot to use an algorithm you’re going to at some point find it to be better to develop your own implementation and stop paying someone else for the service.

visarga · on July 16, 2016

So you can use Algorithmia as only a component of your ML pipeline, or you can use it to try out various out-of-the-box algorithms before taking the effort to run your own. Finding the best setup takes lots of experimentation, anything that can speed that up is useful. We should have easy access to the best performing models in all categories of data: image, video, audio, text, decision making (RL).

Also, these guys could offer support for these models on private cloud servers, to enable privacy.

minimaxir · on July 15, 2016

> "Using GPUs inside of containers is a challenge. There are driver issues, system dependencies, and configuration challenges. It’s a new space that’s not well-explored, yet. There’s not a lot of people out there trying to run multiple GPU jobs inside a Docker container.”

Er, Nvidia itself has an official Docker application which allows containers to interface with the host GPU, optimized for the deep learning use case: https://github.com/NVIDIA/nvidia-docker

Training models is one thing that can commoditized, like with this API, but building models and selecting features without breaking the rules of statistics is another story and is the true bottleneck for deep learning. That can't be automated as easily.

platypii · on July 15, 2016

Algorithmia founder here. nvidia-docker is helpful but does not address all the issues with running GPU computing inside of docker. There are driver issues on the host OS, and the real challenge is running multiple GPU jobs inside of separate docker containers and sharing the GPU.

I agree that building models is still definitely a big challenge, but the tooling and knowledge is getting better every day. Either way, our goal with Algorithmia is to create a channel for people to make their models available, and create incentive for people to put in the effort to train really solid, useful models.

SequoiaHope · on July 15, 2016

Agreed. I brought up a system with nvidia-docker last week for a computer vision application and while it works, it seems fragile. There are more pieces than there should be and it seems easy to break. I also don't know if we can use multiple containers on one host, but it doesn't sound like it.

It is not the final solution for containerized GPU applications.

flx42_ · on July 16, 2016

Author of nvidia-docker here. You can definitely have multiple containers on each GPU if you want. If you find a bug or if you think the documentation was not great, please file a bug!

SequoiaHope · on July 16, 2016

Awesome. Thanks for the reply and I apologize for suggesting something incorrect.

It does strike me as tricky needing to match driver versions between the host and the container. Do you know if there is any effort to eliminate that requirement?

Also while we're chatting, is there any hope of NVIDIA open sourcing their linux drivers? How would such a move affect nvidia-docker?

flx42_ · on July 17, 2016

You don't need to match the driver version between the host and the container. Actually, you shouldn't include any driver file inside the container.

All the user-level driver-files required for execution are mounted when the container is started using a volume. This way you can deploy the same container on any machine with NVIDIA drivers installed.

We have more details on our wiki: https://github.com/NVIDIA/nvidia-docker/wiki/Internals

Concerning your last question: I don't have any information on this topic, but anyway it would not really impact nvidia-docker.

dharma1 · on July 16, 2016

Thanks for your superb work. Is it possible to use nvidia-docker on several AWS instances, to use multiple GPUs? (To spread training on multiple GPUs for more speed and ram. Tensorflow and Caffe support distributed training but not sure if it's viable on dockerized envs on AWS?)

flx42_ · on July 16, 2016

One container can use multiple GPUs on the same machine without problems.

For distributed training (which Caffe doesn't actually support, not the official version), you would have to run one container per instance, but this is more a configuration problem at the framework level, than a Docker or nvidia-docker problem.

lmeyerov · on July 16, 2016

Graphistry co-founder here. We do that every day with nvidia-docker & AWS.

The real challenge is doing this on 100+ GPUs and leveraging multitenancy for an additional 100X+ economy of scale. We're actively working on it, and in my experience, this seems like a classic scheduling area where different domains will want to do it differently. However, even there, it'll end up something like "plugin a new user-level mesos scheduler x", and Nvidia is working on exactly that.

I'll wait for someone at Baidu or the Titan lab to blow up those numbers by another 100-1000X ;-)

Edit: If this sounds like a cool problem, we're leveraging GPU cloud computing and visual graph analytics for event analysis (e.g., core tool for teams in enterprise security). We would love help, esp. on cloud infrastructure or on connecting the eco-system together! Contact build@graphistry.com and we'll figure something out :)

flx42_ · on July 16, 2016

Well yes, you do need to have the driver installed on the host OS :)

You can run multiple containers on the same GPU with nvidia-docker, it's exactly the same as running multiple processes (without Docker) on the same GPU.

lowglow · on July 15, 2016

What's the first step towards easily automating it? What would the long term roadmap look like?

whenwillitstop · on July 15, 2016

Mind going a bit deeper into this?

jc4p · on July 15, 2016

I'm not a big fan of taking the openness in machine learning and making it a web based product, for me the "whoa" moment from the new approachable machine learning frameworks is that I can train a Tensorflow network on my computer then embedded it in an Android/iOS app that'll work offline.

Also, much more minor grievance but I really dislike websites that don't work on my 15" laptop, what's going on here? http://i.imgur.com/q13lCLK.png

felix_thursday · on July 15, 2016

Thanks for pointing out that share button issue. Should be fixed now!

sbierwagen · on July 15, 2016

How does this compare in price to AWS GPU instances?

anowell · on July 15, 2016

The service operates at a higher level than EC2, and pricing is calculated on a per-second of compute basis. Comparing prices is going to depend a lot on the specifics of your workload and your affinity for managing infrastructure.

sbierwagen · on July 15, 2016

So, "more expensive than AWS".

anowell · on July 15, 2016

Speaking strictly of infrastructure costs, stripping away any other value of our marketplace and platform, ignoring an on-premise deployment of our platform that uses your own AWS instances, and given a sustained workload that saturates system resources for the lifetime of your instance(s): yes.

For workloads where you aren't making full use of system resources at all times, then the economy of scale provided by our compute cluster often results in compute-per-second being more cost effective even before considering the costs of managing your own infrastructure. It fits into the "serverless" trend, FWIW.

Each algorithm in our marketplace has a cost calculator that breaks down the price using a per-API-call estimate. If you have a specific workload in mind, feel free to reach out and we'd be happy to further discuss the pricing.

daveguy · on July 15, 2016

Of course it is more expensive. They need to pay for something like the equivalent of AWS to provide the service. Did you honestly think it would be competitive with raw compute where you have to do all the work yourself? I guess if you had a service that could run on a fraction of an instance, but for raw compute + application have you seen anything as cheap as AWS?

j2kun · on July 15, 2016

If you're interested in doing this, check out 21 Inc. You can essentially do the same thing, with tutorials on getting set up on EC2, but get paid directly in bitcoin.

https://21.co/learn/deep-learning-aws/

Disclaimer: I work for 21.

dk8996 · on July 15, 2016

Cool idea. I spent sometime playing around with it. Found that some of the Algos are a bit buggy and not working.

erikb · on July 15, 2016

Does anybody know if these "free APIs" are actually used to get "free training" for the API owner's models? I mean, is it free as in free beer or as in facebook?

RhodesianHunter · on July 15, 2016

I was under the impression that training data needed to contain the answer to the question to be effective, while users of these models would be using them to answer questions.

Unless the users of this service then feed whether the answer given by the service was correct back into the service, I don't see how it would help to train their model.

Happy to be corrected by someone with a better understanding of the space.