The history of machine learning startups is littered with companies that thought a hosted web service was a good idea. The problem with this model is that big data, by definition, is costly to move. So if a managed service is not generating and storing the data you need to process with machine learning or deep learning (as you might conceivably with AWS), then you probably don't want to move your data to those algorithms or models. All you'll get are small-data users. The models and algos need to go to the data. That's the most efficient approach, and it means you have to go on prem... Fwiw, that's what we're trying to do with Skymind and Deeplearning4j.
My understanding of the post is it would host learned models. You'd train them wherever, but host the learned model in algorithmia, which exposes it through an api, making it easy for others to use your model.
It takes big data to train but that trained model can work on "small data" or big data. One-off uses for apps really do lend themselves well to a hosted solution like this IMHO. If you need to classify lots of data – then you are probably at a point to either train your own model or buy it from the developer via this site, I'd think.
Because if I am capable of training a fancy pants deep learning model to do something helpgul, obviously I need a service to host my model so badly giving someone my model is a better idea than paying $10 a month for an EC2 instance.
There's a clear trend in the industry to increasingly rely on cloud services, so it seems reasonable that machine learning would follow the same trend. As long as the compute is in the same data center, data transfer is rarely the bottleneck for these kinds of deep learning algorithms, which is why we designed algorithmia to be able to operate anywhere -- on all the major cloud providers, as well as on premise.
Right, but the question is: Whose cloud and what kind of cloud? Are we talking private cloud, virtual private cloud? Who manages it? Even saying "as long as the compute is in the same data center" is a huge assumption. I think it's great that Algorithmia can go operate anywhere. How do you do that? What do you need to operate well on prem?
I agree that most startups need to get an MVP out the door as soon as possible which leads to clouds. I think hybrid cloud will be the way to go long term.
If you think about it, on one side we have things like AWS and others where devops and "make running your own infra at scale easy" like docker and k8s. On prem in some form isn't going anywhere. What WILL be interesting are the plays like say: convox where you can manage a cloud like you would an on prem openstack/k8s deployment.
Can they anonymize the data before receiving it from their clients? It would be a great advantage to be able to use their service in a privacy-conscious way.
As an algorithm developer and manager I have thought of business ideas similar to what Algorithmia is pursuing. There are a few reasons why I think “algorithms as a service” will not work so well. In most products / services that rely on non-trivial algorithms, the core algorithms are often the “secret sauce” of the business. They are what gives you your edge over your competition. And you need to fully understand and control your secret sauce. You need to know where the core algorithms work well and where they don’t work so well. With an outsourced service, your core algorithms are basically a black box outside your control. Another problem: for most real world algorithms it is pretty rare to be able to take an off the shelf algorithm and have it “just work” well enough for your problem. Often there is a bit of parameter tuning and domain specific knowledge that must be incorporated to get the best results (this is how people like myself get a lot of consulting work). If a generic algorithm does work quite well for your problem, your competitors probably already know about it and you have no real edge over them. A third problem, and this is really the main one: one of the main benefits of developing an advanced algorithm is that once you have it, you “own” it and can deploy it as you see fit. You amortize your costs upfront and are able to use this sunk development cost over and over again without extra cost. But with a service like Algorithmia, you are never able to take full advantage of the tremendous leverage that algorithms can give you. The more you use the algorithms, the more you pay. And if you start paying a lot to use an algorithm you’re going to at some point find it to be better to develop your own implementation and stop paying someone else for the service.
So you can use Algorithmia as only a component of your ML pipeline, or you can use it to try out various out-of-the-box algorithms before taking the effort to run your own. Finding the best setup takes lots of experimentation, anything that can speed that up is useful. We should have easy access to the best performing models in all categories of data: image, video, audio, text, decision making (RL).
Also, these guys could offer support for these models on private cloud servers, to enable privacy.
> "Using GPUs inside of containers is a challenge. There are driver issues, system dependencies, and configuration challenges. It’s a new space that’s not well-explored, yet. There’s not a lot of people out there trying to run multiple GPU jobs inside a Docker container.”
Er, Nvidia itself has an official Docker application which allows containers to interface with the host GPU, optimized for the deep learning use case: https://github.com/NVIDIA/nvidia-docker
Training models is one thing that can commoditized, like with this API, but building models and selecting features without breaking the rules of statistics is another story and is the true bottleneck for deep learning. That can't be automated as easily.
Algorithmia founder here. nvidia-docker is helpful but does not address all the issues with running GPU computing inside of docker. There are driver issues on the host OS, and the real challenge is running multiple GPU jobs inside of separate docker containers and sharing the GPU.
I agree that building models is still definitely a big challenge, but the tooling and knowledge is getting better every day. Either way, our goal with Algorithmia is to create a channel for people to make their models available, and create incentive for people to put in the effort to train really solid, useful models.
Agreed. I brought up a system with nvidia-docker last week for a computer vision application and while it works, it seems fragile. There are more pieces than there should be and it seems easy to break. I also don't know if we can use multiple containers on one host, but it doesn't sound like it.
It is not the final solution for containerized GPU applications.
Author of nvidia-docker here. You can definitely have multiple containers on each GPU if you want. If you find a bug or if you think the documentation was not great, please file a bug!
Awesome. Thanks for the reply and I apologize for suggesting something incorrect.
It does strike me as tricky needing to match driver versions between the host and the container. Do you know if there is any effort to eliminate that requirement?
Also while we're chatting, is there any hope of NVIDIA open sourcing their linux drivers? How would such a move affect nvidia-docker?
You don't need to match the driver version between the host and the container. Actually, you shouldn't include any driver file inside the container.
All the user-level driver-files required for execution are mounted when the container is started using a volume. This way you can deploy the same container on any machine with NVIDIA drivers installed.
Thanks for your superb work. Is it possible to use nvidia-docker on several AWS instances, to use multiple GPUs? (To spread training on multiple GPUs for more speed and ram. Tensorflow and Caffe support distributed training but not sure if it's viable on dockerized envs on AWS?)
One container can use multiple GPUs on the same machine without problems.
For distributed training (which Caffe doesn't actually support, not the official version), you would have to run one container per instance, but this is more a configuration problem at the framework level, than a Docker or nvidia-docker problem.
Graphistry co-founder here. We do that every day with nvidia-docker & AWS.
The real challenge is doing this on 100+ GPUs and leveraging multitenancy for an additional 100X+ economy of scale. We're actively working on it, and in my experience, this seems like a classic scheduling area where different domains will want to do it differently. However, even there, it'll end up something like "plugin a new user-level mesos scheduler x", and Nvidia is working on exactly that.
I'll wait for someone at Baidu or the Titan lab to blow up those numbers by another 100-1000X ;-)
Edit: If this sounds like a cool problem, we're leveraging GPU cloud computing and visual graph analytics for event analysis (e.g., core tool for teams in enterprise security). We would love help, esp. on cloud infrastructure or on connecting the eco-system together! Contact build@graphistry.com and we'll figure something out :)
Well yes, you do need to have the driver installed on the host OS :)
You can run multiple containers on the same GPU with nvidia-docker, it's exactly the same as running multiple processes (without Docker) on the same GPU.
I'm not a big fan of taking the openness in machine learning and making it a web based product, for me the "whoa" moment from the new approachable machine learning frameworks is that I can train a Tensorflow network on my computer then embedded it in an Android/iOS app that'll work offline.
Also, much more minor grievance but I really dislike websites that don't work on my 15" laptop, what's going on here? http://i.imgur.com/q13lCLK.png
The service operates at a higher level than EC2, and pricing is calculated on a per-second of compute basis. Comparing prices is going to depend a lot on the specifics of your workload and your affinity for managing infrastructure.
Speaking strictly of infrastructure costs, stripping away any other value of our marketplace and platform, ignoring an on-premise deployment of our platform that uses your own AWS instances, and given a sustained workload that saturates system resources for the lifetime of your instance(s): yes.
For workloads where you aren't making full use of system resources at all times, then the economy of scale provided by our compute cluster often results in compute-per-second being more cost effective even before considering the costs of managing your own infrastructure. It fits into the "serverless" trend, FWIW.
Each algorithm in our marketplace has a cost calculator that breaks down the price using a per-API-call estimate. If you have a specific workload in mind, feel free to reach out and we'd be happy to further discuss the pricing.
Of course it is more expensive. They need to pay for something like the equivalent of AWS to provide the service. Did you honestly think it would be competitive with raw compute where you have to do all the work yourself? I guess if you had a service that could run on a fraction of an instance, but for raw compute + application have you seen anything as cheap as AWS?
If you're interested in doing this, check out 21 Inc. You can essentially do the same thing, with tutorials on getting set up on EC2, but get paid directly in bitcoin.
Does anybody know if these "free APIs" are actually used to get "free training" for the API owner's models? I mean, is it free as in free beer or as in facebook?
I was under the impression that training data needed to contain the answer to the question to be effective, while users of these models would be using them to answer questions.
Unless the users of this service then feed whether the answer given by the service was correct back into the service, I don't see how it would help to train their model.
Happy to be corrected by someone with a better understanding of the space.
https://skymind.io/ http://deeplearning4j.org/