Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: How can I set up an ML model as a scalable API?
16 points by rococode on Dec 22, 2019 | hide | past | favorite | 8 comments
I have a custom ML (PyTorch) model that I would like to set up as a service / API - it should be able to receive an input any time and promptly return an output. It should be able to scale up automatically to thousands of requests per second. The model itself takes around a minute to load, an inference step takes around 100ms. The model is being called only from my product's backend, so I have a bit of control over request volume.

I've been searching around and haven't found a clear standard/best way to do this.

Here are some of the options I've considered:

- Algorithmia (came across this yesterday, unsure how good it is and have some questions about the licensing)

- Something fancy with Kubernetes

- Write a load balancer and manually spin up new instances when needed.

Right now I'm leaning towards Algorithmia as it seems to be cost-effective and basically designed to do what I want. But I'm unsure how it handles long model loading times, or if the major cloud providers have similar services.

I'm quite new to this kind of architecture and would appreciate some thoughts on the best way to accomplish this!




I work on a free and open source project called Cortex that deploys PyTorch models (as well as other frameworks) as scalable APIs. It sounds perfect for what you're looking for: https://github.com/cortexlabs/cortex

Cortex automates all of the devops work—from containerizing your model, to orchestrating Kubernetes, to autoscaling instances to meet demands. We have a bunch of PyTorch examples in our repo, if you're interested: https://github.com/cortexlabs/cortex/tree/master/examples


You can use sagemaker to perform ml model deployment. https://aws.amazon.com/sagemaker/

Sagemaker takes care of infrastructure for you. It has also been integrated with various orchestrations like k8s, airflow etc.

https://sagemaker.readthedocs.io/en/stable/amazon_sagemaker_...


Maybe you could try to save the pre-trained model to a storage bucket (e.g. s3) and then use flask (or whatever framework you like) to create the endpoints. When the flask app starts, the model can be loaded into memory from the storage bucket, and then you could create, for example, a /predict endpoint that accepts whatever data is needed to make the prediction. Deploy this to some PaaS (Heroku, AWS EBS, GCP App Engine) that has auto-scaling as a feature and you're sorted.


So you can go with Kubernetes. This is my preferred tool.

With Kubernetes, you can either wrap your model inside a container or mount it into the container from a persistent volume.

As for scaling you have two options:

1) Horizontal Pod Autoscaler https://kubernetes.io/docs/tasks/run-application/horizontal-...

2) Knative, which is Kubernetes serverless on-prem solution.


Algorithmia here . What are you concerned about license wise? You own all ip always. There is some restrictions if you choose to commercialize on our service (mostly guarantee you won't take it down on users). System was built for this. Happy to answer questions


Oh hi! I was looking at the terms on this page (https://algorithmia.com/api_dev_terms)

The Software License section states:

> You do not transfer ownership of the Software to Algorithmia, but you do hereby grant Algorithmia, in its capacity as the provider of the Services, a worldwide, non-exclusive, perpetual, irrevocable, fully paid-up and royalty free license to use and permit others to use the Software (including the source code if made viewable) in any manner and without restriction of any kind or accounting to you, including, without limitation, the right to make, have made, sell, offer for sale, use, rent, lease, import, copy, prepare derivative works, publicly display, publicly perform, and distribute all or any part of the Software and any modifications, derivatives and combinations thereof and to sublicense (directly or indirectly through multiple tiers) or transfer any and all such rights; <and then some stuff about FOSS>

I'm no lawyer but my reading of this is that I own the IP but by using the platform Algorithmia receives a perpetual and irrevocable license to do as they please with the models, even if they're intended to be private.

Please correct me if I'm mistaken! I've been playing around with Algorithmia and quite like it, but that specific part is a bit off-putting and makes me hesitate to put the most important parts of our product on Algorithmia.


It's not used for private models. Its meant to make sure if someone builds an application based on your models you won't yank that version from under them. If you never expose your model (for profit) you can delete at will and we have no rights.


Tensorflow has a decent C++ layer, just sayin.

PyTorch? Dunno. Last I spoke to those ppl, they had a solution too.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: