Show HN: Deploy and Retrain Cutting Edge ML Image Recognition via REST API

mikeshi42 · on June 12, 2018

Hey Everyone!

Harsh and I built ModelDepot with the goal of empowering everyday developers to use Machine Learning simply and quickly.

After having thousands of engineers use ModelDepot, we learned that many teams need a simple and effective way to deploy our models and use them for both inference and further training. Deploying should be easy, fast, and on your terms, and training should be effective with less data.

With ModelDepot Percept, you can deploy cutting edge pretrained Image Classification models with just one line on any infrastructure you choose. The model can be used immediately for predictions, or can be effectively trained on your own data with just a few training samples. Interacting with Percept is easy using a REST API, and as cheap as the infrastructure you choose to deploy it on.

We’ve launched ModelDepot Percept with state of the art image classification, and quickly are moving to include other use cases as well.

Let us know if you have any questions or comments! We're more than happy to chat!

If you want more training and prediction free credits, reply here or tweet us your username @ModelDepot :)

mlthoughts2018 · on June 12, 2018

I really think these types of services miss the big picture. The hard part of creating these services in-house is not wrapping things up in a single discrete unit of deployment like a Docker container, and it's not even following research or tutorials for how to actually train a fine-tuned or transfer-learned modification to a popular pre-trained model.

Those are the easy parts that it's not that interesting to pay someone else to do.

The hard parts are always

(1) your own data cleaning, pre-treatment and ingestion pipeline to get the data ready in the first place and into a conformable and reproducible state for tracking trained models and experiments; and

(2) [most important] writing the acceptance testing and model checking plumbing code to validate that trained models are working as expected on custom use cases with custom KPIs or metrics that business stakeholders want (e.g. basic accuracy metrics built in to most modeling tools only matter to the team training the model -- other people could not care less; they want summarization in 'business' terms and demonstrations of accuracy that reflect exact customer or stakeholder use cases).

Because of (1) and especially (2), you still need to employ expensive in-house machine learning staff to understand how to evaluate your models, even if those models are consumed from a third-party service like this one, or like Google's AutoML or AWS Rekognition.

It's why this idea of "machine learning for people who don't know machine learning" seems so silly and cost-ineffective to me. You still have to pay the salary cost to have in-house modeling experts who can understand diagnostics, bugs, training errors, etc., for these third-party models and who can understand any methodological issues with data preparation and pre-treatment, and who can translate machine learning diagnostic jargon into meaningful metrics and demos for specific stakeholder use cases.

Finally, you also still need some mix of machine learning infrastructure expertise to understand if the cost-per-request of paying for these third party services is actually worth it, especially when adjusted for the level of accuracy you need for your specific application (something that is always suspiciously hard to get information about on the pricing pages of these services).

Often times you don't need to offload for the sake of autoscalability features, and you do need to verify if the latency and throughput of the third party service are acceptable for the performance constraints of your specific problem, as well as what overall cost they equate to in terms of supporting your expected throughput for your traffic to the service.

Basically, it seems like the value add these services want to advertise is that you can outsource the problem of developing a machine learning model, even when you need to provide your own data for fine-tuning.

But you can only outsource the tiny amount of work that would have been the implementation of some training code and the execution of some training regimen to arrive at the deployable model artifact (e.g. a trained model wrapped in a Docker container).

This is probably only 10% of the work at best, and it's also the "fun part" that the machine learning engineers you still have to employee are going to be grumpy about outsourcing rather than keeping their skills sharp and tailoring the model more uniquely to your specific business problem.

The other 90% of the work, particularly understanding model diagnostics in an acceptance testing sort of way that ties directly to a real use case, is still there and fundamentally cannot be outsourced unless you're prepared to just fully pay for an entire third party consultant to do the entire project.

Either way, I just don't see why this type of service would be valuable to companies. They might get fooled into thinking if they have a precocious general engineer who can sort of figure some stuff out, then they can offload for the remaining effort, and save a bunch of money on the cost of headcount for expensive machine learning engineers.

But I think they'll quickly get burnt by that idea, and have a weird situation where they thought they paid for a service like this for scalability purposes, but where the lack of in-house machine learning expertise is what really limits scalability, and they end up with a very nasty form of vendor lock-in.

mikeshi42 · on June 12, 2018

Hey! I think all of those points are 100% valid and I understand where you're coming from.

I'd like to preface that the product you see today isn't the end goal we're trying to reach, it's very much just the beginning of the vision of easy to use ML we want to accomplish, and we hope to meet the rest of the pains you've outlined in the future.

1. Data cleaning is the most important and painful thing we experience, and all the companies we talk to experience as well. "Data cleaning" is often a very broad term and can mean a lot of different things across different companies. It's something we absolutely want to tackle, but it's not on the immediate roadmap (it's still a bit vague of how to bring all the different problems companies face together).

2. This is something we hope to move into in the short term, we understand that production does not mean "ML runs on sever and returns requests that look right". We know that it means that the model scores well on offline validation data, online data, and is able to notify appropriately of shifting distributions of input data. We hope to have tools in place that can allow teams to understand the performance of the model both in an offline and continual online setting, and be alerted when the model is failing due to unseen feedback loops or drastic difference between training and inference distributions.

Where we think this stands today is that any engineer on any team can get started quicker with validating a ML PoC and get it deployed into production faster. While I understand there's a ton of nuance to "production", sometimes "good enough" is fine for the short term (and there's a lot of tools that can support them with the problems outlined above as well!), while we work out how to support customers in the long term even better.

As for accuracy, one of our core values is transparency, you can read about the technical workings here: https://medium.com/modeldepot/percept-whats-inside-the-ml-co.... This should arm a team with the appropriate terms to search for and understand if our product is the right fit for their use case. We hope to only expand the capabilities and options customers can tune as well if this exact solution won't work for them out of the box. Additionally, users are free to experiment around with how to understand the accuracy of the model (beyond a single accuracy metric), we even go a bit into this at the end of our guide here: https://medium.com/modeldepot/apples-oranges-a-machine-learn...

As for performance, since it's deployed on your own infrastructure, you can scale horizontally as much as you'd like without hitting arbitrary rate limits. In our own internal benchmarks we can get around ~6 sustained QPS on a c5.4xlarge and about ~12 sustained QPS on a c5.9xlarge. If you'd like to learn more, we can go more into details about p90s and whatnot across various concurrent connection settings.

I hope you have a chance to check out the product and leave us more feedback, we're striving to be different than other MLaaS offerings out there with more transparency in what we do, to empower engineers to leverage ML in a more meaningful way.

chatmasta · on June 12, 2018

How do you enforce request limits when the code is not hosted on your server? And why should I pay tiered based on number of requests if I’m the one paying the infrastructure cost anyway?

(Sorry if I’m misunderstanding your pricing model, only had a quick glance)

mikeshi42 · on June 12, 2018

The pricing model is to help support our continual development costs of the platform, it definitely takes a bit of time to develop the tech, and continually test it with our users and refine the experience so that it's easy to use out of the box for you. We have a pay-as-you-go structure to help make it easier for users that aren't sure about their usage yet to try it out at a very low cost, and decide if it'll work for them, we didn't want a steep initial pricing structure to be a barrier of entry for people exploring ML. The requests are loosely enforced through our central key server (though there is no request rate limit). While absolutely no training/inference data leaves the container, we do occasionally send back usage metrics to help keep track of usage. If you're interested in a totally isolated solution, we can talk about having on-prem deployed key servers so that your cluster can be completely isolated.

I hope that clarifies everything, let me know if you have any feedback or further questions :)

chatmasta · on June 12, 2018

Usage based pricing seems very misplaced for a self hosted solution. Questions of enforceability aside (all I need to do is remove the telemetry code to get it free), the issue is that usage based pricing is meant to scale your revenue alongside costs. When your customer is hosting the infrastructure, you have the same costs regardless how many requests they make, so as a customer, it doesn’t feel right to pay you for each request that my own servers are solving.

A competitor can easily undercut you here. I would also be interested in hearing of any other company that charges per-request for self hosted software, because it’s certainly not a model I’ve heard. Typically the way to approach this is a licensing fee for running the software self-hosted.

mikeshi42 · on June 12, 2018

I don't think usage based pricing is esoteric, if you look towards products in the enterprise space (Gitlab, Splunk, Mongo, etc.) They're all based off of a usage metric of some sort (in our case API requests). We're making on-prem ML accessible at SaaS prices (try negotiating a on-prem contract at other MLaaS). Our costs are continual as we continue to improve the product over time, and those improvements will be passed down to you as a user.

If you're interested in running the product on a licensing fee instead of pay-as-you-go, feel free to shoot us an email at hi@modeldepot.io :) We found that licensed pricing model to be more restrictive for new ML users and increased barriers to entry. If you're not interested in using our paid product, you can check out our primary pre-trained ML platform at https://modeldepot.io/browse and hopefully find a ML solution that works for you without paying a cent.