How does this work under the hood? Is the model loaded every time it receives a request? Is it run in a docker or a lambda? How does it work after "uploading it" to amazon?
Each model is loaded into a Docker container, along with any Python packages and request handling code. The cluster runs on EKS on your AWS account. Cortex takes the declarative configuration from 'cortex.yaml' and creates it every time you run 'cortex deploy' so the containers don’t change unless you run 'cortex deploy' again with updated configuration. This post goes into more detail about some of our design decisions: https://towardsdatascience.com/inference-at-scale-49bc222b3a...
From the MLflow Models docs: "An MLflow Model is a standard format for packaging machine learning models that can be used in a variety of downstream tools—for example, real-time serving through a REST API or batch inference on Apache Spark. The format defines a convention that lets you save a model in different “flavors” that can be understood by different downstream tools."
Cortex is what they are referring to as a downstream tool for real-time serving through a REST API. In other words, MLflow helps with model management and packaging, whereas Cortex is a platform for running real-time inference at scale. We are working on supporting more model packaging formats and I think it's a good idea to support the MLflow format as well.
Contributor here - Cortex supports Tensorflow saved models in addition to ONNX. PyTorch support is on the roadmap. Do you have specific frameworks in mind that you would like Cortex to support?
Unfortunately, a somewhat popular Clojure library for machine learning on GitHub is also called Cortex, because this is going to make discussing machine learning APIs in the context of Clojure that much more confusing.
And I imagine many more machine learning tools will take the same name in the years to come, since it's about the most obvious one you could think of other than "brain".
Calling it an alternative to SageMaker might be a bit misleading, as SageMaker is also a platform for training the models in automatically allocated EC2 resources, even on spot instances.
Cortex contributor here - you're right, I would say we can be compared to SageMaker model deployment. We are currently working on supporting spot instances for serving, and training is on our roadmap.