You should be using Terraform as your configuration management language. Please do not invent your own. Pretty much your yaml is a pseudo-configuration management language.
That will become one of the biggest blockers because of the amount of automation that already exists.
By leveraging Terraform, you will also have the added benefit of getting all the other pieces of AWS/GCP/azure components for free - and is rock stable and production tested.
Thanks for the feedback! We aren't trying to invent another infrastructure provisioning language, and I agree that Terraform would be the right choice if that was the case. Our YAML is more similar to the configuration of deployment tools like Netlify or CircleCI. We use CloudFormation and Kubernetes under the hood but our goal is to provide a much higher abstraction for data scientists / ML engineers.
Not entirely. The abstractions are different between infrastructure deployment.... versus configuration yml of circleci.
The declaration of deployment state is a very BIG and hard problem that has had millions of collective man hours spent over decades. I urge you not to think of it as a simple configuration.
In fact it is so hard that AWS has to build a new language on top of typescript ..versus cloudformation templates that it already had.
The Terraform provider idea is interesting, I'll think about it more carefully. Almost all of our deployment configuration under the hood is done with Kubernetes (which is focused on the declaration of deployment state). We modeled our configuration after Kubernetes for that reason, and we want to go beyond low-level infrastructure configuration by allowing users to configure prediction tracking, model retraining thresholds, and other more ML specific features using the same declarative paradigm and in the same configuration files.
Well, CDK actually produces CloudFormation templates. Sorry, but I always feel the urge to jump in when people claim Terraform should be used instead of CloudFormation because of personal preferences. If you are AWS native and already using CloudFormation, I see no reason to switch. CloudFormation provides a ton of functionality out of the box and Amazon handles it for you. Rollbacks alone are a huge reason one might want to use it over Terraform.
You should be using an existing scripting language as your configuration language.
Seriously, Every single fucking stupid infrastructure-deployment-tool/"platform" whatever has it's own, dumb in-house language that winds up basically re-implementing the programming language the tool is written in badly.
- Puppet: Has a stupid ad-hoc config language.
- Terraform: Has a stupid ad-hoc config language.
- SaltStack: Has a stupid ad-hoc config language.
- Ansible: Has a stupid ad-hoc config language.
If you're even considering implementing a tool like this, use a goddamn existing language for your configuration files.
You don't need to use the entire language, but at least use the language's lexer/parser (cf. json/javascript). That way, all existing tooling for the language will work for the config files (ask me about how saltstack happily breaks their API because you're not "supposed" to use it, despite the fact that they have public docs for it). Additionally, people won't need to figure out all the stupid corner cases in your terrible piecemeal configuration language.
Additionally, by making your configuration language an actual language, you also simplify a lot of the system design, because the configuration can act directly against your API. This means using your tool from other tools becomes much more straightforward, because the only interface you actually need is the API.
Pulumi made the mistake of immediately making remote state a paid-only feature. Even if it's not, from all the recent marketing I looked at everything useful required payment; for getting started with a project that's a non-starter.
On top of that, most of the worst parts of Terraform are no longer an issue with 0.12.
Its completely possible to host your state in S3 or a filesystem; it takes a bit of setup and there may be a few rough edges, but the effort or subscription is completely worth it. The secrets management alone makes it worth it, but their programming model is definitely the future. I think the fact that AWS just released their Cloud Development Kit is strong validation of the approach.
My understanding is that Seldon and Kubeflow are more geared towards infrastructure engineers. Our goal is to hide the infrastructure tooling so that Kuberentes, Docker, or AWS expertise isn’t required. Cortex installs with one command, models are deployed with minimal declarative configuration, autoscaling works by default, and you don’t need to build Docker images / manage a registry.
That will become one of the biggest blockers because of the amount of automation that already exists.
By leveraging Terraform, you will also have the added benefit of getting all the other pieces of AWS/GCP/azure components for free - and is rock stable and production tested.