I really dislike the idea of declarative infrastructure. It's literally a progra...

hardwaresofton · on Jan 3, 2021

This example is really simple -- it gets more complicated when you want to check things that don't serialize perfectly to strings you can easily grep for.

Once you start writing complex scripts you have a choice -- you either do it imperatively, or declaratively. Eventually you'd come to the fact that it doesn't make sense to just... run imperative commands when you can't guarantee that the other end is idempotent, so you'd arrive at:

- (optionally) take a lock on performing changes

- Check existing state

- Perform your changes based on existing state

- (optionally) release your change lock

And voila, we're at complexity. I'd argue that this complexity is essential, not accidental given the goal of making an easy to use system that ensures state.

0xbadcafebee · on Jan 3, 2021

> Once you start writing complex scripts you have a choice -- you either do it imperatively, or declaratively.

I don't think declarative programming exists. I think it's just a regular old program with a poorly defined interface. Moreover, I think the claims of idempotence are overblown to the point of near falsehood.

Declarative Infrastructure is really just Configuration Management applied to cloud infrastructure rather than operating system software. Neither have really solved anything, other than turning the management of complexity into a Sisyphean task. Forever pushing drifting state back up the hill.

Compare this to Immutable Infrastructure, where state never drifts. One never "fixes" a container once deployed, or a package once built and installed. One merely rolls back or upgrades. Any uncertainty is resolved in the build and test process, and in providing both all the dependencies and the execution environment.

I think eventually people will wise up to the fact that Terraform is just puppet for infrastructure. I think the real fix is to make the infrastructure less like an operating system and more like versioned packages. Install everything all at once. If anything changes, reinstall everything. Never allow state change.

erik_seaberg · on Jan 3, 2021

If you’re careful you can do blue/green failover to replace some resources, but datastores need to be updated in place.

0xbadcafebee · on Jan 3, 2021

Sounds like we need to reinvent data stores!

SQL is due for replacement. The combination of schema and data in one constantly mutating hodge podge with no atomic immutable versioning or rollback is absolutely ancient. Migrations are an okay hack but definitely not good enough.

ZFS and LVM prove filesystems can do snapshots and restores on a version of filesystem history without a lot of pain, so clearly we just need more work here to make it an everyday thing. Versioning should be the default and probably also an infinite transaction log, seeing as capacity and performance is ridiculous now.

And couldn't we lock writes, revert a perpetual write journal/transaction log to some previous version, and then continue a new write history tree? If you run out of space, overwrite the old history. If you don't run out of space, allow reverting back.

And allow bulk atomic updates by specifying a method to write files that aren't 'live' until you perform some ioctl, and then atomically expose them and receive the new history version. Then you could do immutable version-controlled storage on a filesystem, right?

Blob/object stores should be much simpler to do the same with. Just an API rather than ioctl.

In this way, replacing a data store immutably will just be replacing a reference to a storage version, the same as using snapshots, but built into the filesystem/API.

erik_seaberg · on Jan 4, 2021

Hm, isn’t there still nondeterminism from a path dependency, because reinstalling a datastore that has arbitrary history isn’t exactly equivalent to creating a datastore with none?

hardwaresofton · on Jan 4, 2021

> I don't think declarative programming exists. I think it's just a regular old program with a poorly defined interface. Moreover, I think the claims of idempotence are overblown to the point of near falsehood.

I think this really depends on how you define the term "declarative programming" -- pinning down a singular meaning and a singular interpretation is really hard. If we think about it like a spectrum, there's a clear difference between ansible and terraform like there is with python and prolog. That's "declarative" enough for me.

Idempotence is also really tricky and hard -- I'm not surprised most large codebases can't handle it, but getting close is definitely worth something.

> Declarative Infrastructure is really just Configuration Management applied to cloud infrastructure rather than operating system software. Neither have really solved anything, other than turning the management of complexity into a Sisyphean task. Forever pushing drifting state back up the hill.

While I agree on declarative infrastructure being configuration management applied to cloud infra (especially in the literal sense), I would argue that they have solved things. In the 90% case they're just what the doctor ordered when compared to writing every ansible script yourself (or letting someone on ansible universe give it to you) -- and ansible actually supports provisioning! The thing with this declarative infrastructure push is that it's encouraged the companies themselves to maintain providers (with or without the help of zealous open source committers), so now someone else is writing your ansible script and it has a much better chance of staying up to date.

> Compare this to Immutable Infrastructure, where state never drifts. One never "fixes" a container once deployed, or a package once built and installed. One merely rolls back or upgrades. Any uncertainty is resolved in the build and test process, and in providing both all the dependencies and the execution environment.

People are often using these two concepts in tandem -- the benefits of immutable infrastructure are well known, and I'd argue that declarative infrastructure tools make this easier to pull off not harder (again, because you don't have to write/maintain the script that puts your deb/rpm/vm image/whatever on the right cloud-thing).

> I think eventually people will wise up to the fact that Terraform is just puppet for infrastructure. I think the real fix is to make the infrastructure less like an operating system and more like versioned packages. Install everything all at once. If anything changes, reinstall everything. Never allow state change.

Agreed, but I'm not sure this is very practical, and there's a lot of value in going part of the way. There is a lot of complexity hidden in "reinstall everything" and "never allow state change", and getting that going without downtime -- it requires the cooperation of the systems involved most of the time, and you'll never get away from the fact that there is efficiency lost.

But again, we were talking about the scripts you'll have to write -- in a world that is not yet ready for fully immutable infrastructure, it's just a question of how you write the scripts, not whether an option exists that will prevent you from writing them all together (because there isn't, and most things are not fully immutable-ready yet).

0xbadcafebee · on Jan 4, 2021

> there's a clear difference between ansible and terraform

The only difference I can see is that Terraform attempts more of an estimation of what might happen when you apply. Otherwise they're the same.

Terraform has multiple layers of unnecessary complexity which were added with good intention (the belief that you could "plan" changes before applying them) but don't actually work in practice. Your state file never reflects the actual state, so it's pretty much meaningless. The plan step is (in theory) supposed to tell you what will happen before you hit apply. But actually knowing it beforehand is impossible.

Part of that is the fault of the providers that don't do the same validation as the actual APIs you're calling do. But the other part is the fact that the system is mutable; it's always changing, so you can never know what will happen until you pull the trigger. The only way to say "only apply these changes if they will actually work" is to move the logic into the system, turning them into transactions (ala SQL).

Honestly, the only reason I use Terraform at all is because writing a bunch of scripts is not scalable. With large teams, you have to use some kind of uniform library/tooling to manage changes. Terraform is currently the best "free" option for that, but I don't find Ansible any more or less reliable, it's just more annoying to use. I definitely don't use them for any "declarative" approach they may have. And in fact, for regular app deployments, I actively do not use Terraform/Ansible at all, and instead write deployment scripts that can manage my particular deployment model requirements. I intentionally abandon the "declarative" model because it's so uncertain (and unwieldy).

> The thing with this declarative infrastructure push is that it's encouraged the companies themselves to maintain providers (with or without the help of zealous open source committers), so now someone else is writing your ansible script and it has a much better chance of staying up to date.

I agree with you here, it's very good that companies can invest in supporting a provider so people can benefit from common solutions. I'm not sure that is specific to declarative infrastructure as much as just being more proactive about supporting their users using their services, though. For example, NewRelic didn't have a Terraform provider until one of their customers wrote one, and eventually they took it over. It's still not great (I have to supplement a lot of missing features with custom scripts calling their APIs directly), but it's better than nothing.

plasticxme · on Jan 3, 2021

Infrastructure should be defined in an easily digestible, human-readable format.

Your manifests serve two purposes: define infrastructure and self document.

While you can achieve the same infrastructure automation with shell scripts, they’re rarely written well enough to easily understand, introducing operational risk when handed off to other people or teams.

oceanplexian · on Jan 3, 2021

Documentation needs to express the intent of the author and how they arrived at a solution, and more importantly why they arrived at a solution. As someone who's had to clean up "self documented" code I can say unequivocally it will be a disaster. A decade from now we will be untangling thousands of lines of some ancient Python library to understand the intent of infrastructure that could have otherwise been properly documented in 5 minutes.

busterarm · on Jan 2, 2021

Yes but AWS CLI commands change over time and don't have a native way of maintaining which version of the CLI you use. Also, you have to maintain that knowledge for however many things you have to do across however many providers.

The point of Terraform isn't to add complexity, it's to have a general way of interacting with a vast number of APIs that's effectively the same and to abstract away the tribal knowledge of knowing how each individual API works.

On the same provider version, you generally can expect Terraform to work the same over time (okay this is less true for say Google provider...) as the CLI keeps evolving.

It's still helpful to understand the providers and their CLIs, but Terraform is a substantial force multipler because of how generic it is across the absurdly long list of APIs that it talks to. That is what its value is.

0xbadcafebee · on Jan 5, 2021

But it's not generic. I have to track the provider version, the Terraform version, my module version, and any sub-module versions, long-term. Each internal team has to jump through hoop after hoop just to run terraform apply reliably.

I've never yet had to rewrite a shell script that used a new version of AWS CLI. It's very possible that that's only because I've not been using it enough. But even that would be just one level of complexity to manage, rather than four.

And in fact, even within a single provider, interfaces aren't generic at all.

You have to write every single resource of every single provider to be specific to its definition. It would be the same amount of work if you were writing a shell script with curl to plug into each API call. I know, because it was actually faster for me to write a Bash implementation of NewRelic's APIs than to figure out its ridiculous Terraform provider and resources with no documentation.

The only benefit of Terraform is that I don't have to write the API implementation [for most providers]