Where I last worked, all terraform changes went through PR, requiring approval, ...

arlk · on July 8, 2021

Same, not atlantis but we used Gitlab-CI and Jenkins steps for an approval whenever there's a change in production, while staging changes are auto-deployed. Terraform plan was written to the PRs using tfnotify[0]. Normal deployments typically took 1 minute and 20 seconds (for each environment, in parallel) which I would consider very reasonable considering that we deployed a medium size infrastructure with only 2 terraform layers, so there was a room for optimizations.

[0]: https://github.com/mercari/tfnotify

Pokepokalypse · on July 8, 2021

This is the way.

(we use gitlab-ci's built-in review process for approval).

At the end of the day, approval's still a human job, and humans make mistakes. Right? :D

Terraform is an incredibly powerful tool, and you can make some monumentally huge mistakes with it.

yashap · on July 9, 2021

Atlantis was actually created at my previous workplace by a couple of my ex-coworkers! Agreed that it’s a great way to bring a bit more care/rigour to always-dangerous infrastructure changes. IIRC we had it configured so that you had to always had to do things in this order:

- plan against staging

- get a PR approval

- apply against staging

- plan against prod

- apply against prod

- merge

Being forced to plan (and get someone up review said plan) before applying makes it far, far less likely you’ll do the level of damage described in this blog post.

arlk · on July 9, 2021

From what I experienced, per-environment branches is a bad practice that eventually will be a big burden to deal with especially when environments don't match. Actually the concept of "staging" in infrastructure is different than it in code, which is the usual source of confusion.

The best strategy is to have a repository for your modules only so you can specify the version[0] you want to use, and separate environments by folders.

[0]: https://www.terraform.io/docs/language/modules/sources.html#...

yashap · on July 10, 2021

Yeah, we just had a single feature branch, which we would merge into the single master branch. We’d simply apply it to staging first, make sure nothing terrible happened, then apply to master. All those steps I listed above happened on the same branch, same PR.

carrja99 · on July 8, 2021

Atlantis is great. If you've grown beyond 5 or so engineers you should have no excuse to be running terraform apply from laptops.

throwaway290232 · on July 8, 2021

It is completely embarrassing how many engineers we have and still apply manually from laptops. Changes are slow and error-prone, we don't even have them hooked up to CI/CD. I think it still works because we have so many damn engineers and we don't actually need to change infrastructure multiple times a day.

That said, Terraform breaks so often that if we did it all automated, we'd have a million more Git commits from trying to fix broken apply's.