I'm one of the HashiCorp founders. Terraform 0.11 to 0.12 is by far the most dif...

0xbadcafebee · on April 14, 2021

Our teams have something like 100,000 LOC in Terraform 0.12, and it's not all in one big monorepo. At that scale there is no such thing as a relatively minor version upgrade.

We want to upgrade to get away from some persistent 0.12 bugs, but we literally don't have the time. We have to change all of the code, and then test every single project that uses that code in non-prod, and pray that the testing finds most of the problems that will appear in production. And it's all owned by different groups and used in different projects, so that makes things longer/more complex. We also have to deal with provider version changes, upgrading CI pipelines and environments to be able to switch between Terraform binaries, and conventions to switch between code branches.

I am already looking around for some way to remove Terraform from our org because it is slowly strangling our productivity. It's way too slow, there's too many footguns, it doesn't reliably predict changes, it breaks on apply like half of the time, and it's an arduous manual process to fix and clean up its broken state when it does eventually break. Not to mention just writing and testing the stuff takes forever, and the very obvious missing features like auto-generation and auto-import. I keep a channel just to rant about it. After Jenkins, Ansible and Puppet, it's one of those tools I dread but can't get away from.

dastx · on April 14, 2021

You can use tfenv to upgrade individual workspaces one at a time. You don't need to do a big bang upgrade.

Note upgrading to 0.13 is quite easy and terraform actually has a subcommand that does most of the work you (usually no additional steps required).

> I am already looking around for some way to remove Terraform from our org because it is slowly strangling our productivity.

The only other alternatives you have are Pulumi. All other alternatives are in my opinion, way worse. You can use ansible, which I'd even worse because you have to manage ansible version upgrades and have no way of figuring out what changes will be made (yes, --diff is usually useless). You can manage manually, but good luck. Lastly your option is CFN (or Azure/GCP equivalent) but then you have no way of managing anything outside of the cloud environment.

Thaxll · on April 14, 2021

There is no solution where 100k loc is not going to be challenging to keep over time.

pizza234 · on April 14, 2021

While it's not possible to make an apple-to-apple comparison (Terraform-to-?), if we compare to something based on an imperative language, say Puppet or Chef, there is a huge difference.

In my opinion, Terraform's big issue is that it was born as a declarative tool for managing infrastructure. Large configurations (IMO) necessarily ossify, because you don't have an imperative language that makes small progressive changes toleratable - it's a giant interrelated lump.

What's worse, when it grows, one needs to split it in different configurations, and one loses referential safety (resources will needed to be linked dynamically).

A, say, Chef project of equivalent size, can be changed with more ease, even if it's in a way, even less safe, because you have the flexibilty of a programming language (of course, configuration management frameworks like that have of different set of problems).

I'm really puzzled by the design choice of a declarative language. Having experience with configuration management, it's obvious to me that a declarative language is insufficient and destined to implode (and make projects implode). Look at the iterative constructs, for example, or the fact that some entities like modules have taken a long time to be more first class citizens (for example, we're stuck with old-style modules that are hard to migrate).

sciurus · on April 14, 2021

> compare to something based on an imperative language, say Puppet or Chef

I'm puzzled by this comparison. I consider both of these to be primarily declarative languages. You declare the state you want puppet or chef to enforce, not how they get there.

E.G. https://puppet.com/blog/puppets-declarative-language-modelin...

pizza234 · on April 16, 2021

I've indeed stretched the concept by equating Chef and Puppet (I guess the latter is closer to TF).

To be more accurate, I'd say that Chef has a declarative structure supported by the imperative constructs of the underlying language, and this is what makes for me a big difference.

Consider the for loop as example. By the time it was added (v0.12), there was a (200 pages) commercial book available. And there are people in this discussion stuck at v0.11.

The difference in the declarative vs. imperative nature, as I see it now that the for loop is implemented in TF, is that it's embedded inside resources, that is, it fits strictly a declarative approach, and has limits. In Chef, you can place a for loop wherever you prefer.

Object instances is also another significant difference. It took a while for TF to be able to move (if I remember correctly) module instances around (that is, to promote them to "more" first class citizens), which made a big difference. In an imperative language, accessing/moving instances around is a core part of the language. In Chef, pretty much everything is global - both in the good and in the bad. But certainly the good part is that refactoring is way more flexible.

I think TF has always been plagued by repetition; in my view, this is inherent in the "more" declarative approach (since they're trying to embed imperative constructs in the language).

Thaxll · on April 14, 2021

I have really bad memories of the change between puppet 2 and 3 for example.

tilolebo · on April 14, 2021

Same. I went through a puppet 2->3 migration and also through a terraform 0.11->0.12 update.

The puppet migration was definitely more painful, because of the entangled code.

pizza234 · on April 16, 2021

It's not clear if it was entangled because it was written in the specific framework or because it was just badly written code. In the latter case, this hasn't really anything to do with the framework. Additionally: did you make the v0.12 a migration just work, or did you change the codebase to take advantage of the new features (and remove inherent duplication)?

There are inherent problems in the TF framework and the migraitons. 0.12 introduced for loops, and 0.13 added modules support to them. So a proper migration should convert deduplicate resources into lists of resources. This is painful for big models, since one needs to write scripts in order to convert resource associations in the statefile. And hope not to miss anything!

Due to the strictly declarative nature, it's also difficult to slowly move duplicated resources into lists, and handle both of them at the same time.

At this time, our time is stuck with a certain TF version, and can't move without spending considerable resources.

cyberpunk · on April 14, 2021

Yeah same boat. We ended up doing several complete rewrites and finally giving up. My main grievance is hcl, it’s so close but so far from an actual programming language that it drives me mad, even after a few kilolines of it in prod.. we ended up going with pulumi which so far has served us well

diroussel · on April 14, 2021

It seems that terraform CDK has been introduced to compete directly with pulumi.

I think both are a great idea as the DSL has given me so many headaches over the years.

ec109685 · on April 15, 2021

Tangential, but curious how did you get to 100k lines of TF? I’d imagine most things within your company would follow very similar patterns and therefor be extracted into modules, and the per app/team code would be relatively small and focused on how to compose these modules together.

pm90 · on April 15, 2021

Modules are useful only up-to a point. Creating complex modules with a ton of moving parts makes it difficult to make changes, to upgrade etc. The best recipe that I’ve found is to use modules to enhance some core functional component and then compose these modules to build infrastructure, rather than defining your entire stack in a single module.

diroussel · on April 14, 2021

We also found tf 0.12 to be quite slow. But this was fixed in 0.13 and how it feels lightning fast compared to before.

hamandcheese · on April 15, 2021

> and it's not all in one big monorepo

There's your first problem.

llbeansandrice · on April 14, 2021

Thanks, good to know that the upgrade to 12 is the biggest jump.

john37386 · on April 14, 2021

I had the same question or concern. I also realized too late that 0.12 is a bigger one than first thought. I was not severely impacted in the end, but boy it was a long time that I didn't experience such a tough upgrade. Happy to know that the hardest is behind and looking forward to try 0.15. Thanks