Terraform 1.0

solatic · on June 8, 2021

Terraform is such an underappreciated tool. It seems like so much of the hate surrounds HCL1 (back in Terraform before 0.12) and doesn't reflect modern Terraform.

For example, after introducing `for_each` and dynamic blocks, it's possible to nearly entirely ditch variables files and local modules, and just add more infrastructure by editing a local YAML file. The only variables your Terraform code should have should be credentials / other secrets that are not loaded from environment variables by providers. A great public example of this usage pattern is supplied by https://github.com/concourse/governance to manage their GitHub repositories.

majormajor · on June 8, 2021

My problem with this approach is that it's still too much "infrastructure as data" and not "infrastructure as code." Moving infrastructure data into flat files is not a clear-cut win over having it in a database - you get easier version control with external tools like git, but you everything that makes a database a joy to work with instead of flat files, like schema validation and easy queries, etc.

Things like for_each and variables exist because "infrastructure as data" would be incredibly tedious and brittle and hard to extend, but an approach that tries to get to "infrastructure as code" by starting with a data format instead of a programming language just seems like too big a gap to cross. I haven't seen a lot of teams unit testing their terraform, for instance.

hpoe · on June 8, 2021

But at the end of the day your infrastructure is essentially data not code. Your infrastructure is permanent, it exists even if it isn't being used it has inertia. At the end of the day your "infrastructure" is really just an entry in a database of a cloud provider, it is data not code.

I think we are seeing things come full circle again where people are finding the limitations of declarative infrastructure tools and decreeing declarative infrastructure dead and moving back to imperative infrastructure tools like Salt or Ansible.

Does anyone else feel that the infrastructure tooling environment/space is in the same place the JS world was 5 years ago?

throwaway894345 · on June 8, 2021

> But at the end of the day your infrastructure is essentially data not code. Your infrastructure is permanent, it exists even if it isn't being used it has inertia. At the end of the day your "infrastructure" is really just an entry in a database of a cloud provider, it is data not code.

That may well be true, but it doesn't solve the problem (note also that HTML is just data, but we don't typically expect people to copy/paste the same HTML blob for every blog entry they write nor do we expect them to update each of them when they need to make a change):

We often have N very similar, large, complex YAML/HCL/etc objects that we want to manage with Terraform. If we need to make a change to all of them, we have to update N different places. Keeping these in sync is tedious and error prone. So we need to be able to factor out the common code into some reusable unit that accepts the bits that vary as parameters. Terraform's notion of "modules" is a great big acknowledgement of this need, although it's amazing that the whole time they were building this no one thought to themselves "guys, this seems really heavyweight and cumbersome for what ultimately is just a function" (and that general failure to notice that they were accidentally building a fully fledged programming language seems like an apt summary of Terraform's development).

Note also that there's nothing special about infrastructure as code here, this is a general application of the DRY principle.

> I think we are seeing things come full circle again where people are finding the limitations of declarative infrastructure tools and decreeing declarative infrastructure dead and moving back to imperative infrastructure tools like Salt or Ansible.

Just because you're using a programming language doesn't mean you're imperatively updating state. You use a programming language to generate the static configuration (e.g., the YAML) that verbosely describes the desired state of the world that the application engine can then diff against the current state to figure out what changes need to be made. This is sort of what Terraform is doing these days, but by all appearances they didn't realize what they were doing and consequently the programming language they built was predictably awful.

fierro · on June 9, 2021

excellent summary of the problem.

polendri · on June 10, 2021

It sounds like you're suggesting that there's some inherent reason why your infrastructure definition must have the same structure as the output of that definition (the infrastructure). I agree that the infrastructure is state, but it seems obvious to me that sometimes it requires non-trivial computations to decide on the desired state, something which is best served by code.

It's also a false dichotomy IMO that configuration files are the only declarative alternative to imperative tools like Salt/Ansible. You can have declarative code too: my laptop is running NixOS and its system state is defined in code (in a purpose-built language that looks much like a config file).

So really I think there are three approaches, not two, each with upsides and downsides which keep us all ping-ponging between them:

1. Config files are ideal for simple use cases, but a mess for complex ones 2. General-purpose programming languages are completely flexible, but allow you to create a huge unmaintainable mess 3. Dedicated declarative languages constrain you enough to mostly provide the best of both config files and code, but then you have to learn a whole new language, one which was probably conceived hastily (I find the Nix language awful honestly)

Some people need arbitrary computations to define their infrastructure, so I think pure config files are a non-starter from a purist's perspective. But, so far we haven't been able to come up with a programming language for infrastructure that isn't a mess to use.

tehbeard · on June 9, 2021

Side tangent, but I'm curious as to why you list Ansible as imperative, when it seems to be declarative in how you configure a module?

Or is this a case of scope? (At the level of a single ansible module, it's config is declarative, but runbooks/roles are imperative? Is it the variable substitution/loop mechanics that make it imperative?)

randomswede · on June 10, 2021

In Ansible, you declare a set of actions, that are then performed, one by one. Occasionally being skipped, if a certain condition holds true.

So, you are basically saying "do this, do that, then do that".

In a declarative model, you would say "this is how the end result should look" and the tool would then go off and make that happen, in whatever order its scheduling tools would say.

Sort of the difference between Rust on one side, and Prolog on the other (yes, it is possible to get a specific flow of instructions in Prolog, but it is much easier to let the prolog interpreter/compiler to Just Make It Happen Somehow).

FWIW, Puppet gets closer to a declarative model, but unfortunately, the last version I played around with seriously was actually quite bad at inferring ordering on its own, so a LOT of work ended up going into "well, A has to happen before B, so let us string a dependency here".

tehbeard · on June 10, 2021

I guess I need an example of the declarative model, as I can see the Ansible model in my head and it still looks declarative to me atleast at the singular module level.

in Ansible, you say "make sure these packages are installed" and they'll be installed as needed, to match that state, or ignored if already there.

Even the file level stuff you can say "make sure this line is in the file" and it either adds it or says "nope, that's already in there".

Is it that there's modules that aren't declarative? Sort of the esoteric ones to poke specific cloud infrastructure (though even the few of those I looked at seemed to be declarative if needed).

randomswede · on June 11, 2021

It is not declarative. The simple fact that the playbook will ALWAYS be run in the order you specify, even if a later step is (technically) a prerequisite of a previous step, means that you are in an imperative mode.

Puppet is declarative, you simply say "these things must, or must not, hold" and a combination of user-declared and inferred dependencies arrange the sequencing, which can be different in each run (as long as the before/after dependencies hold).

tehbeard · on June 12, 2021

Ah so it's that the playbooks are imperative and "dumb" (does exactly what you tell it, rather than inferring "these actions must happen, do them in a sensible order"). That makes sense.

Dove into the puppet docs/wiki article, I guess part of the difference as well is that puppet considers each "unit" a resource, vs. ansible being a "module/action".

It does seem like ansible roles have a dependency mechanism, I guess that might be the intended level for a "declarative" approach in ansible, to encapsulate the playbooks/modules underneath that are more of an implementation detail at that point.

thayne · on June 8, 2021

It's a lot better than it used to be. But there are still quite a few annoyances. For example, you still need to use count as a hack for the absence of any kind of "if". You can't make custom functions. Modules can be kind of awkward to work with. There are still some places that can't take any dynamic values such as lifecycle.ignore_changes and arguments to providers and backends.

frenchman99 · on June 8, 2021

The `count()` "hack" is so common that it barely qualifies as a hack anymore. It's just common practice and immediately understandable when you read code.

bxbxbuuu · on June 8, 2021

This reminds me of shopify's liquid dsl, a horror to work with, but you can just about make it do what you want, sometimes it feels like writing assembly to do string manipulation if they haven't built a function for your exact scenario.

jchook · on June 8, 2021

I really prefer the Pulumi approach where you define the configuration in your favorite Turing-complete language.

Not sure why Hashicorp felt the need to reinvent the wheel instead of having a library in an existing language generate markup or JSON or something like that.

solatic · on June 8, 2021

The biggest issue with Pulumi is that Pulumi doesn't support adding custom API providers. Part of the power of Terraform is in provisioning infrastructure, orchestration, deployment, and application configuration all in one tool. For example:

(aforementioned GitHub provider)

https://registry.terraform.io/providers/terraform-provider-c... for Concourse (CI/CD)

https://registry.terraform.io/providers/coralogix/coralogix/... (full disclosure: I work for Coralogix)

This would be completely impossible with Pulumi. If Pulumi didn't bless it, it doesn't exist in Pulumi's world. In the meantime, Terraform allows you to separate all the network calls to a custom provider and allow you to just focus on the configuration. The number of paid external APIs is only expanding exponentially, Pulumi can't possibly build and support them all in-house. Sounds like a current limitation of Pulumi's "use any programming language you want" design and something that really needs to be addressed; it's not that writing a custom Terraform provider is easy, but it is quite simple to get started by following any of the bajillion open-source providers as a sample template to get started from.

TrueTeller · on June 8, 2021

(Pulumi providers dev here)

This has been the case in the past but we are investing in our provider ecosystem. We built several first-party native providers that aren't based on TF: Kubernetes, Azure, Google. Now, we also encourage third-parties to build their integrations.

Here is a boilerplate repo of a resource-based provider: https://github.com/mikhailshilkov/pulumi-provider-boilerplat...

Here is a provider that is driven by an Open API spec: https://github.com/mikhailshilkov/pulumi-provider-boilerplat...

For simple use-cases, you've always been able to build Dynamic Providers in TypeScript or Python: https://www.pulumi.com/blog/dynamic-providers/

Please reach out if you want to build a provider and we'll definitely help you out.

mdaniel · on June 8, 2021

> If Pulumi didn't bless it, it doesn't exist in Pulumi's world.

That has not been my experience. I have personally ported a Sentry TF provider into Pulumi, and I will grant you that their docs and examples are bordering on active user hatred for exercising the process, but it does work:

https://github.com/pulumi/pulumi-terraform-bridge#adapting-a...

https://github.com/pulumi/pulumi-tf-provider-boilerplate#rea...

What mystifies me about that situation is that I do actually appreciate the amount of silliness that is required to avoid using Pulumi cloud: they are not financially incentivized to make that easy, but I'd guess a lot more folks would nope right out if they didn't make it possible

However, I would think they'd want to make ingesting a TF provider into Pulumi as smooth and reliable as possible, so they don't have people close their browser tab when they don't find a supported provider for Pulumi but it exists in TF

jen20 · on June 9, 2021

> This would be completely impossible with Pulumi. If Pulumi didn't bless it, it doesn't exist in Pulumi's world.

This is only true (temporarily) for automatic plug-in installation - and was until recently also true of Terraform. In fact I had to reverse engineer the TF provider registry protocol because the documentation is manifestly incorrect, recently.

$WORK has lots of Pulumi plug-ins which they know nothing of the existence of, and it works fine.

__jem · on June 8, 2021

Maybe I’m missing something, but I don’t think this is true? E.g., https://www.pulumi.com/blog/dynamic-providers/ There’s also an example of their blog on doing a schema migration with custom logic.

hacker_newz · on June 8, 2021

Why are you using Terraform for orchestration?

jaaames · on June 8, 2021

Can't agree enough.

Declarative programming makes sense for lots of things, React is a great example.

With such a big dependency graph for infra, adding loops and variables and templating to be able to achieve the same thing as Pulumi in a "declarative" way is ultimately just harder and worse than using a familiar powerful language with an SDK.

jen20 · on June 9, 2021

Worth noting that Pulumi IS declarative - the languages build a graph imperatively, but the evaluation is declarative in nature.

mhitza · on June 8, 2021

For me it's less about HCL annoyance nowadays, but more about discoverability. Using Pulumi I no longer have to memorize resource properties because I get IDE autocompletion.

frenchman99 · on June 8, 2021

Autocomplete is automatic in Intellij as far as I can see. I don't recall doing any kind of custom configuration to have it working. Autocomplete works on resource names, variable names, properties, etc.

giaour · on June 8, 2021

Autocomplete for Terraform/HCL is available, too, though you do have to use specific tooling (e.g., VS Code with the Terraform extension) rather than the same tools you use to work on JS.

jen20 · on June 8, 2021

The specific tool recommended here is simply not very good - despite the language server efforts, the IntelliJ HCL plugin is worlds apart from the VS Code tooling (and has been for years). Unfortunately it's not open source - if it were it would mean the availability of an open source implementation of a production quality HCL2 parser for the JVM ecosystem, which would be very useful.

sverhagen · on June 8, 2021

I have really liked the Terraform support in IntelliJ, but the "HashiCorp Terraform / HCL language support" plugin seems to have had its most-recent release on July 17, 2020[1]. And it clearly does not support a bunch of the newer constructs and properties. And that's just very unfortunate.

[1] https://plugins.jetbrains.com/plugin/7808-hashicorp-terrafor...

jen20 · on June 8, 2021

Any examples of things that aren't supported? It doesn't need to embed the metadata per-resource anymore.

sverhagen · on June 8, 2021

I'm seeing errors on each.value.foo when using for_each. Also, this gives me errors:

locals { foo = { for bar in local.bars: "${bar.x}.${bar.y}" => bar } }

Then, optional(bool) is "is not a valid type constructor".

Those all seem "language" aspects. For a resource like "github_branch_protection" it seems to not recognize the right properties. That seems to be more of provider issue.

stevehawk · on June 8, 2021

it took 5 years to get that useful for_each for modules though

so I'd imagine some people waited long enough that they moved on to better tools.

acdha · on June 8, 2021

What better tools do you have in mind? Most of the people I know in the space have been moving _to_ Terraform, although CDK has improved enough over CF to be appealing for people who are all in on Amazon.

thayne · on June 8, 2021

"better" is subjective. But Pulumi fixes a lot of the pain points of terraform for me.

staticassertion · on June 8, 2021

Pulumi has been such a breath of fresh air. It is the only tool that actually feels like it encompasses "infrastructure as code".

MehdiHK · on June 8, 2021

Terraform CDK is a thing too, if you want to go beyond AWS.

asenchi · on June 8, 2021

for_each is an anti-pattern for reliable Terraform IMO. Not sure it was worth the wait and there isn't much out there that can compare with the simplicity of Terraform.

devonbleak · on June 8, 2021

WAY more reliable than count which would do screwy things like rename a bunch of stuff and delete the last item if you removed an item from the middle of a list.

Complex architectures and reusable module encapsulation require a bit more complexity than HCL1 was capable of describing IMO (and apparently the O of most of the Internet). That doesn't necessarily make it less reliable.

Could I describe my infrastructure "reliably" just using raw resources with no loops? Sure but that sounds like a nightmare to both build and maintain.

ncmncm · on June 9, 2021

'Fraid you lost me at "YAML". Anything built on YAML seems like it has to indicate bad judgment at the root.

There could be reasons, but I don't know them.

nikau · on June 8, 2021

Its just an awful language.

Using it is like writing msdos batch files where you are constantly working around limitations and bizarre syntax.

takeda · on June 9, 2021

> Terraform is such an underappreciated tool

Are you kidding? It's the go to tool even for people who are brand new to IaaC.

If anything I would say CloudFormation is underappreciated a lot of reasons why TF was created were fixed almost a decade ago. TF users are still citing those things as the reason why they use TF without ever using it.

randomswede · on June 10, 2021

I have not looked closely at CF for a couple of years, but in late 2016, I actively preferred TF over CF. But, I understand that the XML-only has since been changed and since that was the only real issue I had with CF...

cactus2093 · on June 8, 2021

I haven't done a lot of infrastructure work in the past few years so haven't stayed super on top of the latest changes. I last used it heavily in the earlier days, roughly 4-7 years ago now. And while a lot of the community was great, put in a lot of work on the product, and generally wanted to improve the tool, there were also a lot of very vocal stodgy old timers that were really resistant to any improvements from the very earliest days. It definitely rubbed me the wrong way at times and made me want to look at alternatives.

I remember some old threads about loops for instance, and a lot of the core community was fully convinced that it was a terrible idea, nobody should ever need loops, and if you're a complete weirdo who does want them you should just use a separate templating language to generate your terraform configs instead. And when modules were first released, the support for using them as a means of local code encapsulation and reuse was pretty weak (it would for some reason hard-code absolute file paths in the tfstate file IIRC, so if one person ran a terraform plan on a state file somebody else had last pushed it would always show up as needing to be changed even if it was already up to date). Again I remember core developers insisting that nobody needs features for local code reuse, and modules are only needed for publishing public resources that others can pull in.

Anyway, by no means do I hate Terraform, but I definitely associate it with being unnecessarily clunky and convoluted and full of gotchas even for fairly common use cases. In my opinion that reputation is pretty deserved and built up over probably a hundred hours of experience struggling with it a few years ago. I'm glad to hear that it sounds like that is changing, but I'd still be very cautious and carefully evaluate all the newer alternatives before rushing back to use it again.

mtalantikite · on June 8, 2021

Things have changed since you last used it 4 years ago, so it's probably unfair to judge the tool now based on how it operated then. Most of these pain points (code reuse, state management, more robust HCL features) have been addressed. The one major thing I'd like to see are better LSP bindings for IDE support.

Terraform has been a great tool and it's always surprising to me to hear people hating on it.

cactus2093 · on June 8, 2021

It's a fine tool, but all the other comments as peers of mine highlight the same kinds of issues I mentioned and got completed downvoted for. So clearly there is something to it. Nobody is hating on Terraform, just trying to avoid choosing a tool that makes their job more difficult than alternatives.

jen20 · on June 9, 2021

> very vocal stodgy old timers that were really resistant to any improvements from the very earliest days

As one of the three maintainers of Terraform (for the core and all providers) in that time frame, your characterisation is not particularly accurate - likely hence the downvotes.

Many of the “suggestions” in that time frame were “we should do something and ‘X’ is something so we should do ‘X’” - which is to a large extent how TF came into being.

From the earliest days, breaking changes were avoided - policy which was not retained through later versions.

While you may have heard some “core developers” claim that reuse was unnecessary (I can’t claim omnipresence), the HashiCorp official training that I taught during that time period _used modules extensively_ for this.

femiagbabiaka · on June 8, 2021

agreed. before terraform the alternatives were terrible. remember cloudformation? never again. I'd rather use good patterns around Terraform design than ever go back.

saurabhnanda · on June 8, 2021

I've been sitting on the fence wrt Terraform and other such tools for quite some time now. After being _forced_ to finally write massive k8s YAML files (and ansible YAML files) for a consulting gig, I've been wondering whether these tools should be developed as _libraries_, that you glue together using a full-fledged programming language, instead of shoe-horning a programming language in YAML.

For example, could the following be library functions that you could glue together in the programming language of your choice: (a) get current state of infra, (b) calculate diff between desired state and current state, (c) perform a single step (safely) that represents a granular change in infra, (d) perform a series of steps representing infra changes with safe rollback?

Does something like this already exist?

endymi0n · on June 8, 2021

You‘re pretty much describing the idea behind Pulumi which got a lot of traction lately.

Personally, I‘m still undecided on whether the unlimited freedom of a fully fledged programming language is a good or a bad idea in terms of footgun potential.

I‘m also still a bit unsure whether to play early adopter for an extremely hyped VC open core project even though it feels tempting.

Experiences appreciated!

saurabhnanda · on June 8, 2021

Pulumi sounds interesting. Spent 10 mins with their marketing website and I'm not very clear whether it is a standalone set of libraries, or do they only work in conjunction with their cloud services. Do you know?

haolez · on June 9, 2021

I've been using Pulumi for a new project after using Terraform for a long time. It's a little weird at first, but then it clicks and actually feels quite nice. The Input/Output logic with its async behavior is the weird part, but it works fine when you understand how it works.

The only (minor) problem that I've seen in it is that the JavaScript/TypeScript support seems more mature and featureful than the other backends. So, I'll simply use that.

slow_donkey · on June 8, 2021

You can use it standalone and manage the state yourself.

Looks like they actually might have added locking recently with https://github.com/pulumi/pulumi/pull/2697 but I haven't looked deeply

zymhan · on June 9, 2021

My experience is that it was definitely a foot gun.

There are too many ways to write fancy abstractions that are unreadable or not extensible, for example.

saurabhnanda · on June 9, 2021

What is a foot-gun? Ansible/Terraform, or the library approach that I'm describing?

pnathan · on June 9, 2021

If I was doing things from the ground up, I'd pulumi it, I believe.

Terraform is, however, optimized for everyone under the bell curve.

ranguna · on June 9, 2021

You also have terraform cdk, which is currently in beta.

StreamBright · on June 9, 2021

Best thing is Dhall that I am aware of. Same situation, working as a consultant, forced to use broken things.

https://github.com/dhall-lang/dhall-kubernetes

bjelly · on June 9, 2021

I'm closely tracking an effort by Microsoft that aims to do a lot of what you're describing since I find myself bridging between these tools and deploying stacks that span tools and roles. [CNAB](https://cnab.io/) and the front-running implementation, [Porter](https://porter.sh/), enable one-step infra deployments, packaged as a single OCI-compatible container, with any number of steps, using the best tools for each of those steps. Think of using aws-cli for some initialization step (create or verify presence of a state bucket), applying some terraform to create infra, and finishing with a helm chart to complete deployment of app components. Each stage in a bundle packages not only the code to run it but also the execution binary of the tool that runs it. The spec and porter are still a moving target but it's a promising space and a nice adjacent evolution of the current state of tooling.

leafmeal · on June 8, 2021

My team does something similar to this. We write our Terraform configuration as Python literals with list comprehensions, conditional expressions, etc., then use a script to dump it to JSON which the Terraform command line can parse.

Here's an example: https://github.com/DataBiosphere/azul/blob/develop/terraform...

tylermenezes · on June 9, 2021

Supposedly Terraform supports it, though I haven't tried it: https://www.hashicorp.com/blog/cdk-for-terraform-enabling-py...

openquery · on June 8, 2021

Congrats to the talented people at Hashicorp.

I love Terraform and have used it for years (before 0.12 I think). The workflow, meaningful diffs and reproducible 'infrastructure-as-code' gave a user experience that really was a massive step up to what I was used to (basically cloud console and scripts in CI).

In fact the Terraform workflow / philosophy inspired some of the design of an OSS 'data-as-code' tool (https://www.getsynth.com/) that we're building a company around. We wanted to use HCL instead of JSON for our config to start off with, but the Rust HCL parsers when we started the project weren't really robust so we settled.

Anyway, congratulations Hashicorp!

cube2222 · on June 8, 2021

An interesting document is also what is actually covered by the 1.0.0 compatibility guarantee: https://www.terraform.io/docs/language/v1-compatibility-prom...

throwawaygo · on June 8, 2021

The v1 guarantee is they will break your code at anytime just like before v1.

Pet_Ant · on June 8, 2021

That is an unfair characterisation of the policy in the link. It is not quite clear which subsets they are talking about at times, and it's definitely not complete but there is an effort there and it looks like most cases (by volume of usage) will be unaffected;

> The Terraform v1.x series will be actively maintained for at least 18 months after v1.0.

samsquire · on June 8, 2021

There is a dupe Terraform post on Hacker news frontpage. I'll post my comment here too :-)

I recommend breaking out your terraform code into separate folders and calling them "components". Write a wrapper around the terraform script to pass in -var-file which uses an argument called ENVIRONMENT that you pass to the wrapper. I think the built in support for modules is less useful for what you actually want to do because you end up with variables spread between variables.tf, outputs.tf files. I use a tool I wrote to layer my infrastructure with layers called components and I configure it with a Graphviz file.

My tool, called mazzle (previously devops-pipeline) would run parts of the graph that can run in parallel in parallel. It can also run parts of the build on SSH workers. You bring up the workers at the beginning of the build.

Here's an example of a graph generated from graphviz file: https://github.com/samsquire/mazzle-starter/blob/master/arch...

This graph brings up a hashicorp vault server, Java application, bastion proxy, consul, kubernetes, prometheus

here's the graphviz file:

https://github.com/samsquire/mazzle/blob/master/docs/archite...

It describes the ordering of the infrastructure, the invocation of Ansible, packer, shell scripts to set up vault etc.

The idea is to be able to bring up a new environment by changing one parameter. There's a React GUI too.

https://devops-pipeline.com

ciisforsuckas · on June 8, 2021

I've been using Terragrunt to keep my Terraform DRYU in a similar manner. It's a bit of a rethink in how you structure things but I've been happy so far.

https://terragrunt.gruntwork.io/

whoomp12342 · on June 8, 2021

does terragrunt work with azure and GCP or just aws?

leetrout · on June 8, 2021

Terragrunt extends Terraform functionality so it works with all Terraform providers.

borplk · on June 8, 2021

I recently had to do a piece of AWS work that required cross-account resources (create certificate in one account with ACM, set DNS entries on Route53 in another account).

Not sure about pulumi, but AWS CDK and CloudFormation can't handle that as one step (there are some horrific hacks). With Terraform it's absolutely trivial.

I was liking CDK up to that point, but that limitation is a complete deal breaker for me. Had to come back to my old friend Terraform.

MehdiHK · on June 8, 2021

Have you seen Terraform CDK? https://github.com/hashicorp/terraform-cdk

borplk · on June 9, 2021

A CDK-style Terraform seems perfect. Seems a bit early days but I look forward to it gaining traction.

ciisforsuckas · on June 8, 2021

There are dozens of these examples. I switched a few years back after AWS released the automataic HTTP to HTTPS redirect functionality in ALBs and 6 months after release it still wasn't supported in CF. Terraform isn't perfect and it still has a ton of isues but it's rate of innovation is way a head of CF.

lukev · on June 8, 2021

This is pretty straightforward in Pulumi. I recently built a stack that, in a single `pulumi up`, creates VPCs and subnets in a handful of different accounts with VPC peering, routing and DNS between each of them, including an AWS Client VPN set up so you can access all the VPCs from a single VPN endpoint.

Aperocky · on June 8, 2021

Not sure if this official guidance is a 'horrific hack' but there are official AWS guidance on how to do this:

https://aws.amazon.com/blogs/infrastructure-and-automation/m...

borplk · on June 8, 2021

Thank you, and yes in my books that's a horrific hack and too much effort compared to the 5 lines of code I just added to Terraform to get the job done.

Aperocky · on June 10, 2021

I think the root reason for this is AWS stacks have to authenticate from a single origin (i.e. user credentials) instead of Terraform which can utilize multiple auths. This makes it necessarily complicated for AWS stacks when it tries to deploy another stack in another account, as the stacks are also account based. (but I imagine terraform stacks isn't).

Hawxy · on June 8, 2021

FYI this has been supported in CDK for a few months now. See the CrossAccountZoneDelegation at the end of this section: https://docs.aws.amazon.com/cdk/api/latest/docs/aws-route53-...

Coryodaniel · on June 8, 2021

People tend to complain about HCL a lot, I think it’s a great language for infrastructure. I don’t want a “real programming language” for provisioning infrastructure. I feel like every time I’ve seen someone “need” a real programming language, that there is a _better_ way to do the task at hand with HCL.

That being said, there are some ugly bits.

1. Remote state as a data source means your infra is broken, you just don’t know it yet. Two apply’s have to occur to get your infra in the correct state, but they are separated by an arbitrary amount of time between executions. Even if you automate it with CI/CD, your second root module could be broken until run since it depends on the output of the other module.

2. Public modules are absolute garbage. Go find the best one, it’s trash. Here is why, 10-20 orgs all come in and tweak the module to work for them. You’ll often see 1-10 resources in a module (sometimes more), but the module will end up with more _input complexity_ than the underlying resources. Sometimes even more inputs than all the original resources combined! In the end, you get a module that “works” for everyone, with a half baked “DRY abstraction” for N number of organizations.

3. Organizing code is hard, because we often don’t fully consider environments/workspaces, infrastructure ownership, change management, and other sociotechnical concerns. I think Terraform and IaC in general is the epitome of Conway’s Law and when the (changing) social structure of the organization isn’t followed, the code gets harder to work with. This point is at odds with #1 above.

4. People tend to think “terraform apply” is a magic transactional boundary around your infrastructure. If it applies, it worked!!! But in reality, if modules aren’t crafted correctly they can “apply” cleanly, but still introduce an outage while they are executing.

All that said, I’m excited for the 1.0 release. I love terraform. Thanks to all (except module authors) for the hard work.

joombaga · on June 8, 2021

I'm not sure I understand #1.

Your points would still apply if a resource (e.g. aws_instance.foo) is created in one module and then referenced as a data source (e.g. data.aws_instance.foo) in another module. Are you suggesting remote state is different? Or would you also advise against referencing data source attributes from resources created in other modules?

Coryodaniel · on June 8, 2021

Oh for sure, that’s point #4, but at least it’s in the same apply.

In #1 there is also a tight coupling between two different sources. If team A changes their output, the dependent team B's references break.

Also 1.2: security. If I can read an attribute from your state file, I can read the whole thing.

SpencerBratman · on June 8, 2021

hey this is super random and not related to your comment above, but I saw your comment about honey and how you worked in this space. I was wondering if you'd be open to chatting about your experience in this space. (working on something in the affiliate space). Really appreciate it! spencerbratman [@] gmail.com

kevincox · on June 8, 2021

I hate Terraform with a passion but it is probably the best tool out there for managing cloud infrastructure so I use it at work with no plans to replace it.

The biggest downsides are the awful half-baked language and the awkwardness of modules and passing values throughout your config. Also the staticness of providers are a serious pain, for example you can't create a kubernetes cluster then add a resource to it. The work around is to use two separate Terraform stacks which brings a lot of pain for passing values across the boundary. Furthermore you can no longer effectively plan any change that affects the boundary between the two stacks. "Luckily" Terraform's performance is so bad that you need to split the stacks anyways.

The biggest feature I would like to see is the ability to dump a pure representation of your evaluated configuration. This would allow reasonable diffs in CI. There are of course complications, especially if you use `data` resources but technically it is possible to do a very good job here which would make it so much easier to make changes.

throwaway894345 · on June 8, 2021

I strongly agree both with respect for the half-baked-ness of the language and with the "it's probably the best out there". Ultimately, these tools should have a static/yaml-like "assembly language" that describes the state of your infrastructure without any of the DRY. There would be a diffing engine which would figure out what changes need to be applied and apply them accordingly. Users could use some vanilla programming language to generate that yaml in a DRY way; then the Terraform folks don't need to badly reinvent a programming language.

I know they also have a CDK, but I can't tell if it properly solves that problem or if it still forces us into Terraform idiosyncrasies (i.e., if I rename something in Terraform, it will try to delete the corresponding resource and recreate it, and I think that absurd behavior remains with the CDK).

kevincox · on June 8, 2021

100%. Terraform is half-way between a tool for generating the configuration and applying it. I think Terraform's application engine is actually quite good, but I would like to use a much better tool to generate the config. (And be able to diff that config)

You can feed JSON to Terraform however this falls over if you need dependencies for output values. This usually isn't an issue because most Cloud provider resources have predictable IDs but as soon as you have one that doesn't you are up for a lot of pain and suffering.

ekimekim · on June 8, 2021

You may be interested in Pulumi: https://www.pulumi.com/

Basically it's Terraform but instead of declaring your resources in HCL, you declare them in a real programming language. You're still producing a declarative config that the engine then diffs, applies etc. In fact, it's compatible with existing terraform providers, so it has a surprisingly large selection of things you can use it for.

Note their docs will try to guide you towards using their hosted service which basically does nothing except host the state file, but you can use an S3 or GCS bucket instead and it works fine.

It's definitely not without its own problems, but I'd say it's overall an improvement.

zinclozenge · on June 8, 2021

Unfortunately last I checked, pulumi only offers state locking with their paid service. If you want to self-host you have to implement it yourself, which seems like a non-starter for a lot of people.

lmzen · on June 8, 2021

This was addressed a couple months ago in https://github.com/pulumi/pulumi/pull/2697

zinclozenge · on June 9, 2021

Wow it took 2 years for the PR to get merged.

deadbunny · on June 8, 2021

I think this has been addressed.

https://www.pulumi.com/docs/intro/concepts/state/

arcticfox · on June 8, 2021

Glad somebody mentioned Pulumi. It solved all of the major problems I had with Terraform.

nprateem · on June 8, 2021

Not with that licensing thanks

simcop2387 · on June 8, 2021

It looks like it's Apache 2.0 licensed? Wh issues do you have with that licemse.

renewiltord · on June 8, 2021

It’s Apache 2, isn’t it? What’s wrong with that?

mwarkentin · on June 8, 2021

There's CDK for Terraform: https://github.com/hashicorp/terraform-cdk

cormacrelf · on June 8, 2021

Someone should make a Clojure demo of those Java bindings, or even cljs. I hope Clojure has good type based completions these days, because it would be a fantastic language for this.

cormacrelf · on June 8, 2021

It’s pretty wild that the object identity via name thing is still a problem. Can they not add a transitional name feature where an object is known by multiple aliases for a while and then when you have finished putting though a change, you can delete the original name? Is this not very basic SQL migration practice? Like column aliases until no longer needed.

throwaway894345 · on June 8, 2021

I don't even understand why the state needs to know the identifiers that the high level language uses for various resources. If the high level language has a binding "foo_bucket" for an AWS S3 bucket resource with a single property `name = "foo"`, then why should the state need to know that the high level language refers to that bucket with the name "foo_bucket"? Instead, the state should look something like this (obviously simplified):

    {
        "resources": [
            {
                "type": "aws_s3_bucket",
                "properties": {"name": "foo"}
            }
        ]
    }

Note that there is no reference to "foo_bucket".

devonbleak · on June 8, 2021

This doesn't make sense to me. You need to know the logical identifier in order to explicitly link the code with the resource. Otherwise if I change the code for that resource how does TF know what it needs to change if none of the existing resources in state matches the new config? Do you just always destroy and re-create every time there's a change to anything?

throwaway894345 · on June 8, 2021

> Otherwise if I change the code for that resource how does TF know what it needs to change if none of the existing resources in state matches the new config?

A resource provider defines a collection of fields that is the "identifier" for the resource. For example, an S3 bucket resource would have the "name" field for its identifier.

If you change another attribute besides the bucket name, the engine will see that the input and the state both have a s3 bucket resource with the same name but different props, so it knows it will need to update some props (rather than create a new one). However, if the name changes, the engine will see that the input has a bucket that doesn't exist in the state so it will add a "create bucket" step to the plan. It will also see that the state has a bucket that isn't in the input, so it will add a "delete bucket" step to the plan.

Maybe another way of saying the same thing is that a resource provider can mark any given field as "forces replacement", and all of the fields that force replacement are the de facto identifiers? I haven't thought through whether these are exactly equivalent.

devonbleak · on June 8, 2021

The "identifier" is often something that's computed later or returned from the API. Think about something like an ec2 instance - the identifier is the instance ID that's returned from AWS. You can have many instances that basically look identical so how do you differentiate which one this logical resource is referencing?

And back to the s3 bucket use case sometimes you want uniqueness in your name so you use a prefix instead of specifying the whole name - how do you determine which bucket that resources is referencing if there are multiple buckets matching the prefix?

I hear what you're saying in terms of wanting state management to be simplified, but pretty much every IaC solution uses this explicit logical resource -> physical resource mapping in state.

kevincox · on June 8, 2021

Yeah, moving objects around the config is common if you want to keep it organized and requires manual actions that require essentially a global lock on the stack (and Terraform has no built-in feature to actually take this lock). It makes it basically impossible to implement a fully automated production change pipeline with Terraform.

cormacrelf · on June 9, 2021

Moreover I can never, ever, remember the syntax for moving objects around the config. It's really painful.

Edit: the aliases would have to handle moving as well as renaming. You could just have aliases in a global namespace, which means adding `alias = "portable-elb"` and doing one `terraform apply` means you can pick up that config, drop it anywhere else, and it will move it for you. It wouldn't even need to do a full `apply`, just a local JSON manipulation.

ethbr0 · on June 8, 2021

> application engine [vs] tool to generate the config

I get it from HashiCorp's perspective though.

A robust application engine with a suboptimal config generator is a viable product.

A suboptimal application engine with a brilliant config generator is not.

So given limited resources, former gets the dev grease.

throwaway894345 · on June 8, 2021

This is a false dichotomy.

You can generate these configs really easily with any off-the-shelf programming language for a small fraction of the effort they’ve put into HCL + all of the stuff on top that makes HCL the shitty programming language that it is.

Even if you insist on building your own programming language for this purpose, Hashicorp could’ve saved themselves a lot of work by looking at the prior art of the last 70 years of programming language history.

In other words, if they just picked, say, JavaScript from the start they could have saved a bunch of time and energy and put that into their application engine.

lvncelot · on June 8, 2021

> You can feed JSON to Terraform however this falls over if you need dependencies for output values

This is what I've started doing with Jsonnet for generation, and also exactly why I've stopped doing it.

jen20 · on June 8, 2021

I'm not sure I follow exactly what you're missing. `${aws_instance.example.x}` as a string value creates the same dependency as it would via HCL when used with JSON.

polynomial · on June 9, 2021

Same here, I don't see how outputs is being treated any differently by Terraform than any other .tf file written in HCL. I'm not saying it's not possible, but I haven't experienced a failure more there yet.

lvncelot · on June 10, 2021

Thanks for the hint, now I'm not sure what went wrong when I tried something like this. I should read up on this more.

polynomial · on June 8, 2021

What are some of the tools that do this? The only ones I know of are Scalr and Pulumi.

nuker · on June 8, 2021

> Ultimately, these tools should have a static/yaml-like "assembly language" that describes the state of your infrastructure without any of the DRY.

CloudFormation ?

> There would be a diffing engine which would figure out what changes need to be applied and apply them accordingly.

CloudFormation.

oneplane · on June 8, 2021

Problem with CloudFormation is that it doesn't work with Cloudflare, Azure, GCP, Big-IP, Palo Alto, NetBox etc..

nuker · on June 8, 2021

Its a problem only if you use these vendors, you don't have to.

throwaway894345 · on June 8, 2021

It's a pretty tough sell to tell people they have to uproot all of their existing infrastructure and move to Amazon just to use an infra-as-code tool.

oneplane · on June 8, 2021

It's also unlikely that you will only use AWS, forever. At some point in time you'll have to deal with various resources (be it IT resources, time, money or people-as-a-resource), and whenever you bind your knowledge and workforce to an IaC tool that doesn't transfer or isn't portable you're going to end up with N+1 tools every time. In other words: it doesn't scale all that well. (And that doesn't mean Google-scale, but going from 2 IaC engineers to 5 IaC engineers is much harder if you can't apply universal tooling)

Tools are never 'just tools', there is context and there are externalities. And as you already pointed out: migrating/uprooting all of those other things isn't a likely scenario.

throwaway894345 · on June 9, 2021

Agreed. If you use an auth service (SaaS or self-hosted) that isn't AWS Cognito you will also find yourself wanting to integrate with your IaC tool. Having to roll this yourself with CloudFormation is a lot of effort, or at least it was last time I looked, and importing a third party "provider" wasn't really a thing.

zymhan · on June 8, 2021

Fun fact: You don't even have to use Terraform

throwaway894345 · on June 8, 2021

Yeah, CloudFormation is workable in this regard (I've created a neat generator for Python), although it has lots of its own problems (e.g., if you want to create a new resource, you have to run it as its own lambda--your infra-as-code needs its own infra which needs its own infra-as-code).

polynomial · on June 8, 2021

> I've created a neat generator for Python

care to share? (I know some hn users often don't w/o being asked, out of a sense of not wanting to be seen as self-promoting.)

throwaway894345 · on June 8, 2021

It’s hanging out in a private repo with a bunch of other stuff and I don’t care to put it in it’s own repo at the moment. Basically CloudFormation publishes a JSON spec of all of their resource types and I use that to generate Python code with type annotations. It’s sort of like Troposphere, but I go further—Tropo makes you reference resources by their cloudformation string names, but my tool lets you use the Python object containing the resource and it will resolve to the correct CloudFormation “Ref” object at compile time. (also, unlike tropo, I generated my Python types from a spec so I don’t have to keep up with AWS changes). That said, I’ve given up on CloudFormation altogether since Terraform has better support for resources outside of AWS.

nuker · on June 8, 2021

>if you want to create a new resource, you have to run it as its own lambda

Please don't, lol

kubanczyk · on June 8, 2021

> they also have a CDK

Terraform-CDK, as of now, needs to go through standard HCL parser. Sadly, there is no backdoor into Terraform's internal structures. If HCL (as a language) is the limitation for you, the CDK does not let you fly around it.

smaddox · on June 8, 2021

This would be great. Perhaps it could be based on https://dhall-lang.org/

throwaway894345 · on June 8, 2021

I absolutely think a statically typed language is the right way to go (from experience using a Python->CloudFormation generator even with Mypy), but Dhall is going to be really unfamiliar for most people and it's hard to sell people on new languages that are syntactically unfamiliar.

As an aside, I think functional concepts could have made their way into mainstream programming much earlier if the FP people would have been willing to lower themselves to syntax that is readable to us plebs--I think this is no small part of Rust's success. People say syntax doesn't matter, but I disagree.

verdverm · on June 8, 2021

https://cuelang.org has better syntax but its logic based unification is a struggle bus for many people.

throwaway894345 · on June 8, 2021

I looked at Cue and I don't understand what problem it solves. It certainly doesn't (seem) to solve the problem of DRYing up verbose YAML, or at least it's missing any notion of a function.

"hey, these YAML blobs are all mostly the same, but they vary based on a couple of parameters--I should write a function that takes those parameters and outputs the right YAML object"

^ This is the #1 thing that the high-level language should concern itself with. Static typing is really nice to have and it's cool that Cue has a pretty interesting type system, but (as far as I can tell) it doesn't have functions. It almost has functions, but I don't want to have to resort to a hack for the #1 thing that I care about (functions).

Considering I prefer functions over sane syntax (although sane syntax is roughly tied with static typing), I'm inclined to prefer Dhall over Cue, but I'm still optimistic that something better will emerge. Also while we're on syntaxes that are deliberately obtuse, I'm pretty sure the Nix community has a Nickel language which is basically a statically typed version of the Nix language.

Maybe Cue has a more enlightened way of thinking about the infra-as-code problem and I'm just not getting it.

verdverm · on June 8, 2021

CUE's philosophy is to wrap code in data, not data in code, as learned from the major configuration systems at Google. Being a logical language, rather than telling the computer what to do, you state facts and it verifies that you are correct. It is also intentionally not Turing complete do that you cannot program in CUE.

CUE is gaining traction while still being young and changing. Grafana is adopting it for validating dashboards and such. Expect to see it more in DevOps too

randomswede · on June 10, 2021

When I stopped being an SRE at Google, my most immediate thought was relief that I would never, ever, have to deal with BCL/GCL again.

After 6 months outside Google, I desperately wished for BCL/GCL to be everywhere, because all other config languages were just plain broken. And more annoyingly, there's no better way to describe it than "I have seen better, just trust me".

CUE seems to be a step forward. Flabbergast looked like it might have been a contender. The latter is DEFINITELY inspired by BCL/GCL.

At some point, I will have to sit down with CUE and try to re-implement the "perfect little horror" in it (it should be impossible IFF CUE is not Turing-complete, but it actually turns out that there are edge cases of configuration where you want that Turing-completeness).

throwaway894345 · on June 9, 2021

> Being a logical language, rather than telling the computer what to do, you state facts and it verifies that you are correct.

Sure--it's like advanced static typing for static configuration. But that seems like a different and lesser problem than DRYing up the configuration in the first place, and moreover if you use a statically typed programming language to DRY up your configuration then you get pretty similar guarantees to Cue. You don't get Cue's "unifying many definitions" approach, but I can't honestly discern the value proposition in that.

As for turing incompleteness, that's a nice to have at best. If I had to choose between a turing incomplete declarative language like JSON and a turing complete imperative language like Lua, I'd take the latter every single time.

lmm · on June 9, 2021

Nah, the syntax is superficial. Scala has offered better-than-Rust FP in a traditional syntax for over a decade, but if anything the tension between imperative and functional people is worse there.

garethrowlands · on June 8, 2021

Indeed, https://github.com/mujx/dhall-terraform

kortex · on June 8, 2021

You can reduce a little bit of the repetition in YAML with anchors.

There are tools that convert JSON/YAML into HCL.

https://learnxinyminutes.com/docs/yaml/#:~:text=yaml%20also%...

throwaway894345 · on June 8, 2021

I think you misunderstand the problem I'm trying to solve, or maybe I misunderstand your response. My goal isn't to write YAML instead of HCL, my goal is to get rid of HCL and Terraform semantics altogether. If I had my way, Terraform's low level engine would operate on a verbose (i.e., "not DRY") YAML (or JSON or HCL or I don't care) description of resources which would be generated from (for example) a Python script.

The Python/Go/etc script is what humans interface with, and it is DRY. The YAML/HCL/etc is what the Terraform engine operates on and humans should very rarely need to interact with this.

kortex · on June 8, 2021

Ah, so like you have some process which generates your YAML/HCL, which is your "IR/assembly" layer, not meant for regular human consumption/editing, which is fed to Terraform. But it's readable/auditable, VCS-trackable, and diff-able.

I do that a lot as well and in fact I'm kinda leaning towards taking that approach from the get-go. Right now I start with the YAML, but then something makes inevitably leads me to templating it using make + jinja/gomplate, which eventually leads me to wanting to use python scripts, and then invoke (python package, it's like gulp or make).

It's not code, like business logic code, but it's too verbose and repetitive for human manual editing.

throwaway894345 · on June 8, 2021

Yeah, in the Kubernetes world, the official interface is the YAML/assembler and different people have come up with different approaches for generating that. Helm for a long time (and even currently) uses text templates (e.g., jinja, mustache, etc) to render that YAML which is predictably abysmal.

CloudFormation used JSON (and eventually YAML) but built on top of it language-like facilities (the ability to reference resources, call pseudo-functions, etc) all very poorly. So you get an impoverished language built on top of YAML.

Terraform decided they would do approximately the same thing, except they reinvented their own JSON/YAML alternative (HCL) and built a crappy programming language atop it (instead of atop JSON/YAML).

These all give you pretty crumby means of abstraction. CloudFormation you get nested stacks instead of functions and you can only pass scalars around (no objects or lists--except comma-delineated strings which can be parsed into a list of strings). You're also limited in how many nested stacks you can create and how many total parameters can be passed into any given top-level stack.

Terraform seems strictly better. You can pass objects and lists and I've never approached any parameter limits, but still, you have to create a whole directory just to define a function and refactoring existing code into a module is painful because it means renaming resources (putting them under the module) which Terraform interprets as intent to destroy and recreate the resource.

Helm is using text templates so you can even generate syntactically invalid YAML! I think they might be supporting Lua these days, but I haven't looked into it.

I think the idea was that the whole marketing push behind infra as code was "it's just YAML! Such declarative! Wow!" as though yaml magically simplifies the inherently complex task of infrastructure, so everyone started with something YAML-like--even though we absolutely should have known that we would need to abstract--and gradually built our own half-baked languages on top of them. Of course, infra as code is absolutely worthwhile, but it's the ability to define what you want and have a tool reconcile it with some current state--it's not some magical property of YAML/JSON/HCL/etc.

fin.

zbentley · on June 9, 2021

That's an accurate summary of the arc of progress in this area. Also explains why so many folks are now turning to operators (versioned procedural code that runs in k8s and does arbitrary things, rather than arbitrary versioned yaml artifacts applied to k8s) to do advanced stuff rather than layering on more templating duct tape.

_bz2r · on June 8, 2021

> these tools should have a static/yaml-like "assembly language" that describes the state of your infrastructure without any of the DRY

the last five words are a bit of a double negative; i think you mean "without the repetition" but I can't tell.

Hasnep · on June 8, 2021

"without DRY" in this case means "with repetitions" i.e. in a verbose way. GP wants to be able to generate this verbose, machine readable syntax with DRY, human readable syntax.

throwaway894345 · on June 8, 2021

Yes, this. Thanks for clarifying for me, apologies to the parent for my lack of clarity.

MadVikingGod · on June 8, 2021

Dang, your solution sound so much like kubernetes I'm not sure if you are joking or not.

throwaway894345 · on June 8, 2021

Kubernetes is one conceivable incarnation, but it operates differently than other infra-as-code tools. Terraform, for example, builds a dependency graph of your resources and initializes them in order. Kubernetes doesn't care about dependencies, and it just keeps trying to create resources and things will fail until their dependencies come online.

Further, Kubernetes manifests are the verbose "assembly language" layer, so you still need something for humans that is DRYer.

We use Terraform to manage Kubernetes resources (as well as cloud provider resources) at the moment, but I think you can equally use cloud provider operators for Kubernetes and manage everything with Kubernetes--I haven't tried this yet so I can't comment. In the latter case, you would still need something to DRY up your Kubernetes manifests. Also, if you aren't running on Kubernetes and you just want infra-as-code, k8s is an expensive solution (in terms of operations).

What I was picturing was a more conventional infra-as-code diffing engine (like Terraform's) but with a more verbose interface similar to Kubernetes YAML.

zbentley · on June 9, 2021

> Kubernetes manifests are the verbose "assembly language" layer, so you still need something for humans that is DRYer.

It's a little more than that. Out-of-the-box manifests for primitives are certainly assembly-like, you're right--but CRDs allow you to operate at a higher level of abstraction while staying in the same syntax, which is powerful and unique to k8s (everything else, from Helm to Terraform to Ansible, distinguishes between pseudo-assembly "language that directly expresses changes to be made" and "language that humans can write abstractions in").

Nilithus · on June 8, 2021

> "Luckily" Terraform's performance is so bad that you need to split the stacks anyways

Not sure what about terraforms performance is so bad. Seems hard to blame a tool who's main execution path is potentially 100's of network IO requests with 3rd party API's. Most of the "split stacks" I've seen is more for code organization and security reasons rather than performance. Seems safer to know 100% that deploying infra for my app isn't going to mess with my VPC settings and can be executed with a lower privileged role.

> Furthermore you can no longer effectively plan any change that affects the boundary between the two stacks.

That's fair -- you do end up with these "foundational" modules a lot of the time. Like an 'aws-account basics' module or something that other modules expect the account to be setup with that base for being able to query data objects for subnets ect... planning changes if that changes be difficult but not impossible. Good versioning is critical. Feels in the same vein as apps that need to manage framework updates and things like that. (though can be made more difficult or easier based on how you've broken up using your cloud provider -- multiple accounts by buisness unit or all in one).

jmccarthy · on June 8, 2021

Our experience of building a provider: performance is fast with fast APIs, and slow with slow APIs. Haven't observed any of the core diffing, DAG, or apply scheduling to be problematic (but also haven't tried an apply at extremely high - 10^4? 10^5? - resource count)

cube2222 · on June 8, 2021

> The biggest feature I would like to see is the ability to dump a pure representation of your evaluated configuration. This would allow reasonable diffs in CI. There are of course complications, especially if you use `data` resources but technically it is possible to do a very good job here which would make it so much easier to make changes.

The planned state, current state, and diff of them are all available as separate fields in the Terraform plan file, is that not what you're looking for?

kevincox · on June 8, 2021

The key word is "pure" here. These things all depend on the current state of the infrastructure. The "planned state" is close to what I want, but it can be very confusing if someone has deployed a new change since you forked off.

lowercase1 · on June 8, 2021

Yeah. I have a poor view of terraform since my first interaction was trying to a few one line changes to avoid repetition but couldn't find why it didn't work without setting up connection to the AWS S3 bucket.

time0ut · on June 8, 2021

Have you tried Terragrunt [0]? It helps a lot with managing a set of related stacks. Still feels like a bandaid on a broken model, but it is what we have.

[0] https://terragrunt.gruntwork.io/

Regarding performance, last time I looked, Hashicorp's documentation implied there was no limit to the size of a Terraform stack. I think they meant theoretically in a science fiction universe where humanity had captured all of the sun's output to perform terraform plan and apply...

eKIK · on June 8, 2021

+1 from me on the "awful half-baked language" (HCL).

I just recently wrote an article about my experience, including issues and workarounds, when migrating from Terraform to Pulumi: https://blog.ekik.org/my-experience-migrating-my-infrastruct...

Hope it's OK that I'm sharing it here. I think it's relevant because there seems to be quite a lot of interest around Pulumi, and how one would go about moving from Terraform to Pulumi.

mwcampbell · on June 8, 2021

I'm actually thinking of going the other way. I've been using Pulumi for several months now, and I'm thinking of moving to Terraform, because it has a so much larger third-party ecosystem, including more providers, and tools that can analyze HCL, like Infracost and security scanners. When will I learn to see the bigger picture and value popularity over quality?

eKIK · on June 8, 2021

It's a very interesting point.

I've been part of managing rather large Terraform infrastructures (1000+ resources) for a couple of years, but I'm a Pulumi n00b with only about a month of experience.

The infrastructure I'm managing right now with Pulumi is much smaller, only around 130-140 different resources.

For me it ultimately came down to developer productivity. I'm much better at convincing Pulumi to do what I want compared to how it was with Terraform. This also makes me a much happier and less frustrated developer :).

My priorities might very well be different if I were to manage much larger infrastructures (infra cost would be more important for example).

mwcampbell · on June 8, 2021

The stack I manage with Pulumi is currently around 300 resources. (I think that count is inflated by all the secrets in AWS Secrets Manager, because each secret has two resources: the secret and the current version.) I currently manage it by myself, but I'm hoping that won't be the case for very long.

Maybe the ending of my previous comment was too cynical. But I think I've repeatedly made the mistake of valuing my productivity and happiness as a currently solo developer over what will let my company take full advantage of a big third-party ecosystem (including a large talent pool).

eKIK · on June 8, 2021

I don't think you're too cynical at all - I think you're exactly right! It's often much more sensible to use the "tried and true" stuff most of the time.

In my particular case I don't plan to have my company grow much at all - we're staying small. I think Pulumi is a sensible "bet" for me, because it does what I need right now really well. Sure, there's a bit of a risk, but worst case scenario I would spend a day or two to migrate what I have back to Terraform.

I would definitely not have made the call to "let's just switch everything to Pulumi" if I was still working at a larger company. As you said, a large talent pool / community is a huge deal when you have the option to hire people who can spend time learning a particular tool or language.

0xbadcafebee · on June 8, 2021

I work in a very large shop with lots of TF and we do not use any of the "ecosystem" other than Terragrunt. Almost all of it is experimental junk.

We use almost entirely one provider, with things like a "template" or "random" provider as well, which are really just core features they decided to split off into plugins. Even when we use SaaS that there is a provider for, we don't use the provider, because we aren't constantly changing it, or managing it doesn't require lots of people across multiple teams with multiple iterations and modules.

gerbilly · on June 8, 2021

+10 from me on the "awful half-baked language" (HCL).

Only cmake's 'language' is worse.

granra · on June 8, 2021

People mention pulumi but hashicorp are creating something similar with https://github.com/hashicorp/terraform-cdk. But all the existing terraform providers work with it afaik.

tonyhb · on June 8, 2021

I don't know if people have even tried Pulumi before recommending it.

I've tried it, and it has buggy defaults, diff generation, etc. Each time I applied the same code, it would generate a diff based off of some internal defaults and... recreate the exact same infrastructure by _tearing it down_ and making it fresh. Not ideal.

Would advise using the TF CDK specifically.

ManWith2Plans · on June 8, 2021

The token system is broken in TF CDK still and it's not ready for adoption. I've built two stacks with it but I'm back at terraform for now. I intend to explore pulumi though when the opportunity presents itself.

I think using a Turing-complete language like typescript with mature tooling to define cloud infrastructure feels very natural and makes things much more manageable than using HCL.

One thing I absolutely can't do without is the state management api terraform provides with its CLI. This is absent from terraform-cdk and aws's CDK, although many of the same APIs seem to exist for pulumi.

throwaway894345 · on June 8, 2021

> I think using a Turing-complete language like typescript with mature tooling to define cloud infrastructure feels very natural and makes things much more manageable than using HCL.

Fully agree. Not sure if any of the CDKs (or Pulumi) get the ergonomics right though. The ergonomics should feel like we're just generating YAML/JSON/etc, but the CDKs I've seen require inheritance, mutable state, etc.

> One thing I absolutely can't do without is the state management api terraform provides with its CLI. This is absent from terraform-cdk and aws's CDK, although many of the same APIs seem to exist for pulumi.

AWS's CDK is built on CloudFormation, so I don't think it has analogs for Terraform's state APIs. As for TF CDK, I would think you would just use Terraform's CLI state management directly? Maybe I'm confused about what you're trying to do?

ManWith2Plans · on June 8, 2021

@throwaway894345 You can, but that means you have to introspect the generated code to determine terraform resource ids etc. A really bad developer experience on large stacks.

polynomial · on June 9, 2021

> This is absent from terraform-cdk

Curious to know how that is, or what an example would be? I don't see how you would have to give up state management with CDK, which I understand to be extending TF, not supplanting it.

ManWith2Plans · on June 9, 2021

@polynomial - You have to use the state API on the generated terraform. This means that you need to understand the structure of the generated terraform, and are dealing with generated .json files that require introspection to determine what terraform resource ids are prior to managing their state. It is possible to do, but if you're writing code, you don't want to have to worry about the generated json.

yunwal · on June 8, 2021

I wouldn't recommend using cdktf either yet. Can't manage multiple stacks in a single repository, no full support for input variables, constant breaking changes. It's not production ready at all.

Stick with terraform if you need to provision non-aws resources. Otherwise, use aws-cdk.

tonyhb · on June 8, 2021

I do multiple stacks via changing the state file based off of env:

  constructor(scope: Construct, name: string, c: StackConfig) {
    super(scope, name);

    new S3Backend(this, {
      bucket: "some-bucket-here",
      key: c.name("state-env"),
      region: "" // wherever
    });
  }
 
  // ... at the bottom of main
  new Stack(app, 'something-something-dev', { environment: "dev", name: (i) => `${i}-dev` });
  new Stack(app, 'something-something-prod', { environment: "prod", name: (i) => `${i}-prod` });

Then you can use stacks properly.

cmclaughlin · on June 9, 2021

Support for multiple stacks in a single file was added to cdktf recently. I’ve been managing dozens of production stacks in a single repo for a while now and highly recommended it.

ManWith2Plans · on June 9, 2021

And yet if you try to pass values from one stack to another, it will fail spectacularly.

throwaway894345 · on June 8, 2021

> Each time I applied the same code, it would generate a diff based off of some internal defaults and... recreate the exact same infrastructure by _tearing it down_ and making it fresh. Not ideal.

Not quite the same, but in vanilla Terraform if you simply rename a resource it will tear it down and recreate it even though the resource itself hasn't changed. Makes refactoring really painful. I think you can work around this by renaming the state as well as the resource, but this is often a lot of work (and a bit of risk) just to rename an identifier so I don't bother. I suspect the CDK doesn't solve this problem either.

gjhr · on June 8, 2021

  terraform state mv [old name] [new name]

I'd much rather explicitly state when real resources are renamed than have terraform diffing my code and guessing whether I wanted to rename it or I am actually trying to recreate something. I can only imagine the headaches that would happen with a tool trying to track changes to infra as well as changes to code without explicitly tying infra state to version control somehow.

https://www.terraform.io/docs/cli/commands/state/mv.html

throwaway894345 · on June 8, 2021

> I'd much rather explicitly state when real resources are renamed than have terraform diffing my code and guessing whether I wanted to rename it or I am actually trying to recreate something.

But you're not renaming real resources, you're just renaming the Terraform identifier that corresponds to them. There's no reason that changing this identifier should destroy and recreate the resource it corresponds to. If you explicitly want to destroy and recreate it, you can change an attribute that forces a recreation (typically a "name" field or whatever identifier the resource's provider cares about).

gjhr · on June 11, 2021

OK but how does Terraform know you are renaming a resource? It is not a daemon always running and watching everything you type. It only gets a snapshot of your code to work from when you run it, it doesn't know what your code was before, just the saved state from your last run and the real state in your cloud provider. The only way it can track the state is through the name which you have provided it, if you change that name it cannot know without inferring something. Maybe it matches up all the attributes in your code and state and infers that a rename has happened. What happens when only 95% of attributes match? What happens when multiple things match (An ec2 instance only requires 2 attributes so this is plausible)?

Example 1:

You have 2 essentially identical EC2 VMs with terraform names vm1 and vm2. You decide these are not good descriptive names so change them to webserver1 and webserver2, before running that change you also realise you only need 1 of the servers so delete webserver2 from your code. Terraform runs a plan and sees there is now only a single VM definition but 2 VMs in state. Neither of the terraform identifiers match the original resources. How does it know which one was renamed and which one to delete?

Example 2:

You use Terraform for IaC and something like Chef for configuration management so your Terraform code exclusively deals with the "hardware". A service is being migrated to a new implementation so you need to delete the old VM and bring up a new one. Both old and new implementation have the same exact hardware requirements. You make the change in your Terraform code, deleting the old resource and creating a new one with the same requirements but a different name, and run a plan. Terraform tells you there's nothing to change because its inferred that you wanted to rename.

eyko · on June 8, 2021

> This experimental repository contains software which is still being developed and in the alpha testing stage. It is not ready for production use.

Not sure how much you'll want to invest in being essentially an alpha tester. That being said, if you're currently using Terraform and can wait, it's worth keeping an eye on.

k__ · on June 8, 2021

Right, tfcdk and k8scdk are a thing.

Pulumi is also integrating with TF.

vluft · on June 8, 2021

> for example you can't create a kubernetes cluster then add a resource to it

I have no love for HCL, but you can do this by creating a kubernetes provider with the auth token pointing at the resource output for the auth token you generated for the cluster.

nrmitchi · on June 8, 2021

Yes, however this will work (typically) if the cluster already exists (a previous run), but typically not if you creating the cluster, and kubernetes provider, as part of the same run.

IIRC you'll end up with a kubernetes provider without auth (typically pointing at your local machine), which is 1, not helpful, and 2) can be actively bad.

I believe the core issue here is that providers don't have the ability to specify a `depends_on` relation: https://github.com/hashicorp/terraform/issues/2430

marenkay · on June 8, 2021

This works even without the depends_on property. All you need to is have the module you use for creating the cluster have an output that is guaranteed to be a computed property.

Then use that computed property as input variable for whatever you want to deploy into Kubernetes.

We're using this with multiple providers and it works. Of course, an actual dependency that's visible would be better.

nrmitchi · on June 8, 2021

I'd love to see an example of this actually working, because I have had the opposite experience (explicitly with the Kubernetes and Helm providers); I've had to do applies in multiple steps.

gouggoug · on June 8, 2021

This should work (as in, it will create the cluster and only then add the k8s resource to it, in the same plan/apply).

Here the module creates an EKS cluster, but this would work for any module that creates a k8s cluster.

  module "my_cluster" {
    source                          = "terraform-aws-modules/eks/aws"
    version                         = "17.0.2"

    cluster_name                    = "my-cluster"
    cluster_version                 = "1.18"
  }

  # Queries for Kubernetes authentication
  # this data query depends on the module my_cluster
  data "aws_eks_cluster" "my_cluster" { 
    name = module.my_cluster.cluster_id
  }
  
  # this data query depends on the module my_cluster
  data "aws_eks_cluster_auth" "my_cluster" { 
    name = module.my_cluster.cluster_id
  }

  # this provider depends on the data query above, which depends on the module my_cluster
  provider "kubernetes" {  
    host                   = data.aws_eks_cluster.my_cluster.endpoint
    cluster_ca_certificate = base64decode(data.aws_eks_cluster.my_cluster.certificate_authority.0.data)
    token                  = data.aws_eks_cluster_auth.my_cluster.token
    load_config_file       = false
  }

  # this provider depends on the data query above, which depends on the module my_cluster
  provider "helm" { 
    kubernetes {
      host                   = data.aws_eks_cluster.my_cluster.endpoint
      cluster_ca_certificate = base64decode(data.aws_eks_cluster.my_cluster.certificate_authority.0.data)
      token                  = data.aws_eks_cluster_auth.my_cluster.token
      load_config_file       = false
    }
  }


  # this resource depends on the k8s provider, which depends on the data query above, which depends on the module my_cluster
  resource "kubernetes_namespace" "namespaces" { 

    metadata {
      name = "my-namespace"
    }
  }

StopHammoTime · on June 8, 2021

I literally implemented this not a month ago. I don't understand the complaint at all. Terraform is easily able to orchestrate a cluster then use it's data to configure the provider. The provider details does not need to be available until resources are created using the provider, which won't occur until the EKS cluster is available.

hnjst · on June 8, 2021

Using something similar, but it doesn't handle well cluster deletion.

clipradiowallet · on June 8, 2021

You can do this with either:

1. depends_on = ... 2. implicit dependency, ie reference some cluster property in your deployment, which causes the same behavior as depends_on

himinlomax · on June 8, 2021

The tool is ok, but developing plugins for it shows how inadequate Golang is for the job. There's so much repetition and boilerplate required. I wrote a FreeIPA plugin a few years back, it handled just registering a host and the executable weighed over 100 MB! WTF? Haven't looked at that side of things lately, I wonder if it's different nowadays.

cube2222 · on June 8, 2021

We have a big amount of resources available inside of our Spacelift provider[0] and it weights ~20 MB.

It'll probably mostly depend on the libraries you use.

[0]:https://github.com/spacelift-io/terraform-provider-spacelift...

StopHammoTime · on June 8, 2021

Definitely agree with this, Go is so verbose for the application. When I wrote a provider, I had the same problem. What made it even more worse is that I was connecting into an API that made use of dynamic json generation. So many interfaces and other hacks to get the json documents to parse correctly.

tornato7 · on June 8, 2021

Is it a Go problem or a new-to-Go problem? I haven't written terraform plugins specifically but I have been writing Go for years and never find myself needing to write an excessive amount of boilerplate. There can definitely be some frustrations in dealing with dynamic JSON though. JSON-to-Go converters are your friend.

himinlomax · on June 8, 2021

I was not using anything special, I had implemented my own client for IPA. Te equivalent functionality in Python (ended up using Ansible to do my thing) uses just a few kB ...

Hikikomori · on June 8, 2021

Maybe try out https://www.pulumi.com/

xorcist · on June 8, 2021

Why not use something like Ansible instead?

It too is declarative. It too can be easily extended. It's also something a lot of people already know.

I used to use Ansible or Puppet for these things before Terraform was all the rage. It was a lot more stable than trying to distributing those state files, which is a strange design to pick. There are plenty of existing modules but it's also dead simple to write your own.

dtech · on June 8, 2021

I have limited experience with Ansible, but afaik calling it declarative when compared to Terraform is a stretch [1]

[1] https://blog.gruntwork.io/why-we-use-terraform-and-not-chef-...

xorcist · on June 8, 2021

It should be noted that the article is written to sell services for Terraform. It is unfortunately built on a few false premises that are never argued. Very few Chef developers would agree with Chef being somehow more imperative than Puppet, for example, seeing how the language was originally thought of as a superset of Puppet's.

The author does not specify which module is used for AWS, but it is not representative for how one would want to use Ansible for infrastructure. Writing idempotent playbooks is widely regarded as best practice in the Ansible community.

I have used Ansible for declaring node state in large production environments (not some dinky startup) and found it to be a very straightforward way to manage infrastructure.

akvadrako · on June 8, 2021

Ansible is not really made for managing cloud resources and it shows - the modules are not production ready.

throwdbaaway · on June 9, 2021

For GCP, both ansible modules and terraform modules are actually generated from https://github.com/GoogleCloudPlatform/magic-modules, so their "production readiness" are the same.

I understand that mitchellh himself personally created a bunch of cloud modules for terraform at the beginning, and those were likely of higher quality than whatever created by some internal developers assigned by Google/Microsoft, and might be slightly better than the AWS modules maintained by community.

Anyway, when it comes to ansible versus terraform, we shall move the discourse to states management instead. With ansible, you don't have to deal with states, but will need to clean up the cloud resources separately. With terraform, you can use the tool to clean up the cloud resources easily, but then you also have the headache of managing states. Plus, whenever you change something, there is always the nagging feeling that it will do a destroy/recreate instead of an in-place update.

gdubya · on June 8, 2021

I like Terraform for infrastructure, up to the point of creating the K8s cluster, then ArgoCD for keeping K8s in sync.

stuff4ben · on June 8, 2021

That's an interesting combo. What are you keeping in sync in K8s with Argo?

gdubya · on June 8, 2021

The operators we offer in our clusters (e.g. ECK, Prometheus, etc... the ArgoCD ApplicationSet generators make it easy to configure which features are installed on each cluster), as well as the applications developed by the development teams. Our work isn't complete yet (still working on sync for secrets and RBAC), but it's working nicely so far.

tdumitrescu · on June 8, 2021

Yeah, these days I try to avoid writing any HCL and instead feed Terraform with JSON generated via jsonnet (which we were already using to generate k8s YAML). Much better templating and language features while still remaining declarative, and it helps on a team to have a single source language for such configs.

Coryodaniel · on June 8, 2021

> Also the staticness of providers are a serious pain, for example you can't create a kubernetes cluster then add a resource to it.

TF def has some rough edges, but you can certainly create a cluster and add resources in a single root module (I don’t think it’s a great practice).

In this example the EKS cluster is in a module, but it can be a ref to a resource in the same module as well.

  data "aws_eks_cluster_auth" "current" {
    name = module.eks.cluster_id
  }

  provider "kubernetes" {
    load_config_file       = false
    host                   = module.eks.cluster_endpoint
    cluster_ca_certificate = base64decode(module.eks.cluster_certificate_authority_data)
    token                  = data.aws_eks_cluster_auth.current.token
  }

k__ · on June 8, 2021

I never used Terraform, I started with Vagrant, then CloudFormation, CDK, and now Pulumi.

I like Pulumi the most right now.

It integrates with services like Cloudflare and Auth0 and I can use TypeScript to write my code.

AcerbicZero · on June 8, 2021

I’ve had many similar frustrations about terraform, and the overall lack of visibility into what’s happening drives me mad at times.

A proper repl, with the ability to actually manage a config would be a huge step forward - I spend more time trying to figure out what vars get populated and how I can get a value into another resource than anything else. It’s like I’m constantly fighting with the HCL syntax to get what I want to happen.

clipradiowallet · on June 8, 2021

If you want visibility(spoiler: it's just API calls), try using `TF_LOG=DEBUG terraform <foo>`. You might also want to set `-parallelism=1` or you'll be treated to statements printing in an order you are not expecting.

unethical_ban · on June 8, 2021

Yep, the documentation is sometimes lacking, and the concept of moving variables in and out of modules is not intuitive, to say the least.

clipradiowallet · on June 8, 2021

> The biggest feature I would like to see is the ability to dump a pure representation of your evaluated configuration.

Are you asking for a dump of existing state or desired state? For existing state, see `terraform state pull`. For delta between desired+existing, see `terraform plan -out`. My apologies in advance if I completely misunderstood what you were asking for.

kevincox · on June 8, 2021

I am asking to dump the desired. So that I can diff the desired against the commit vs the desired of the last commit. I don't want to include production at all.