Hacker News new | past | comments | ask | show | jobs | submit login
Terraform 1.0 (github.com/hashicorp)
691 points by cube2222 on June 8, 2021 | hide | past | favorite | 308 comments



Terraform is such an underappreciated tool. It seems like so much of the hate surrounds HCL1 (back in Terraform before 0.12) and doesn't reflect modern Terraform.

For example, after introducing `for_each` and dynamic blocks, it's possible to nearly entirely ditch variables files and local modules, and just add more infrastructure by editing a local YAML file. The only variables your Terraform code should have should be credentials / other secrets that are not loaded from environment variables by providers. A great public example of this usage pattern is supplied by https://github.com/concourse/governance to manage their GitHub repositories.


My problem with this approach is that it's still too much "infrastructure as data" and not "infrastructure as code." Moving infrastructure data into flat files is not a clear-cut win over having it in a database - you get easier version control with external tools like git, but you everything that makes a database a joy to work with instead of flat files, like schema validation and easy queries, etc.

Things like for_each and variables exist because "infrastructure as data" would be incredibly tedious and brittle and hard to extend, but an approach that tries to get to "infrastructure as code" by starting with a data format instead of a programming language just seems like too big a gap to cross. I haven't seen a lot of teams unit testing their terraform, for instance.


But at the end of the day your infrastructure is essentially data not code. Your infrastructure is permanent, it exists even if it isn't being used it has inertia. At the end of the day your "infrastructure" is really just an entry in a database of a cloud provider, it is data not code.

I think we are seeing things come full circle again where people are finding the limitations of declarative infrastructure tools and decreeing declarative infrastructure dead and moving back to imperative infrastructure tools like Salt or Ansible.

Does anyone else feel that the infrastructure tooling environment/space is in the same place the JS world was 5 years ago?


> But at the end of the day your infrastructure is essentially data not code. Your infrastructure is permanent, it exists even if it isn't being used it has inertia. At the end of the day your "infrastructure" is really just an entry in a database of a cloud provider, it is data not code.

That may well be true, but it doesn't solve the problem (note also that HTML is just data, but we don't typically expect people to copy/paste the same HTML blob for every blog entry they write nor do we expect them to update each of them when they need to make a change):

We often have N very similar, large, complex YAML/HCL/etc objects that we want to manage with Terraform. If we need to make a change to all of them, we have to update N different places. Keeping these in sync is tedious and error prone. So we need to be able to factor out the common code into some reusable unit that accepts the bits that vary as parameters. Terraform's notion of "modules" is a great big acknowledgement of this need, although it's amazing that the whole time they were building this no one thought to themselves "guys, this seems really heavyweight and cumbersome for what ultimately is just a function" (and that general failure to notice that they were accidentally building a fully fledged programming language seems like an apt summary of Terraform's development).

Note also that there's nothing special about infrastructure as code here, this is a general application of the DRY principle.

> I think we are seeing things come full circle again where people are finding the limitations of declarative infrastructure tools and decreeing declarative infrastructure dead and moving back to imperative infrastructure tools like Salt or Ansible.

Just because you're using a programming language doesn't mean you're imperatively updating state. You use a programming language to generate the static configuration (e.g., the YAML) that verbosely describes the desired state of the world that the application engine can then diff against the current state to figure out what changes need to be made. This is sort of what Terraform is doing these days, but by all appearances they didn't realize what they were doing and consequently the programming language they built was predictably awful.


excellent summary of the problem.


It sounds like you're suggesting that there's some inherent reason why your infrastructure definition must have the same structure as the output of that definition (the infrastructure). I agree that the infrastructure is state, but it seems obvious to me that sometimes it requires non-trivial computations to decide on the desired state, something which is best served by code.

It's also a false dichotomy IMO that configuration files are the only declarative alternative to imperative tools like Salt/Ansible. You can have declarative code too: my laptop is running NixOS and its system state is defined in code (in a purpose-built language that looks much like a config file).

So really I think there are three approaches, not two, each with upsides and downsides which keep us all ping-ponging between them:

1. Config files are ideal for simple use cases, but a mess for complex ones 2. General-purpose programming languages are completely flexible, but allow you to create a huge unmaintainable mess 3. Dedicated declarative languages constrain you enough to mostly provide the best of both config files and code, but then you have to learn a whole new language, one which was probably conceived hastily (I find the Nix language awful honestly)

Some people need arbitrary computations to define their infrastructure, so I think pure config files are a non-starter from a purist's perspective. But, so far we haven't been able to come up with a programming language for infrastructure that isn't a mess to use.


Side tangent, but I'm curious as to why you list Ansible as imperative, when it seems to be declarative in how you configure a module?

Or is this a case of scope? (At the level of a single ansible module, it's config is declarative, but runbooks/roles are imperative? Is it the variable substitution/loop mechanics that make it imperative?)


In Ansible, you declare a set of actions, that are then performed, one by one. Occasionally being skipped, if a certain condition holds true.

So, you are basically saying "do this, do that, then do that".

In a declarative model, you would say "this is how the end result should look" and the tool would then go off and make that happen, in whatever order its scheduling tools would say.

Sort of the difference between Rust on one side, and Prolog on the other (yes, it is possible to get a specific flow of instructions in Prolog, but it is much easier to let the prolog interpreter/compiler to Just Make It Happen Somehow).

FWIW, Puppet gets closer to a declarative model, but unfortunately, the last version I played around with seriously was actually quite bad at inferring ordering on its own, so a LOT of work ended up going into "well, A has to happen before B, so let us string a dependency here".


I guess I need an example of the declarative model, as I can see the Ansible model in my head and it still looks declarative to me atleast at the singular module level.

in Ansible, you say "make sure these packages are installed" and they'll be installed as needed, to match that state, or ignored if already there.

Even the file level stuff you can say "make sure this line is in the file" and it either adds it or says "nope, that's already in there".

Is it that there's modules that aren't declarative? Sort of the esoteric ones to poke specific cloud infrastructure (though even the few of those I looked at seemed to be declarative if needed).


It is not declarative. The simple fact that the playbook will ALWAYS be run in the order you specify, even if a later step is (technically) a prerequisite of a previous step, means that you are in an imperative mode.

Puppet is declarative, you simply say "these things must, or must not, hold" and a combination of user-declared and inferred dependencies arrange the sequencing, which can be different in each run (as long as the before/after dependencies hold).


Ah so it's that the playbooks are imperative and "dumb" (does exactly what you tell it, rather than inferring "these actions must happen, do them in a sensible order"). That makes sense.

Dove into the puppet docs/wiki article, I guess part of the difference as well is that puppet considers each "unit" a resource, vs. ansible being a "module/action".

It does seem like ansible roles have a dependency mechanism, I guess that might be the intended level for a "declarative" approach in ansible, to encapsulate the playbooks/modules underneath that are more of an implementation detail at that point.


It's a lot better than it used to be. But there are still quite a few annoyances. For example, you still need to use count as a hack for the absence of any kind of "if". You can't make custom functions. Modules can be kind of awkward to work with. There are still some places that can't take any dynamic values such as lifecycle.ignore_changes and arguments to providers and backends.


The `count()` "hack" is so common that it barely qualifies as a hack anymore. It's just common practice and immediately understandable when you read code.


This reminds me of shopify's liquid dsl, a horror to work with, but you can just about make it do what you want, sometimes it feels like writing assembly to do string manipulation if they haven't built a function for your exact scenario.


I really prefer the Pulumi approach where you define the configuration in your favorite Turing-complete language.

Not sure why Hashicorp felt the need to reinvent the wheel instead of having a library in an existing language generate markup or JSON or something like that.


The biggest issue with Pulumi is that Pulumi doesn't support adding custom API providers. Part of the power of Terraform is in provisioning infrastructure, orchestration, deployment, and application configuration all in one tool. For example:

(aforementioned GitHub provider)

https://registry.terraform.io/providers/terraform-provider-c... for Concourse (CI/CD)

https://registry.terraform.io/providers/coralogix/coralogix/... (full disclosure: I work for Coralogix)

This would be completely impossible with Pulumi. If Pulumi didn't bless it, it doesn't exist in Pulumi's world. In the meantime, Terraform allows you to separate all the network calls to a custom provider and allow you to just focus on the configuration. The number of paid external APIs is only expanding exponentially, Pulumi can't possibly build and support them all in-house. Sounds like a current limitation of Pulumi's "use any programming language you want" design and something that really needs to be addressed; it's not that writing a custom Terraform provider is easy, but it is quite simple to get started by following any of the bajillion open-source providers as a sample template to get started from.


(Pulumi providers dev here)

This has been the case in the past but we are investing in our provider ecosystem. We built several first-party native providers that aren't based on TF: Kubernetes, Azure, Google. Now, we also encourage third-parties to build their integrations.

Here is a boilerplate repo of a resource-based provider: https://github.com/mikhailshilkov/pulumi-provider-boilerplat...

Here is a provider that is driven by an Open API spec: https://github.com/mikhailshilkov/pulumi-provider-boilerplat...

For simple use-cases, you've always been able to build Dynamic Providers in TypeScript or Python: https://www.pulumi.com/blog/dynamic-providers/

Please reach out if you want to build a provider and we'll definitely help you out.


> If Pulumi didn't bless it, it doesn't exist in Pulumi's world.

That has not been my experience. I have personally ported a Sentry TF provider into Pulumi, and I will grant you that their docs and examples are bordering on active user hatred for exercising the process, but it does work:

https://github.com/pulumi/pulumi-terraform-bridge#adapting-a...

https://github.com/pulumi/pulumi-tf-provider-boilerplate#rea...

What mystifies me about that situation is that I do actually appreciate the amount of silliness that is required to avoid using Pulumi cloud: they are not financially incentivized to make that easy, but I'd guess a lot more folks would nope right out if they didn't make it possible

However, I would think they'd want to make ingesting a TF provider into Pulumi as smooth and reliable as possible, so they don't have people close their browser tab when they don't find a supported provider for Pulumi but it exists in TF


> This would be completely impossible with Pulumi. If Pulumi didn't bless it, it doesn't exist in Pulumi's world.

This is only true (temporarily) for automatic plug-in installation - and was until recently also true of Terraform. In fact I had to reverse engineer the TF provider registry protocol because the documentation is manifestly incorrect, recently.

$WORK has lots of Pulumi plug-ins which they know nothing of the existence of, and it works fine.


Maybe I’m missing something, but I don’t think this is true? E.g., https://www.pulumi.com/blog/dynamic-providers/ There’s also an example of their blog on doing a schema migration with custom logic.


Why are you using Terraform for orchestration?


Can't agree enough.

Declarative programming makes sense for lots of things, React is a great example.

With such a big dependency graph for infra, adding loops and variables and templating to be able to achieve the same thing as Pulumi in a "declarative" way is ultimately just harder and worse than using a familiar powerful language with an SDK.


Worth noting that Pulumi IS declarative - the languages build a graph imperatively, but the evaluation is declarative in nature.


For me it's less about HCL annoyance nowadays, but more about discoverability. Using Pulumi I no longer have to memorize resource properties because I get IDE autocompletion.


Autocomplete is automatic in Intellij as far as I can see. I don't recall doing any kind of custom configuration to have it working. Autocomplete works on resource names, variable names, properties, etc.


Autocomplete for Terraform/HCL is available, too, though you do have to use specific tooling (e.g., VS Code with the Terraform extension) rather than the same tools you use to work on JS.


The specific tool recommended here is simply not very good - despite the language server efforts, the IntelliJ HCL plugin is worlds apart from the VS Code tooling (and has been for years). Unfortunately it's not open source - if it were it would mean the availability of an open source implementation of a production quality HCL2 parser for the JVM ecosystem, which would be very useful.


I have really liked the Terraform support in IntelliJ, but the "HashiCorp Terraform / HCL language support" plugin seems to have had its most-recent release on July 17, 2020[1]. And it clearly does not support a bunch of the newer constructs and properties. And that's just very unfortunate.

[1] https://plugins.jetbrains.com/plugin/7808-hashicorp-terrafor...


Any examples of things that aren't supported? It doesn't need to embed the metadata per-resource anymore.


I'm seeing errors on each.value.foo when using for_each. Also, this gives me errors:

locals { foo = { for bar in local.bars: "${bar.x}.${bar.y}" => bar } }

Then, optional(bool) is "is not a valid type constructor".

Those all seem "language" aspects. For a resource like "github_branch_protection" it seems to not recognize the right properties. That seems to be more of provider issue.


it took 5 years to get that useful for_each for modules though

so I'd imagine some people waited long enough that they moved on to better tools.


What better tools do you have in mind? Most of the people I know in the space have been moving _to_ Terraform, although CDK has improved enough over CF to be appealing for people who are all in on Amazon.


"better" is subjective. But Pulumi fixes a lot of the pain points of terraform for me.


Pulumi has been such a breath of fresh air. It is the only tool that actually feels like it encompasses "infrastructure as code".


Terraform CDK is a thing too, if you want to go beyond AWS.


for_each is an anti-pattern for reliable Terraform IMO. Not sure it was worth the wait and there isn't much out there that can compare with the simplicity of Terraform.


WAY more reliable than count which would do screwy things like rename a bunch of stuff and delete the last item if you removed an item from the middle of a list.

Complex architectures and reusable module encapsulation require a bit more complexity than HCL1 was capable of describing IMO (and apparently the O of most of the Internet). That doesn't necessarily make it less reliable.

Could I describe my infrastructure "reliably" just using raw resources with no loops? Sure but that sounds like a nightmare to both build and maintain.


'Fraid you lost me at "YAML". Anything built on YAML seems like it has to indicate bad judgment at the root.

There could be reasons, but I don't know them.


Its just an awful language.

Using it is like writing msdos batch files where you are constantly working around limitations and bizarre syntax.


> Terraform is such an underappreciated tool

Are you kidding? It's the go to tool even for people who are brand new to IaaC.

If anything I would say CloudFormation is underappreciated a lot of reasons why TF was created were fixed almost a decade ago. TF users are still citing those things as the reason why they use TF without ever using it.


I have not looked closely at CF for a couple of years, but in late 2016, I actively preferred TF over CF. But, I understand that the XML-only has since been changed and since that was the only real issue I had with CF...


I haven't done a lot of infrastructure work in the past few years so haven't stayed super on top of the latest changes. I last used it heavily in the earlier days, roughly 4-7 years ago now. And while a lot of the community was great, put in a lot of work on the product, and generally wanted to improve the tool, there were also a lot of very vocal stodgy old timers that were really resistant to any improvements from the very earliest days. It definitely rubbed me the wrong way at times and made me want to look at alternatives.

I remember some old threads about loops for instance, and a lot of the core community was fully convinced that it was a terrible idea, nobody should ever need loops, and if you're a complete weirdo who does want them you should just use a separate templating language to generate your terraform configs instead. And when modules were first released, the support for using them as a means of local code encapsulation and reuse was pretty weak (it would for some reason hard-code absolute file paths in the tfstate file IIRC, so if one person ran a terraform plan on a state file somebody else had last pushed it would always show up as needing to be changed even if it was already up to date). Again I remember core developers insisting that nobody needs features for local code reuse, and modules are only needed for publishing public resources that others can pull in.

Anyway, by no means do I hate Terraform, but I definitely associate it with being unnecessarily clunky and convoluted and full of gotchas even for fairly common use cases. In my opinion that reputation is pretty deserved and built up over probably a hundred hours of experience struggling with it a few years ago. I'm glad to hear that it sounds like that is changing, but I'd still be very cautious and carefully evaluate all the newer alternatives before rushing back to use it again.


Things have changed since you last used it 4 years ago, so it's probably unfair to judge the tool now based on how it operated then. Most of these pain points (code reuse, state management, more robust HCL features) have been addressed. The one major thing I'd like to see are better LSP bindings for IDE support.

Terraform has been a great tool and it's always surprising to me to hear people hating on it.


It's a fine tool, but all the other comments as peers of mine highlight the same kinds of issues I mentioned and got completed downvoted for. So clearly there is something to it. Nobody is hating on Terraform, just trying to avoid choosing a tool that makes their job more difficult than alternatives.


> very vocal stodgy old timers that were really resistant to any improvements from the very earliest days

As one of the three maintainers of Terraform (for the core and all providers) in that time frame, your characterisation is not particularly accurate - likely hence the downvotes.

Many of the “suggestions” in that time frame were “we should do something and ‘X’ is something so we should do ‘X’” - which is to a large extent how TF came into being.

From the earliest days, breaking changes were avoided - policy which was not retained through later versions.

While you may have heard some “core developers” claim that reuse was unnecessary (I can’t claim omnipresence), the HashiCorp official training that I taught during that time period _used modules extensively_ for this.


agreed. before terraform the alternatives were terrible. remember cloudformation? never again. I'd rather use good patterns around Terraform design than ever go back.


I've been sitting on the fence wrt Terraform and other such tools for quite some time now. After being _forced_ to finally write massive k8s YAML files (and ansible YAML files) for a consulting gig, I've been wondering whether these tools should be developed as _libraries_, that you glue together using a full-fledged programming language, instead of shoe-horning a programming language in YAML.

For example, could the following be library functions that you could glue together in the programming language of your choice: (a) get current state of infra, (b) calculate diff between desired state and current state, (c) perform a single step (safely) that represents a granular change in infra, (d) perform a series of steps representing infra changes with safe rollback?

Does something like this already exist?


You‘re pretty much describing the idea behind Pulumi which got a lot of traction lately.

Personally, I‘m still undecided on whether the unlimited freedom of a fully fledged programming language is a good or a bad idea in terms of footgun potential.

I‘m also still a bit unsure whether to play early adopter for an extremely hyped VC open core project even though it feels tempting.

Experiences appreciated!


Pulumi sounds interesting. Spent 10 mins with their marketing website and I'm not very clear whether it is a standalone set of libraries, or do they only work in conjunction with their cloud services. Do you know?


I've been using Pulumi for a new project after using Terraform for a long time. It's a little weird at first, but then it clicks and actually feels quite nice. The Input/Output logic with its async behavior is the weird part, but it works fine when you understand how it works.

The only (minor) problem that I've seen in it is that the JavaScript/TypeScript support seems more mature and featureful than the other backends. So, I'll simply use that.


You can use it standalone and manage the state yourself.

Looks like they actually might have added locking recently with https://github.com/pulumi/pulumi/pull/2697 but I haven't looked deeply


My experience is that it was definitely a foot gun.

There are too many ways to write fancy abstractions that are unreadable or not extensible, for example.


What is a foot-gun? Ansible/Terraform, or the library approach that I'm describing?


If I was doing things from the ground up, I'd pulumi it, I believe.

Terraform is, however, optimized for everyone under the bell curve.


You also have terraform cdk, which is currently in beta.


Best thing is Dhall that I am aware of. Same situation, working as a consultant, forced to use broken things.

https://github.com/dhall-lang/dhall-kubernetes


I'm closely tracking an effort by Microsoft that aims to do a lot of what you're describing since I find myself bridging between these tools and deploying stacks that span tools and roles. [CNAB](https://cnab.io/) and the front-running implementation, [Porter](https://porter.sh/), enable one-step infra deployments, packaged as a single OCI-compatible container, with any number of steps, using the best tools for each of those steps. Think of using aws-cli for some initialization step (create or verify presence of a state bucket), applying some terraform to create infra, and finishing with a helm chart to complete deployment of app components. Each stage in a bundle packages not only the code to run it but also the execution binary of the tool that runs it. The spec and porter are still a moving target but it's a promising space and a nice adjacent evolution of the current state of tooling.


My team does something similar to this. We write our Terraform configuration as Python literals with list comprehensions, conditional expressions, etc., then use a script to dump it to JSON which the Terraform command line can parse.

Here's an example: https://github.com/DataBiosphere/azul/blob/develop/terraform...


Supposedly Terraform supports it, though I haven't tried it: https://www.hashicorp.com/blog/cdk-for-terraform-enabling-py...


Congrats to the talented people at Hashicorp.

I love Terraform and have used it for years (before 0.12 I think). The workflow, meaningful diffs and reproducible 'infrastructure-as-code' gave a user experience that really was a massive step up to what I was used to (basically cloud console and scripts in CI).

In fact the Terraform workflow / philosophy inspired some of the design of an OSS 'data-as-code' tool (https://www.getsynth.com/) that we're building a company around. We wanted to use HCL instead of JSON for our config to start off with, but the Rust HCL parsers when we started the project weren't really robust so we settled.

Anyway, congratulations Hashicorp!


An interesting document is also what is actually covered by the 1.0.0 compatibility guarantee: https://www.terraform.io/docs/language/v1-compatibility-prom...


The v1 guarantee is they will break your code at anytime just like before v1.


That is an unfair characterisation of the policy in the link. It is not quite clear which subsets they are talking about at times, and it's definitely not complete but there is an effort there and it looks like most cases (by volume of usage) will be unaffected;

> The Terraform v1.x series will be actively maintained for at least 18 months after v1.0.


There is a dupe Terraform post on Hacker news frontpage. I'll post my comment here too :-)

I recommend breaking out your terraform code into separate folders and calling them "components". Write a wrapper around the terraform script to pass in -var-file which uses an argument called ENVIRONMENT that you pass to the wrapper. I think the built in support for modules is less useful for what you actually want to do because you end up with variables spread between variables.tf, outputs.tf files. I use a tool I wrote to layer my infrastructure with layers called components and I configure it with a Graphviz file.

My tool, called mazzle (previously devops-pipeline) would run parts of the graph that can run in parallel in parallel. It can also run parts of the build on SSH workers. You bring up the workers at the beginning of the build.

Here's an example of a graph generated from graphviz file: https://github.com/samsquire/mazzle-starter/blob/master/arch...

This graph brings up a hashicorp vault server, Java application, bastion proxy, consul, kubernetes, prometheus

here's the graphviz file:

https://github.com/samsquire/mazzle/blob/master/docs/archite...

It describes the ordering of the infrastructure, the invocation of Ansible, packer, shell scripts to set up vault etc.

The idea is to be able to bring up a new environment by changing one parameter. There's a React GUI too.

https://devops-pipeline.com


I've been using Terragrunt to keep my Terraform DRYU in a similar manner. It's a bit of a rethink in how you structure things but I've been happy so far.

https://terragrunt.gruntwork.io/


does terragrunt work with azure and GCP or just aws?


Terragrunt extends Terraform functionality so it works with all Terraform providers.


I recently had to do a piece of AWS work that required cross-account resources (create certificate in one account with ACM, set DNS entries on Route53 in another account).

Not sure about pulumi, but AWS CDK and CloudFormation can't handle that as one step (there are some horrific hacks). With Terraform it's absolutely trivial.

I was liking CDK up to that point, but that limitation is a complete deal breaker for me. Had to come back to my old friend Terraform.


Have you seen Terraform CDK? https://github.com/hashicorp/terraform-cdk


A CDK-style Terraform seems perfect. Seems a bit early days but I look forward to it gaining traction.


There are dozens of these examples. I switched a few years back after AWS released the automataic HTTP to HTTPS redirect functionality in ALBs and 6 months after release it still wasn't supported in CF. Terraform isn't perfect and it still has a ton of isues but it's rate of innovation is way a head of CF.


This is pretty straightforward in Pulumi. I recently built a stack that, in a single `pulumi up`, creates VPCs and subnets in a handful of different accounts with VPC peering, routing and DNS between each of them, including an AWS Client VPN set up so you can access all the VPCs from a single VPN endpoint.


Not sure if this official guidance is a 'horrific hack' but there are official AWS guidance on how to do this:

https://aws.amazon.com/blogs/infrastructure-and-automation/m...


Thank you, and yes in my books that's a horrific hack and too much effort compared to the 5 lines of code I just added to Terraform to get the job done.


I think the root reason for this is AWS stacks have to authenticate from a single origin (i.e. user credentials) instead of Terraform which can utilize multiple auths. This makes it necessarily complicated for AWS stacks when it tries to deploy another stack in another account, as the stacks are also account based. (but I imagine terraform stacks isn't).


FYI this has been supported in CDK for a few months now. See the CrossAccountZoneDelegation at the end of this section: https://docs.aws.amazon.com/cdk/api/latest/docs/aws-route53-...


People tend to complain about HCL a lot, I think it’s a great language for infrastructure. I don’t want a “real programming language” for provisioning infrastructure. I feel like every time I’ve seen someone “need” a real programming language, that there is a _better_ way to do the task at hand with HCL.

That being said, there are some ugly bits.

1. Remote state as a data source means your infra is broken, you just don’t know it yet. Two apply’s have to occur to get your infra in the correct state, but they are separated by an arbitrary amount of time between executions. Even if you automate it with CI/CD, your second root module could be broken until run since it depends on the output of the other module.

2. Public modules are absolute garbage. Go find the best one, it’s trash. Here is why, 10-20 orgs all come in and tweak the module to work for them. You’ll often see 1-10 resources in a module (sometimes more), but the module will end up with more _input complexity_ than the underlying resources. Sometimes even more inputs than all the original resources combined! In the end, you get a module that “works” for everyone, with a half baked “DRY abstraction” for N number of organizations.

3. Organizing code is hard, because we often don’t fully consider environments/workspaces, infrastructure ownership, change management, and other sociotechnical concerns. I think Terraform and IaC in general is the epitome of Conway’s Law and when the (changing) social structure of the organization isn’t followed, the code gets harder to work with. This point is at odds with #1 above.

4. People tend to think “terraform apply” is a magic transactional boundary around your infrastructure. If it applies, it worked!!! But in reality, if modules aren’t crafted correctly they can “apply” cleanly, but still introduce an outage while they are executing.

All that said, I’m excited for the 1.0 release. I love terraform. Thanks to all (except module authors) for the hard work.


I'm not sure I understand #1.

Your points would still apply if a resource (e.g. aws_instance.foo) is created in one module and then referenced as a data source (e.g. data.aws_instance.foo) in another module. Are you suggesting remote state is different? Or would you also advise against referencing data source attributes from resources created in other modules?


Oh for sure, that’s point #4, but at least it’s in the same apply.

In #1 there is also a tight coupling between two different sources. If team A changes their output, the dependent team B's references break.

Also 1.2: security. If I can read an attribute from your state file, I can read the whole thing.


hey this is super random and not related to your comment above, but I saw your comment about honey and how you worked in this space. I was wondering if you'd be open to chatting about your experience in this space. (working on something in the affiliate space). Really appreciate it! spencerbratman [@] gmail.com


I hate Terraform with a passion but it is probably the best tool out there for managing cloud infrastructure so I use it at work with no plans to replace it.

The biggest downsides are the awful half-baked language and the awkwardness of modules and passing values throughout your config. Also the staticness of providers are a serious pain, for example you can't create a kubernetes cluster then add a resource to it. The work around is to use two separate Terraform stacks which brings a lot of pain for passing values across the boundary. Furthermore you can no longer effectively plan any change that affects the boundary between the two stacks. "Luckily" Terraform's performance is so bad that you need to split the stacks anyways.

The biggest feature I would like to see is the ability to dump a pure representation of your evaluated configuration. This would allow reasonable diffs in CI. There are of course complications, especially if you use `data` resources but technically it is possible to do a very good job here which would make it so much easier to make changes.


I strongly agree both with respect for the half-baked-ness of the language and with the "it's probably the best out there". Ultimately, these tools should have a static/yaml-like "assembly language" that describes the state of your infrastructure without any of the DRY. There would be a diffing engine which would figure out what changes need to be applied and apply them accordingly. Users could use some vanilla programming language to generate that yaml in a DRY way; then the Terraform folks don't need to badly reinvent a programming language.

I know they also have a CDK, but I can't tell if it properly solves that problem or if it still forces us into Terraform idiosyncrasies (i.e., if I rename something in Terraform, it will try to delete the corresponding resource and recreate it, and I think that absurd behavior remains with the CDK).


100%. Terraform is half-way between a tool for generating the configuration and applying it. I think Terraform's application engine is actually quite good, but I would like to use a much better tool to generate the config. (And be able to diff that config)

You can feed JSON to Terraform however this falls over if you need dependencies for output values. This usually isn't an issue because most Cloud provider resources have predictable IDs but as soon as you have one that doesn't you are up for a lot of pain and suffering.


You may be interested in Pulumi: https://www.pulumi.com/

Basically it's Terraform but instead of declaring your resources in HCL, you declare them in a real programming language. You're still producing a declarative config that the engine then diffs, applies etc. In fact, it's compatible with existing terraform providers, so it has a surprisingly large selection of things you can use it for.

Note their docs will try to guide you towards using their hosted service which basically does nothing except host the state file, but you can use an S3 or GCS bucket instead and it works fine.

It's definitely not without its own problems, but I'd say it's overall an improvement.


Unfortunately last I checked, pulumi only offers state locking with their paid service. If you want to self-host you have to implement it yourself, which seems like a non-starter for a lot of people.


This was addressed a couple months ago in https://github.com/pulumi/pulumi/pull/2697


Wow it took 2 years for the PR to get merged.



Glad somebody mentioned Pulumi. It solved all of the major problems I had with Terraform.


Not with that licensing thanks


It looks like it's Apache 2.0 licensed? Wh issues do you have with that licemse.


It’s Apache 2, isn’t it? What’s wrong with that?



Someone should make a Clojure demo of those Java bindings, or even cljs. I hope Clojure has good type based completions these days, because it would be a fantastic language for this.


It’s pretty wild that the object identity via name thing is still a problem. Can they not add a transitional name feature where an object is known by multiple aliases for a while and then when you have finished putting though a change, you can delete the original name? Is this not very basic SQL migration practice? Like column aliases until no longer needed.


I don't even understand why the state needs to know the identifiers that the high level language uses for various resources. If the high level language has a binding "foo_bucket" for an AWS S3 bucket resource with a single property `name = "foo"`, then why should the state need to know that the high level language refers to that bucket with the name "foo_bucket"? Instead, the state should look something like this (obviously simplified):

    {
        "resources": [
            {
                "type": "aws_s3_bucket",
                "properties": {"name": "foo"}
            }
        ]
    }
Note that there is no reference to "foo_bucket".


This doesn't make sense to me. You need to know the logical identifier in order to explicitly link the code with the resource. Otherwise if I change the code for that resource how does TF know what it needs to change if none of the existing resources in state matches the new config? Do you just always destroy and re-create every time there's a change to anything?


> Otherwise if I change the code for that resource how does TF know what it needs to change if none of the existing resources in state matches the new config?

A resource provider defines a collection of fields that is the "identifier" for the resource. For example, an S3 bucket resource would have the "name" field for its identifier.

If you change another attribute besides the bucket name, the engine will see that the input and the state both have a s3 bucket resource with the same name but different props, so it knows it will need to update some props (rather than create a new one). However, if the name changes, the engine will see that the input has a bucket that doesn't exist in the state so it will add a "create bucket" step to the plan. It will also see that the state has a bucket that isn't in the input, so it will add a "delete bucket" step to the plan.

Maybe another way of saying the same thing is that a resource provider can mark any given field as "forces replacement", and all of the fields that force replacement are the de facto identifiers? I haven't thought through whether these are exactly equivalent.


The "identifier" is often something that's computed later or returned from the API. Think about something like an ec2 instance - the identifier is the instance ID that's returned from AWS. You can have many instances that basically look identical so how do you differentiate which one this logical resource is referencing?

And back to the s3 bucket use case sometimes you want uniqueness in your name so you use a prefix instead of specifying the whole name - how do you determine which bucket that resources is referencing if there are multiple buckets matching the prefix?

I hear what you're saying in terms of wanting state management to be simplified, but pretty much every IaC solution uses this explicit logical resource -> physical resource mapping in state.


Yeah, moving objects around the config is common if you want to keep it organized and requires manual actions that require essentially a global lock on the stack (and Terraform has no built-in feature to actually take this lock). It makes it basically impossible to implement a fully automated production change pipeline with Terraform.


Moreover I can never, ever, remember the syntax for moving objects around the config. It's really painful.

Edit: the aliases would have to handle moving as well as renaming. You could just have aliases in a global namespace, which means adding `alias = "portable-elb"` and doing one `terraform apply` means you can pick up that config, drop it anywhere else, and it will move it for you. It wouldn't even need to do a full `apply`, just a local JSON manipulation.


> application engine [vs] tool to generate the config

I get it from HashiCorp's perspective though.

A robust application engine with a suboptimal config generator is a viable product.

A suboptimal application engine with a brilliant config generator is not.

So given limited resources, former gets the dev grease.


This is a false dichotomy.

You can generate these configs really easily with any off-the-shelf programming language for a small fraction of the effort they’ve put into HCL + all of the stuff on top that makes HCL the shitty programming language that it is.

Even if you insist on building your own programming language for this purpose, Hashicorp could’ve saved themselves a lot of work by looking at the prior art of the last 70 years of programming language history.

In other words, if they just picked, say, JavaScript from the start they could have saved a bunch of time and energy and put that into their application engine.


> You can feed JSON to Terraform however this falls over if you need dependencies for output values

This is what I've started doing with Jsonnet for generation, and also exactly why I've stopped doing it.


I'm not sure I follow exactly what you're missing. `${aws_instance.example.x}` as a string value creates the same dependency as it would via HCL when used with JSON.


Same here, I don't see how outputs is being treated any differently by Terraform than any other .tf file written in HCL. I'm not saying it's not possible, but I haven't experienced a failure more there yet.


Thanks for the hint, now I'm not sure what went wrong when I tried something like this. I should read up on this more.


What are some of the tools that do this? The only ones I know of are Scalr and Pulumi.


> Ultimately, these tools should have a static/yaml-like "assembly language" that describes the state of your infrastructure without any of the DRY.

CloudFormation ?

> There would be a diffing engine which would figure out what changes need to be applied and apply them accordingly.

CloudFormation.


Problem with CloudFormation is that it doesn't work with Cloudflare, Azure, GCP, Big-IP, Palo Alto, NetBox etc..


Its a problem only if you use these vendors, you don't have to.


It's a pretty tough sell to tell people they have to uproot all of their existing infrastructure and move to Amazon just to use an infra-as-code tool.


It's also unlikely that you will only use AWS, forever. At some point in time you'll have to deal with various resources (be it IT resources, time, money or people-as-a-resource), and whenever you bind your knowledge and workforce to an IaC tool that doesn't transfer or isn't portable you're going to end up with N+1 tools every time. In other words: it doesn't scale all that well. (And that doesn't mean Google-scale, but going from 2 IaC engineers to 5 IaC engineers is much harder if you can't apply universal tooling)

Tools are never 'just tools', there is context and there are externalities. And as you already pointed out: migrating/uprooting all of those other things isn't a likely scenario.


Agreed. If you use an auth service (SaaS or self-hosted) that isn't AWS Cognito you will also find yourself wanting to integrate with your IaC tool. Having to roll this yourself with CloudFormation is a lot of effort, or at least it was last time I looked, and importing a third party "provider" wasn't really a thing.


Fun fact: You don't even have to use Terraform


Yeah, CloudFormation is workable in this regard (I've created a neat generator for Python), although it has lots of its own problems (e.g., if you want to create a new resource, you have to run it as its own lambda--your infra-as-code needs its own infra which needs its own infra-as-code).


> I've created a neat generator for Python

care to share? (I know some hn users often don't w/o being asked, out of a sense of not wanting to be seen as self-promoting.)


It’s hanging out in a private repo with a bunch of other stuff and I don’t care to put it in it’s own repo at the moment. Basically CloudFormation publishes a JSON spec of all of their resource types and I use that to generate Python code with type annotations. It’s sort of like Troposphere, but I go further—Tropo makes you reference resources by their cloudformation string names, but my tool lets you use the Python object containing the resource and it will resolve to the correct CloudFormation “Ref” object at compile time. (also, unlike tropo, I generated my Python types from a spec so I don’t have to keep up with AWS changes). That said, I’ve given up on CloudFormation altogether since Terraform has better support for resources outside of AWS.


>if you want to create a new resource, you have to run it as its own lambda

Please don't, lol


> they also have a CDK

Terraform-CDK, as of now, needs to go through standard HCL parser. Sadly, there is no backdoor into Terraform's internal structures. If HCL (as a language) is the limitation for you, the CDK does not let you fly around it.


This would be great. Perhaps it could be based on https://dhall-lang.org/


I absolutely think a statically typed language is the right way to go (from experience using a Python->CloudFormation generator even with Mypy), but Dhall is going to be really unfamiliar for most people and it's hard to sell people on new languages that are syntactically unfamiliar.

As an aside, I think functional concepts could have made their way into mainstream programming much earlier if the FP people would have been willing to lower themselves to syntax that is readable to us plebs--I think this is no small part of Rust's success. People say syntax doesn't matter, but I disagree.


https://cuelang.org has better syntax but its logic based unification is a struggle bus for many people.


I looked at Cue and I don't understand what problem it solves. It certainly doesn't (seem) to solve the problem of DRYing up verbose YAML, or at least it's missing any notion of a function.

"hey, these YAML blobs are all mostly the same, but they vary based on a couple of parameters--I should write a function that takes those parameters and outputs the right YAML object"

^ This is the #1 thing that the high-level language should concern itself with. Static typing is really nice to have and it's cool that Cue has a pretty interesting type system, but (as far as I can tell) it doesn't have functions. It almost has functions, but I don't want to have to resort to a hack for the #1 thing that I care about (functions).

Considering I prefer functions over sane syntax (although sane syntax is roughly tied with static typing), I'm inclined to prefer Dhall over Cue, but I'm still optimistic that something better will emerge. Also while we're on syntaxes that are deliberately obtuse, I'm pretty sure the Nix community has a Nickel language which is basically a statically typed version of the Nix language.

Maybe Cue has a more enlightened way of thinking about the infra-as-code problem and I'm just not getting it.


CUE's philosophy is to wrap code in data, not data in code, as learned from the major configuration systems at Google. Being a logical language, rather than telling the computer what to do, you state facts and it verifies that you are correct. It is also intentionally not Turing complete do that you cannot program in CUE.

CUE is gaining traction while still being young and changing. Grafana is adopting it for validating dashboards and such. Expect to see it more in DevOps too


When I stopped being an SRE at Google, my most immediate thought was relief that I would never, ever, have to deal with BCL/GCL again.

After 6 months outside Google, I desperately wished for BCL/GCL to be everywhere, because all other config languages were just plain broken. And more annoyingly, there's no better way to describe it than "I have seen better, just trust me".

CUE seems to be a step forward. Flabbergast looked like it might have been a contender. The latter is DEFINITELY inspired by BCL/GCL.

At some point, I will have to sit down with CUE and try to re-implement the "perfect little horror" in it (it should be impossible IFF CUE is not Turing-complete, but it actually turns out that there are edge cases of configuration where you want that Turing-completeness).


> Being a logical language, rather than telling the computer what to do, you state facts and it verifies that you are correct.

Sure--it's like advanced static typing for static configuration. But that seems like a different and lesser problem than DRYing up the configuration in the first place, and moreover if you use a statically typed programming language to DRY up your configuration then you get pretty similar guarantees to Cue. You don't get Cue's "unifying many definitions" approach, but I can't honestly discern the value proposition in that.

As for turing incompleteness, that's a nice to have at best. If I had to choose between a turing incomplete declarative language like JSON and a turing complete imperative language like Lua, I'd take the latter every single time.


Nah, the syntax is superficial. Scala has offered better-than-Rust FP in a traditional syntax for over a decade, but if anything the tension between imperative and functional people is worse there.



You can reduce a little bit of the repetition in YAML with anchors.

There are tools that convert JSON/YAML into HCL.

https://learnxinyminutes.com/docs/yaml/#:~:text=yaml%20also%...


I think you misunderstand the problem I'm trying to solve, or maybe I misunderstand your response. My goal isn't to write YAML instead of HCL, my goal is to get rid of HCL and Terraform semantics altogether. If I had my way, Terraform's low level engine would operate on a verbose (i.e., "not DRY") YAML (or JSON or HCL or I don't care) description of resources which would be generated from (for example) a Python script.

The Python/Go/etc script is what humans interface with, and it is DRY. The YAML/HCL/etc is what the Terraform engine operates on and humans should very rarely need to interact with this.


Ah, so like you have some process which generates your YAML/HCL, which is your "IR/assembly" layer, not meant for regular human consumption/editing, which is fed to Terraform. But it's readable/auditable, VCS-trackable, and diff-able.

I do that a lot as well and in fact I'm kinda leaning towards taking that approach from the get-go. Right now I start with the YAML, but then something makes inevitably leads me to templating it using make + jinja/gomplate, which eventually leads me to wanting to use python scripts, and then invoke (python package, it's like gulp or make).

It's not code, like business logic code, but it's too verbose and repetitive for human manual editing.


Yeah, in the Kubernetes world, the official interface is the YAML/assembler and different people have come up with different approaches for generating that. Helm for a long time (and even currently) uses text templates (e.g., jinja, mustache, etc) to render that YAML which is predictably abysmal.

CloudFormation used JSON (and eventually YAML) but built on top of it language-like facilities (the ability to reference resources, call pseudo-functions, etc) all very poorly. So you get an impoverished language built on top of YAML.

Terraform decided they would do approximately the same thing, except they reinvented their own JSON/YAML alternative (HCL) and built a crappy programming language atop it (instead of atop JSON/YAML).

These all give you pretty crumby means of abstraction. CloudFormation you get nested stacks instead of functions and you can only pass scalars around (no objects or lists--except comma-delineated strings which can be parsed into a list of strings). You're also limited in how many nested stacks you can create and how many total parameters can be passed into any given top-level stack.

Terraform seems strictly better. You can pass objects and lists and I've never approached any parameter limits, but still, you have to create a whole directory just to define a function and refactoring existing code into a module is painful because it means renaming resources (putting them under the module) which Terraform interprets as intent to destroy and recreate the resource.

Helm is using text templates so you can even generate syntactically invalid YAML! I think they might be supporting Lua these days, but I haven't looked into it.

I think the idea was that the whole marketing push behind infra as code was "it's just YAML! Such declarative! Wow!" as though yaml magically simplifies the inherently complex task of infrastructure, so everyone started with something YAML-like--even though we absolutely should have known that we would need to abstract--and gradually built our own half-baked languages on top of them. Of course, infra as code is absolutely worthwhile, but it's the ability to define what you want and have a tool reconcile it with some current state--it's not some magical property of YAML/JSON/HCL/etc.

fin.


That's an accurate summary of the arc of progress in this area. Also explains why so many folks are now turning to operators (versioned procedural code that runs in k8s and does arbitrary things, rather than arbitrary versioned yaml artifacts applied to k8s) to do advanced stuff rather than layering on more templating duct tape.


> these tools should have a static/yaml-like "assembly language" that describes the state of your infrastructure without any of the DRY

the last five words are a bit of a double negative; i think you mean "without the repetition" but I can't tell.


"without DRY" in this case means "with repetitions" i.e. in a verbose way. GP wants to be able to generate this verbose, machine readable syntax with DRY, human readable syntax.


Yes, this. Thanks for clarifying for me, apologies to the parent for my lack of clarity.


Dang, your solution sound so much like kubernetes I'm not sure if you are joking or not.


Kubernetes is one conceivable incarnation, but it operates differently than other infra-as-code tools. Terraform, for example, builds a dependency graph of your resources and initializes them in order. Kubernetes doesn't care about dependencies, and it just keeps trying to create resources and things will fail until their dependencies come online.

Further, Kubernetes manifests are the verbose "assembly language" layer, so you still need something for humans that is DRYer.

We use Terraform to manage Kubernetes resources (as well as cloud provider resources) at the moment, but I think you can equally use cloud provider operators for Kubernetes and manage everything with Kubernetes--I haven't tried this yet so I can't comment. In the latter case, you would still need something to DRY up your Kubernetes manifests. Also, if you aren't running on Kubernetes and you just want infra-as-code, k8s is an expensive solution (in terms of operations).

What I was picturing was a more conventional infra-as-code diffing engine (like Terraform's) but with a more verbose interface similar to Kubernetes YAML.


> Kubernetes manifests are the verbose "assembly language" layer, so you still need something for humans that is DRYer.

It's a little more than that. Out-of-the-box manifests for primitives are certainly assembly-like, you're right--but CRDs allow you to operate at a higher level of abstraction while staying in the same syntax, which is powerful and unique to k8s (everything else, from Helm to Terraform to Ansible, distinguishes between pseudo-assembly "language that directly expresses changes to be made" and "language that humans can write abstractions in").


> "Luckily" Terraform's performance is so bad that you need to split the stacks anyways

Not sure what about terraforms performance is so bad. Seems hard to blame a tool who's main execution path is potentially 100's of network IO requests with 3rd party API's. Most of the "split stacks" I've seen is more for code organization and security reasons rather than performance. Seems safer to know 100% that deploying infra for my app isn't going to mess with my VPC settings and can be executed with a lower privileged role.

> Furthermore you can no longer effectively plan any change that affects the boundary between the two stacks.

That's fair -- you do end up with these "foundational" modules a lot of the time. Like an 'aws-account basics' module or something that other modules expect the account to be setup with that base for being able to query data objects for subnets ect... planning changes if that changes be difficult but not impossible. Good versioning is critical. Feels in the same vein as apps that need to manage framework updates and things like that. (though can be made more difficult or easier based on how you've broken up using your cloud provider -- multiple accounts by buisness unit or all in one).


Our experience of building a provider: performance is fast with fast APIs, and slow with slow APIs. Haven't observed any of the core diffing, DAG, or apply scheduling to be problematic (but also haven't tried an apply at extremely high - 10^4? 10^5? - resource count)


> The biggest feature I would like to see is the ability to dump a pure representation of your evaluated configuration. This would allow reasonable diffs in CI. There are of course complications, especially if you use `data` resources but technically it is possible to do a very good job here which would make it so much easier to make changes.

The planned state, current state, and diff of them are all available as separate fields in the Terraform plan file, is that not what you're looking for?


The key word is "pure" here. These things all depend on the current state of the infrastructure. The "planned state" is close to what I want, but it can be very confusing if someone has deployed a new change since you forked off.


Yeah. I have a poor view of terraform since my first interaction was trying to a few one line changes to avoid repetition but couldn't find why it didn't work without setting up connection to the AWS S3 bucket.


Have you tried Terragrunt [0]? It helps a lot with managing a set of related stacks. Still feels like a bandaid on a broken model, but it is what we have.

[0] https://terragrunt.gruntwork.io/

Regarding performance, last time I looked, Hashicorp's documentation implied there was no limit to the size of a Terraform stack. I think they meant theoretically in a science fiction universe where humanity had captured all of the sun's output to perform terraform plan and apply...


+1 from me on the "awful half-baked language" (HCL).

I just recently wrote an article about my experience, including issues and workarounds, when migrating from Terraform to Pulumi: https://blog.ekik.org/my-experience-migrating-my-infrastruct...

Hope it's OK that I'm sharing it here. I think it's relevant because there seems to be quite a lot of interest around Pulumi, and how one would go about moving from Terraform to Pulumi.


I'm actually thinking of going the other way. I've been using Pulumi for several months now, and I'm thinking of moving to Terraform, because it has a so much larger third-party ecosystem, including more providers, and tools that can analyze HCL, like Infracost and security scanners. When will I learn to see the bigger picture and value popularity over quality?


It's a very interesting point.

I've been part of managing rather large Terraform infrastructures (1000+ resources) for a couple of years, but I'm a Pulumi n00b with only about a month of experience.

The infrastructure I'm managing right now with Pulumi is much smaller, only around 130-140 different resources.

For me it ultimately came down to developer productivity. I'm much better at convincing Pulumi to do what I want compared to how it was with Terraform. This also makes me a much happier and less frustrated developer :).

My priorities might very well be different if I were to manage much larger infrastructures (infra cost would be more important for example).


The stack I manage with Pulumi is currently around 300 resources. (I think that count is inflated by all the secrets in AWS Secrets Manager, because each secret has two resources: the secret and the current version.) I currently manage it by myself, but I'm hoping that won't be the case for very long.

Maybe the ending of my previous comment was too cynical. But I think I've repeatedly made the mistake of valuing my productivity and happiness as a currently solo developer over what will let my company take full advantage of a big third-party ecosystem (including a large talent pool).


I don't think you're too cynical at all - I think you're exactly right! It's often much more sensible to use the "tried and true" stuff most of the time.

In my particular case I don't plan to have my company grow much at all - we're staying small. I think Pulumi is a sensible "bet" for me, because it does what I need right now really well. Sure, there's a bit of a risk, but worst case scenario I would spend a day or two to migrate what I have back to Terraform.

I would definitely not have made the call to "let's just switch everything to Pulumi" if I was still working at a larger company. As you said, a large talent pool / community is a huge deal when you have the option to hire people who can spend time learning a particular tool or language.


I work in a very large shop with lots of TF and we do not use any of the "ecosystem" other than Terragrunt. Almost all of it is experimental junk.

We use almost entirely one provider, with things like a "template" or "random" provider as well, which are really just core features they decided to split off into plugins. Even when we use SaaS that there is a provider for, we don't use the provider, because we aren't constantly changing it, or managing it doesn't require lots of people across multiple teams with multiple iterations and modules.


+10 from me on the "awful half-baked language" (HCL).

Only cmake's 'language' is worse.


People mention pulumi but hashicorp are creating something similar with https://github.com/hashicorp/terraform-cdk. But all the existing terraform providers work with it afaik.


I don't know if people have even tried Pulumi before recommending it.

I've tried it, and it has buggy defaults, diff generation, etc. Each time I applied the same code, it would generate a diff based off of some internal defaults and... recreate the exact same infrastructure by _tearing it down_ and making it fresh. Not ideal.

Would advise using the TF CDK specifically.


The token system is broken in TF CDK still and it's not ready for adoption. I've built two stacks with it but I'm back at terraform for now. I intend to explore pulumi though when the opportunity presents itself.

I think using a Turing-complete language like typescript with mature tooling to define cloud infrastructure feels very natural and makes things much more manageable than using HCL.

One thing I absolutely can't do without is the state management api terraform provides with its CLI. This is absent from terraform-cdk and aws's CDK, although many of the same APIs seem to exist for pulumi.


> I think using a Turing-complete language like typescript with mature tooling to define cloud infrastructure feels very natural and makes things much more manageable than using HCL.

Fully agree. Not sure if any of the CDKs (or Pulumi) get the ergonomics right though. The ergonomics should feel like we're just generating YAML/JSON/etc, but the CDKs I've seen require inheritance, mutable state, etc.

> One thing I absolutely can't do without is the state management api terraform provides with its CLI. This is absent from terraform-cdk and aws's CDK, although many of the same APIs seem to exist for pulumi.

AWS's CDK is built on CloudFormation, so I don't think it has analogs for Terraform's state APIs. As for TF CDK, I would think you would just use Terraform's CLI state management directly? Maybe I'm confused about what you're trying to do?


@throwaway894345 You can, but that means you have to introspect the generated code to determine terraform resource ids etc. A really bad developer experience on large stacks.


> This is absent from terraform-cdk

Curious to know how that is, or what an example would be? I don't see how you would have to give up state management with CDK, which I understand to be extending TF, not supplanting it.


@polynomial - You have to use the state API on the generated terraform. This means that you need to understand the structure of the generated terraform, and are dealing with generated .json files that require introspection to determine what terraform resource ids are prior to managing their state. It is possible to do, but if you're writing code, you don't want to have to worry about the generated json.


I wouldn't recommend using cdktf either yet. Can't manage multiple stacks in a single repository, no full support for input variables, constant breaking changes. It's not production ready at all.

Stick with terraform if you need to provision non-aws resources. Otherwise, use aws-cdk.


I do multiple stacks via changing the state file based off of env:

  constructor(scope: Construct, name: string, c: StackConfig) {
    super(scope, name);

    new S3Backend(this, {
      bucket: "some-bucket-here",
      key: c.name("state-env"),
      region: "" // wherever
    });
  }
 
  // ... at the bottom of main
  new Stack(app, 'something-something-dev', { environment: "dev", name: (i) => `${i}-dev` });
  new Stack(app, 'something-something-prod', { environment: "prod", name: (i) => `${i}-prod` });
Then you can use stacks properly.


Support for multiple stacks in a single file was added to cdktf recently. I’ve been managing dozens of production stacks in a single repo for a while now and highly recommended it.


And yet if you try to pass values from one stack to another, it will fail spectacularly.


> Each time I applied the same code, it would generate a diff based off of some internal defaults and... recreate the exact same infrastructure by _tearing it down_ and making it fresh. Not ideal.

Not quite the same, but in vanilla Terraform if you simply rename a resource it will tear it down and recreate it even though the resource itself hasn't changed. Makes refactoring really painful. I think you can work around this by renaming the state as well as the resource, but this is often a lot of work (and a bit of risk) just to rename an identifier so I don't bother. I suspect the CDK doesn't solve this problem either.


  terraform state mv [old name] [new name] 
I'd much rather explicitly state when real resources are renamed than have terraform diffing my code and guessing whether I wanted to rename it or I am actually trying to recreate something. I can only imagine the headaches that would happen with a tool trying to track changes to infra as well as changes to code without explicitly tying infra state to version control somehow.

https://www.terraform.io/docs/cli/commands/state/mv.html


> I'd much rather explicitly state when real resources are renamed than have terraform diffing my code and guessing whether I wanted to rename it or I am actually trying to recreate something.

But you're not renaming real resources, you're just renaming the Terraform identifier that corresponds to them. There's no reason that changing this identifier should destroy and recreate the resource it corresponds to. If you explicitly want to destroy and recreate it, you can change an attribute that forces a recreation (typically a "name" field or whatever identifier the resource's provider cares about).


OK but how does Terraform know you are renaming a resource? It is not a daemon always running and watching everything you type. It only gets a snapshot of your code to work from when you run it, it doesn't know what your code was before, just the saved state from your last run and the real state in your cloud provider. The only way it can track the state is through the name which you have provided it, if you change that name it cannot know without inferring something. Maybe it matches up all the attributes in your code and state and infers that a rename has happened. What happens when only 95% of attributes match? What happens when multiple things match (An ec2 instance only requires 2 attributes so this is plausible)?

Example 1:

You have 2 essentially identical EC2 VMs with terraform names vm1 and vm2. You decide these are not good descriptive names so change them to webserver1 and webserver2, before running that change you also realise you only need 1 of the servers so delete webserver2 from your code. Terraform runs a plan and sees there is now only a single VM definition but 2 VMs in state. Neither of the terraform identifiers match the original resources. How does it know which one was renamed and which one to delete?

Example 2:

You use Terraform for IaC and something like Chef for configuration management so your Terraform code exclusively deals with the "hardware". A service is being migrated to a new implementation so you need to delete the old VM and bring up a new one. Both old and new implementation have the same exact hardware requirements. You make the change in your Terraform code, deleting the old resource and creating a new one with the same requirements but a different name, and run a plan. Terraform tells you there's nothing to change because its inferred that you wanted to rename.


> This experimental repository contains software which is still being developed and in the alpha testing stage. It is not ready for production use.

Not sure how much you'll want to invest in being essentially an alpha tester. That being said, if you're currently using Terraform and can wait, it's worth keeping an eye on.


Right, tfcdk and k8scdk are a thing.

Pulumi is also integrating with TF.


> for example you can't create a kubernetes cluster then add a resource to it

I have no love for HCL, but you can do this by creating a kubernetes provider with the auth token pointing at the resource output for the auth token you generated for the cluster.


Yes, however this will work (typically) if the cluster already exists (a previous run), but typically not if you creating the cluster, and kubernetes provider, as part of the same run.

IIRC you'll end up with a kubernetes provider without auth (typically pointing at your local machine), which is 1, not helpful, and 2) can be actively bad.

I believe the core issue here is that providers don't have the ability to specify a `depends_on` relation: https://github.com/hashicorp/terraform/issues/2430


This works even without the depends_on property. All you need to is have the module you use for creating the cluster have an output that is guaranteed to be a computed property.

Then use that computed property as input variable for whatever you want to deploy into Kubernetes.

We're using this with multiple providers and it works. Of course, an actual dependency that's visible would be better.


I'd love to see an example of this actually working, because I have had the opposite experience (explicitly with the Kubernetes and Helm providers); I've had to do applies in multiple steps.


This should work (as in, it will create the cluster and only then add the k8s resource to it, in the same plan/apply).

Here the module creates an EKS cluster, but this would work for any module that creates a k8s cluster.

  module "my_cluster" {
    source                          = "terraform-aws-modules/eks/aws"
    version                         = "17.0.2"

    cluster_name                    = "my-cluster"
    cluster_version                 = "1.18"
  }

  # Queries for Kubernetes authentication
  # this data query depends on the module my_cluster
  data "aws_eks_cluster" "my_cluster" { 
    name = module.my_cluster.cluster_id
  }
  
  # this data query depends on the module my_cluster
  data "aws_eks_cluster_auth" "my_cluster" { 
    name = module.my_cluster.cluster_id
  }

  # this provider depends on the data query above, which depends on the module my_cluster
  provider "kubernetes" {  
    host                   = data.aws_eks_cluster.my_cluster.endpoint
    cluster_ca_certificate = base64decode(data.aws_eks_cluster.my_cluster.certificate_authority.0.data)
    token                  = data.aws_eks_cluster_auth.my_cluster.token
    load_config_file       = false
  }

  # this provider depends on the data query above, which depends on the module my_cluster
  provider "helm" { 
    kubernetes {
      host                   = data.aws_eks_cluster.my_cluster.endpoint
      cluster_ca_certificate = base64decode(data.aws_eks_cluster.my_cluster.certificate_authority.0.data)
      token                  = data.aws_eks_cluster_auth.my_cluster.token
      load_config_file       = false
    }
  }


  # this resource depends on the k8s provider, which depends on the data query above, which depends on the module my_cluster
  resource "kubernetes_namespace" "namespaces" { 

    metadata {
      name = "my-namespace"
    }
  }


I literally implemented this not a month ago. I don't understand the complaint at all. Terraform is easily able to orchestrate a cluster then use it's data to configure the provider. The provider details does not need to be available until resources are created using the provider, which won't occur until the EKS cluster is available.


Using something similar, but it doesn't handle well cluster deletion.


You can do this with either:

1. depends_on = ... 2. implicit dependency, ie reference some cluster property in your deployment, which causes the same behavior as depends_on


The tool is ok, but developing plugins for it shows how inadequate Golang is for the job. There's so much repetition and boilerplate required. I wrote a FreeIPA plugin a few years back, it handled just registering a host and the executable weighed over 100 MB! WTF? Haven't looked at that side of things lately, I wonder if it's different nowadays.


We have a big amount of resources available inside of our Spacelift provider[0] and it weights ~20 MB.

It'll probably mostly depend on the libraries you use.

[0]:https://github.com/spacelift-io/terraform-provider-spacelift...


Definitely agree with this, Go is so verbose for the application. When I wrote a provider, I had the same problem. What made it even more worse is that I was connecting into an API that made use of dynamic json generation. So many interfaces and other hacks to get the json documents to parse correctly.


Is it a Go problem or a new-to-Go problem? I haven't written terraform plugins specifically but I have been writing Go for years and never find myself needing to write an excessive amount of boilerplate. There can definitely be some frustrations in dealing with dynamic JSON though. JSON-to-Go converters are your friend.


I was not using anything special, I had implemented my own client for IPA. Te equivalent functionality in Python (ended up using Ansible to do my thing) uses just a few kB ...



Why not use something like Ansible instead?

It too is declarative. It too can be easily extended. It's also something a lot of people already know.

I used to use Ansible or Puppet for these things before Terraform was all the rage. It was a lot more stable than trying to distributing those state files, which is a strange design to pick. There are plenty of existing modules but it's also dead simple to write your own.


I have limited experience with Ansible, but afaik calling it declarative when compared to Terraform is a stretch [1]

[1] https://blog.gruntwork.io/why-we-use-terraform-and-not-chef-...


It should be noted that the article is written to sell services for Terraform. It is unfortunately built on a few false premises that are never argued. Very few Chef developers would agree with Chef being somehow more imperative than Puppet, for example, seeing how the language was originally thought of as a superset of Puppet's.

The author does not specify which module is used for AWS, but it is not representative for how one would want to use Ansible for infrastructure. Writing idempotent playbooks is widely regarded as best practice in the Ansible community.

I have used Ansible for declaring node state in large production environments (not some dinky startup) and found it to be a very straightforward way to manage infrastructure.


Ansible is not really made for managing cloud resources and it shows - the modules are not production ready.


For GCP, both ansible modules and terraform modules are actually generated from https://github.com/GoogleCloudPlatform/magic-modules, so their "production readiness" are the same.

I understand that mitchellh himself personally created a bunch of cloud modules for terraform at the beginning, and those were likely of higher quality than whatever created by some internal developers assigned by Google/Microsoft, and might be slightly better than the AWS modules maintained by community.

Anyway, when it comes to ansible versus terraform, we shall move the discourse to states management instead. With ansible, you don't have to deal with states, but will need to clean up the cloud resources separately. With terraform, you can use the tool to clean up the cloud resources easily, but then you also have the headache of managing states. Plus, whenever you change something, there is always the nagging feeling that it will do a destroy/recreate instead of an in-place update.


I like Terraform for infrastructure, up to the point of creating the K8s cluster, then ArgoCD for keeping K8s in sync.


That's an interesting combo. What are you keeping in sync in K8s with Argo?


The operators we offer in our clusters (e.g. ECK, Prometheus, etc... the ArgoCD ApplicationSet generators make it easy to configure which features are installed on each cluster), as well as the applications developed by the development teams. Our work isn't complete yet (still working on sync for secrets and RBAC), but it's working nicely so far.


Yeah, these days I try to avoid writing any HCL and instead feed Terraform with JSON generated via jsonnet (which we were already using to generate k8s YAML). Much better templating and language features while still remaining declarative, and it helps on a team to have a single source language for such configs.


> Also the staticness of providers are a serious pain, for example you can't create a kubernetes cluster then add a resource to it.

TF def has some rough edges, but you can certainly create a cluster and add resources in a single root module (I don’t think it’s a great practice).

In this example the EKS cluster is in a module, but it can be a ref to a resource in the same module as well.

  data "aws_eks_cluster_auth" "current" {
    name = module.eks.cluster_id
  }

  provider "kubernetes" {
    load_config_file       = false
    host                   = module.eks.cluster_endpoint
    cluster_ca_certificate = base64decode(module.eks.cluster_certificate_authority_data)
    token                  = data.aws_eks_cluster_auth.current.token
  }


I never used Terraform, I started with Vagrant, then CloudFormation, CDK, and now Pulumi.

I like Pulumi the most right now.

It integrates with services like Cloudflare and Auth0 and I can use TypeScript to write my code.


I’ve had many similar frustrations about terraform, and the overall lack of visibility into what’s happening drives me mad at times.

A proper repl, with the ability to actually manage a config would be a huge step forward - I spend more time trying to figure out what vars get populated and how I can get a value into another resource than anything else. It’s like I’m constantly fighting with the HCL syntax to get what I want to happen.


If you want visibility(spoiler: it's just API calls), try using `TF_LOG=DEBUG terraform <foo>`. You might also want to set `-parallelism=1` or you'll be treated to statements printing in an order you are not expecting.


Yep, the documentation is sometimes lacking, and the concept of moving variables in and out of modules is not intuitive, to say the least.


> The biggest feature I would like to see is the ability to dump a pure representation of your evaluated configuration.

Are you asking for a dump of existing state or desired state? For existing state, see `terraform state pull`. For delta between desired+existing, see `terraform plan -out`. My apologies in advance if I completely misunderstood what you were asking for.


I am asking to dump the desired. So that I can diff the desired against the commit vs the desired of the last commit. I don't want to include production at all.


> for example you can't create a kubernetes cluster then add a resource to it

Of course you can! DM me if you want details.



I can confirm, we're using a similar approach at it mostly works.

There are still issues though, if you try to remove your cluster the k8s provider can't be configured (no module.my_cluster.cluster_id anymore) and the refresh phase of plan will fail. You can find workarounds but those I know are quite manual / ugly.


Amen! I found it excruciating that the language was always a few simple steps away from being homomorphic to JSON. I desperately needed to be able to manipulate it as data structures, not as strings. All of the ways I found to work around its limitations made me wish for something else entirely.


Have you used it since they introduced HCL2? It supports other data structures much better than it used to. Maps, lists, sets, etc. are much easier to work with.


Still a far cry away from a proper programming language, which is what we need. For example, if you want to loop over some config and generate a resource for each config, but the resources need different providers (e.g., different AWS accounts) then you just can't do it. Further, if you just want a little function, you have to build a fully fledged module. Then there are the crazy namespaces (`var`, `local`, `resource`, `module`, etc).


Yes you can... assuming your config is a map, include a key for "provider", and set it appropriately. EG in your example for multiple AWS accounts, define providers aliased as `aws.account1`, `aws.account2`, and so on. Reference those provider aliases in your map you are iterating through, and set the provider to that value.


I'm 99% sure that will fail with an error because "provider" can't be set dynamically. See this issue, for example: https://github.com/hashicorp/terraform/issues/24476


You can write Terraform using JSON if you want to: https://www.terraform.io/docs/language/syntax/json.html

Or do you mean something deeper?


I mean being able to reliably convert HCL↔JSON.


You can use JSON for your configuration: https://www.terraform.io/docs/language/syntax/json.html !


It's a language for a reason, there's a grammar, parser, lexer, ast. What's the problem?


The module passing got a lot better in 0.12 when you could pass full modules or resources as outputs and vars.


have you tried pulumi ? what's your opinion ?


I have not.


Same (hate it, love it, use it every day). I can't believe you left out the the stilted looping syntax.


I didn't figure it was worth starting...

- Lack of functions (the only real functions are modules which are basically unusable for quick computations such as "slugify this string"). - Very primitive loops. - Lack of temporary variables. I often end up looping over a list multiple times and storing intermediates. - No panic or log functions.


Add an secrets provider to avoid having secrets in state files to that list.


How does terraform compare to ansible?


Ansible focuses on provisioning machines whereas Terraform focuses on creating Cloud infrastructure. A common combo is using Terraform to provision VMs and networking settings then using Ansible to configure those VMs.

I find few if any reasons to use Ansible over a shell script. IMHO Ansible is just a weird YAML syntax to generate a "shell" script with some utilities to ship that script to nodes over the network. I find it super awkward not to mention slow and inconsistent.

For deployments I much prefer using Nix and for imperative actions I just use actual shell/python.


You can totally provision using ansible too, on most cloud vendors.

The reason to use ansible over a shell script is that the ansible playbook will be idempotent. That is to say you can run/rerun the playbook from any point without having to wipe any previous work, or worry about double applying your config changes.


> is that the ansible playbook will be idempotent

This isn't really true. I think you are correct that most of the built-in operations are idempotent but you can also do this with a small library of functions in a shell/python script or whatever you prefer. Most things you want to do on provision are idempotent anyways (install this package, download this file) or are trivial to make so (create this directory).

I would take a real programming language any day for the minor cost of having to handle idempotency myself. It would take a couple of hours to reimplement idempotent primitives to replace the Ansible standard library in just about any language.

In my mind the main value of Ansible is playbooks that others have made for you, but many people avoid these anyways to have full control.


I thinj that it's difficult to keep an idempotent shell script or programming language implementation as clean as Ansible over a long period. I deal with a similar thing at work and the Ansible stuff is still mostly good over the long haul with the weird bits like calling other scripts being obvious. The Bash script provisioner we have is just a mess. It's not that an individual can't write a better Bash or Python script but a team of mixed experience, opinions and skillsets coming and going over 7 years definitely cannot. Our Ansible scripts are about half as old, but I don't think the shell script saw significant decline after hitting an inflection point or anything, they just gradually crept away from pure ideals.

I personally find Ansible's value lies in what it makes difficult.


They're not competition. I use Terraform for infra provisioning, and Ansible for post-provisioning application setup. I also use Packer + Ansible playbooks to build my AMIs.


You can create infra with Ansible. The downside to Ansible is the Cloud Provider modules are "community" not core and some of them are buggy.


Yup, that's the best use-case. The more that cloudy / container stuff takes over the less I use Ansible tbf.


A lot of post provisioning tasks I used to do with Ansible are now handled with cloud-init.


Exactly. We use it in the pipeline for building the AMI unfortunately at my current place but it's not optimal.


I like Packer + Ansible for building machine images. I haven't really tried any alternative workflows but that has been great for my needs so far!


What kinds of tasks can Ansible do that Packer isn't also capable of?


Both tools can be used to create cloud resources and configure machines but fundamentally they are very different.

Ansible is a list of actions that you apply linearly. Each action might be a noop if it already exists.

Terraform is a tree of resources that are applied by order of dependency. Terraform also records the previous run and deletes resources that are no longer in the code.

Generally, Ansible is great at performing actions on a lot of hosts. A sort of multi-ssh. And Terraform is best adapted to manage cloud resources.


It’s a symbolic step but hopefully goes a long way to convince various decision makers about long term viability of hashicorp products.

I remember when a colleague didn’t want to use TF from an unknown company in his saas. His company is now gone but terraform is alive and widely used.


I think it's a very meaningful step, as it signals maturity - the platform changed significantly over the last couple of years, and it's (unfortunately but necessary) a pain to perform some upgrades, or at least, to redesign according to the new features.

For example, we can't make full use of the modules flexibility which I think was added to 0.15 (module.kount anybody? :)), because it's a very painful process. Had we started using TF now, we wouldn't have had this problem. But of course, nobody's at fault here.


I think the biggest problem is that the state file is a JSON blob who's hierarchy directly maps the structure of your code in Terraform. This makes refactoring a nightmare as you're continually having to fudge the state file and/or declare that a resource defined in code relates to a resource defined in state (I forget the exact CLI flag you pass to do this off hand).


Yes, I think is reflected into our situation. For our TF codebase, specifically, it would be greate to deduplicate modules (which is something that couldn't be done some time ago), but there is no simple way of, say, creating a new module, and slowly migrate resources into it.

In particular (AFAIK), there are no tools for moving stuff around, so in addition to the TF restructuring, one also need to write scripts to manually move the resources.


You're looking for 'terraform state mv'. After my first handful of these it's now as natural as refactoring and moving modules in any code-base, almost.

Or if it's a big mess you can 'rm' and 'import'.


I'm aware one can rename the resources via mv. But when multiple self-standing modules with hundreds/thousands of resources in each have to be merged into an array of modules, it's a big work.

I'm not even sure that the new resource address can be figured out, and the list of resources can be search/replaced in order to produce a single renaming (mv'ing) script.

Even if this was possible, it would likely require:

- either each module to be moved monolithically, which is risky (e.g. data sources may break, since there's no referential integrity) and requires a fully designed and implemented destination module (carrying two different representations of the resources contained in each module).

- or, and I don't know if this works in real world, creating an structured but empty destination module, and slowly moving resources from the leaves down to the root. this is a lot of work, and probably requires a very large amount of references to be carried cross-modules.

Big refactorings are a difficult in any language/framework, but in TF are particularly so, because referencing between resources is rigid, so it's hard to move small parts and their references. Doing this is Chef is much simpler, since resource name and address matches and it's under control of the developer (but Chef has a different approach, of course).


terraform state mv is working for this but yeah, a migration path in some defined way would certainly be preferred.


module.count was added in TF 0.13 [1] but yet still what you say makes totally sense, and I really welcome a 1.0 release hoping that there won't be any more disruptive changes and revolutions in the DSL, because yeah, you definitely need to invest a lot of time in refactors/rewrites to keep it up with newer Terraform versions and language features.

[1] https://github.com/hashicorp/terraform/blob/v0.13/CHANGELOG....


I see everyone raving about Terraform but I always found it awkward how the DSL works. It might be an improvement over stuff like CloudFormation but feels strange to move the complexity into the language.

Things like the CDK which operate on top of CF feel much more natural and more flexible to me.


Serious question. What value does Terraform provide?

Two years ago I looked into it and rather then having an abstraction from cloud providers it seemed to require to still target (and code against) each one specifically.

So, I was quite disappointed as I thought the value proposition was to not have to know x cloud provider specific terminologies.

Any insights much appreciated.

Edit: I was a little worried asking such a naive question but the comments are super useful! Thanks everyone for sharing your insights.


> Two years ago I looked into it and rather then having an abstraction from cloud providers

This is a misrepresentation I've seen multiple times and I don't know how it's come to be.

Terraform doesn't abstract resources. It simply supports all cloud providers and lets you intermix resources from different clouds inside a single project. Resources can depend on each other and use each other's attributes.

As an example, you can bring up a load balancer in AWS and create a DNS record for it on Cloudflare in a same Terraform project and maintain them together.


I believe the issue is the Terraform has been labelled: “Cloud agnostic”. That was why I believed that Terraform would abstract away the individual cloud providers.

It depends on your interpretation of the word “agnostic”. Personally I would say a more correct description would be: Support for multiple cloud providers.


While resources are fundamentally the same across clouds (i.e. they're all VMs, they all have firewalls etc), they are vastly different concepts and have different feature sets. It's almost impossible to do a like for like api call between two providers.

However, you can develop cloud agnostic modules that you can then consume, which allows for a decent cloud-agnostic experience.


this, different resourcers are named in different providers and I would find that HCL code would vary from one provider to another. This is the reason I have been a huge proponent of using container based applications that happen to get launched in a specific cloud, rather than using a base OS/ function app service


It makes things repeatable, organized, and most importantly checked into source control.

It is actually useable unlike cloudformation which is for a nightmare of unreadable, barely editable yaml files and fifty commands to then upload and apply the files. Lets not even get into debugging or unsticking cloudformation when it breaks, something that usually requires writing a support ticket.

Additionally you can build your own modules. I can have a module that is `ServiceFoo`. Pass in a param that causes it to switch between different backends. Yea I have to write the AWS and GCP part seperately, but then anything that needs `ServiceFoo` can just call the module and have the things split across both sides.

You can then also do things like have your DNS in AWS, but have nodes in both GCP and AWS. Use the settings pulled from GCP to input into Route53, etc.


Abstracting over Cloud vendors is not a use case for Terraform itself. The value it brings is that you get to specify your infrastructure 'as code', which means you'll be able to re-create it from code, and reliably deploy changes.

There's a lot more benefits, it depends on what you are comparing it against. Coming from a software development background, I'd like to compare it to a wordpress app vs webapp development from something like rails. The wordpress app works fine, is faster to write, but once it gets complex things fall apart. The rails app is maybe a bit more difficult to develop at first, some features might take longer, but it's more flexbile, powerful and when engineered well, it will not hit a ceiling where it just all falls apart.


So... is Terraform the Wordpress app or the rails app? :D


Rails.


I'll chime in. When I first used Terraform, it was described as this tool that would create resources in a cloud agnostic way. That's still possible, but not the main focal point.

Terraform just takes any API (called terraform providers) and applies the GitOps philosophy to it. That's it. Now you can easily recreate resources with a single command, modify whatever parameters you need, store those changes in Git, etc.

Yes, one long curl command would give you the same results but then you miss out on the concept of dependencies or API versioning or simple programming constructs like reading from a file or looping over values.


You're right, Terraform is not "write once, deploy infrastructure anywhere" tool. It does however allow you to reuse many portions of your infrastructure descriptions between cloud providers by using module composition[1].

Terraform can also be used to put your infrastructure under version control, which is a pretty big deal.

[1] https://www.terraform.io/docs/language/modules/develop/compo...


> Serious question. What value does Terraform provide?

Having your cloud environment documented and reproducible in a code repository instead of stuff manually clicked together in the AWS Console.


Writing infrastructure as code is quite often an exercise in:

a) define what I want b) write an API call to find out whether what I want already exists c) write an API call to create it if it doesn't exist d) Sometimes people do stuff manually and your code should tolerate working around these manual changes (i.e. update in place when possible, tear down and recreate when not possible) e) To be efficient, your code should run things in parallel when possible

Terraform allows you to write (a) and outsource the rest to a provider (b,c) usually maintained by the API provider themselves or Terraform itself (d,e)


Creating and destroying infra resources with the click of a button. Better visibility into what's provisioned and their configuration. If you need to main many individual pieces of infrastructure then it's nice to have a central codified manner to achieve that


> having an abstraction from cloud providers

I kinda expected to see many examples of this in this thread (whether they use Terraform or not). So, to ask explicitly: are there usable abstractions of the type "use-case-achievable-with-3-top-public-clouds"? Even something extremely simple like a bunch of linuxes behind a regional load balancer. I don't mean the lowest common denominator of all the clouds, but just a few popular ones, with an obvious intention to reduce vendor lock-in.

These would be probably very basic scenarios, but still the whole multi-cloud hype could have produced something decent-ish by now?


It's better than CloudFormation (or a bunch of home grown bash scripts) and you can also modify providers beyond just the cloud host (ie: datadog alerts, database users and permissions, etc, etc.).


Just to expand on your second point, because I think it's often missed, there are a ton of 'providers' available that extend management of their products (to varying, sometimes hilariously little extent). If you're running products from any of the vendors on this list, you might be able to use terraform to manage them as well:

https://registry.terraform.io/browse/providers


There are 3 ways to allocate cloud resources: 1. Use GUI 2. Use Boto library with Python 3. Use devops tools like Terraform

As you go from 1 to 3, the programmability increases. 3 is additionally idempotent. As others mention, 3 has competing products and each has its warts.


> Any insights much appreciated.

Don't do multi-cloud, its not worth it. And tools claiming to do multi-cloud lie. For AWS stick to CloudFormation.


tf specifically or what does codifying infrastructure provide?



Congrats to Hashicorp for this milestone!

We've been using terraform for a couple of years now to manage our infra for dev/qa/prod, and aside from minor HCL changes we haven't had any major problems keeping up with the latest versions. Having the ability to rebuild everything (Kubernetes, DNS, MySQL, etc.) automatically has saved us more than once!


I've been using Terraform for years, and I'm really glad to see it reach 1.0. Congrats to the Terraform team at Hashicorp, and thanks to you and your colleagues for consistently making the tooling that makes operations at scale possible and keeping it open source.


After i was introduced to Pulumi, i felt at home ... there is no going back to Terraform.


So, v1.0, but still no dynamic providers, resulting in piles of copypasta especially when creating Kubernetes clusters and wanting to do something initially with them using the Kubernetes provider. So sad! Secrets are still stored in the state without encryption when retrieving them from the CLI. Last, but not least - even when using their commercial products, there's no way to do phased workspace, i.e. you do something in one workspace, then in another, then you continue in the first one, etc. Last, but not least - you can't override sensitive variables when importing using Terraform Cloud or Enterprise!


You people hating on Terraform are spoiled. My company insists on using CloudFormation, which I hate with a passion.


I've used both - with having to use CF to create a particularly gnarly and sprawling environment. I constantly ran into limitations hidden behind cryptic or unrelated error messages. It was infuriating.

Terraform syntax is definitely not sexy, but it's a robust piece of software, and in fact, can be used to learn better Go techniques.

A total aside, but people who claim Golang is easy are full of it. It's an extremely hard language to write well at scale, and Terraform is a good example to study.


It took me couple of hours to write a TF module years ago, it was that easy.


Been using CF for a few years and haven't had much issues. Sure beats having to manually set up infra. The only thing that bothers me us running into the 200 resource stack limit like every 6 months.


your company sounds like they know what they are doing. Cloudformation will take your infrastructure from point A to point B or roll it back in case of failure. Terraform, not so much.


So much this. If you hate cloudformation, have a look at CDK, which allows you to programmatically define a stack in a language of your choice, instead of trying to use unreadable huge yaml to write code.

I really wish terraform will one day reach the same features and maturity as cloudformation.


to be fair to Terraform, this is hard. It's hard when you are dealing with multiple cloud providers since you have to keep state somewhere. Network failures or underlying cloud failures are gonna impair TF in the head every time.

If there is one thing TF needs to learn to do is handle failure. Right now it has that rosy yolo approach leaving you to pick up the pieces when it fails.


> roll it back in case of failure

Until it doesn't. How many times I've seen stack being in error because could not rollback.

CF can't do trivial things like creating a resources in account A that is needed for account B.


lol. Terraform cannot do basic things like rollback the deployment in case of failure. Also, I have yet to see CF losing track of its resources.

Here is a challenge for you: Deploy a moderate to complex infra with Terraform and after that try to clean up all the resources it created. 50$ says Terraform cannot do it and you need some sort of manual/script intervention. The future is bright.


CloudFormation is too limited. I imagine most companies use much more than AWS. Off the top of my head, we use Cloudflare, PagerDuty, GitLab, etc all of which have Terraform providers.

What happens when you have to use something outside of AWS? How do you codify those changes?


Merely as the technical answer to your question, not as advocacy: CFN has custom providers [0] and they've started publishing quite a few implementations on GH (but I haven't tried them to know if they're for real): e.g. https://github.com/aws-cloudformation/aws-cloudformation-res...

As far as I know, it is possible to bridge terraform providers into a CFN stack using that mechanism, similar to how Pulumi works

0: https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGui...


> My company insists on using CloudFormation, which I hate with a passion

I turned down offers when I saw they use Terraform. I was lucky to find purely CloudFormation based infra development.


I completely agree, I have PTSD from running an ansible playbook to deploy CF from the command line.


Does anyone use terraform for onpremise clusters? If so, what is your setup, what hypervisor do you use it with? Are you happy with it? Or maybe you would rather replace it with a set of ansible roles?


I set up Terraform with libvirt for my local VM host; I think it's much better suited for managing infrastructure components than Ansible is.

Outside personal stuff, I've done a few environments where the (mostly unchanging) infrastructure is set up with Terraform and then configuration and operations (like upgrades) are orchestrated with Ansible, and it works well.

Now, I bet someone might be tempted to claim you should never even need to upgrade VM instances and immutable infrastructure solves everything, but sometimes it's just ridiculously simpler to do in-place upgrades; orchestrating image building, testing and deployment is not easier than running an Ansible playbook to do in-place upgrades unless you already have infrastructure that does it for you.

If the software you're installing is properly written and provided via OS package managenment, often you just don't gain enough benefit from immutable systems considering the overhead.


We have an on-site OpenStack cluster and use Terraform in an ad hoc way for managing (some) infrastructure. It's by far the easiest way to do so, opposed to OpenStack's API and SDK (the latter of which is so poorly documented it beggar's belief!). Ansible is usually used in tandem with Terraform, to decouple the infrastructure and configuration management.


We use it for provisioning vsphere VMs with all of the provisioning done through cloud-init. Can't say I like cloud-init but our onprem stuff is pretty simple and almost all of the VMs can be rebuilt if a change is needed so it works well enough but I'd probably use ansible if I were starting new.


Love the tool, hate the fact that they still work with a CLA [0]

If anyone at HashiCorp is reading this, could you guys consider changing to a DCO? [1]

[0] https://github.com/hashicorp/terraform/blob/main/.github/CON...

[1] https://drewdevault.com/2021/04/12/DCO.html


I'd love to see more pass-by-reference in Terraform because it could simplify the API substantially. Right now you gotta figure out if you need to pass an id, name, or arn. forces a lot of reading docs and tight coupling.

if one could instantiate a resource by passing other connected resources into it by reference, then provider APIs could pull info they need, and could be refactored without affecting devs


Just be careful with your state file when upgrading!


I've gotten into the habit of manually creating a backup of the state file:

# to view the state file

$ aws s3 cp --quiet s3://terraform/production/terraform.tfstate /dev/stdout

# to backup the state file

$ aws s3 cp --quiet s3://terraform/production/terraform.tfstate > terraform.tfstate.bak


S3 automatic object versioning is useful for that.


I do like how most people here sort of forgot that a decent amount of people in ops(especially in bigger enterprise company) are not programmer and at most have shell scripting experience and thats about it, telling them to "use a real programming language" is going to create some "fun" issues (atleast HCL is consistent)


I enjoy Terraform, I just wish there was a more graceful way of setting up a new module to use backend state from the get-go. Having to create the resources with local state first, then re-run terraform init after adding the backend configuration block, just gets really annoying. Small complaint in the grand scheme of things, though.


I believe in the Terraform way of doing things. A lot of other people dislike the Terraform language, and then they use Pulumi because it allows them to write infrastructure in Python or JavaScript.

But I specifically DO NOT WANT my company's infrastructure to be written in a Turing-complete dynamically-typed language. I believe the Terraform language is safer.

However, one thing I don't like about Terraform is that it provides a lot of low-level APIs for cloud providers that can take a lot of glue code to string together into a real PaaS.

I solved this at my company by writing https://provose.com -- a Terraform module that a high-level API for configuring containers, buckets, databases, and distributed filesystems. Provose understands what you want to deploy and automatically figures out the needed VPC settings, security groups, IAM roles/policies/etc, Route 53 records, ACM certificates, and load balancer settings.

Apologies for the self promotion, but if you decide to try Provose, I'd be happy to help you through any issues you face :)


Wow, really impressed by all the work, website, and write up being just one contributor!


One afternoon with Terraform and you'll want to give up cloud and get your own datacenter again.


I have a highly fuzzy result in my sarcasm detection here.

It's because I can understand this to mean either that:

1. Terraform is so difficult managing cloud services that you'll give it all up and run for the hills of bare metal once again.

- or -

2. it's so good you'll want to swear off the cloud providers and switch to running your own infra using Terraform and not the cloud services own tools.

Given other comments I can easily see this going both ways.


Yeah sorry sarcasm doesn't do well via text. I did mean 1


Funnily enough, lots of organizations use Terraform for their own data centers!


Please give me a vim plugin that allows me to easily move through terraform files as opposed to being forced to grep for relationships.

As it stands I hate dealing with terraform because it is horrifyingly undiscoverable.


This has less to do with vim than with Terraforms poor LSP support.

Vim/Neovim has had LSP support for some time with plugins and now built in natively.

I use COC vim as it's the fastest to get started and I rarely have to do much config as it's based on VSCode and comes with sane bindings for go-to-def, refactoring and more. https://github.com/neoclide/coc.nvim https://github.com/prabirshrestha/vim-lsp https://github.com/neovim/nvim-lspconfig

You can try to use either of these implementations: https://github.com/juliosueiras/terraform-lsp https://github.com/hashicorp/terraform-ls

These are both fairly limited but you can see every editor (except intellij) uses this under the hood.

I've used them at companies with 300+ terraform repos and have never had much of an issue navigating/understanding TF through Vim.


By no means am I placing the blame at vim's feet. I know it is a terraform problem.

Do those projects allow me to follow a user defined variable to its definition?


I have a love & hate relationship with terraform.

It's so frustrating when you can't create a behavior.

But the joy of orchestrating infrastructure at scale with code is overwhelming <3


Our customers increasingly use Terraform modules to deploy our product. Thanks Hashicorp.


IMHO something like Crossplane is the future for infra management.


Maybe they'll finally adopt meaningful semver now


Interestingly, I heard that Terraform's registry is hosted via Fastly. I was wondering if there was any correlation with the timing of this release and Fastly's outage earlier.


Can someone explain in a few words what this is and who may be interested in this?

The name does not give any hints, also the discription tells me nothing:

"Terraform enables you to safely and predictably create, change, and improve infrastructure. It is an open source tool that codifies APIs into declarative configuration files that can be shared amongst team members, treated as code, edited, reviewed, and versioned."

Is this a tool for property developers?


This is a tool to manage your infrastructure (AWS/GCP/Azure/etc) as a code.

You write code, apply it, cloud providers spin up resources you declared, you commit your code (infrastructure) to git.

Now your infrastructure is version controlled, can be "easily" build from ground up in minutes, instead of someone doing 3124 things manually in the UI.


You could have read more of the site or even looked it up on Wikipedia in the time it took for you to write your comment.


Seems pretty clear to me based on the description. What specifically are you unsure about?


They do not mention "cloud" or "data center" once in their description. How is someone, who is not dealing with this kind of infrastructure supposed to know what they are talking about?




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: