As a guy who's done a lot of programming and a lot of technical writing, it's clear that this is the result of a TON of work. It is a model of clarity, well-formatted, and explained with just the right level of detail. It is completely pro quality and OP should be super proud of this body of work.
This isn't just warmed-over Amazon docs. It's just what you need when you can't figure out what the docs are saying and you want to get something done now.
I do get the reasoning, but it's still funny that we're using an infrastructure management tool to manage the thing that was supposed to relieve you from the burden
of infrastructure management.
It seems there is a misconception between infrastructure creation and management. Serverless doesn't abscond you from defining what infrastructure you want to create, it just reduces the management overhead once you have created it.
For example, you can both create a lambda and EC2 infrastructure using Terraform. However, once you have created a lambda function, little maintenance is required going forward, unlike EC2. EC2 instances require continuous patching of the operating system, managing disk space (/tmp filling up over time), and you have to plan for an EC2 instance to be taken out of service if it starts failing health checks.
Was that one of the goals though? How I see it, lambda does a pretty good job of abstracting the compute layer and orchestration. Most things beyond that will introduce complexity since you’ll be interfacing with other things with different goals of their own.
Comparing SAM and Terraform is like comparing apples and oranges. Comparing Terraform to CloudFormation is more appropriate, and Terraform wins in terms of verbosity there every time.
SAM generates CloudFormation templates/stacks, which create your AWS resources.
Serverless generates CloudFormation templates/stacks, which create your AWS resources.
Related: I wouldn't recommend deploying raw lambda's without a framework. The development workflow really sucks without one. Use SAM or Serverless.
Is SAM itself the end game or does it create some other code (like Cloudformation)?
The reason I'd be reluctant to use it without thoroughly being it is that I don't like using any 3rd party monster suite of modules. I've been bitten by that a few times. You're at the mercy of the 3rd party to continue updating and supporting the modules. On top of that, it may not support a particular user case you want to implement. And it's almost worse if it does, because that likely means that it's no longer a simple wrapper around raw terraform resources, but a huge complex beast.
In this case, I don't think the author necessarily expects the user to directly user his code. It is a set of examples. The modules themselves are very simple (really just one or two raw resources with vars and outputs)
Problem with SAM is having to deal with one more tool.
But also, maybe even important, it makes things less transparent. I want to be able to refer to the original CloudFormation docs (even when using Terraform) without "translating" every time.
I agree that using SAM can be convenient for standalone lambda functions. However, in the context of a larger AWS configuration, where your integration points are already in defined in terraform, the cost of using SAM for these is breaking the invariant that "all our AWS resources are defined in _TOOL_ (in this case, _TOOL_ is terraform).
The verbosity is frustrating, but largely ignorable. Having to go back to string references of AWS references defined in different sources of truth is a maintenance nightmare, IMO
I’m building data pipelines in AWS (s3/sqs/dynamo/api gw/lambda/batch) + Snowflake.
Earlier this year I tried to use Terraform for everything, using principle “everything is a resource” (everything in my case is AWS, Datadog and Snowflake), so adopted “terraform apply” as universal deployment interface. Like if we need a ECR task and a Docker image, build the image from within Terraform (using null_resource which runs “docker build”). This approach works for everything but Lambda as Terraform requires a pointer to source code bundle at the plan stage. After unsuccessful fights I gave up for Lambda, so I build bundles prior to “terraform apply” (using “make build”, where build target does its magic of zipping either Go binary or Babashka Clojure sources).
That approach scales well for already two dozens of Lambdas and counting. Ping me if you want more details.
——
I disagree with this tutorial about tendency to use Terraform modules per AWS service, hiding well-documented AWS resources behind the facade of module with custom parameters with long names.
My rule of thumb is: only use a module if you want to enforce a very opinionated way to create resources across multiple projects.
Most of the times you don't want that.
Regarding lambda, I also run a "make build" before "terraform apply". After all terraform is not a build tool.
But I do make the zip with data.archive_file so that I can pass the output_base64sha256 straight to the lambda resource.
It really depends on your cloud; I recently worked with an openstack setup where I needed to create some 6 resources per compute instance, with ~200 instances that’s a whole lot of resource blocks to manage — hiding al that in a module abd passing in a dict of machines to create was imo the right choice..
I use a domain driven design (ddd) approach with terraform where different business domains and subdomains are isolated by modules. For example, if an application handles photo uploads and image processing, that would be in its own module.
I tried it, doesn’t work - Lambda resource needs hash sum to compare against previous deployment to trigger updates of the source bundle, and Terraform needs a file to be present during plan stage. With null_resource the file is created after plan, during apply. To workaround this I tried to provide something else to bundle hash sum (like hash sum of source files, not bundle), but the value that TF keeps in the state is the one returned from AWS Lambda API, not the one you supply it; so it causes resource update on every apply, this is not what I wanted.
Your comment made me think of trying to skip bothering with Lambda’ hash sums and use custom refresh triggers instead, initiated by null_resource. Will do after holidays.
It can work. I wrote a script in Python to build stable zip files. The main thing is to ensure any timestamps are reset to zero and the files are added in a fixed order. It also filters out any unwanted files and directories to minimise the size of the bundle. I have a wrapper around it that generates the hash and uses pip3 (as I use Python for my lambdas) to download the dependencies and build the directory hierarchy for the lambda/layer, runs the stable zip script, and returns the path of the bundle and the hash.
This didn't take long to write, and reduced the amount of churn we had with our deploys. We had massive problems with one particular set of lambdas due to the sheer amount of code (mostly unavoidable dependencies, but shared, so they could go in a layer), and our deployment times plummeted to practically nothing after I knocked this together.
I'm not sure I can share the code as it's something I wrote for work, but it ought to be simple to recreate from the description above.
Yes. If we use a null_resource that has the hashes of the source code files as a trigger, then in the `local-exec` provisioner of the null_resource, we can run the build. The build can also be run remotely (we use google cloud build) to be independent of the developer's machine architecture and operating system, which is important for native dependencies. Terraform will not re-run the null resource provisioner so long as the source code does not change, there is no need for a reproducible build.
For various reasons (mainly auditing purposes, but it also reduces any incidental infrastructure churn, and makes it easier to guarantee a rollback happened as expected), we need to ensure reproducibility, so it's a bit more important for us that we guarantee the artifacts produced are exactly what we expect.
Instead of using source_code_hash you can push your code to s3 with the hash as the filename and update the lambda to point at the new file
Terraform can manage uploading objects to s3
It also seems a bit strange to have Terraform do the packaging. We do that in CI for most of our lambdas to ensure the test suite runs, linting, etc then it creates a zip at the end and pushes to S3
The only ones Terraform deploys directly are fairly trivial Python API "glue" lambdas
Alternatively you could use the "archive_file" [0] resource provided by terraform. I use this resource to zip up my lambda source files and then use the hash of the zip file to determine if my application should be redeployed.
This works fine only as long as you do not need a step to build or download dependencies, like `npm install` or `pip install`, as part of your run of terraform apply. Otherwise, a more complex solution is necessary, like talideon's above.
One of the striking things about serverless development that is less obvious from the outset is how it blurs the lines between application and infrastructure.
Deployment of a service is rarely in practice just deployment of new code to an already provisioned lambda - because that lambda can do nothing in isolation. Instead, it tends to be the lambda alongside an SQS queue and a trigger, and an S3 bucket; or an API Gateway that links an authorizer to the Lambda's code. Because of that, evolution and development of the application tends to require evolution and development of those surrounding infrastructure pieces in tandem.
As a result, managing the infrastructure of your serverless service is often most naturally done alongside the application code itself - indeed, the distinction becomes somewhat meaningless. That also means the engineers developing the service require the ability to own and operate the infrastructure as well. That may or may not be well served by Terraform. It's a tool I absolutely love for mutable, stateful infrastructure, but something like the Serverless Framework or AWS SAM can be a much lower-friction and more natural fit for serverless work.
Yes, that’s the way I’m seeing it as well. SLS is a much better fit when developing services with lots of integrations with other services IMO, with much less code than Terraform.
Alex, OpenFaaS founder here. The author has done a huge amount of work here, I am surprised that it's being given away for free, and not being monetized (it should be).
I often hear folks complain that Kubernetes is complex, and hard to understand. We've done a lot of work to make the experience of deploying functions simple on K8s, with very little to manage. But it still costs you a K8s cluster - most of our users don't mind that, because they have one anyway.
But, for everyone else we created a new of openfaas called "faasd" which can run very well somewhere like DO or Hetzner for 3 EUR / mo. It doesn't include clustering, but is likely to be suitable for a lot of folks, who don't want to get deep into IAM territory.
https://github.com/openfaas/faasd
And there's a guide on setting up faasd with TLS on DigitalOcean, it took about 30 seconds to launch, and makes building functions much simpler than Lambda. "faas-cli up"
Thanks Alex. Those are very kind words.
I would not have made a career in IT, if not for opensource. So contributing my droplet contribution to that ocean.
Lambda provides a particular challenge for Terraform. You don't normally see Terraform used as a deployment tool for containerized services, even though it could theoretically do that. But because it's the only thing close to the lambdas unless you want to introduce another third party tool, deployment ends up falling to it as well, unless you decide to choose another tool for the lambdas, like serverless or CloudFormation, and then you've got a bad build tool or a bad deployment tool for anything but the most trivial lambda builds.
And I will continue to be sad that all of the higher order first party tooling is ultimately going to be based on CloudFormation (looking at you, https://aws.amazon.com/proton/).
Ultimately, after having used Terraform to manage function code bundling/deployment and skipping Terraform completely for the lambdas, I think Terraform does best when it manages the infrastructure lifecycle for lambdas and nothing else. You can then rely on more competent tooling for deployment.
Have you taken a look at CDK and CDK-TF? It lets you programmatically generate terraform templates (or cloudformation with plain CDK) from other languages.
I have been using CDK at work and it is fantastic. It feels like React for Infrastructure. I used to write raw Cloudformation at my last internship, and being able to write Typescript instead feels like moving from assembly language to C.
I've been using CDK too at work (work at Amazon) and I feel like its the future of infrastructure provisioning. I can tell you that its a first class product with tons of internal use so it will only get better.
Personally I wouldn't deploy the lambda code with the terraform. They inherently have different life cycles. In an ideal scenario you deploy some dummy code with terraform (just a hello world). And as a separate pipeline you deploy the actual code. Ideally, if your ci/cd supports it, you have two separate pipelines, each one only does it's thing if the relevant files have been edited, with the code depending on the terraform.
This is how I do it. I have a file called “dummy.zip” with and empty file in it that lives in the terraform repo and that zip goes in to S3 on initial apply then CI pushes the built zip to S3 and calls the lambda update command via the CLI.
I’ve not yet had to juggle changes to the live lambda but I put in the bits to make it stamp out an alias so I can start using specific aliases to ensure new versions don’t automatically get picked up by downstream invocations. All of that has yet to be put in to practice for real.
The problem is that using CD such as Codepipeline/Codedeploy it wants to use CloudFormation and SAM. I don't want to use either they are both terrible. So in the end I end up making a pipeline to build Lambda and deploy in CodeBuild. It would be nice if Amazon scrapped SAM and did something better.
Not OP, but I assume they mean your lambda-specific components and configurations such as IAM roles, accounts, the function itself and it's parameters are considerably long-lived in comparison to the actual code executed on top. You wouldn't be modifying your lambda-specific setup everyday. That, however, doesn't apply to the actual code, which might be modified several times a week or a day to incorporate MRs coming from different people.
Yea, it’s that smart, but it’s a risk mitigation cause people make mistakes. You don’t really want the thing that casually you run many times a day to have the ability to accidentally delete your database. Security as well, our code deploy pipeline has less privileges than our terraform pipeline
The risk of accidentially deleting the database is real. However, it can be mitigated without introducing another tool by using a second terraform root module (with a corresponding second statefile). So you would have one terraform root module for foundational or stateful things like databases which rarely change and should never be accidentially deleted, and a second terraform root module that holds only the lambda. The former root module is applied only manually, the latter can run automated in a pipeline.
The problem is that Terraform is stateful. Terraform will revert the Lambda code back to the state defined in Terraform when Terraform is applied after the code was updated outside of Terraform.
There is a way to mitigate this a by making Terraform ignore changes to the actual code of the Lambda.
This is fantastic, I had tried to make terraform and lambda work together before and ended up abandoning that path and leaning on the serverless framework for that part of the project, but I was never happy about it being split out.
I look forward to trying this out the next time I want to prototype anything with some lightweight lambdas behind it.
You can also split the difference between them. I think serverless does a good job of building artifacts and a terrible job of deploying them, and I think the opposite is true of terraform, on balance. There's no reason not to use both.
For my team, I decided we would go all in on Terraform for AWS resources. Lambda has turned out to be a particularly tricky one to fit into that mold. It took us some time to sort out where the “build” step lived in our deployment pipeline so that terraform config pointed at the right build artifact.
I've been part of 2 teams that used aws lambda for their APIs. One team decided that they wanted to use chalice and build custom bash scripts to deploy parts of the system out. (chalice would perform the deployment of apigateway, and get the required IAM policies built out. For all other resources they would use terraform, but they broke it down on a per-resource bases, rather than accepting changes to all the infra in 1 go, there would be a script that runs each resource type independently; buckets, tables, service-accounts, elasticsearch....
The current team does their deployments in 2 stages, first the use chalice to generate the correct terraform files for the 2 lambda's that are deployed one the indexer, the other the service api, then the generate all the other terraform json files using python. After this is complete the infra is deployed in 1 step.
The second team has way less errors on deployment, and managing the infra is way easier, esp if you need to nuke it all and rebuild an environment.
I would break out the terraform backend on a per-environment basis rather than clumping them into folder/bucket, that can be dangerous if the bucket gets emptied.
We’ve been using SAM with TF and I’m really not a fan of it and would prefer to move everything to TF early next year with the core infrastructure managed in the terragrunt way of folders of resource types (vpcs, dbs, etc) and having our code repos have a simple terraform file that instantiates a module and references the core remote states as needed because so far (knock on wood) we rarely have upstream changes to the lambdas configuration / parent environments themselves. Then we can dump SAM and everything can run in one spot in terraform cloud.
We don’t work in Python so I’ve not used Chalice but I’ll see if it can inspire our Go tooling.
Do you reckon the pain points you hit were due to SAM specifically? Or is it just more so how SAM does it doesn't integrate into TF? Thanks. Just curious as I've used SAM to reduce cloud formation boiler plate and we've considered terraform or pulumi, so it would be nice to hear of dragons witnessed first hand :)
We use pyinvoke to orchestrate terraform invocations and feed the outputs to chalice configuration files. We then intercept the chalice deployed resources and feed them back into dependent terraform modules as inputs. It's been incredibly smooth for us and we're able to get deterministic environments that are easy to debug, modify and deploy. It takes a little imperative glue to manage dependencies, but the bulk of the configuration is declaratively defined and I've been incredibly happy with this workflow.
I’d previously done my terraform work with invoke and jinja2 stamping out HCL templates - never tried going straight to JSON but that makes sense. Thanks for sharing — I’ll have to try Chalice just to see how it works.
This is incredibly well written and comprehensive. It's a gentle and friendly introduction to both Lambda and Terraform. What a great job.
I'll just point out that if you're using Python, Chalice is excellent and is able to emit Terraform code for all of its resources (https://aws.github.io/chalice/topics/tf.html).
I'm wondering if people would find it useful to see the cost of using opensource projects that spin-up resources in their AWS accounts before they run `terraform apply`? Or maybe a repo shield/badge in the readme? (the idea came from https://github.com/infracost/infracost/issues/43)
I'm not sure how it could work for usage-based resources like Lambda/S3, maybe just assuming minimum usage for each resource is good enough to provide a rough monthly estimate? e.g. 1M Lambda requests, 1 GB storage and 1K S3 requests, then let users customize those numbers if they care to find out more?
I have personally have nothing against Terraform, and we’re using it for a lot of infrastructure-heavy things on our platform, but I think there are way better frameworks when deploying Lambda function.
Particularly the Serverless framework will save you A LOT of boilerplate IAC regarding all the event-driven integrations Lambda currently has to offer, and manages the full build and packaging cycle with the relevant language specific plugins, e.g. for Node or Python.
I've always chosen to go the hard-way (use terraform) for lambda because all my other infra is in terraform, do any of those other frameworks you mentioned interoperate neatly with TF? (I mean in terms of being aware of one another, being able to use similar/same resource tags, pick-up env vars for lambda exposed by TF (e.g rds credentials from the KMS), etc?)
There’s CDK-TF that was already mentioned in another post. Apart from that I’m not sure what you mean with interoperability, because the resources can only be owned by one stack.
For sure stacks created with those frameworks can use resources that are created by Terraform in other stacks. Serverless for example has its own integration with the Parameter Store for encrypted variables, so that there’s no need to use environment variables in that use case. Terraform would then create the Paramter Store keys which SLS could reference.
CDK is amazing, we use it in typescript and it's just great, so much easier than raw cloudformation or SAM.
I can't compare to TF as I have no experience with it, but I would assume CDK is better just because you can use a well established language (instead of some custom lang that needs to be learned)
IMO, i dont think terraform is the right tool for containerized services. I had experimented with terraform and ansible for deployments earlier but i could see simpler deployments using serverless or apex.
As a guy who's done a lot of programming and a lot of technical writing, it's clear that this is the result of a TON of work. It is a model of clarity, well-formatted, and explained with just the right level of detail. It is completely pro quality and OP should be super proud of this body of work.
This isn't just warmed-over Amazon docs. It's just what you need when you can't figure out what the docs are saying and you want to get something done now.