We are making these open source runtimes available soon:
C++[1]
Rust[2]
We are also working with our partners to provide more open source runtimes:
Erlang (Alert Logic)
Elixir (Alert Logic)
Cobol (Blu Age)
N|Solid (NodeSource)
PHP (Stackery)
This is great, but I'm ever so slightly disappointed that they called it the Lambda Runtime API instead of Common Gateway Interface. [1] It's not quite the same, but it definitely gives me an everything-old-is-new-again feeling.
Seeing this takes away one of my biggest comparative advantages, but I'm happy to see that at least I was not wrong in thinking this feature would be important for developers.
CGI still has the advantage of being an open standard, locally testable, and not vendor specific.
Instead of enquing a job you’d just write to an SQS queue and have a lambda as your “job handler” using the queue as an event source.
Boom, no need for Active Job.
Welcome to Serverless.
There’s going to be a lot of people trying to do things the “rails way” on serverless and it’s going to be a little confusing for a while. Many will write it off, some will creat innovative new gems. And of course the Ruby community will come up with the best abstraction for SAM templates and deployment, because we like nice tools.
No need to use SQS here, assuming you don't need strict ordering behavior, since SNS can use Lambda functions as notification targets. The nice thing about SNS is that it can automatically fan out to multiple Lambda functions in parallel.
And then what if your Lambda function fails because a service dependency is down? The answer is, it retries twice and then you can have it go to a deadletter queue. If you want to have any configuration control over your retry or deadletter behavior, SQS is really handy.
It's also a useful pattern to, instead of configuring the SQS event itself as your Lambda trigger, have a scheduled trigger and then perform health checks on your service dependencies before pulling messages from the queue. This technique saves you wasted retries and even gives you more fine-grained control over deleting messages (the SQS mechanism for saying, "this is done and you don't need to retry it after the timeout). The SQS trigger, in contrast, deletes the entire batch of messages if the Lambda exits cleanly and does not delete any of them if the Lambda does not exit cleanly, which is a bit messy.
Also, you can have an SQS queue subscribe to multiple SNS topics, which means you can set up separate SNS topics for backfill or other use cases. This is especially useful in cross-account use cases, where often your online events are coming from an SNS from a different account, but you want to be able to inject your own testing and backfill events.
I usually don’t let the lambda SQS back end control message deletion for just that reason. I delete individual messages as I process them and throw an exception if any fail.
If a specific message fails to process multiple times, most often it’s because that particular message is malformed or invalid, or exposes an unhandled edge case in the processing logic, and will never successfully process, at which point continued attempts are both fruitless and wasteful.
It’s better to retry a finite number of times and stash any messages that fail to process offline for further investigation. If there is an issue that can be resolved, it’s possible to feed deadletter messages back in for processing once the issue is addressed.
Feeding your deadletter events back into your normal processing flow not only hides these issues but forces you to pay real money to fail to process them in an infinite loop that is hard if not impossible to diagnose and detect.
Using SQS as an event source, it will also fanout with the advantage of being able to process up to 10 messages at once, dead letter queues for SQS make a lot more sense than lambda DLQs etc.
But, I prefer to use SNS for producers and SQS for consumers and assign the SQS queue to a topic. With SNS+SQS you get a lot more flexibility.
Be careful the if there is possibility for a job to fail and need retry. SQS visibility and automatic dead lettering can give you this basically for free. I don't think SNS will.
1. You can have one SNS topic, send messages to it with attributes and subscribe to different targets based on the attribute.
2. You can have one SNS message go to different targets. There may be more than one consumer interested in an event.
3. “Priority” processing. I can have the same lambda function subscribed to both an SQS queue and an SNS topic. We do some things in batch where the message will go to SNS -> SQS -> lambda. But with messages with a priority of “high” they will go directly from SNS -> Lambda but they will also go through the queue. High priority messages are one off triggered by a user action.
By “fan out” I mean that SNS will execute as many Lambda functions as needed to handle the published events as they occur. This gives you much greater concurrency than a typical pub/sub architecture where you typically have a ceiling on the subscriber count or are forced to wait on consumers to process the earlier enqueued messages before processing subsequent ones (i.e., the head-of-line blocking problem).
Yes. But there is an easy way around it. If you can break your process up into chunks that run less than 15 minutes, you can have all of your steps in different methods, and use Step Functions to orchestrate it. You will usually be using the same lambda instance since you are calling the same function.
A JSON file and a case statement is hard? You can even validate the JSON as you’re creating it in the console. Not just validating correct JSON, but validating that it is a correctly formatted state machine. Anyone who can’t handle creating the step function and a case statement is going to have a heck of time using Terraform, CloudFormatiom, creating IAM policies, yml files for builds and deployments, etc.
Just run the stupid thing on a VM for however long it takes, why would anyone need all this complexity to run some dumb program that should just be a cron job to begin with. And why should the limitations of some platform dictate how one structures their code to begin with? You're solving for the wrong problem.
“Just do everything like we would do on premise” is the very reason that cloud implementations end up costing more than they should.
Let’s see all the reasons not to use a VM.
1. If this job is run based on when a file comes in, instead of a simple S3 event -> lambda, then you have to do S3 Event -> SNS -> SQS and then poll the queue.
2. Again if you are running a job based on when a file comes in, what do you do if you have 10 files coming in at the same time and you want to run them simultaneously? Do you have ten processes running and polling the queue? What if there is a spike of 100 files coming in, do you have 100 processes running just in case? With the lambda approach you get autoscaling. True you can set up two Cloudwatch alarms - one to trigger when the number of items in the queue is X and to scale down when the number of items in the queue is below Y, then set up an autoscaling group and launch configuration. Of course since we are automating our deployments that means we also have to set up a CloudFormation template with a user data section to tell the VM what to download and the description of the alarms, scale in, scale out rules, the autoscaling group, the launch configuration, the definition of the EC2, etc.
Alternatively, you can vastly over provision that one VM to handle spikes.
The template for the lambda and the state machine will be a lot simpler.
And after all of that work, we still don’t have the fine grained control of the autoscaling with the EC2 instance and cron job that we would have with the lambda and state machine.
3. Now if we do have a VM and just one cron job, not only do we have a VM running and costing us money waiting on a file to come in, we also have a single point of failure. Yes we can alleviate that by having an autoscaling group with one instance and set up a health check that automatically kills the instance if it is unhealthy and bring up another one and set the group up to span multiple AZs.
I disagree - the rampant complexity of systems you can build using all these extra services is the bigger problem, and at scale, all these extra services on AWS cost more than their OSS equivalent running on VMs. Take a look at ELB for example, there's a per-request charge that yes is very small, but at 10k req/s it's significant - not to mention the extra bytes processed charges. Nginx/haproxy with DNS load balancing can do pretty much what ELB is offering. In the very small none of this matters, but you can just run smaller/bigger VMs as needed; prices starting at the cost of a hamburger per month.
Re reliablity, I have VMs on AWS with multiple years of uptime - it's the base building block of the entire system, so it's expected to work all the time. You monitor those systems with any sort of nagios-type tool and alert into something like pager duty for when things do go wrong, but they mostly don't. Redundancy I build in at the VM layer.
Additional services on AWS however have their own guarantees and SLAs and they're more often down than the low-level virtual machines are - so it's an inherently less reliable way to build systems imo, but to each their own.
Also, this notion that you can conveniently split your processing up into 15m chunks is just crazy -- you may never know how long some batch job is going to take any given day if it's long running. They're typically leaning on some other systems outside your control that may have differing availability over time. I met a guy who built some scraper on lambda that did something convoluted and he was happy to make like 100k updates in the db overnight -- I built a similar thing using a single core python job running on a VM that did 40MM updates in the same amount of time. It's just silly what people think simple systems are not capable of.
I disagree - the rampant complexity of systems you can build using all these extra services is the bigger problem,
So now I am going to manage my own queueing system, object storage system, messaging system, Load Balancer, etc? My time is valuable.
and at scale, all these extra services on AWS cost more than their OSS equivalent running on VMs. Take a look at ELB for example, there's a per-request charge that yes is very small, but at 10k req/s it's significant - not to mention the extra bytes processed charges. Nginx/haproxy with DNS load balancing can do pretty much what ELB is offering.
ELB automatically scales with load. Do you plan to overprovision to handle peak load? My manager would laugh at me if I were more concerned with saving a few dollars than having the cross availability zone, fault tolerant, AWS managed ELB.
In the very small none of this matters, but you can just run smaller/bigger VMs as needed; prices starting at the cost of a hamburger per month.
Again, we have processes that don’t have to process that many messages during a lull but can quickly scale up 20x. Do we provision our VMs so they can handle a peak load?
Re reliablity, I have VMs on AWS with multiple years of uptime - it's the base building block of the entire system, so it's expected to work all the time. You monitor those systems with any sort of nagios-type tool and alert into something like pager duty for when things do go wrong, but they mostly don't.
Why would I want to be alerted when things go wrong and be awaken in the middle of the night? I wouldn’t even take a job where management was so cheap that they wouldn’t spend money on fault tolerance and scalability and instead expected me to do wake up in the middle of the night.
The lambda runtime doesn’t “go down”. Between Route53 health checks, ELB health checks, EC2 health checks, and autoscaling, things would really have to hit the fan for something to go down after a successful deployment.
Even if we had a slow resource leak, health checks would just kill the instance and bring up a new one. The alert I would get is not something I would have to wake up in the middle of the night for, it would be something we could take our time looking at the next day or even detach an instance from the autoscaling group to investigate it.
Redundancy I build in at the VM layer.
Additional services on AWS however have their own guarantees and SLAs and they're more often down than the low-level virtual machines are - so it's an inherently less reliable way to build systems imo, but to each their own.
When have you ever heard that the lambda runtime is down (I’m completely making up that term - ie the thing that runs lambdas).
Also, this notion that you can conveniently split your processing up into 15m chunks is just crazy -- you may never know how long some batch job is going to take any given day if it's long running.
If one query is taking up more than 15 minutes, it’s probably locking tables and we need to optimize the ETL process....
And you still didn’t answer the other question. How do you handle scaling? What if your workload goes up 10x-100x?
I met a guy who built some scraper on lambda that did something convoluted and he was happy to make like 100k updates in the db overnight -- I built a similar thing using a single core python job running on a VM that did 40MM updates in the same amount of time. It's just silly what people think simple systems are not capable of.
Then he was doing it wrong. What do you think a lambda is? It’s just a preconfigured VM. There is nothing magic you did by running on Linux EC2 instance that you also couldn’t have optimized running a lambda - which is just a Linux VM with preinstalled components.
Yes costs do scale as you grow and they may scale linear but the cost should scale at lower slope than your revenues. But then again, I don’t work for a low margin B2C company. B2B companies have much greater margins to work with and the cost savings of not having to manage infrastructure is well worth it.
If I get called once after I get off work about something going down, it’s automatically my top priority to find the single point of failure and get rid of it.
Another example is SFTP. Amazon just announced a managed SFTP server. We all know that we could kludge something together much cheaper than what Amazon is offering. But my manager jumped at the chance to get rid of our home grown solution that we wouldn’t have to manage ourselves.
Yes, you wouldn't use Sidekiq anymore either because you would have a different ActiveJob backend, but the important part is that you're eliminating a piece of infrastructure (Redis) without the the poor scalability of DelayedJob.
Don’t get me wrong, I’m no savage. I use structured JSON logging with Serilog and use ElasticSearch as a sink with a Kibanna for most of my app logging.
I also log to Cloudwatch though and keep those logs for maybe a week.
You laugh, but depending on the actual Cobol support (e.g. whether it also supports a decent amount of CICS) it could make it easy to port a bunch of functions that are locked away on zOS.
They already have. You're entering a market, as you say, where every contemporary is a super senior developer, for old, rich companies like banks, and with demand staying static...but supply dropping.
Has anyone seen discussion of the impact of serverless on programming-language design?
It relaxes constraints that have historically restricted the shape of viable languages. With massively-parallel deterministic compilation (like Stanford's gg - which just got simpler to implement). Parallel distributed incremental shared type checking, like Facebook's Hack. Language-community-wide sharing of compilation artifacts (sort-of like build images, or typescript's type definition repos, or Coq's proof repo).
"That would be a nice language feature, but we don't know how to compile it efficiently, so you can't have it" has been a crippling refrain for decades. "[D]on't know how to either parallelize or centrally cache it" is a much lower bar. At least for open source.
This involves not just compiler tech in isolation, but also community organization for fine-grain code sharing. Any suggestions on things to look at?
Way to go, Ruby support!!!! I am irrationally excited about this, if I wanted to do Serverless Ruby up until now, my nearest options were (some community-supported thing with "traveling ruby" on AWS, or...) Kubeless, or Project Riff
We've been waiting! I thought this would never happen. Eating major crow today, as I've assumed for a long time that AWS's lack of Ruby support in Lambda was an intentional omission and sign that they are not our friends.
I will change my tone starting today!
(edit: Or Google Cloud Functions, or probably some other major ones I've missed...)
It's pretty comprehensive. And it was able to cache the execution context (and keep the cache warm) to approach native execution times. All of this obviously before native Ruby support was announced.
I had a play with it this evening implementing a basic webhook handler and it’s super smooth - I hooked a Ruby Lambda function up to API Gateway and everything just works. I suspect you could very easily create some sort of Rack shim on top as well, effectively giving near free, infinitely scalable, Ruby app servers (assuming you can keep the start time down).
I just did the same thing, but things fall apart when you need to use gems w/ native extension (mysql). I'll need to investigate the native runtime route.
It turns out one of the samples from AWS is that shim, and it’s super elegant - at least in theory I think you could deploy almost any Ruby web app to Lambda with minimal effort.
You're not the only one. I have a co-worker that had inside information who said he believed it was part of a feud between Ruby and python, since there are a lot of python devs working on aws. I didn't believe him for a while, but was sort of starting too since it is clear that there is demand for Ruby on lambda
10 minutes before the announcement that OP is about here, one of our architects copied this line from an AWS slide deck:
> AWS Lambda supports code written in Node.js (JavaScript), Python, Java (Java 8 compatible), and C# (.NET Core) and Go.
... and intimated that, if your language is not on this list, you're not part of the dominant side of the industry.
As if to say, to the large cohort of Ruby developers we have on staff, "You guys are basically irrelevant because Amazon doesn't care if you can work on the pride and joy of their platform or not."
Every other week I've got a different person telling me that Ruby is a quirky niche language and I should get with the times and become a full-stack JavaScript developer instead.
I get it, there's growth in other languages, but if I have years of experience that means I should throw it away?
Sometimes these conversations get really personal, too. Almost like we're choosing a football team, you'd think it was politics or religion that we're debating...
I've learned a lot since I started writing Ruby at my first permanent job in 2010, and I'm 100% sure that if I looked at any of the early Ruby code I wrote in 2010, I'd grumble and groan at it just as hard as if you showed me some PHP code that I wrote at my first co-op job as a sophomore in college.
You can write great code in any language. You can also write a shanty-town or big ball of mud, in practically any language. I think it's unfair to say that PHP in and of itself represents a "bad code smell." But I'm also fairly sure from anecdotal evidence that I'm in the minority.
I'm personally very hyped for using Haskell natively on Lambda! In the keynote he mentions the partner sponsored runtimes, and actually said "Scala and Haskell, you'll just have to bring your own!" (as in, community effort).
May be the only one(?) on here who is hyped for using CFML natively on Lambda. Now thanks to the work of a member of our community we have a proof of concept in Fuseless http://www.fuseless.org as well as the open source CFML http://www.lucee.org folks hinting at a Lambda project. Only six months ago people told me it was never going to happen but I now see that changing.
Yes! We are definitely looking forward to championing awesome projects to build languages for the Runtime API. Take a look at the Rust and C++ examples as they show you a bit of how it all works. (Chris from AWS Serverless team)
There has never been anything stopping you from running binaries compiled from any language (including Haskell) on Lambdas. They run Linux after all. See serverless-haskell (link at bottom) for defining serverless Haskell functions.
serverless-haskell also takes care of packaging shared objects depended upon alongside the binary. You will have to package any binary dependencies of your Haskell code yourself.
The way it works is by generating a Node script which invokes the compiled binary with the payload as an environment variable, although this is encapsulated by an interface which you implement in the Haskell code.
One thing to make sure is that your libraries and Haskell code are compiled on a system compatible with the Lambda execution environment. Documentation going further into ELF compatibility is available, but I can’t remember where. I remember using stack to have my Haskell code compile on an Ubuntu docker image which was compatible.
I haven’t personally played with scala yet, but do you need to add any additional runtime components for it to work? Or perhaps scalaz? Or is it just making a .jar and you are good to go? :)
Honestly, for everything we are currently running on Lambda. At the moment we are using the Node.js runtime with Typescript.
We use both Lambda for data processing on incoming IoT events, and also for API interactions with our user, along with services to send mails and SMS, etc.
---
I would just want to use the robustness and correctness that I'm able to ensure with Haskell, compare to TypeScript, and especially bundling quirks with Webpack and NPM.
An example of an easy thing to let through is missing an _await_ on a call (might be for logging or something else), which means it'll go out-of-band, and can then easily not be run during the execution, if you have disabled waiting for the event loop to be done (which is necessary in many cases for global setup optimizations).
The call might then finish when the lambda function is thawed, but it might also never run, because the function gets collected.
Now, admittedly, this is not a big issue in any way, but it's the death by a thousand papercuts that I'd like to avoid.
Haskell (and many functional languages) excel at tasks where data is being transformed and/or can be modeled that way. Most of our lambda functions do exactly that.
Layers sounds like a great solution to sharing code/libraries. If anyone at AWS is here, will there be a way to pull them down for local testing? At the minute it's trivial because you're forced to pull all your dependencies into a single place you can zip them, and you can test them at that point - but will you still have to do that if you want to test locally with layers?
I'm not sure why this was downvoted/dead, but thankyou - it looks like this would be enough since it gives a link to download the layer and a hash to check it against.
hi! yes there will be support in SAM CLI that when you do local testing referring against a Layer it will pull it down for you. - Chris, Serverless @ AWS
Hi Chris -- it would be interesting to compare notes on Cloud Native Buildpacks, which seems to have overlapping mission with Lambda Layers. Could you come find us at
https://slack.buildpacks.io ?
* We had to hack around Lambda zip size limits. Now we can deploy fat dependencies to Layers and ditch the hacks.
* We can drastically speed up build and packaging time locally by keeping fat dependencies in Layers.
* We use a shared in-house library to wrap all code that lives in a Lambda. Updating the version of this required us to deploy every single Lambda that used it. Now it can live in one place.
* We can eliminate repeated deployment pain for Python code with C dependencies by deploying the manylinux wheel to Layers. Now devs can go back to just packaging up pure Python and not worry about cross-platform problems.
Only uncompressed size (250MB) really matters. You could always work around the compressed limits (50MB) by uploading to S3 first, instead of deploying the ZIP directly.
We've been packaging pandas in a lambda which is used to perform some calculations, but being a 50 MB zip file makes cold starts of about 6-8 secs. We're lucky that the service has little use, thus our way to workaround it is by having a lambda warmer which is run every 5 minutes and invokes N pandas lambdas. I'd be very interested in knowing if Layers has some feature to avoid this kind of issue.
We had the same cold start problem and couldn’t find a way to reliably keep things warm. For instance, Lambda would often spawn a new instance rather than re-use the warm one.
In the end, we came to the conclusion that Amazon is smart and won’t let you hack together the equivalent of a cheaper EC2.
I don't think it's deliberately so, just that developing a solution requires scheduling and routing to cooperate. Normally they're considered by separate systems. As your execution pool expands, this problem becomes worse, not better.
On the other hand, their incentive to solve the problem is relatively weak vs an on-premise alternative.
If I were doing this today, I would prototype the problem in Python and after realising the startup penalty, would rewrite it in D's Mir [1] or Nim's ArrayMancer [2].
Life on a lambda is too short to pay 6-8 second startup penalty over and over millions of time.
Our problem is that we have a team of data scientists who are familiar with Python, plus a decent set of custom tools written in it, so changing languages isn't an option
that's often the current explanation for continued use of Pyhton and R.
Often it is a sign that the problem is not "big" enough (eg: not crunching truly large data sets) OR data science team gets disproportionate amount of goodwill (thus money) to spend on its foibles. :)
How did you get the zip down to 50MB. I was under the impression that pandas+numpy was closer to 300MB and bumped up against AWS size limits. I was considering building some hacked together thing with S3
I came to this thread specifically to find out about numpy and pandas on lambda.
We've been running a stripped down version of numpy + scipy + matplotlib in lambda. We'd build the environment in a docker container with Amazon linux, manually remove unneeded shared objects and then rezip the local environment before uploading to s3.
When I worked on this I used this article as a reference: https://serverless.com/blog/serverless-python-packaging/ and also ended up with a huge image. What that article didn't mention is that the virtual environment folder should be excluded in the serverless config file, as the runtime is provided by boto3. So adding:
package:
exclude:
- venv/
would reduce the size considerably (to 50 MB in my case)
It is in our case. This is a service which is very seasonal, so it may be used during a couple of days each month only. Having a bunch of instances mostly idle would definitely be more expensive
Note that it's long been possible to use any language with Lambda through a shim. In the early days that was using Node.js to call out to your binary. That meant you had an extra bit of startup cost (Node + your binary). Once Go was supported that no longer mattered much since Go binaries start almost instantly.
Hi! This is super common feedback and something the team is definitely thinking about! What would you want to see it increased to? (Chris Munns from Serverless @ AWS)
This would be amazing! At lot of ML use cases are largely unfeasible in lambda on python without serious pruning. Latest version of tensorflow is 150mb uncompressed, add numpy pandas etc to that and it adds up fast. I think 1 GB uncompressed would be pretty reasonable in the current state of ML tools, personally.
As a thought, could Lambda (perhaps in cooperation with AWS Sagemaker?) offer a Lambda execution environment atop the AWS Deep Learning AMI? This would solve a lot of problems for a lot of people
Is there any plan to add more disk space, or a way to fan out jobs? We use a lambda that does work on video files, we have to limit how many run concurrently (3) to prevent running out of disk space right now. Edit - or ability to attach volumes like Azure's lambdas.
Can't wait for this too, it seems kind of old limitation not suitable for layers at all. But on another hand - lambda should be small and fast if something out of limits is needed then fargate or ecs should be used.
That said I I hope they increase the limit to at least 500 Mb sooner than next reInvent.
Noob question but is it possible/advisable to somehow (re)use prepared statements in Lambda?
"Prepared statements only last for the duration of the current database session. When the session ends, the prepared statement is forgotten, so it must be recreated before being used again. This also means that a single prepared statement cannot be used by multiple simultaneous database clients..."[1]
I don't see why not. Presumably you would connect to the database and prepare the statement when the Lambda function starts up and execute the prepared statement from the per-request handler invoked.
You certainly can link MySQL connector libraries. Or any library for that matter that you normally link in a C++ application. The lambda-cpp-runtime does not limit you.
Check out some of the examples in the GitHub repository.
Checking the examples is the first thing I did. No example uses MySQL connector.
There are only example CMake calls to find_library, that will or will not find this library based on things that are not documented.
AWS Lambda still uses Amazon Linux AMI as the base for running all code. MySQL C++ connector library is not available to be easily installed by yum for this particular OS.
In particular, the library is distributed as an RPM, from Oracle website, which requires you to be logged in to download it, and then manually installed.
Also, an appropriate CMake find script has to be available to CMake.
Therefore, this is still a valid question, and AWS support needs to clarify.
For the previous Lambda version, I had to add the mysqlcppconn.so file to the uploaded zip and use something like this to ensure the executable runs:
PM for PowerShell here, no ETAs right now, but we're working closely with some O365 teams (starting with Exchange Online) to bring their modules up to compatibility with PS 6.x.
Forgive the noob question, but why is it necessary to have a custom runtime when using a binary from a compiled language? It seems to me that golang support should just mean binary support, and then c++ and rust would be able to comply already, no?
As of yesterday, lambda only provided a way to specify a function handler. The input to that function, and the response value from that function, needs to be deserialized and serialized (and implicitly, construct a runtime in which the concept of a function exists). Previously, a runtime for each supported language handled deserializing the wire protocol into a runtime object, invoking the specified function, routing stdout and stderr to cloudwatch, serializing the function's return value, and transforming runtime exceptions into response exceptions.
The lambda team is now letting you write that runtime, and presumably providing more documentation on the responsibilities of the runtime.
Check out the example C++ and Rust runtimes to understand why each language needed to have it's own custom runtime.
You could do that, it's true, and folks did that with node.js to shim Go before Lambda had official Go support. But being able to just create/use/share a custom runtime inside of an organization will be easier for them overtime vs. the shim method. - Chris from Serverless@AWS.
The way Lambda manages to achieve the performance it does is because it bootstraps the application and then runs each new request through the loaded application.
This means Lambda needs some way to communicate with an already running service.
Initially, Lambda had language specific runtimes that took care of this (i.e. with Node they have their own bootstrapper that loads your top-level JS file and then calls the defined method each time it receives a new event).
With the release of the Go runtime, they built a framework that gets compiled into your service that runs a simplified HTTP server that Lambda then submits events to.
This latest generalised version eschews an embedded HTTP server for letting your app do something like a long-poll to a local event RPC source in the Lambda container. Basically, your app boots and attempts to pull a job off the queue, if there's a job, your Lambda runs, if there isn't, your Lambda service gets paused until there's something to send it.
The go runtime can execute any binary, but the lamda is controlled via go rpc, which uses go's custom binary serialization. I couldn't find any implementations of gobs in any other language when the go runtime was announced.
I hope this runtime API is a simplification of the go rpc api
Anyone know if there have been any improvements to cold start times for Lambdas in a VPC? That was the absolute death knell for us. If you're using Lambdas as a service backend for mobile/web apps, it's extremely common those Lambdas will be talking to a DB, and any decent security practice would require that DB to be in a VPC. Cold starts for Lambdas in a VPC could be on the order of 8-10 seconds: https://medium.freecodecamp.org/lambda-vpc-cold-starts-a-lat...
I just got out of a session at re:invent where they covered that they were working on improving VPC cold start times by reducing the amount of time it takes to associate an ENI with your lambda function. The method they're using also reduces the number of IPs needed to 1 per subnet.
We recently had to abandon Lambdas, 10+ second cold start, and for some reason when adding an API Gateway you get another +200ms on each request (Google it, common issue apparently).
So, 10+ seconds cold start, and 200 + 200-300ms (around 500-600ms avg) calls to the Lambda function. Complete garbage for our application at least (I imagine using it for background processing might not be an issue with latency).
Switched over to EC2, less than 200ms response total, no cold starts.
Agreed! I'm much more concerned with VPC performance - I don't have a single lambda outside of a VPC. Firecracker is extremely cool, and I'm very glad to see the improved perf at the VM level, but that's not my bottleneck.
Thankfully, in my case, I have a very steady flow of data so I don't expect too many cold starts.
One thing though, does your lambdas need both public and private access? Else you can place them in a subnet for private only, since the slow part is the ENI for the Nat Gateway.
Cold starts for the VM are only part of the problem. If you're on a JITed runtime, a cold start also means compilation and optimization. It would be nice if they had ways to cache the JITed machine code so they could start back up with it already compiled and optimized.
You can generally resolve it yourself by poking seldom used functions to keep them hot. But no, they haven’t provided a solution to cold start (unless you consider ec2 or fargate a solution).
> You can generally resolve it yourself by poking seldom used functions to keep them hot.
We've tried this and it helps somewhat but when AWS attempts to scale your function based on load, cold starts re-appear. We've moved away from Lambdas where a dependable response time is required.
If you are experiencing cold starts it means that function is not used very often. If it's not used very often that likely means it's not user facing (or something less important like a Terms of Service page). If that's the case, why do you need instant response times?
No, that's not what it means. If you have high concurrent execution, you get 'cold start' every time the underlying service 'scales out' to support more.
The MORE you use lambda concurrently, the more you hit the cold start issue.
Granted, it's just for that one cold start execution per-scale node (and they could probably just optionally pre-fire to warm things in that instance, like with a cache), but it's definitely there horizontally.
I really with they would add an init() callback that is called on cold start but before any traffic is sent to your lambda. It wouldn't help when there are no lambdas running but it could be useful when things are scaling up, especially if you can ask for additional concurrency above the actual concurrency necessary for spikes.
I don't think so. When it spins it up, the request is already in flight. Otherwise this would have been solved by everyone but instead everyone sees terrible cold start times.
This is along the lines of what the other responses to this comment have said, but https://hackernoon.com/im-afraid-you-re-thinking-about-aws-l... gives a very detailed overview. It's titled "I'm afraid you’re thinking about AWS Lambda cold starts all wrong", because the way you're thinking about cold start times is common (and wrong).
that’s not entirely true. while your warm lambdas can and will take the traffic it your traffic ramps up, additional lambda instances will be spun up. you will pay cold start prices as they are spinning up. so, even if you have a heavily used lambda fn, depending on the traffic your p99 will still look pretty bad and you will not be able to guarantee that all requests will be processes in x ms or less.
That's not for in-VPC functions, although if the underlying instance changes with firecracker migration users might see ENI start improvements. Currently your ENI usage is roughly floor(3GB/Memory Allocation)*(Concurrent Executions). If the 3GB changes users will see huge gains as each ENI creation can take around 9s.
I'm wondering how that's even possible if it includes the time for downloading your code from S3. I.e. normal cold starts (as I understand it) involve fetching the code from S3 to install on a VM. Perhaps they aren't including that time when claiming single milli cold start times?
I am still waiting for proper PHP support and the ability for Lambdas to use VPC to connect to RDS servers, leaving my DBs wide open is kind of annoying... they say its possible but I've had 4 engineers try and no one can get it to work.
These issues that Azure has already solved, make me wonder how much longer i will stay with AWS.
I suspect cold-start performance of arbitrary Docker containers would be intolerable for most customers in light of the sizes and number of layers of many of the images seen in the wild. Most people aren't building "FROM scratch" images yet, if they ever will. Single binaries, or scripting languages with well-defined runtime environments, are far easier to meet customer-demanded SLOs when building a serverless platform.
(Disclaimer: I work for AWS, but this is strictly my personal opinion/observation.)
This is in part what Cloud Native Buildpacks is working to solve. Can you come find us (https://buildpacks.io)? I think it would be really helpful to have CNB supported in/as Layers.
Docker isn't a runtime, its a packaging technology. The Runtime API is a simple standard interface that will allow you to run even Docker containers if you so desired.
Docker doesn’t let you take the union of separately-authored base images together as a runtime image for your app. Lambda (apparently) does.
That’s the only major difference I can see, though. The rest is convention/standardization on what’s put into the image (an executable that speaks HTTP over its stdio; separately-namespaced projects under /opt rather than use of the global namespace; etc.)
The key distinction for me is that Lambda is event driven. If you have a microservice/function which is seldomly called, in Fargate you would pay 24x7 just to be listening for calls.
Fargate tasks can now be initiated by CloudWatch Events as well[1]. Lambda will always be at the extreme end of "serverless", but over time I expect the lines to blur more especially since Fargate will now be based on Firecraker[2] which has 125ms startup times for container workloads.
Yes, it's strange they didn't do this. Azure and GCP are headed that way, and there are smaller vendors doing the same thing.
Per-event invocation of a docker container instead of a specific bundle of code seems much more flexible, especially with the Firecracker tech they announced.
Sort of. Lambda is event-driven and on-demand, Fargate is "task-based". As mentioned in the sibling comment, if you build a "ping" HTTP endpoint in both, in Lambda you will have just a function which is called when necessary, in Fargate you will need to build a mini-webserver to keep listening for requests (as well as pay just to be listening instead of pay just when running).
I like how both you and the sibling comment tried to educate me on the difference between containers and event-driven compute. I'm well aware of the difference. My point, which maybe I didn't make so clear since both of you missed it, was that you can make fargate spin up a container for an event, but it would be too slow for any useful event driven computing.
If you're gonna use containers, use containers. If you want someone else to manage the container for you, use Lambda.
Fargate is a great service, but it's not a serverless solution not by any mainstream definition of the term (although Amazon seems to be intent on stretching the definition lately).
Create a simple ping/pong HTTP service in both and you'll quickly see difference in everything from billing to startup time.
I like how both you and the sibling comment tried to educate me on the difference between containers and event-driven compute. I'm well aware of the difference. My point, which maybe I didn't make so clear since both of you missed it, was that you can make fargate spin up a container for an event, but it would be too slow for any useful event driven computing.
If you're gonna use containers, use containers. If you want someone else to manage the container for you, use Lambda.
In one moment you're saying "Thats what Fargate is" as if Fargate == Running Serverless Containers.
On the other you're saying "you can make fargate spin up a container for an event, but it would be too slow for any useful event driven computing", which is the truth, but it contradicts your initial comment.
1. https://aws.amazon.com/blogs/compute/introducing-the-c-lambd...
2. https://aws.amazon.com/blogs/opensource/rust-runtime-for-aws...
3. https://aws.amazon.com/blogs/compute/announcing-ruby-support...