Hacker News new | past | comments | ask | show | jobs | submit login
New for AWS Lambda: Use Any Programming Language and Share Common Components (amazon.com)
480 points by abd12 on Nov 29, 2018 | hide | past | favorite | 211 comments



  We are making these open source runtimes available soon:
  C++[1]
  Rust[2]

  We are also working with our partners to provide more open source runtimes:
  Erlang (Alert Logic)
  Elixir (Alert Logic)
  Cobol (Blu Age)
  N|Solid (NodeSource)
  PHP (Stackery)
There is also native Ruby support as well[3]

1. https://aws.amazon.com/blogs/compute/introducing-the-c-lambd...

2. https://aws.amazon.com/blogs/opensource/rust-runtime-for-aws...

3. https://aws.amazon.com/blogs/compute/announcing-ruby-support...


This is great, but I'm ever so slightly disappointed that they called it the Lambda Runtime API instead of Common Gateway Interface. [1] It's not quite the same, but it definitely gives me an everything-old-is-new-again feeling.

https://en.wikipedia.org/wiki/Common_Gateway_Interface


Interesting you mention CGI; I built https://bigcgi.com (https://github.com/bmsauer/bigcgi) primarily to address the issue of "any language" in FaaS environments.

Seeing this takes away one of my biggest comparative advantages, but I'm happy to see that at least I was not wrong in thinking this feature would be important for developers.

CGI still has the advantage of being an open standard, locally testable, and not vendor specific.


Hmm, I was thinking Service Component Architecture [1], but yes, I can relate to your 2nd point.

[1] https://en.wikipedia.org/wiki/Service_Component_Architecture


Thanks for the Ruby links.

With the native ruby support[3], I wonder how hard it would be to implement a Lambda ActiveJob handler..


Why would you need to make an active Job handler? What would that do for you over just a regular event based Ruby lambda?

Honest question: I don’t Ruby, I just looked up what it is.


Instead of enquing a job you’d just write to an SQS queue and have a lambda as your “job handler” using the queue as an event source.

Boom, no need for Active Job.

Welcome to Serverless.

There’s going to be a lot of people trying to do things the “rails way” on serverless and it’s going to be a little confusing for a while. Many will write it off, some will creat innovative new gems. And of course the Ruby community will come up with the best abstraction for SAM templates and deployment, because we like nice tools.


No need to use SQS here, assuming you don't need strict ordering behavior, since SNS can use Lambda functions as notification targets. The nice thing about SNS is that it can automatically fan out to multiple Lambda functions in parallel.


And then what if your Lambda function fails because a service dependency is down? The answer is, it retries twice and then you can have it go to a deadletter queue. If you want to have any configuration control over your retry or deadletter behavior, SQS is really handy.

It's also a useful pattern to, instead of configuring the SQS event itself as your Lambda trigger, have a scheduled trigger and then perform health checks on your service dependencies before pulling messages from the queue. This technique saves you wasted retries and even gives you more fine-grained control over deleting messages (the SQS mechanism for saying, "this is done and you don't need to retry it after the timeout). The SQS trigger, in contrast, deletes the entire batch of messages if the Lambda exits cleanly and does not delete any of them if the Lambda does not exit cleanly, which is a bit messy.

Also, you can have an SQS queue subscribe to multiple SNS topics, which means you can set up separate SNS topics for backfill or other use cases. This is especially useful in cross-account use cases, where often your online events are coming from an SNS from a different account, but you want to be able to inject your own testing and backfill events.


I usually don’t let the lambda SQS back end control message deletion for just that reason. I delete individual messages as I process them and throw an exception if any fail.


What problems have you encountered using the same SNS topic as the dead letter queue for the Lambda function?


I haven’t even tried that because that sounds like an infinite retry scenario waiting to happen.


Indeed it is - I was assuming, perhaps mistakenly, that that was the desired behavior. After all, why dispatch requests that are not to be processed?


If a specific message fails to process multiple times, most often it’s because that particular message is malformed or invalid, or exposes an unhandled edge case in the processing logic, and will never successfully process, at which point continued attempts are both fruitless and wasteful.

It’s better to retry a finite number of times and stash any messages that fail to process offline for further investigation. If there is an issue that can be resolved, it’s possible to feed deadletter messages back in for processing once the issue is addressed.

Feeding your deadletter events back into your normal processing flow not only hides these issues but forces you to pay real money to fail to process them in an infinite loop that is hard if not impossible to diagnose and detect.


Using SQS as an event source, it will also fanout with the advantage of being able to process up to 10 messages at once, dead letter queues for SQS make a lot more sense than lambda DLQs etc.

But, I prefer to use SNS for producers and SQS for consumers and assign the SQS queue to a topic. With SNS+SQS you get a lot more flexibility.


Indeed that is true, SNS can fan out to Lambda. It’s my understanding (I welcome a correction) that ActiveJob is more sequential like SQS + Lambda.


More like

SQS -> Multiple lambda instances

Unless you set the concurrency to one. But as always when working with any queue, don’t depend on only once processing or in order processing.

AWS does have FIFO queues but they don’t work with lambda directly.


Be careful the if there is possibility for a job to fail and need retry. SQS visibility and automatic dead lettering can give you this basically for free. I don't think SNS will.


Lambda functions can be configured to use SNS for dead letter queues as well. See https://docs.aws.amazon.com/lambda/latest/dg/retries-on-erro... (event sources that aren't stream-based).


Ah, very nice. Thanks for the correction.


Curious a/your use case or two for the “fan out” capability here. Do you mind sharing?


1. You can have one SNS topic, send messages to it with attributes and subscribe to different targets based on the attribute.

2. You can have one SNS message go to different targets. There may be more than one consumer interested in an event.

3. “Priority” processing. I can have the same lambda function subscribed to both an SQS queue and an SNS topic. We do some things in batch where the message will go to SNS -> SQS -> lambda. But with messages with a priority of “high” they will go directly from SNS -> Lambda but they will also go through the queue. High priority messages are one off triggered by a user action.


By “fan out” I mean that SNS will execute as many Lambda functions as needed to handle the published events as they occur. This gives you much greater concurrency than a typical pub/sub architecture where you typically have a ceiling on the subscriber count or are forced to wait on consumers to process the earlier enqueued messages before processing subsequent ones (i.e., the head-of-line blocking problem).


this is the truth! excited to see what people come up with!


Most people use Sidekiq which uses Redis for ActiveJob. If there was an Lambda backend then you could get rid of Redis.


Does the 15 minute limit not apply anymore? That might be a blocker.


Yes. But there is an easy way around it. If you can break your process up into chunks that run less than 15 minutes, you can have all of your steps in different methods, and use Step Functions to orchestrate it. You will usually be using the same lambda instance since you are calling the same function.

https://aws.amazon.com/step-functions/

Basically you define your steps with an input to each step (all of the below is psuedocode of course$ :

Step 1: {“Event”:”step a”}

Step 1: {“Event”:”step b”}

Step 1: {“Event”:”step c”}

And define the handler in your lambda:

Switch event:

   Case “step a”

      DoA()

   Case “step b”

      DoB()

   Case “step c”

      DoC()
I’ve had a lambda run for two hours doing this.


"easy way" wtf


A JSON file and a case statement is hard? You can even validate the JSON as you’re creating it in the console. Not just validating correct JSON, but validating that it is a correctly formatted state machine. Anyone who can’t handle creating the step function and a case statement is going to have a heck of time using Terraform, CloudFormatiom, creating IAM policies, yml files for builds and deployments, etc.


Just run the stupid thing on a VM for however long it takes, why would anyone need all this complexity to run some dumb program that should just be a cron job to begin with. And why should the limitations of some platform dictate how one structures their code to begin with? You're solving for the wrong problem.


“Just do everything like we would do on premise” is the very reason that cloud implementations end up costing more than they should.

Let’s see all the reasons not to use a VM.

1. If this job is run based on when a file comes in, instead of a simple S3 event -> lambda, then you have to do S3 Event -> SNS -> SQS and then poll the queue.

2. Again if you are running a job based on when a file comes in, what do you do if you have 10 files coming in at the same time and you want to run them simultaneously? Do you have ten processes running and polling the queue? What if there is a spike of 100 files coming in, do you have 100 processes running just in case? With the lambda approach you get autoscaling. True you can set up two Cloudwatch alarms - one to trigger when the number of items in the queue is X and to scale down when the number of items in the queue is below Y, then set up an autoscaling group and launch configuration. Of course since we are automating our deployments that means we also have to set up a CloudFormation template with a user data section to tell the VM what to download and the description of the alarms, scale in, scale out rules, the autoscaling group, the launch configuration, the definition of the EC2, etc.

Alternatively, you can vastly over provision that one VM to handle spikes.

The template for the lambda and the state machine will be a lot simpler.

And after all of that work, we still don’t have the fine grained control of the autoscaling with the EC2 instance and cron job that we would have with the lambda and state machine.

3. Now if we do have a VM and just one cron job, not only do we have a VM running and costing us money waiting on a file to come in, we also have a single point of failure. Yes we can alleviate that by having an autoscaling group with one instance and set up a health check that automatically kills the instance if it is unhealthy and bring up another one and set the group up to span multiple AZs.


I disagree - the rampant complexity of systems you can build using all these extra services is the bigger problem, and at scale, all these extra services on AWS cost more than their OSS equivalent running on VMs. Take a look at ELB for example, there's a per-request charge that yes is very small, but at 10k req/s it's significant - not to mention the extra bytes processed charges. Nginx/haproxy with DNS load balancing can do pretty much what ELB is offering. In the very small none of this matters, but you can just run smaller/bigger VMs as needed; prices starting at the cost of a hamburger per month.

Re reliablity, I have VMs on AWS with multiple years of uptime - it's the base building block of the entire system, so it's expected to work all the time. You monitor those systems with any sort of nagios-type tool and alert into something like pager duty for when things do go wrong, but they mostly don't. Redundancy I build in at the VM layer.

Additional services on AWS however have their own guarantees and SLAs and they're more often down than the low-level virtual machines are - so it's an inherently less reliable way to build systems imo, but to each their own.

Also, this notion that you can conveniently split your processing up into 15m chunks is just crazy -- you may never know how long some batch job is going to take any given day if it's long running. They're typically leaning on some other systems outside your control that may have differing availability over time. I met a guy who built some scraper on lambda that did something convoluted and he was happy to make like 100k updates in the db overnight -- I built a similar thing using a single core python job running on a VM that did 40MM updates in the same amount of time. It's just silly what people think simple systems are not capable of.


I disagree - the rampant complexity of systems you can build using all these extra services is the bigger problem,

So now I am going to manage my own queueing system, object storage system, messaging system, Load Balancer, etc? My time is valuable.

and at scale, all these extra services on AWS cost more than their OSS equivalent running on VMs. Take a look at ELB for example, there's a per-request charge that yes is very small, but at 10k req/s it's significant - not to mention the extra bytes processed charges. Nginx/haproxy with DNS load balancing can do pretty much what ELB is offering.

ELB automatically scales with load. Do you plan to overprovision to handle peak load? My manager would laugh at me if I were more concerned with saving a few dollars than having the cross availability zone, fault tolerant, AWS managed ELB.

In the very small none of this matters, but you can just run smaller/bigger VMs as needed; prices starting at the cost of a hamburger per month.

Again, we have processes that don’t have to process that many messages during a lull but can quickly scale up 20x. Do we provision our VMs so they can handle a peak load?

Re reliablity, I have VMs on AWS with multiple years of uptime - it's the base building block of the entire system, so it's expected to work all the time. You monitor those systems with any sort of nagios-type tool and alert into something like pager duty for when things do go wrong, but they mostly don't.

Why would I want to be alerted when things go wrong and be awaken in the middle of the night? I wouldn’t even take a job where management was so cheap that they wouldn’t spend money on fault tolerance and scalability and instead expected me to do wake up in the middle of the night.

The lambda runtime doesn’t “go down”. Between Route53 health checks, ELB health checks, EC2 health checks, and autoscaling, things would really have to hit the fan for something to go down after a successful deployment.

Even if we had a slow resource leak, health checks would just kill the instance and bring up a new one. The alert I would get is not something I would have to wake up in the middle of the night for, it would be something we could take our time looking at the next day or even detach an instance from the autoscaling group to investigate it.

Redundancy I build in at the VM layer. Additional services on AWS however have their own guarantees and SLAs and they're more often down than the low-level virtual machines are - so it's an inherently less reliable way to build systems imo, but to each their own.

When have you ever heard that the lambda runtime is down (I’m completely making up that term - ie the thing that runs lambdas).

Also, this notion that you can conveniently split your processing up into 15m chunks is just crazy -- you may never know how long some batch job is going to take any given day if it's long running.

If one query is taking up more than 15 minutes, it’s probably locking tables and we need to optimize the ETL process....

And you still didn’t answer the other question. How do you handle scaling? What if your workload goes up 10x-100x?

I met a guy who built some scraper on lambda that did something convoluted and he was happy to make like 100k updates in the db overnight -- I built a similar thing using a single core python job running on a VM that did 40MM updates in the same amount of time. It's just silly what people think simple systems are not capable of.

Then he was doing it wrong. What do you think a lambda is? It’s just a preconfigured VM. There is nothing magic you did by running on Linux EC2 instance that you also couldn’t have optimized running a lambda - which is just a Linux VM with preinstalled components.

Yes costs do scale as you grow and they may scale linear but the cost should scale at lower slope than your revenues. But then again, I don’t work for a low margin B2C company. B2B companies have much greater margins to work with and the cost savings of not having to manage infrastructure is well worth it.

If I get called once after I get off work about something going down, it’s automatically my top priority to find the single point of failure and get rid of it.

Another example is SFTP. Amazon just announced a managed SFTP server. We all know that we could kludge something together much cheaper than what Amazon is offering. But my manager jumped at the chance to get rid of our home grown solution that we wouldn’t have to manage ourselves.


You won’t need Sidekiq either.


Yes, you wouldn't use Sidekiq anymore either because you would have a different ActiveJob backend, but the important part is that you're eliminating a piece of infrastructure (Redis) without the the poor scalability of DelayedJob.


Sidekiq has other useful features (like the built in dashboard)


That’s what CloudWatch is for....


CloudWatch is an awful tool to have to use


Don’t get me wrong, I’m no savage. I use structured JSON logging with Serilog and use ElasticSearch as a sink with a Kibanna for most of my app logging.

I also log to Cloudwatch though and keep those logs for maybe a week.


The post is now updated with links to the C++ and Rust runtimes, they are available now!



Awesome! I'm so excited to see this support. :D


Thanks for sharing the Ruby links.


Guess I'll go learn Cobol now.


You laugh, but depending on the actual Cobol support (e.g. whether it also supports a decent amount of CICS) it could make it easy to port a bunch of functions that are locked away on zOS.


I was actually serious.


That might be really smart. Cobol is still in wide use, and somebody needs to replace the aging Cobol community. I bet salaries will sky rocket


They already have. You're entering a market, as you say, where every contemporary is a super senior developer, for old, rich companies like banks, and with demand staying static...but supply dropping.


I've read this multiple times, and I'm sure it's an intuive argument. Is it actually true though?


Yep, I am aware of projects currently looking for COBOL devs in Germany.


I am aware of projects which look for developer using any language. That doesn't make COBOL special though ^^


Several related lambda announcements to around structure/reuse as well:

1. Lambda Layers - Reusable components that can be shared across lambda functions (covered in the linked article)

2. AWS (Serverless) Toolkits for PyCharm, IntelliJ & VS Code - https://aws.amazon.com/blogs/aws/new-aws-toolkits-for-pychar...

3. Nested Applications Using the AWS Serverless Application Repository - https://aws.amazon.com/about-aws/whats-new/2018/11/sam-suppo...


Has anyone seen discussion of the impact of serverless on programming-language design?

It relaxes constraints that have historically restricted the shape of viable languages. With massively-parallel deterministic compilation (like Stanford's gg - which just got simpler to implement). Parallel distributed incremental shared type checking, like Facebook's Hack. Language-community-wide sharing of compilation artifacts (sort-of like build images, or typescript's type definition repos, or Coq's proof repo).

"That would be a nice language feature, but we don't know how to compile it efficiently, so you can't have it" has been a crippling refrain for decades. "[D]on't know how to either parallelize or centrally cache it" is a much lower bar. At least for open source.

This involves not just compiler tech in isolation, but also community organization for fine-grain code sharing. Any suggestions on things to look at?


Way to go, Ruby support!!!! I am irrationally excited about this, if I wanted to do Serverless Ruby up until now, my nearest options were (some community-supported thing with "traveling ruby" on AWS, or...) Kubeless, or Project Riff

https://www.serverless-ruby.org/

We've been waiting! I thought this would never happen. Eating major crow today, as I've assumed for a long time that AWS's lack of Ruby support in Lambda was an intentional omission and sign that they are not our friends.

I will change my tone starting today!

(edit: Or Google Cloud Functions, or probably some other major ones I've missed...)


You're missing Ruby on Jets. http://rubyonjets.com/

It's pretty comprehensive. And it was able to cache the execution context (and keep the cache warm) to approach native execution times. All of this obviously before native Ruby support was announced.


Awesome! Thank you


Happy to say that Jets is now on the official AWS Ruby runtime: https://blog.boltops.com/2018/12/12/official-aws-ruby-suppor...


I had a play with it this evening implementing a basic webhook handler and it’s super smooth - I hooked a Ruby Lambda function up to API Gateway and everything just works. I suspect you could very easily create some sort of Rack shim on top as well, effectively giving near free, infinitely scalable, Ruby app servers (assuming you can keep the start time down).


I just did the same thing, but things fall apart when you need to use gems w/ native extension (mysql). I'll need to investigate the native runtime route.


That's interesting... gems w/ native extension, I'm not sure if that was on my test case list, but it should be. Thx...


It turns out one of the samples from AWS is that shim, and it’s super elegant - at least in theory I think you could deploy almost any Ruby web app to Lambda with minimal effort.


You're not the only one. I have a co-worker that had inside information who said he believed it was part of a feud between Ruby and python, since there are a lot of python devs working on aws. I didn't believe him for a while, but was sort of starting too since it is clear that there is demand for Ruby on lambda


So Python users are actively trying to kill Ruby? Stopping their Employers from supporting Ruby?

Not the first time I heard of it. But actively trying to kill it is a whole different level.


OK, I am skeptical of this too, but it's a thing.

10 minutes before the announcement that OP is about here, one of our architects copied this line from an AWS slide deck:

> AWS Lambda supports code written in Node.js (JavaScript), Python, Java (Java 8 compatible), and C# (.NET Core) and Go.

... and intimated that, if your language is not on this list, you're not part of the dominant side of the industry.

As if to say, to the large cohort of Ruby developers we have on staff, "You guys are basically irrelevant because Amazon doesn't care if you can work on the pride and joy of their platform or not."

Every other week I've got a different person telling me that Ruby is a quirky niche language and I should get with the times and become a full-stack JavaScript developer instead.

I get it, there's growth in other languages, but if I have years of experience that means I should throw it away?

Sometimes these conversations get really personal, too. Almost like we're choosing a football team, you'd think it was politics or religion that we're debating...


So basically Ruby is now being treated like PHP.


Hah.. bazinga!

I've learned a lot since I started writing Ruby at my first permanent job in 2010, and I'm 100% sure that if I looked at any of the early Ruby code I wrote in 2010, I'd grumble and groan at it just as hard as if you showed me some PHP code that I wrote at my first co-op job as a sophomore in college.

You can write great code in any language. You can also write a shanty-town or big ball of mud, in practically any language. I think it's unfair to say that PHP in and of itself represents a "bad code smell." But I'm also fairly sure from anecdotal evidence that I'm in the minority.

I understand completely now. Thanks for that.


I'm personally very hyped for using Haskell natively on Lambda! In the keynote he mentions the partner sponsored runtimes, and actually said "Scala and Haskell, you'll just have to bring your own!" (as in, community effort).


May be the only one(?) on here who is hyped for using CFML natively on Lambda. Now thanks to the work of a member of our community we have a proof of concept in Fuseless http://www.fuseless.org as well as the open source CFML http://www.lucee.org folks hinting at a Lambda project. Only six months ago people told me it was never going to happen but I now see that changing.


Yes! We are definitely looking forward to championing awesome projects to build languages for the Runtime API. Take a look at the Rust and C++ examples as they show you a bit of how it all works. (Chris from AWS Serverless team)


There has never been anything stopping you from running binaries compiled from any language (including Haskell) on Lambdas. They run Linux after all. See serverless-haskell (link at bottom) for defining serverless Haskell functions.

serverless-haskell also takes care of packaging shared objects depended upon alongside the binary. You will have to package any binary dependencies of your Haskell code yourself.

The way it works is by generating a Node script which invokes the compiled binary with the payload as an environment variable, although this is encapsulated by an interface which you implement in the Haskell code.

One thing to make sure is that your libraries and Haskell code are compiled on a system compatible with the Lambda execution environment. Documentation going further into ELF compatibility is available, but I can’t remember where. I remember using stack to have my Haskell code compile on an Ubuntu docker image which was compatible.

https://github.com/seek-oss/serverless-haskell


Same here, Haskell in Lambda is where we want to be.


This would be a real game changer for me, from an internal marketing perspective and for personal use.


Internal marketing for me as well! Will help persuade my colleagues to give Haskell a try.


Scala works just fine in the java runtime for us


I haven’t personally played with scala yet, but do you need to add any additional runtime components for it to work? Or perhaps scalaz? Or is it just making a .jar and you are good to go? :)


I've simply used sbt assembly to create a far jar. It includes all of the needed dependencies.


Ditto Kotlin.


For the uninitiated, can you please provide some example use cases where you would use Lambda for Haskell for great good?


Honestly, for everything we are currently running on Lambda. At the moment we are using the Node.js runtime with Typescript.

We use both Lambda for data processing on incoming IoT events, and also for API interactions with our user, along with services to send mails and SMS, etc.

---

I would just want to use the robustness and correctness that I'm able to ensure with Haskell, compare to TypeScript, and especially bundling quirks with Webpack and NPM.

An example of an easy thing to let through is missing an _await_ on a call (might be for logging or something else), which means it'll go out-of-band, and can then easily not be run during the execution, if you have disabled waiting for the event loop to be done (which is necessary in many cases for global setup optimizations).

The call might then finish when the lambda function is thawed, but it might also never run, because the function gets collected.

Now, admittedly, this is not a big issue in any way, but it's the death by a thousand papercuts that I'd like to avoid.


Haskell (and many functional languages) excel at tasks where data is being transformed and/or can be modeled that way. Most of our lambda functions do exactly that.


Layers sounds like a great solution to sharing code/libraries. If anyone at AWS is here, will there be a way to pull them down for local testing? At the minute it's trivial because you're forced to pull all your dependencies into a single place you can zip them, and you can test them at that point - but will you still have to do that if you want to test locally with layers?


seems you can get the link to the layer by calling GetLayerVersion: https://docs.aws.amazon.com/lambda/latest/dg/API_GetLayerVer...


I'm not sure why this was downvoted/dead, but thankyou - it looks like this would be enough since it gives a link to download the layer and a hash to check it against.


hi! yes there will be support in SAM CLI that when you do local testing referring against a Layer it will pull it down for you. - Chris, Serverless @ AWS


Hi Chris -- it would be interesting to compare notes on Cloud Native Buildpacks, which seems to have overlapping mission with Lambda Layers. Could you come find us at https://slack.buildpacks.io ?


Thanks Chris! Sounds like it's probably time I learn the SAM CLI :)


Awesome. Our "Distributed Monolith" problems are now solved.

We have so many Lambdas that share common Java JAR libs.

Lambda Layers appears to solve our reuse and deployment headaches.


Lambda Layers fixes so many problems!

* We had to hack around Lambda zip size limits. Now we can deploy fat dependencies to Layers and ditch the hacks.

* We can drastically speed up build and packaging time locally by keeping fat dependencies in Layers.

* We use a shared in-house library to wrap all code that lives in a Lambda. Updating the version of this required us to deploy every single Lambda that used it. Now it can live in one place.

* We can eliminate repeated deployment pain for Python code with C dependencies by deploying the manylinux wheel to Layers. Now devs can go back to just packaging up pure Python and not worry about cross-platform problems.

And probably loads more I'm not thinking of.


Lambda zip size limits are still strongly in force. All your layers and your lambda code still much be < 50MB zipped / 250MB unzipped total.


Are you sure? I've been seeing conflicting information regarding that, but can't find the authoritative answer from Amazon


spullara is right, I am wrong -

> The overall, uncompressed size of function and layers is subject to the usual unzipped deployment package size limit.

https://aws.amazon.com/blogs/aws/new-for-aws-lambda-use-any-...


Only uncompressed size (250MB) really matters. You could always work around the compressed limits (50MB) by uploading to S3 first, instead of deploying the ZIP directly.


We've been packaging pandas in a lambda which is used to perform some calculations, but being a 50 MB zip file makes cold starts of about 6-8 secs. We're lucky that the service has little use, thus our way to workaround it is by having a lambda warmer which is run every 5 minutes and invokes N pandas lambdas. I'd be very interested in knowing if Layers has some feature to avoid this kind of issue.


We had the same cold start problem and couldn’t find a way to reliably keep things warm. For instance, Lambda would often spawn a new instance rather than re-use the warm one.

In the end, we came to the conclusion that Amazon is smart and won’t let you hack together the equivalent of a cheaper EC2.


I don't think it's deliberately so, just that developing a solution requires scheduling and routing to cooperate. Normally they're considered by separate systems. As your execution pool expands, this problem becomes worse, not better.

On the other hand, their incentive to solve the problem is relatively weak vs an on-premise alternative.


If I were doing this today, I would prototype the problem in Python and after realising the startup penalty, would rewrite it in D's Mir [1] or Nim's ArrayMancer [2].

Life on a lambda is too short to pay 6-8 second startup penalty over and over millions of time.

[1]: https://github.com/libmir/mir-algorithm [2]: https://mratsim.github.io/Arraymancer/


Our problem is that we have a team of data scientists who are familiar with Python, plus a decent set of custom tools written in it, so changing languages isn't an option


that's often the current explanation for continued use of Pyhton and R.

Often it is a sign that the problem is not "big" enough (eg: not crunching truly large data sets) OR data science team gets disproportionate amount of goodwill (thus money) to spend on its foibles. :)


How did you get the zip down to 50MB. I was under the impression that pandas+numpy was closer to 300MB and bumped up against AWS size limits. I was considering building some hacked together thing with S3

I came to this thread specifically to find out about numpy and pandas on lambda.


We've been running a stripped down version of numpy + scipy + matplotlib in lambda. We'd build the environment in a docker container with Amazon linux, manually remove unneeded shared objects and then rezip the local environment before uploading to s3.

A similar method is described here: https://serverlesscode.com/post/deploy-scikitlearn-on-lamba/

Layers should make this entire process easier.


When I worked on this I used this article as a reference: https://serverless.com/blog/serverless-python-packaging/ and also ended up with a huge image. What that article didn't mention is that the virtual environment folder should be excluded in the serverless config file, as the runtime is provided by boto3. So adding:

package: exclude: - venv/

would reduce the size considerably (to 50 MB in my case)


Why though? Is it cheaper than just running a bunch of servers?


It is in our case. This is a service which is very seasonal, so it may be used during a couple of days each month only. Having a bunch of instances mostly idle would definitely be more expensive


How much delay from a cold start can your application tolerate? On the order of tenths of a second or up to one second?


Being that the data is queried from a web app through HTTP, the shorter the better. Around 1 sec should be alright, but 6 - 8 definitely isn't


Note that it's long been possible to use any language with Lambda through a shim. In the early days that was using Node.js to call out to your binary. That meant you had an extra bit of startup cost (Node + your binary). Once Go was supported that no longer mattered much since Go binaries start almost instantly.

Of course an official method is nice here.


I haven't even been using a shim to run my Rust binaries. Just statically link them, and use: https://github.com/srijs/rust-aws-lambda

TBH, I'm very happy to see native support, but it was already super, super easy to use Rust on lambdas without a shim layer.


I was really hoping they would increase the deployment package size. Currently it is at 250Mb unzipped including all layers.


Hi! This is super common feedback and something the team is definitely thinking about! What would you want to see it increased to? (Chris Munns from Serverless @ AWS)


This would be amazing! At lot of ML use cases are largely unfeasible in lambda on python without serious pruning. Latest version of tensorflow is 150mb uncompressed, add numpy pandas etc to that and it adds up fast. I think 1 GB uncompressed would be pretty reasonable in the current state of ML tools, personally.


Roger that! Thanks!


As a thought, could Lambda (perhaps in cooperation with AWS Sagemaker?) offer a Lambda execution environment atop the AWS Deep Learning AMI? This would solve a lot of problems for a lot of people

https://aws.amazon.com/machine-learning/amis/


Is there any plan to add more disk space, or a way to fan out jobs? We use a lambda that does work on video files, we have to limit how many run concurrently (3) to prevent running out of disk space right now. Edit - or ability to attach volumes like Azure's lambdas.


Can't wait for this too, it seems kind of old limitation not suitable for layers at all. But on another hand - lambda should be small and fast if something out of limits is needed then fargate or ecs should be used.

That said I I hope they increase the limit to at least 500 Mb sooner than next reInvent.


Any thoughts on what it should be raised to?


1GB ... I mean, take a J2EE service including the runtime and see what that package winds up at. That's about as big as it will likely need to get.


What does a test suite look like for an application structured using lambda functions?


Noob question but is it possible/advisable to somehow (re)use prepared statements in Lambda?

"Prepared statements only last for the duration of the current database session. When the session ends, the prepared statement is forgotten, so it must be recreated before being used again. This also means that a single prepared statement cannot be used by multiple simultaneous database clients..."[1]

1. https://www.postgresql.org/docs/current/sql-prepare.html


I don't see why not. Presumably you would connect to the database and prepare the statement when the Lambda function starts up and execute the prepared statement from the per-request handler invoked.


Stored procedures would be one way.


Now, if I can also link the MySQL C++ connector libraries, I could run some of my code "natively" in Lambda.

I have used C++ in Lambda before, it is quite cumbersome and it still has the performance hit of using Node.JS.


Rust was kinda nice even before this, but cutting out the Go or Nodejs indirection certainly helps.


You certainly can link MySQL connector libraries. Or any library for that matter that you normally link in a C++ application. The lambda-cpp-runtime does not limit you.

Check out some of the examples in the GitHub repository.


Checking the examples is the first thing I did. No example uses MySQL connector.

There are only example CMake calls to find_library, that will or will not find this library based on things that are not documented.

AWS Lambda still uses Amazon Linux AMI as the base for running all code. MySQL C++ connector library is not available to be easily installed by yum for this particular OS.

In particular, the library is distributed as an RPM, from Oracle website, which requires you to be logged in to download it, and then manually installed.

Also, an appropriate CMake find script has to be available to CMake.

Therefore, this is still a valid question, and AWS support needs to clarify.

For the previous Lambda version, I had to add the mysqlcppconn.so file to the uploaded zip and use something like this to ensure the executable runs:

"/lib64/ld-linux-x86-64.so.2 --library-path "+process.cwd() +"/node_modules/.bin/lib "+process.cwd() +"/node_modules/.bin/helloworld "


So will this allow me to run powershell 5.0?

I have O365 scripts i need to run, but i only see support for PS6+


PM for PowerShell here, no ETAs right now, but we're working closely with some O365 teams (starting with Exchange Online) to bring their modules up to compatibility with PS 6.x.


Hard to see them ever supporting PowerShell 5.0 given that it doesn't run on Linux.


I think so.


Forgive the noob question, but why is it necessary to have a custom runtime when using a binary from a compiled language? It seems to me that golang support should just mean binary support, and then c++ and rust would be able to comply already, no?


As of yesterday, lambda only provided a way to specify a function handler. The input to that function, and the response value from that function, needs to be deserialized and serialized (and implicitly, construct a runtime in which the concept of a function exists). Previously, a runtime for each supported language handled deserializing the wire protocol into a runtime object, invoking the specified function, routing stdout and stderr to cloudwatch, serializing the function's return value, and transforming runtime exceptions into response exceptions.

The lambda team is now letting you write that runtime, and presumably providing more documentation on the responsibilities of the runtime.

Check out the example C++ and Rust runtimes to understand why each language needed to have it's own custom runtime.

https://github.com/awslabs/aws-lambda-cpp

https://github.com/awslabs/aws-lambda-rust-runtime


You could do that, it's true, and folks did that with node.js to shim Go before Lambda had official Go support. But being able to just create/use/share a custom runtime inside of an organization will be easier for them overtime vs. the shim method. - Chris from Serverless@AWS.


tl;dr; Cold starts/performance.

The way Lambda manages to achieve the performance it does is because it bootstraps the application and then runs each new request through the loaded application.

This means Lambda needs some way to communicate with an already running service.

Initially, Lambda had language specific runtimes that took care of this (i.e. with Node they have their own bootstrapper that loads your top-level JS file and then calls the defined method each time it receives a new event).

With the release of the Go runtime, they built a framework that gets compiled into your service that runs a simplified HTTP server that Lambda then submits events to.

This latest generalised version eschews an embedded HTTP server for letting your app do something like a long-poll to a local event RPC source in the Lambda container. Basically, your app boots and attempts to pull a job off the queue, if there's a job, your Lambda runs, if there isn't, your Lambda service gets paused until there's something to send it.


The go runtime can execute any binary, but the lamda is controlled via go rpc, which uses go's custom binary serialization. I couldn't find any implementations of gobs in any other language when the go runtime was announced.

I hope this runtime API is a simplification of the go rpc api


We had to abandon Lambda due to cold starts. Any news if that's resolved?


Cold start performance is improving significantly as the Lambda fleets migrate to Firecracker technology: https://aws.amazon.com/blogs/aws/firecracker-lightweight-vir...


Anyone know if there have been any improvements to cold start times for Lambdas in a VPC? That was the absolute death knell for us. If you're using Lambdas as a service backend for mobile/web apps, it's extremely common those Lambdas will be talking to a DB, and any decent security practice would require that DB to be in a VPC. Cold starts for Lambdas in a VPC could be on the order of 8-10 seconds: https://medium.freecodecamp.org/lambda-vpc-cold-starts-a-lat...


I just got out of a session at re:invent where they covered that they were working on improving VPC cold start times by reducing the amount of time it takes to associate an ENI with your lambda function. The method they're using also reduces the number of IPs needed to 1 per subnet.


We recently had to abandon Lambdas, 10+ second cold start, and for some reason when adding an API Gateway you get another +200ms on each request (Google it, common issue apparently).

So, 10+ seconds cold start, and 200 + 200-300ms (around 500-600ms avg) calls to the Lambda function. Complete garbage for our application at least (I imagine using it for background processing might not be an issue with latency).

Switched over to EC2, less than 200ms response total, no cold starts.


They're working on it, coming in 2019 (announced today) [1]

[1] https://twitter.com/jeremy_daly/status/1068272580556087296


Agreed! I'm much more concerned with VPC performance - I don't have a single lambda outside of a VPC. Firecracker is extremely cool, and I'm very glad to see the improved perf at the VM level, but that's not my bottleneck.

Thankfully, in my case, I have a very steady flow of data so I don't expect too many cold starts.


I know that they are actively looking at it.

One thing though, does your lambdas need both public and private access? Else you can place them in a subnet for private only, since the slow part is the ENI for the Nat Gateway.


They all need to access S3, which I believe requires public.


Cold starts for the VM are only part of the problem. If you're on a JITed runtime, a cold start also means compilation and optimization. It would be nice if they had ways to cache the JITed machine code so they could start back up with it already compiled and optimized.


Modern lambdas can be paused for 4+ hours before experiencing cold starts (depending on a lot of variables).

AWS is making continual improvements in this area.


I was really hoping they would announce pre-warming for all Lambdas. :(


You can generally resolve it yourself by poking seldom used functions to keep them hot. But no, they haven’t provided a solution to cold start (unless you consider ec2 or fargate a solution).


> You can generally resolve it yourself by poking seldom used functions to keep them hot.

We've tried this and it helps somewhat but when AWS attempts to scale your function based on load, cold starts re-appear. We've moved away from Lambdas where a dependable response time is required.


Can you explain more about why this is?

If you are experiencing cold starts it means that function is not used very often. If it's not used very often that likely means it's not user facing (or something less important like a Terms of Service page). If that's the case, why do you need instant response times?


No, that's not what it means. If you have high concurrent execution, you get 'cold start' every time the underlying service 'scales out' to support more. The MORE you use lambda concurrently, the more you hit the cold start issue. Granted, it's just for that one cold start execution per-scale node (and they could probably just optionally pre-fire to warm things in that instance, like with a cache), but it's definitely there horizontally.


I really with they would add an init() callback that is called on cold start but before any traffic is sent to your lambda. It wouldn't help when there are no lambdas running but it could be useful when things are scaling up, especially if you can ask for additional concurrency above the actual concurrency necessary for spikes.


More lifecycle events please! I'd love a onThawed and onFrozen or something so I can kill the DB connections neatly


You can already do this, I think. Just put this logic in your application's static initialization.


I don't think so. When it spins it up, the request is already in flight. Otherwise this would have been solved by everyone but instead everyone sees terrible cold start times.


Also, containers are restarted periodically (~4 hours) so even if you have very steady traffic you'll see cold starts regularly.


This is along the lines of what the other responses to this comment have said, but https://hackernoon.com/im-afraid-you-re-thinking-about-aws-l... gives a very detailed overview. It's titled "I'm afraid you’re thinking about AWS Lambda cold starts all wrong", because the way you're thinking about cold start times is common (and wrong).


that’s not entirely true. while your warm lambdas can and will take the traffic it your traffic ramps up, additional lambda instances will be spun up. you will pay cold start prices as they are spinning up. so, even if you have a heavily used lambda fn, depending on the traffic your p99 will still look pretty bad and you will not be able to guarantee that all requests will be processes in x ms or less.


https://github.com/awslabs/aws-lambda-cpp makes claims of millisecond cold start.


That's not for in-VPC functions, although if the underlying instance changes with firecracker migration users might see ENI start improvements. Currently your ENI usage is roughly floor(3GB/Memory Allocation)*(Concurrent Executions). If the 3GB changes users will see huge gains as each ENI creation can take around 9s.


Why would you use VPC functions?


Mostly due to being forced. For example: serverless aurora forces a VPC on you.

With delays as bad as 8+ seconds for ENI attachment, this is a significant holdback (https://medium.freecodecamp.org/lambda-vpc-cold-starts-a-lat...)


I believe you can now use Aurora Serverless outside of a VPC by using the new Data API endpoint.


You indeed can, but it's beta (us-east-1 only at the moment) and leaves a lot to be desired (https://www.jeremydaly.com/aurora-serverless-data-api-a-firs...)


Ah okay, I never thought about using an SQL DB.


True. In-VPC has extra costs.


I'm wondering how that's even possible if it includes the time for downloading your code from S3. I.e. normal cold starts (as I understand it) involve fetching the code from S3 to install on a VM. Perhaps they aren't including that time when claiming single milli cold start times?


As far as I know they aren't an issue of AppSync, the managed GraphQL service of AWS.

You can basically replace all HTTP-API Lambdas with it, because it allows direct access to DynamoDB.


Out of curiosity: What is the response time on cold starts?


In VPC probably around 10s if there are no ENIs available



Anyone come up with solutions for Lambda functions to effectively use database connection pools without the use of a dedicated server?


Can't wait for that. Although I like DynamoDB, I'd love to connect to a Postgres RDS deployment from Lambda.


Agreed, this really is the last thing holding me back from going all out on Lambda.


Jeremy Daly has created the mysql-serverless[1] library that helps elliviate a lot of the issues with using MySQL within Lambda.

MySQL tends to have very low connection times, which helps a lot in this area also.

[1] https://github.com/jeremydaly/serverless-mysql


I don't think so.

Guess you're stuck with Aurora Serverless or DynamoDB.


any improvements on cold start? this is a deal breaker for me. also doesn't seem cheaper than running a $5/month DO


If your use-case is a web-API look into AppSync.

The VTL resolvers don't have the cold-start problem, AFAIK.


most serverless frameworks take care of cold starts, no? zappa by default will just keep your functions warm


I found this to be the best update on the cold start problem. https://mikhail.io/2018/08/serverless-cold-start-war/


skimmed it and seems like < 2~3s is the expectation from AWS Lambda.

So I will go with AWS Lambda. But seems like having lot of dependency increases the lag, wonder if there's a way to put the source on a diet.


I am still waiting for proper PHP support and the ability for Lambdas to use VPC to connect to RDS servers, leaving my DBs wide open is kind of annoying... they say its possible but I've had 4 engineers try and no one can get it to work.

These issues that Azure has already solved, make me wonder how much longer i will stay with AWS.


Lambda has had VPC support for over 2.5 years now: https://aws.amazon.com/blogs/aws/new-access-resources-in-a-v...


yes it has VPC support but for some reason as soon as we turn mysql to not be open to the public in RDS, Lambdas can no longer connect.


it is possible to attach Lambda to VPC to connect to an RDS, as long as security group rules allow it and the VPC has dns res enabled

Source: just implemented cross acct lambda -> RDS connection using VPC


Nice! Though we've been playing with this for Azure Functions over the last few months.



How does it works ?


Layers sound really nice.


Missed opportunity to just support one runtime to rule them all: Docker containers.


I suspect cold-start performance of arbitrary Docker containers would be intolerable for most customers in light of the sizes and number of layers of many of the images seen in the wild. Most people aren't building "FROM scratch" images yet, if they ever will. Single binaries, or scripting languages with well-defined runtime environments, are far easier to meet customer-demanded SLOs when building a serverless platform.

(Disclaimer: I work for AWS, but this is strictly my personal opinion/observation.)


This is in part what Cloud Native Buildpacks is working to solve. Can you come find us (https://buildpacks.io)? I think it would be really helpful to have CNB supported in/as Layers.


Nix allows you to easily create from scratch containers.


Docker isn't a runtime, its a packaging technology. The Runtime API is a simple standard interface that will allow you to run even Docker containers if you so desired.


Docker doesn’t let you take the union of separately-authored base images together as a runtime image for your app. Lambda (apparently) does.

That’s the only major difference I can see, though. The rest is convention/standardization on what’s put into the image (an executable that speaks HTTP over its stdio; separately-namespaced projects under /opt rather than use of the global namespace; etc.)


Technically Dockerfiles don't let you do that (they used to https://github.com/moby/moby/issues/13026) but in theory it should be possible to construct a Docker image from multiple base images (e.x. https://grahamc.com/blog/nix-and-layered-docker-images)


Why wouldn't Docker layers work the same way? More work but possible by stacking base images.


I think over time Lambda and Fargate[1] will get closer and closer to each other in terms of functionality and deployment speed.

1. https://aws.amazon.com/blogs/aws/aws-fargate/


The key distinction for me is that Lambda is event driven. If you have a microservice/function which is seldomly called, in Fargate you would pay 24x7 just to be listening for calls.


Fargate tasks can now be initiated by CloudWatch Events as well[1]. Lambda will always be at the extreme end of "serverless", but over time I expect the lines to blur more especially since Fargate will now be based on Firecraker[2] which has 125ms startup times for container workloads.

1. https://aws.amazon.com/about-aws/whats-new/2018/08/aws-farga...

2. https://firecracker-microvm.github.io/


Firecracker is behind Lambda as well as Fargate, so that points toward the lines blurring too.


Yes, it's strange they didn't do this. Azure and GCP are headed that way, and there are smaller vendors doing the same thing.

Per-event invocation of a docker container instead of a specific bundle of code seems much more flexible, especially with the Firecracker tech they announced.


https://aws.amazon.com/ecs/ They don't tend to miss an opportunity.


You're aware thats an entirely different service right? We're talking about Serverless and Lambda.


Whoops, need this as well https://aws.amazon.com/fargate/


I've always found "You're aware... right?" and its close cousin "You do realize... right?" to sound pretty condescending.


They said they didn't do this for security reasons, they wanted hardware virtualisation to separate Lambdas - not container separation.


That’s what Fargate is.


Sort of. Lambda is event-driven and on-demand, Fargate is "task-based". As mentioned in the sibling comment, if you build a "ping" HTTP endpoint in both, in Lambda you will have just a function which is called when necessary, in Fargate you will need to build a mini-webserver to keep listening for requests (as well as pay just to be listening instead of pay just when running).


I like how both you and the sibling comment tried to educate me on the difference between containers and event-driven compute. I'm well aware of the difference. My point, which maybe I didn't make so clear since both of you missed it, was that you can make fargate spin up a container for an event, but it would be too slow for any useful event driven computing.

If you're gonna use containers, use containers. If you want someone else to manage the container for you, use Lambda.


Fargate is a great service, but it's not a serverless solution not by any mainstream definition of the term (although Amazon seems to be intent on stretching the definition lately).

Create a simple ping/pong HTTP service in both and you'll quickly see difference in everything from billing to startup time.


I like how both you and the sibling comment tried to educate me on the difference between containers and event-driven compute. I'm well aware of the difference. My point, which maybe I didn't make so clear since both of you missed it, was that you can make fargate spin up a container for an event, but it would be too slow for any useful event driven computing.

If you're gonna use containers, use containers. If you want someone else to manage the container for you, use Lambda.


In one moment you're saying "Thats what Fargate is" as if Fargate == Running Serverless Containers.

On the other you're saying "you can make fargate spin up a container for an event, but it would be too slow for any useful event driven computing", which is the truth, but it contradicts your initial comment.




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: