Serverless Performance: Cloudflare Workers, Lambda and LambdaEdge

jhgg · on July 3, 2018

For what it's worth at work (Discord) we serve our marketing page/app/etc... entirely from the edge using Cloudflare workers. We also have a non-trivial amount of logic that runs on the edge, such as locale detection to serve the correct pre-rendered sites, and more powerful developer only features, like "build overrides" that let us override the build artifacts served to the browser on a per-client basis.

This is really useful to test new features before they land in master and are deployed out - we actually have tooling that lets you specify a branch you want to try out, and you can quickly have your client pull built assets from that branch - and even share a link that someone can click on to test out their client on a certain build. Just the other week, we shipped some non-obvious bug, and I was able to bisect the troubled commit using our build override stuff.

The worker stuff is completely integrated into our CI pipeline, and on a usual day, we're pushing a bunch of worker updates to our different release channels. The backing-store for all assets and pre-rendered content sits in a multi-regional GCS bucket that the worker signs + proxies + caches requests to.

We build the worker's JS using webpack, and our CI toolchain snapshots each worker deploy, and allows us to roll back the worker to any point in time with ease.

I wrote probably the stupidest worker script too, that streams a clock to your browser using chunked requests: https://jake.lol/clock (no javascript required).

kentonv · on July 3, 2018

At Cloudflare we're all in awe at how fast https://discordapp.com loads... nice work.

jhgg · on July 3, 2018

Thanks! And thanks for building Cloudflare workers. You have no idea how much nicer it is to write and express a lot of this logic in modern javascript, rather than nginx configs and VCLs.

adreamingsoul · on July 3, 2018

I haven't dug too deep into this, but when loading view-source:https://jake.lol/clock I'm seeing on average 14 second load times. Does this happen for anyone else?

JohnDotAwesome · on July 5, 2018

That seems right. Should probably be longer. Each second, a new chunk is emitted with the appropriate html to hide the old clock element and show the new.

jedberg · on July 2, 2018

The value of any function-as-a-service is the ecosystem within which it sits. Pretty much all of them are the same: upload your code, we will run it.

The value comes from 1) What can trigger that code to run and 2) What services that code can interact with.

And on those two points, AWS still wins hands down. They have by far the most possible triggers for Lambda, and they have by far the most services that Lambda can interact with.

It's cool that Cloudflare built something faster, but unless you're running in a vacuum, speed is the least of your concerns.

zackbloom · on July 2, 2018

Yes, that's how Amazon creates lock-in. But it depends what you're doing with it right? If you are looking to run code based on a SQS event, yes you have to use a Lambda. If you are looking to execute code when something visits a URL you have more options.

scarface74 · on July 2, 2018

Triggering Lambdas based on SQS events is new - it was just introduced at the end of June of this year.(https://aws.amazon.com/blogs/aws/aws-lambda-adds-amazon-simp...). For the past 14 years you've had to poll for SQS messages.

If you are referring to SNS, you've always been able to send SNS messages to HTTP endpoints. (https://docs.aws.amazon.com/sns/latest/dg/SendMessageToHttp....)

jedberg · on July 2, 2018

> Yes, that's how Amazon creates lock-in.

That is the cynical way to look at it. It also creates value because it lets you do more with what you already have.

> If you are looking to execute code when something visits a URL you have more options.

Sure, but unless that code works in isolation, it probably needs at the very least access to some sort of data store.

greenail · on July 3, 2018

The lockin argument alone is a red herring. Every technology implementation creates lockin. The valid question is how hard is something to change. A good architecture balances how easy it is to change something with how optimized it is, also balancing how much it costs to build and maintain.

Realistically you can get as locked into Amazon as you want, lambda alone does not create inescapable lockin by any measure so I would argue Jeremy still has a point in the fact that tools become more useful when you can use them to do more work (ecosystem)...

1996 · on July 3, 2018

Amazon performance is generally bad when using other services.

We are rolling out a CDN, with a goal of 20 ms latency in most countries. We want more granularity that AWS - and some zones are just not well served (No AWS in Africa, incomplete offer in Brasil, etc)

Still, we figured we would use Route 53 as you can do Latency Based Routing even with non-AWS servers. Computing latency or using EDNS0 as a proxy is not rocket science, so we thought the DNS would not be a limiting point.

Oh boy, how wrong we were! After wrongly blaming the bad performance on Cloudflare caching, further tests revealed Route 53 takes as much as 0.7s to reply to some DNS queries - and even worse when fronted by Cloudflare, as for some reason the DNS TTL seems to be ignored by Cloudflare. The latency only drops down after about 4 queries, which makes me thing they have some kind of Round-Robin that does not share the DNS queries (I could be wrong)

In the article, the author says: "Most of that delay is DNS however (Route53?). Just showing the time spent waiting for a response (ignoring DNS and connection time)". No you should not ignore the DNS delays! Route53 performance is very bad - 2 full seconds for you!!

We are fortunate it did not take 2s for us. Still, having servers all over the world that reply in 20 ms is useless when the first DNS query takes 700ms.

We ended up leaving for Azure: Traffic Manager outperforms Route 53 by a factor of 2.

Eventually, we will roll our own GeoIP with DNS resolvers on a anycast subnet.

I do not understand how this level of "performance" can be tolerated. At 2 seconds for a DNS query, you are better off using the registrar free DNS service!!

mayank · on July 3, 2018

Saying Route53 takes “2 seconds” to resolve is pretty meaningless without a distribution or at least percentiles. Route53 obviously doesn’t take 2 seconds for all or most queries.

1996 · on July 3, 2018

I am quoting the author, and their analysis of the initial query. This observation from whoever wrote the article matches my own experimental results: initial queries are very very slow on Route 53 LBR. A distribution of queries is useless and misleading, as later queries are cached if you have a sufficient TTL - so only the first few really matters in the performance results.

Later queries are fast of course, as the results are cached (TTL).

Even if the DNS is very pooly configured, all queries after the first one will benefit from the cache!! So the first few queries matter much more, and this is what we should be talking about instead of distributions and percentiless.

Said differently: If each of your visitor has to way a second or two until the site comes up the first time, then the site works normally, it may still give them a bad impression.

I measured the DNS delay on first Route53 reply to be over 700 ms personally. For the author it is 2000 ms. These results are in the same order of magnitude, and make Route53 unsuitable for many applications. Of coutse, you could start hacking, like keeping Amazon cache warm by issuing queries through chron, or by setting extremely long TTLs and hoping your visitors DSL modem will keep your A records in cache as long as you asked for - but these are just hacks trying to compensate that the first DNS query takes SECONDS to process.

Route53 LBR DNS is not as a "slow and requiring hacks". It's supposed to be fast, simple to run, and to ingrate with different ecosystems. To me, it seems to be none of that.

After assessing Route53 as fubar, I switched from AWS to Azure: TrafficManager offers the same features, and the first request takes less than 350ms. There must still be some cruft in there, but at least it is manageable.

kentonv · on July 2, 2018

As the architect of Workers I was obviously pretty happy with Zack's results in general. But, I'm not happy with the tail latency (99th percentile), even if it beats the competition. I suspect this has to do with GC pauses. The solution may be to proactively run GC in a background thread between requests. For high-traffic workers that are always processing requests, we could load multiple instances of the worker and alternate between them.

BTW, if you're into modern C++ and this kind of work interests you, please e-mail me at kenton at cloudflare.com. We're hiring!

1996 · on July 3, 2018

You are from Cloudflare? Could you tell me why the replies to geoip/latency based routing CNAMEs do not seem to be cached by Cloudflare?

The setup is: domain.com -> geoiplbr.domain.com with cloudflare caching enabled. Nothing else that is fancy and could cause delays.

If I measure the TTFB for domain.com, I see a large DNS delay until about the 4th consecutive query - and then the DNS is no longer the limiting factor.

The same measures on geoiplbr.domain.com normalize after the 2nd query.

It seems to me you have some kind of Roud Robin going on that does not share the DNS results.

Or maybe the caching is not done at the POP level?

kentonv · on July 3, 2018

Sorry, I work on Workers, not DNS, so I honestly don't know.

skunkworker · on July 3, 2018

This seems to disregard some of the other factors that make Lambda > Cloudflare Workers. We run binaries on our lambda instance with a go-based function, since Lambda allows for up to 250mb of binaries, 3gb ram and 30s max, this allows us to perform computationally and ram heavy applications without worrying about our instance being killed off.

Also I looked into using cloudflare workers to write my own custom edge cdn but they currently don't allow you to change where in the call requests are processed or telling cloudflare what to cache vs not cache. If they could have some functionality that would allow you to easily write your own multi layered CDN this would be interesting.

zackbloom · on July 3, 2018

It’s worth pointing out Amazon charges more than 10x a Worker on a per execution basis to use it as you describe for just 100ms of compute. If you’re actually using 30s it’s probably very expensive indeed.

The statements in the second paragraph are fortunately incorrect. With the exception of some security features Workers totally takes over the incoming request. It can use flags in its subrequests to configure the cache as you need, and will soon have access to the raw Cache API.

skunkworker · on July 4, 2018

Interesting, I'm glad to know that raw access to the Cache API is being added, when I contacted Cloudflare about this a number of months ago at the time they didn't support this. For my edge CDN needs I will reevaluate cloudflare workers soon.

On the first paragraph we have shifted some computationally heavy and horizontally restricted functions from our own servers to Lambda, this allows us to instantly scale to meet our non-consistent demand. With the lambda workers we are using we are averaging 5 to 11s of execution time with approximately 800mb of memory and utilize the cpu heavily. If Cloudflare workers ever expanded to allow for a similar scope I would definitely take a second look at it.

poulpi · on July 2, 2018

Have you think about including Golang based Lambda function in your benchmark?

As you're guessing that Cloudfare superior JS runtime plays a big role, it could be interesting to see if it can compete against Golang Lambda as well.

kentonv · on July 2, 2018

A lot of our performance benefit comes from lighter-weight sandboxing using V8 instead of containers, which makes it feasible for us to run Workers across more machines and more locations. It wouldn't surprise me too much if a Worker written in JS can out-perform a Lambda written in Go, as long as the logic is fairly simple. But I agree we should actually run some tests and put up numbers... :)

On another note, currently we only support JavaScript, but we're putting the finishing touches on WebAssembly support, which would let you run Go on Workers... stay tuned.

Matthias247 · on July 3, 2018

Just curious: The article mentions V8 isolates. Do you actually also run all IO of the worker in the same isolate? Or in a different one, and the API calls are bridged (via some webworker-like API)?

I guess one of the main challenges is that all resources are properly released when a worker is shut down. Releasing memory sounds pretty easy, if V8 does it for you. But releasing all IO resources might be a bit harder, especially if they are shared between isolates.

kentonv · on July 3, 2018

The Workers runtime itself is implemented in (modern) C++, not JavaScript. So, there's no need for a separate isolate -- API objects are implemented directly in C++.

In C++, memory and I/O resources are both managed through RAII. Of course, when binding to JavaScript, we often end up at the mercy of the JavaScript GC to let us know when an object is no longer reachable from JS, and the GC makes no promises as to how promptly it will notice this (maybe never). That's fine for memory (it amortizes out) but not for I/O resources. So we're back at the original problem.

Luckily, in the CF Workers environment, it turns out that all I/O objects are request-scoped. So, once a request/response completes, we can proactively release all I/O object handles bound into JS during that request/response. If JS is still holding on to those handles and calls them later, it gets an exception.

Matthias247 · on July 3, 2018

Thanks for the explanation!

Yes, I guess a part of my question was whether the destructors/finalizers that the JS object bindings in C++ might impose are called fast enough to guarantee isolation and prevent resource leakage. Looks like in your case that happens through the request scoping.

zackbloom · on July 2, 2018

I have a test which dives into the crypto performance, which seems to be largely driven by the amount of CPU allocated to the process (both Workers and Lambda is ultimately just calling a C crypto implementation). I'll have a longer post about it shortly, but the summary is a 128MB Lambda is around 8x slower than a Worker in pure CPU.

iamleppert · on July 3, 2018

This really isn't a very good benchmark. It's basically only validating the Cloudflare edge network, but the test itself is far from real-world. A service that returns the current time is not doing anything practical and borders on meaningless.

zackbloom · on July 3, 2018

We have a post which compares CPU intensive workloads which should be ready after the American holiday. The summary is a 128MB lambda provides you with roughly 1/8 of a CPU core, which is therefore 8x slower than a worker.

mrkurt · on July 2, 2018

Comparing Workers to Lambda proper seems silly. Lambda lets you connect to DBs, use a lot more than 128MB of memory, etc, etc, etc.

Comparing them to Lambda@Edge makes sense, but Lambda@Edge is not a very good product.

(Full disclosure: my company competes with Cloud Flare Workers).

BillinghamJ · on July 2, 2018

Could you expand on your opinion of Lambda@Edge? For my company's needs, it has worked superbly.

mrkurt · on July 3, 2018

Sure! I think it's fine infrastructure-wise, but the dev experience is awful. I like tools you can build, test and run locally, for example.

The fundamental problem I run into with Lambda@Edge is just that their request stages aren't a great abstraction (OpenResty/nginx has a similar problem). It really limits what kinds of problems you can solve.

kentonv · on July 3, 2018

> The fundamental problem I run into with Lambda@Edge is just that their request stages aren't a great abstraction (OpenResty/nginx has a similar problem). It really limits what kinds of problems you can solve.

Yes! I completely agree. Interesting that we both ended up with the Service Workers API instead. I'm really hoping that Service Workers becomes the standard for JavaScript HTTP handling in the future.

zackbloom · on July 3, 2018

If you happen to give Workers a try we would love to hear what you think -> zack [at] cloudflare.com

bufferoverflow · on July 2, 2018

Does Cloudflare have a free DB of sorts, like Amazon's DynamoDB? Or can I query Amazon's DynamoDB from the worker?

zackbloom · on July 2, 2018

That's something we're working on at Cloudflare. You can query any database which has an HTTP interface though right now. It looks like Dynamo has one: https://docs.aws.amazon.com/amazondynamodb/latest/developerg...

But you also have a bunch of other options like:

- https://restdb.io/

- https://firebase.google.com/docs/reference/rest/database/

- http://couchdb.apache.org/

kentonv · on July 2, 2018

Building out storage is my current focus. The challenge is that we want to build something that actually utilizes out network of 151 locations today, 1000's of locations in the future. If your application has users on Mars (or, New Zealand), you should be able to store their data at the Cloudflare location on Mars (or, New Zealand) so that they can get to it with minimal latency.

PS. If you're a storage expert and building a hyper-distributed storage system interests you, e-mail me at kenton at cloudflare. We're hiring.

ranman · on July 2, 2018

Let me know if you need help with the Mars location in the future. I can't wait for AWS to open their utopia-planitia-1 region with SpaceX or BlueOrigin.

jchrisa · on July 2, 2018

I've written a few blog posts about using FaunaDB from FaaS functions, the steps should be the same for any FaaS provider:

1) Provision a (free) connection secret from fauna.com

2) Import the FaunaDB driver (npm install faunadb)

3) Create a client object using your connection secret.

After that you are using FaunaDB, which is purely pay-as-you go, with ACID transactions, joins, indexes, etc.

It would be simple to write a tutorial like this for Cloudflare. Hello world on Azure functions: https://blog.fauna.com/azure-functions-with-serverless-node-... and on Lambda: https://blog.fauna.com/serverless-cloud-database

zackbloom · on July 2, 2018

That would be great, please do!

jbergstroem · on July 3, 2018

I'm using cloudflare workers to "polyfill" client hints if they're missing with cookie logic. With their addition of being able to mutate cache keys via edge workers I find it to be a extremely powerful way of everything from per-device image optimization (or google data saver or hidpi support) to serving different pages for the same uri based on your requirements (and storing this in cache).

smoll · on July 2, 2018

What Infrastructure as Code (IaC) options exist for Cloudflare Workers? AFAICT neither Serverless nor Terraform support it. IaC is table stakes for any new part of my tech stack, and I would prefer not to code it from scratch - unless deployment/configuration is extremely easy to automate via CLI or something...

prdonahue · on July 3, 2018

To expand on what Zack said, we're just about ready to merge in Cloudflare Workers support to our golang SDK (see https://github.com/cloudflare/cloudflare-go/pull/188).

Once this is merged, it clears the way for us adding Terraform support (as terraform-provider-cloudflare wraps cloudflare-go).

There's been lots of interest from our customers in being able to manage Workers using Terraform, so it's high on the list.

zackbloom · on July 2, 2018

There's a Terraform provider for Cloudflare [1]. There's an issue for adding support to Serverless [2], please feel free to thumbs_up.

[1] - https://github.com/terraform-providers/terraform-provider-cl...

[2] - https://github.com/serverless/serverless/issues/4948

kevan · on July 2, 2018

>To be fair, comparing my Lambda, which only runs in us-east-1 (Northern Virginia, USA), to a global service like Workers is a a little unfair.

At least you acknowledge that it's a bit silly to use a global benchmark to compare a global service with an intentionally-regionalized service.

zackbloom · on July 2, 2018

Why would you run something in a single location if you can run it everywhere for the same price though? It's not like Lambda is cheaper for being centralized.

kevan · on July 2, 2018

Isolating failure domains and complying with data residency requirements are a couple reasons. Also, global reach usually means global blast radius if you screw something up.

For the specific use case you tested workers on the edge absolutely make more sense than lambda, but I think the headline is a bit click-baity.

mayank · on July 3, 2018

If your function is IO heavy and your datastore isn’t globalized (as most aren’t, save Spanner and its ilk), you may prefer your function running in the same region as your data to minimize DB latency and transfer cost.

speeq · on July 2, 2018

I wish Cloudflare would offer some kind of key-value store with Workers, something like Google Cloud Memorystore but globally distributed in all of their PoPs - even if it's really limited like 32 MB RAM.

zackbloom · on July 2, 2018

Would you like to be able to write to it from your Worker, or only read from it? Can you tell us more about your use case?

Feel free to email zack [at] cloudflare.com directly if you like.

adreamingsoul · on July 2, 2018

Can we see the code that was used for testing Lambda and Workers?

zackbloom · on July 2, 2018

Yes! https://github.com/cloudflare/worker-performance-examples/tr...

richardowright · on July 2, 2018

Any plans to publish the results from the pbkdf2 version?

zackbloom · on July 2, 2018

Yep, later this week!

adreamingsoul · on July 3, 2018

Thanks!

mchahn · on July 2, 2018

> The functions being tested simply return the current time

Not a very interesting benchmark. This would only measure net latency and spin-up time.

zackbloom · on July 2, 2018

I have a post which should come out later in the week which dives into the performance with CPU-intensive workloads. tl;dr is that a 128MB Lambda is about 8x slower than a Worker.

mayank · on July 3, 2018

Can you also throw in some adversarial workloads? Simple proof of work using node’s builtin crypto module would be a nice benchmark for V8 isolates vs Lambda’s processes, and would go a long way convincing people that using isolates is reliable in a shared setting relative to processes/containers.

zackbloom · on July 3, 2018

Can you elaborate on this? How would doing crypto demonstrate the security of the isolate?

mayank · on July 3, 2018

Nail the CPU and ensure that other isolates on the same process/machine don’t get starved. I meant isolation more than security.

kentonv · on July 3, 2018

Different isolates can run concurrently on different threads, so pegging the CPU in one isolate doesn't block any others. Also, if a worker spends more than 50ms of CPU time on a request, we cancel it, terminating execution of the worker even if it's in an infinite loop. (Almost all non-buggy Workers we've seen in practice use more like 0ms-2ms per request. Note this is CPU time; time spent waiting for network replies doesn't count.)

squark007 · on July 2, 2018

I would love to see fly.io included in these benchmarks - their product is also very similar.

cordite · on July 3, 2018

“Half a decade of experience” doesn’t sound like much these days.

zackbloom · on July 3, 2018

That’s fair, the truth is some of the people here have been around since the beginning of the internet, so four decades might be more fair. Unless you add up everyone’s experience, then we’re in the millennia...

xstartup · on July 3, 2018

I was working on an app which serves roughly 400*10^6 API requests/day.

One cool property of our app is that it's rarely updated and mostly read.

And the goal is to achieve the lowest possible latency at the edge.

It scales beautiful, I am not sure which other architecture can help us keep this afloat with only 4 developers working on it.

So, we have a DynamoDB table which is replicated to multiple regions using DynamoDB streams and Lambda.

For us, Lambda means achieving a lot without many developers and system administrators but I understand that not all problems yield gracefully to this pattern.

It seems using Cloudflare Workers to trigger our Lambda function instead of API gateway could prove to be cheaper.

zackbloom · on July 3, 2018

Would it be possible to read the data through the Cloudflare cache? If so your data and API would be replicated not just around the world but actually within the majority of the worlds ISPs. Based on our experiences with 1.1.1.1, Cloudflare is within 20ms roundtrip of most people on earth.