AWS’s Elastic Load Balancer is a Strangler

reilly3000 · on March 5, 2020

For what its worth, ALB is a fantastic product. As a full L7 load balancer it can stand in the place of nginx, provide OIDC auth, replace API gateway especially for high volume Lambdas, and has lots of tunable logic for running diverse auto-scaled workloads. AWS is good about not breaking APIs and contracts, so the "strangler" strategy is really about accommodating existing customers. I'm not going to say Classic Load Balancers are a dog of a product or anything, its just that they pre-date VPC, and the web stack has moved on since then.

Having ALB be such a robust, turn-key product does tempt one into giving it a lot of responsibilities which definitely lends itself towards lock-in. I guess its the same with marriage: there are always tradeoffs.

I do wish it was better integrated with ECS + AppMesh + API Gateway, so it could be like a managed GetAmbassador.io. Envoy proxies are a hell of a good idea, and I think are going to be something like the React of the backend. While one can dream about a grand unified request router for the cloud, ALB continues to innovate with things like canary deployments and awesome routing rules, all well continuing to 'just work' at web scale. Is there anything else you can throw up to 50K RPS at and not have to think about it that much?

philliphaydon · on March 5, 2020

One thing is that ALB to Lambda restrictions suck.

Lambdas have a 6mb response size. Api gateway has a 10mb response size. If you go ALB to Lambda. And your response is over 1mb. Your response will fail. Because the ALB team decided to introduce a random 1mb limit, not document it as a limitation, and added it to their trouble shooting notes instead.

So if you need to return a small PDF or something from a lambda via ALB. Good luck.

jedberg · on March 5, 2020

FYI for anyone reading, the expected architecture is to return the URL to an S3 object that is your larger than 1mb response.

The reason they do this is to keep the alb running efficiently by not having to architect for random large responses.

philliphaydon · on March 5, 2020

The problem I have is, is the limit is less than the limit imposed by Lambda itself. Which is less than the limit imposed by API Gateway.

API Gateway: 10mb

Lambda: 6mb

ALB: 1mb

Yet if your target is an ec2 instance, there's no limit on the load balancer. So IMO the limit for a lambda target should be the limit imposed by Lambda itself. 6mb.

sudhirj · on March 5, 2020

While I'm not disputing the limit numbers or whatever hardship they might cause, it's worth noting that this is a basic limitation of the original Lambda model and maybe FaaS in general. The capability comes from running a giant pseudo-infinite mesh of isolated execution environments that load your code and execute on demand, while having to buffer both the request and response to make sure clients are protected from the details. This buffering means that size of the buffer will always be limited - the team managing might make the buffers bigger based on experience, but it's not a solved problem.

ALB to containers or servers is a different beast - here the entire request and response need not be buffered at all (there might still be a very small buffer, mostly negligible), so streaming responses, websockets etc become possible.

We use lambda to resize images, so we do push against these limits a bit, but it's a fair tradeoff for the advantages - no worries about CPU throttling from too many requests, no waiting for servers to start for spiky loads etc.

inopinatus · on March 5, 2020

Lambda was not designed for request/response. It’s an event driven service. Wrapping API gateway around it is an architectural blunder, and leads to folks like the GP wondering why their use case is a shitty fit.

fulafel · on March 5, 2020

Why is request/response not a fit for an event driven service in your view? A lot of request/response server apps are written using the event model.

inopinatus · on March 5, 2020

Trying to build synchronous out of asynchronous requires state machines, buffering, and overallocation of resources. It’s the enemy of scale.

The two styles are an impedance mismatch.

markonen · on March 6, 2020

There is nothing inherently asynchronous about the Lambda product, unless you’re talking about the Node.js runtime and even then that’s more about Node than about Lambda.

Each Lambda invocation gets a dedicated VM for the duration of the request. It is a great match for synchronous code.

inopinatus · on March 7, 2020

That is a mis-statement. Lambda executes functions in response to events. It is totally asynchronous with regards to its execution triggers.

Lambda does reuse VMs, so I hope you aren’t relying on containers being discarded for any integrity or security outcomes.

All the responses in this thread illustrate to me that AWS needs to put more effort into socialising how the product works. Since I was physically in the room for Lambda’s AWS internal launch this is twice disappoint because the technical messaging then was very clear and compelling.

fulafel · on March 5, 2020

Lambda is natively http based, no? Like all aws apis. It's just that it only speaks it's own json protocol, not generic web.

Also request/response is not inherently synchronous or asynchronous. It's just a protocol design pattern.

Buffering, overcommit, etc are also kust normal facts of life in both sync and async messaging.

inopinatus · on March 7, 2020

No, it it natively event driven. Don’t confuse the control plane for the operational.

Request/response is fundamentally synchronous. If you want to nitpick about other layers not blocking, that’s missing the wood for the trees.

toomuchtodo · on March 5, 2020

I would read a deep dive blog post about this.

mlthoughts2018 · on March 5, 2020

“function as a service” absolutely does need to support request/response as a primary use case.

internalthief · on March 5, 2020

It is documented, at least on this page:

https://docs.aws.amazon.com/elasticloadbalancing/latest/appl...

Right at the top of the page, underneath the header "Limits"

> The maximum size of the response JSON that the Lambda function can send is 1 MB.

philliphaydon · on March 5, 2020

Ah this must be relatively new. The only documentation I could find or AWS Support Business Support / Lambda team could give me was a link to:

https://docs.aws.amazon.com/elasticloadbalancing/latest/appl...

Which I complained about because it wasn't mentioned as lambda as a target docs. I guess they amended it.

jk563 · on March 5, 2020

git blame suggests the change was made 15 months ago, though that doesn't take it to account time to publish. Which I guess just goes to show that it can be tough to find information on the docs, despite them been relatively decent as far as docs go (in my opinion).

jcims · on March 5, 2020

Somebody has a war story

james_s_tayler · on March 5, 2020

War stories are the best part of HN.

fulafel · on March 5, 2020

That's pretty damning, lots of innocent REST messages can be more than 1MB. Can you have larger bodies in chunked encoding?

fosk · on March 5, 2020

If you need more capabilities you can also frontend your Lambdas with something like Kong[1]. ALBs and AWS API Gateway can be quite slow and constrained if performance - and a customizable feature set - are important requirements.

[1] https://github.com/Kong/kong

greyskull · on March 5, 2020

Please chime in on https://github.com/aws/aws-app-mesh-roadmap/issues/37 if you haven't already

CoffeeDregs · on March 5, 2020

Didn't know that ALBs could front Lambdas... Will be so happy to be able to be rid of APIGW...

reilly3000 · on March 5, 2020

At the volume I was running APIGW I saved 91% in cost moving to ALB.

watermelon0 · on March 5, 2020

It's a great product, but it still imposes limitations on header sizes. It has limitations on the size of the entire header, size of each header line, and size of the request line (i.e. URL length).

It's quite unfortunate, since this means that some use cases are limited to the classic ELB.

[1] https://docs.aws.amazon.com/elasticloadbalancing/latest/user...

sandGorgon · on March 5, 2020

can a Envoy be the ingress replacing nginx or traefik ? this is being debated in the k3s distribution - https://github.com/rancher/k3s/issues/817

will something like https://www.getambassador.io/reference/core/ingress-controll... , support all the usecases that a nginx ingress can ? (old certificates, spdy, etc etc etc)

reilly3000 · on March 5, 2020

https://www.envoyproxy.io/learn/front-proxy outlines the use case for an Envoy proxy being a general purpose ingress, with added routing and observability features. Honestly I can't speak competently to what Nginx and Nginx Plus/Pro are capable of these days, especially in relation to the sidecar proxy paradigm, but I know its much more than I've ever used in practice.

checker · on March 5, 2020

And so is nginx, haproxy, API gateway, and any other layer 7 load balancer/proxy.

nickserv · on March 5, 2020

Everything old is new again. Except now it's on someone else's machine and you have to pay for it...

joana035 · on March 5, 2020

And people were doing that even before aws existed

Edit: you can do that with iptables

checker · on March 5, 2020

I didn't know that it's possible with just iptables. This sounds pretty useful. Thanks!

ufmace · on March 5, 2020

I haven't had to use the Strangler technique, but I think I would disagree with this post. It is possible to use ALBs in this way, but I think it goes against the point a little. I thought the point was that it should be a little bit annoying to maintain all of the pass-through routes in the replacement codebase, to ensure that there is motivation to actually do the replacement, instead of just leaving it in place indefinitely.

Plus, leaving the pass-through in the codebase lets you do a much wider variety of things. You could run both the original and your replacement, and check the responses against each other. You could do that, or do replacements, at however tiny of a part you can imagine. You could choose which cases to handle where according to any algorithm you can dream up.

vorpalhex · on March 5, 2020

I've always called this technique "proxying", since most of the software that does this is referred to as "a proxy".

philsnow · on March 5, 2020

Small nit, doesn't this assume http elbs ? I didn't think you could slice traffic in any useful way if it's raw TCP.

amdavidson · on March 5, 2020

No, ELB can terminate TLS.

https://aws.amazon.com/elasticloadbalancing/features/?nc=sn&...

travbrack · on March 5, 2020

Right. Without reading http headers, you can only differentiate traffic by IP and/or port.

zbentley · on March 5, 2020

In general this is true. But routers/loadbalancers exist for many TCP protocols (even stateful ones) as well. gRPC is probably the most popular such protocol at present.

travbrack · on March 5, 2020

I think you mean application protocols. There's only one TCP protocol. In the example of gRPC you'd have gRPC > http/2 > TCP > IP > ethernet

Yes load balancers can make decisions on application layer protocols like gRPC, but not in mode TCP which is what the parent comment was asking about. That's because that mode only looks at the TCP header and IP headers which don't contain any information about the application protocols.

An example of a time you might use this mode is if you're terminating TLS on the web servers, so you can't read the encrypted http headers when they hit the load balancer.

_nalply · on March 5, 2020

Sounds really good. I didn't know that some people calls this a Strangler.

I have only one small caveat if your work is full of politics: The pressure to provide a replacement of the legacy system disappears and a strangler might transmute into the saving net of the legacy system.

Now you have two problems instead of one: the strangler AND the legacy system.

YawningAngel · on March 5, 2020

I think this is a good thing. Moving from having no choice but to replace an entire system to being able to work on it in incrementally makes an otherwise impossible problem tractable. If you end up only replacing the problematic parts of a legacy system and retaining the rest I think that's fine.

closeparen · on March 6, 2020

The legacy has to be really bad for an abandoned mid-flight migration to be any better. Usually the intermediate state is at least a little worse (but worth it because you eventually get through). Strangler pattern only helps bound the damage.

jerf · on March 5, 2020

All following the Strangler pattern can do is provide you a path forward where previously you may not have been able to even see a path. It can't force you to walk it.

luord · on March 6, 2020

This is true for pretty much any load balancer or even any reverse proxy, I think; but nevertheless, it's true and a good way to think about stranglers.

Thaxll · on March 5, 2020

Yes it's called A/B testing, weighted load balancing ect...

crankylinuxuser · on March 5, 2020

My biggest gripe is that the ALBs do not handle SSL decryption and re-encryption in a safe way.

It's not my opinion that it's not safe, but FedRAMP. I'll just leave that at that.