I've built and maintain quite a large serverless system, with a large team of people, and most of these issues aren't really a big deal. Cold start is well-known and also has well-known mitigations (e.g. don't use the JVM, which wasn't designed for fast startup times).
I use AWS extensively, so I can elaborate on AWS's approach to these problems. Deployment is straightforward with CloudFormation, VPCs/security groups/KMS/etc. provide well-documented security features, CloudWatch provides out-of-the-box logging and monitoring. Integration testing is definitely important, but becomes a breeze if your whole stack is modeled in CloudFormation (just deploy a testing stack...). CloudFormation also makes regionalization much easier.
The most painful part has been scaling up the complexity of the system while maintaining fault tolerance. At some point you start running into more transient failures, and recovering from them gracefully can be difficult if you haven't designed your system for them from the beginning. This means everything should be idempotent and retryable--which is surprisingly hard to get right. And there isn't an easy formula to apply here--it requires a clear understanding of the business logic and what "graceful recovery" means for your customers.
Lambda executions occasionally fail and need to retry, occasionally you'll get duplicate SQS messages, eventual consistency can create hard-to-find race conditions, edge cases in your code path can inadvertently create tight loops which can spin out of control and cost you serious $$$, whales can create lots of headaches that affect availability for other customers (hot partitions, throttling by upstream dependencies, etc.). These are the real time-consuming problems with serverless architectures. Most of the "problems" in this article are relatively easy to overcome, and non-coincidentally, are easy to understand and sell solutions for.
- Vendor lock-in: although I try to minimise this and have a clear picture of where the lock-in lies, it is very much present.
- Cold/warm start: this is somewhat annoying. I haven't set up keepalive requests yet, but they feel like an ugly hack, so I'm not sure if I'm going to or if I will just suck it up.
- Security/monitoring: having fewer functions make this less of a worry, but it's still difficult and it's clear that there's not much of an ecosystem and best practices.
- Debugging: this can be really annoying. I can usually avoid this by having a proper local testing setup (see below), but when that doesn't cut it, this gets annoying.
Drawbacks I haven't really experienced:
- Statelessness: this is probably due to not splitting up my code in too many functions, and thus not having that much complex interactions, but this hasn't really been a problem for me.
- Execution limits: haven't run into them yet.
Drawbacks I've managed to contain:
- Local testing: I'm writing Node Lambda's. Since they're just stateless functions, it was relatively easy for me to write a really simple Express server, transform the Express requests into the API Gateway format, and convert the API Gateway response format back into Express format. This works fine for local testing, and reduces the effects of vendor lock-in. That said, I do do a lot of the request parsing in the Lambda function itself, rather than letting API Gateway do this.
- Deployment: this is actually pretty sweet. I'm using TerraForm which, although it's a bit immature and thus cumbersome to set up, has been getting the job done and allows me to easily deploy new copies of the infrastructure for every branch of my code.
- Remote testing: as mentioned above, deploying it for every branch I have allows me to see what it's going to look like in production.
Hmm, perhaps I should turn this into an article sometime...
> - Vendor lock-in: although I try to minimise this and have a clear picture of where the lock-in lies, it is very much present.
The post isn't loading so I don't know exactly what it's in reference to (AWS lambda, the framework, the more generic idea?). But, the serverless framework[0] supports all major cloud providers (AWS, Google, & Azure) as well as self-hostable options like OpenWhisk and Kubeless.
Yes, but when I started putting this together it was very immature, as in: it crashed a lot. I didn't really want to lock myself in to an immature product either, and I also didn't really like having to run specific commands to get to a certain setup, rather than describing the layout of my desired infrastructure.
You have to make a choice at some level of the stack, for me, that was serverless as the deployment framework. You could probably use terraform or ansible + some custom scripts, but then you could argue you're locked into terraform or ansible ;)
The login thing isn't actually necessary (or if it is, that's new)
> it was relatively easy for me to write a really simple Express server, transform the Express requests into the API Gateway format, and convert the API Gateway response format back into Express format.
For those interested in this route, there's a plugin[0] for the Serverless Framework that handles this conversion for you. Honestly takes 1 minute to convert an existing Express app to serverless.
If you want to use Serverless (the package, not the architecture) and are converting an existing Express app, that might indeed be a good idea. I was greenfielding, and neither of these conditions were true for me.
>- Vendor lock-in: although I try to minimise this and have a clear picture of where the lock-in lies, it is very much present.
I feel like this is the most overstated draw back in cloud computing by far. I don't know how most people write their code but me and most people I've ever worked with have a natural tendency to seperate out abstractions to protect you from "likely to change" parts of your application. In fact, if you use Spring boot in Java it already abstracts away a lot of the differences for various cloud scenarios.
Further, in the worst case you have to rewrite some stuff. Before you skip a solution for fear of vendor-lock-in you should attempt a cost analysis: how much will you save on these platform-specific advantages and how does that offset your cost of rewriting when some other solution becomes more desiable? How long will it take to reach break even and how does that time compare to how often software is just rewritten or abandoned in your organization anyway?
I mostly share your view, and hoped not to overstate it: it's clearly a drawback, but in all, not enough to turn me away from it. By being aware of this drawback and keeping an eye on where the lock-in is, it would not be _that_ much work to rewrite the pain points to work with another provider. That said, it's still a good idea to be aware of the drawback.
Unfortunately I can't edit my post anymore, but I should've noted that I'm overall happy with my serverless setup.
> - Vendor lock-in
Try something like https://github.com/apex/up - write the code as regular servers, and it'll convert your entire server into a function.
> - Cold/warm start
Should be a temporary annoyance - I'm sure all the providers are already hard at work on fixing this.
> - Security/monitoring
Easier to do than regular servers - most FaaS systems have monitoring built in, and collect whatever you write to STDOUT.
I didn't really feel like the benefits of packages like Serverless and Apex were worth it for me, as they're not solving that big of a problem, and you'd replace it with lock-in to even less mature products. That said, the current lock-in is not that bad, just good to be aware of and keep an eye on.
I hope that the annoyances are temporary indeed, but that doesn't buy me anything today :) That said, I think for almost all my drawbacks it holds true that they're likely to grow less and less significant, so if serverless is not worth it for you today, it's still worth it to keep an eye on.
Edit: I should perhaps also link the shim that I'm using. It's slightly longer, also specific to my relatively loose configuration of API Gateway, and also simulates S3's client-side file upload functionality. As you can see, it's really not that bad: https://gitlab.com/Flockademic/Flockademic/blob/dev/app.ts
Oh, oops, didn't scan close enough. But in any case, I don't want to be writing "regular servers", I want to write FaaS functions. Ideally (and hopefully this is where we will end up), against the same interface regardless of provider, but not necessarily e.g. Express's interface.
Thanks for sharing though, it's surely useful to some.
It's too bad I can't edit my post anymore, as I should've added that overall, the drawbacks are not enough to draw me away from the advantages. I also should've noted that I'm not using Serverless (the software).
CGI scripts with a new name. Lambda and the like are interesting but any system is a composable set of components.
You can say the same about Object oriented programming or func programming. separation of concerns but on the network. Server functions arent a panecea cause you still have to manage all the other pieces.
An elegant thing would be your entire app is in that single lambda but without state there goes your db. even so, people cant resist taking something and adding and adding more to it.
More like CGI scripts with a loadbalancer that promises to run your application on a host with enough memory within at most 600ms. That solves an actual problem with CGI scripts, but I really would love to see more tiers with lower latency brackets, sub 10 milisecond should be doable for certain lambdas and costs.
The other key component of serverless is the pricing scheme: pay for what you use; don’t pay for provisioned capacity. This has a huge impact on how you design systems as you no longer have to consider throughput (besides account limitations that you can raise without cost), only latency. You can even treat lambda like an async queue which will never accumulate a backlog.
Interestingly lambda seems to make async IO technologies like nodejs less compelling, because throughput capacity isn’t really relevant anymore.
> The other key component of serverless is the pricing scheme: pay for what you use; don’t pay for provisioned capacity.
There is another way of looking at it: Pay for a percentage of the traffic, irrespective of how fast your application runs or what your budget is. If you reach your monthly budget because of one hacker news trending article, yeah, then no more traffic for you. It can be insane.
If I have autoscaling with an upper bound, I can stay within a fixed budget per month. If I reach my budget, yeah no more new servers, but whatever running keeps running. My business doesn't disappear, I get business continuity. Paying based on % of traffic sounds insane. How will I plan for business continuity if my monthly budget is reached with this kind of model? (without any "minimum xx requests" kind of crap free tier, business continuity means fixed upper bound AND continuing service, not OR)
It’s more that you’re dealing with a different set of constraints on which to evaluate technologies. There’s not an overall better suited technology than nodejs - still good reasons to use it on Lambda.
To make the most of Lambda though you’ll want something that affords small code size (as there’s a 50MB limit), and has fast single execution latency.
sub 10 ms would only work if they had an instance running and warm, ready to take requests - if they still have to look up your function, provision a machine, interpret it etc it's easy to go past that. 600ms is still respectable if it's from a "cold start".
But if you've got time critical applications maybe serverless is not for you.
I know it's an edge case that doesn't fit Amazon Lambdas atm. I do it by hand ATM, so I know it can be done, but sure I run it on too many preowned servers to handle peaks. Controlling serverless resources in a more fine grained way is a way to use your servers more effectively.
I'm actually working on this concept; a serverless platform utilizing the CGI standard. I call it bigCGI. It addresses a lot of the drawbacks another user mentioned:
Vendor lock in: The CGI standard is supported by almost every server technology. Plus bigCGI is open source and can be self hosted.
Security/Monitoring: one of my biggest challenges, but separate processes, by their nature, offer separation of resources.
Debugging: CGI scripts are super easy to run locally. I've also configured bigCGI to collect STDERR from apps.
Statelessness: CGI is a stateless standard.
If anyone is interested in this, I'd love extra eyeballs on it, especially for security. https://github.com/bmsauer/bigcgi
(Please excuse the bad README, I'll update it next chance I get)
Back when we only had shared hosting or bare metal there where some pretty good CGI hosters. So a better comparision is compact fluorescent lamp: incandescent light bulb with a new name, CFL are clumsy and expensive for many things but a cheap and easy way to use less energy. I'm waiting for the LED version.
Would those hosters scale up to serve arbitrary amounts of traffic? Would they automatically replace hosts that failed? Could they seamlessly integrate with systems that can store and query unlimited amounts of data? Did they have a system like API Gateway to put your CGI scripts behind? Did they have deployment systems that would set up all of the above and more with a single command, like Serverless? Also are you aware that Lambda is exceedingly cheap?
Weak argument from analogy given the (lacking) degree of relevant similarity. See: On Analogy, A System of Logic by John Stuart Mill for a thorough analysis of analogical reasoning:
Seems to me like they are both dismissive statements about a technology that provides the same thing in a new improved way. But more importantly, linking to a book of logic is an especially poor form of argument.
considering how tenuous the original comment claiming cloud functions are just basic evolution of cgi scripting I think providing similarly poor analogies is quite apt.
The phrase that Amazon used when launching lambda was 'deploy code not servers'. To me this sums up what 'serverless' means. It means the developer doesn't have to worry about servers in any way.
With AWS Lambda/API Gateway (and arguably with Google App Engine before it) you take away the toil of having to:
* Manage/deploy servers
* Monitor/maintain/upgrade servers
* Figuring out tools to deploy your app to your server
* Scaling an app globally.
* Coping with outages in a data-centre/availability
* Worry about load-balancing & scaling infrastructure
Heroku is a bit higher level. If you're using heroku, you're using their blocks and their methods of passing traffic, logging, etc. With lambda, it's just a function. You have to connect everything yourself, not you can do it exactly the way you want.
It's just code with basic function entry point. You may get your parameters and configuration in a different way, but not much else changes. There are only so many ways to call some code.
Which is probably when you need to re-architect, anyway. There are also several optimizations that can be made along the way to reduce costs. I think there are plenty of ways to think of problems from a serverless standpoint that really don't need to go the traditional server route. I also think there are plenty of use cases where serverless may very well be a problem, that's okay.
All those issues can be postfixed with "... on Lambda". They're specific challenges that will come up when using Lambda as advertised in production :)
As with most system improvements we're not really talking about removing complexity, just reshuffling it into a new and more pleasing form. Lambda will let you wildly simplify some things, but pain points otherwise accounted for in larger deployments will get moved onto your lambda services to compensate.
Cold starts, for example, aren't generally an issue at the function level in web applications as the application itself is/isn't warmed up. By introducing a Lambda back-end you get to take a stance on that per-function while caring less about it in your webapp front-ends.
The OP is trying to highlight where these new and less obvious challenges arise in the new stack, not pushing Lambda as the end all of app dev.
I'd expect a lot of those are things that should be supplied by AWS, especially things like providing a service that runs the code exactly as it would on AWS itself (which should fix the testing / debugging); monitoring and security should mostly be builtin.
What exactly do you think it’s meant to do? Amazon even lists web application backends as its first use case example in a recent article. All those things are necessary in production.
I tried the excruciating task of building a real-time multiplayer game with server-side authority using only Firebase and Google Cloud Functions.
Basically I handle all the logic in Firebase rules and lambdas. Any destructive action goes through a GCF that updates the Firebase Realtime Database. Actions that happen often (like moving) are updated directly by the client but validated with some overly complicated Firebase rules.
It's very nice not having to worry about servers and scaling, but it can be a really painful development experience.
Let me know if you want to try it out and I'll post the URL.
It's pretty much a collaborative and social experiment where I wanted to explore the concept of using in-browser cryptocurrency mining as a monetization and bot fighting method (something I had a problem with in the original Ludum Dare entry it's based on). That's a different discussion though. (You need to manually start the mining and you can explore the map without doing it. It's only when digging and chatting it's required.)
I'm actually planning on rewriting the whole backend in a more traditional Node.js + WebSockets stack so no, I probably wouldn't do it again for this type of application. However, I will probably use it again for other things.
For me the BIGGEST gain of using serverless architecture is security. For example, this year I started two ecommerce platforms. One's a digital delivery e-shop - selling ebooks, training and stuff and the other is a physical delivery e-shop. Normally, I'd write my own Phoenix/Rails app, but this time, I decided to go serverless for my furniture shop and wrote it all in Jekyll. Yes, the static site builder. I use netlify to manage the production aspect of it (which IS pretty AWESOME) and simple excel sheets to track inventory (which is what my vendor provides me, anyway). For payment, I simply use the checkout/cart functionality provided by Paypal and all this just works! The site is designed in such a way that you can't even tell anyway what's being used for the backend. No one can tell it's just a bunch of static HTML pages on display.
Whereas, for my digital delivery store, I regularly need to check my logs to see if anyone's doing anything suspecious. For example, a lot of IPs randomly try to visit wp-login.php or /phpmyadmin. Maintaining a production web application is a full time job by itself, if you don't have a team.
Having said that, many people would immediately assume static page builders are generally dumb. That isn't exactly true - You can automate a lot of stuff. For example, my local machine has a custom Jekyll plugin for my store that resizes and optimizes product images before pushing to prod to keep the page load time small. IF I had chosen the Rails/Phoneix route, I'd need to worry about hosting imagemagick or the like somewhere. Or maybe write some code to communicate with an third party API and usually, it's not free.
End of the day, I make sales and that's all that matters. That's when it hit me hard that my customers needn't care nor know what's behind their favorite site.
If static sites are now also Serverless, the word has lost all meaning.
IF I had chosen the Rails/Phoneix route, I'd need to worry about hosting imagemagick or the like somewhere.
Static sites have no option but to run that stuff ahead of time, but that doesn't mean that dynamic sites can't do the same. Asset Pipelines with precompilation are pretty common - both Rails and Phoenix have one.
"Serverless" is itself a bit of a misnomer. The point seems to be to distribute stateless sub-computations. Whoop dee doo. At some point you need a "server" to modify application state. A static site has no state modifications, so it actually is serverless in this sense of having stateless computation.
I get it, but while the name is not very good, there's a reason why it came about now, despite the fact that we've had static website hosting for decades. It was coined to describe a particular kind of service, and retrofitting it makes it less useful and more prone to confusion.
Right, but we've had that since the start of the web. I get that Serverless is not a very good name, but there's a reason why it was coined now - it was meant to describe a particular kind of service that appeared recently. If we're going to apply it backwards to any web hosting service where the host was not explicitly managed, the term becomes much less useful.
> If we're going to apply it backwards to any web hosting service where the host was not explicitly managed
But that's the thing - there is no "the host" any more. In the days of yore, if the physical machine hosting your site failed, your site went down. Maybe some rare services had failover and redundancy to some degree, but they were primitive at best compared to AWS (and the other serverless providers, for that matter) today. There was simply no comparison to the ecosystem that exists today. Have you ever had to manage production hosts before? It simply boggles my mind how many people throw out these "the cloud is just somebody else's computer" comparisons - like they've never had to diagnose JVM garbage collection thrashing at 3 in the morning before, or dealt with a server that goes down due to 100% of disk space being consumed by logs, or patch a massive fleet in a matter of hours in response to a CVE, or a power outage, or a hyper-localized network event in a data center, or any of the other million+ super annoying problems that come with managing physical hosts, and to a large degree, VMs/VPSes.
I can't tell if you're agreeing or disagreeing with me :) my point is exactly that: what AWS et all are offering now is not like what we used to have, so we should use the term "serverless" for stuff we already had, like fully-managed static sites.
Ha, I thought I was disagreeing, but having gone back and re-read just now, I realize I must have misread because, yes, I agree with you 100%. I was also partially expressing general grist at the naivetë of some of the other commenters.
Because most devs aren’t the ones that will need to worry about how it’s going to be monitored once deployed. Not to mention troubleshooting a billion micro services when something breaks.
With APIG+Lambda you get CloudWatch integration 'out-of-the-box', and can expand the metrics and logs you send out with a few added lines of config.
You can also toggle X-Ray for a detailed view of your call graph.
If the default implementation isn't good enough, you can define your own CW events, alarms, and log filtering.
I'll grant you that you can't just ssh into a host and tail some logs, but if you're keen you can send your logs to elasticsearch for a better arbitrary search experience.
What I've found having to use serverless is the local development tools are broken. You'll find cloudformation is the only way to reliable manage all the proprietary paid for services which will require you to read several documentation on various tools (kinesis, dynamodb stream or fifo, standard queues) to work out which one might work. To find out it doesn't necessarily work in your applied function (CloudWatch Insufficient Alarms I'm looking at you). So you then have a choice to use cloudformation and another deployment tool to push your services. This is more typical after you've given up trying to manage dynamodb-local across platforms for your codebase.
One thing I would like to see in articles like this are more concrete use cases for using serverless. The article begins talking about the concept as though you should write your entire app in serverless, and then in their case study, use serverless as a background process to convert image types.
The other use case I always come across is image scaling.
I'd be interested if anyone would like to share their use cases as entire apps or background processes.
We use Lambda + API GW to manage the glue between our different data/service providers. So for instance we expose a "services" API (API GW) that takes a request, does some business logic (lambda code), calls the relevant provider(s) and returns the aggregate response.
That principle can (and probably will) be extended to hosting our own back-end / business logic.
We're trying to get to the point where a dev only needs to write a Swagger file, the lambda code and a bit of configuration, and the rest is taken care of by AWS and our CI framework.
we're a small start-up, so I worry about shipping as fluidly as possible, and being confident that my infra just runs (security, patch management, scalability, uptime, etc.) All of which serverless gives me much more easily, and at a lower cost (for now) than running my own EC2 instances.
I'll worry about vendor lock-in (or dumping serverless for that matter) once I start to get ridiculous bills from AWS, or hit performance issues, or whatnot. But I try to mitigate by relying on standard formats (Swagger) and keeping my code as close to the business logic as possible - which is fairly easy with Lambda. The only thing that is really tied to AWS is the framework we use to build and deploy the architecture: a large Cloudformation file basically. We explored Serverless (the app) to manage this, but it didn't fit our need (in particular, Serverless has apparently never heard about Swagger, so that sucks).
I’m looking forward to Amazon Fargate which is less extreme than Lambda but still server less in some sense. Basically you can still write proper apps (Spring Boot apps in my case) but, so long as you containerize them, you don’t have to provision servers, just tell Amazon what resources they need and how many instances you want.
TFA is unintelligible mumbo jumbo describing an architecture that amounts to (I think!) minimizing statefulness and moving all stateless computation out to a cloud.
Well, OK, but how about some examples? How about some advice as to where to draw lines? E.g., if the computations take less time to do on a server than to distribute, then maybe don't?
I use AWS extensively, so I can elaborate on AWS's approach to these problems. Deployment is straightforward with CloudFormation, VPCs/security groups/KMS/etc. provide well-documented security features, CloudWatch provides out-of-the-box logging and monitoring. Integration testing is definitely important, but becomes a breeze if your whole stack is modeled in CloudFormation (just deploy a testing stack...). CloudFormation also makes regionalization much easier.
The most painful part has been scaling up the complexity of the system while maintaining fault tolerance. At some point you start running into more transient failures, and recovering from them gracefully can be difficult if you haven't designed your system for them from the beginning. This means everything should be idempotent and retryable--which is surprisingly hard to get right. And there isn't an easy formula to apply here--it requires a clear understanding of the business logic and what "graceful recovery" means for your customers.
Lambda executions occasionally fail and need to retry, occasionally you'll get duplicate SQS messages, eventual consistency can create hard-to-find race conditions, edge cases in your code path can inadvertently create tight loops which can spin out of control and cost you serious $$$, whales can create lots of headaches that affect availability for other customers (hot partitions, throttling by upstream dependencies, etc.). These are the real time-consuming problems with serverless architectures. Most of the "problems" in this article are relatively easy to overcome, and non-coincidentally, are easy to understand and sell solutions for.