Hacker News new | past | comments | ask | show | jobs | submit login
What a typical serverless architecture looks like in AWS (medium.com/serverless-transformation)
173 points by lxm on May 23, 2020 | hide | past | favorite | 182 comments



Please don't do this.

I have seen this in practice. A simple CRUD app split across literally 100 of "re-usable" repositories. The business logic is all over the place and impossible to reason about. Especially with step-functions, now the logic is both in the code and on the cloud-level. Each developer gets siloed into their little part and not able to run the whole app locally.

The whole thing could easily fit in a single VM as a Rails or Django application.

The only one that will be happy about this is AWS and contractors because it's a guaranteed lock-in.


What you (and many others here) are objecting to is completely orthogonal to serverless as an architecture.

There's nothing about serverless that requires separate repositories or even microservices. I know because I have built a 50,000 line serverless application that is a single repository and deploys functionally as a monolith.

We also don't use step functions heavily, because, like you say, that's basically a cloud-specific DSL that you could write in a real programming language with only slightly worse visibility.

Serverless is, plain and simple, about passing immutable "partial computation" state through events and keeping all mutable long term state in a database.


The whole "serverless" fad is about lock-in.

I also really wish they would not have squatted that term. It should refer to decentralized P2P systems which are truly serverless.


I build off of serverless. Not very locked in - I wrote my service to run locally without any AWS services at one point. The main 'lock in' has nothing to do with serverless - it's DynamoDB. DynamoDB is really good so I'm reluctant to port to other clouds without something that has a very similar model. If having a really good service is "lock-in", sure, we're locked in.


> The whole "serverless" fad is about lock-in.

I don't believe that's true at all.

Serverless stuff in gener and function-as-a-service solutions such as the example described in this discussion in particular is in my opinion about placing the needs of the service provider before the needs of the customers.

More specifically, it's about enabling the service provider to waste less computational resources while providing the exact same service.

For example, how many VM instances would you need to put up an API Gateway distributing a couple of HTTP requests to at least one instance of a HTTP server whose only responsibility is to trigger a workflow or update a database? How many instances would you need to keep up to keep a database running that barely has 2 or 3 tables? How many instances would you need to launch to run a batch job that does nothing more than sniff a bunch of files?

Without considering any concern with availability, that's about 3 to 4 instances. All idling, and mostly running support stuff that needs to be there just for your service to be able to handle a request.

This might be your go-to solution for this sort of service, but in the eyes of a service provider that is sinfully wasteful. I mean, half a dozen instances with a utilization rate that barely breaks off 50% just to do the same thing everybody else is already doing?

So, why not cut with all that bullshit and simply tweak a shared API Gateway/message broker/background task/workflow automation/pubsub/database/data store service to do the stuff you need to do?

If you use the communal service and let the service provider manage it with it's dedicated staff, the company doesn't waste half of its computational resources idling by or just duplicating the service everyone is already using.

That's less hardware to power, less hardware to provision, less hardware to maintain, less spiky hardware utilization rates... Less work and less costs.

If you want to discuss lock-in then focus on IAM. Everything else is a way to help the service provider better utilize their current capabilities.


Even without serverless you are pretty locked in. Things like Terraform help but the level of cloud integration required for a complex system is pretty overwhelming.


If its "required" you need to think more.

We use GCP and GKS for some things, but we could leave pretty easily. Hardest parts would be DIY Postgres HA and DIY K8S. CockroachDB is almost ready to get rid of the former headache. Haven't looked into the latter headache yet. Of course we really only use K8S for load balancing and HA and there are other options there like Consul and Nomad.


I’m glad you are able to move your use case easily!


Try Patroni for Postgres HA, seems really trivial.


Thank you for this comment I've experienced this exact same nightmare in my current Job, a serverless hell


Can you expand on this with specifics please? Our serverless code is all in one repo and we’re pretty happy with it.


A bit of a translation barrier so I'm not positive but the company advocating this appears to be an agency that builds things for multiple clients. For that case it makes more sense to split out capabilities like that so they can easily pull them in like Lego blocks to other projects. For everyone else I'd advocate starting with a monolith until you need to split things.


Makes sense for whom? Maybe for them, certainly not for their clients.


> Makes sense for whom? Maybe for them, certainly not for their clients.

It makes sense for the system, and thus for the client.

Just because there are a lot of boxes in the diagram that does not mean the system is complex. A lambda is just a stateless message handler. Half the diagram are lambdas. The rest is just a few persistence services and then stuff related to external interfaces, likr DNS services and API gateways and auth and pubsub.

Any desktop app is way way more complex than this. A dialog box has easily twice the number of handlers.


If the firm is fast and produces reliable platforms at the cost the client is looking for then their clients couldn’t care less whether it’s Django (sigh) or Lambdas.


Complexity in any form can be lock-in to contractors, eng. teams, vendors, etc.

Make no mistake, serverless cloud services are the greatest building blocks of all time. That doesn't mean you should use all of them, nor use them in the most granular manner.

As always, the architect should make prudent decisions.


The immediate thought looking at that diagram is that while on first glance serverless runtimes seem to promise that you don't have to care about the details, turns out you really do have to care a lot about the details!

You're not really relieving yourself of concerns by using serverless runtimes, just shifting them around. Sure, you don't have to care about managing state at the leaves, but you very definitely do now have to come up with a way to hold state globally with ephemeral/stateless leaves, and it turns out that's... a really hard problem with a really complex solution space.

Not to be overly dismissive or say that one way is better than the other, it definitely depends on the usecase and probably the underlying business model of the usecase. Just to point out that when it comes to engineering abstractions, there really isn't any free lunch. It's all tradeoffs.


How is that any different than a typical web server? Hopefully you don’t maintain state on your web server either. This isn’t a new concept. Even in the early 2000s when I was maintaining two web servers behind a physical load balancer we knew to maintain session state separately from the web server.


I had the same question - if this guy has each web request adding to local state on his web server, I have some bad news...


I hate to be “that guy”. But “the cloud” is no more than someone else managing servers and services at varying levels of abstraction.

I have no idea why some people hear AWS/Azure and they act as if the same architectural principles that have always applied still apply, just on a more granular scale in the case of “Serverless”.


Not sure I understand the point you are making?

> and they act as if the same architectural principles that have always applied still apply,

They do still apply...


I was agreeing with you.

More of a rant about the person who complains about not being able to maintain state at a tier that you never maintain state at in even a traditional architecture.


ah, gotcha! Very good :)


Soon we will be feeding plants with energetic drinks.


It has what plants crave


It’s nice to see the serverless movement and its applications. However the fact every (public) cloud provider does it differently is killing it (incompatible logic/interfaces). Right now, serverless is merely shifting complexity of apps to cloud services. Cloud services with a lot more logic (lines of code) that does a million other things than what your application needs. That oftens means much more (uncovered) bugs. In order to have better control and predictability there has to be standards and minimizing depth of code.

I’m convinced serverless is the next after containers; but only if it becomes a standard that can encapsulate services in building blocks that you can easily connect with each other and very predictably behavior. We are not there yet, hopefully soon.


The only think that you need to make your code work with Lambda is a function that takes in an event body and a Lambda context. If you are following standard architectural best practices, your lambda entry point should just be validating your event, mapping it to your business domain and remapping the result. You should treat your lambda event just like you would an action in your controller in most MVC frameworks aka “skinny controllers”.

This is a standard “onion architecture” with “ports and adapters”.

Sound development principles don’t get thrown out of the window just because you “move to the cloud”.


If you're running a stateless containerized architecture you need to hold state globally too, right?


All I see is a whole bunch of dollar signs floating away.

It’s quite sad how effective cloud marketing penetrated the dev community.


It is sad. I will say that many years ago I was stuck on the cloud bandwagon, but today if you asked me how a new business should set up their infra, I would advocate for on-prem or leasing bare metal from some provider.

Just a simple way to provision and snapshot VMs is what 99% of businesses actually need. This is what most businesses have always needed. Many are still using this exact same technology today without any problems.


After 20 years of experience and 8 “in the cloud”, I agree.


Do you think that the cost calculator does a poor job of estimating the financial hit?

The claim is that getting out of the regular instance-and-EBS-volume architecture avoids cost.

The question being whether "avoids cost" meant Amazon, or your business.


In my case I see people skipping over things they don’t know when estimating like traffic and actual resource deployment and crap like “yeah Jenkins will be fine in a container”.

Always assumptions, missing info and costs and poor understanding of it because it’s so damn complicated understanding cost. Consider traffic costing across several regions and you can regularly just piss $20k away from a small architectural decision made wrong.

Estimating costs is Actually so difficult that it has almost entirely replaced the old role of license management under the lie that renting your shit monthly makes that concern go away.


In theory, the presence of other cloud providers should help drive costs down.

"The difference between theory and practice is greater in practice than in theory."--Herb Sutter in C++ Users Journal some decades ago.


Most companies don’t audit their cloud costs and the ones that do simply consider it the “cost of doing business” and move on.

This is a terrible practice that is being pushed by the cloud vendors. It’s gotten to the point where now cloud advocates are calling AWS and others an “operating system” your business needs to perform and fulfill its vision.

My company has given a hard push of lowering costs by 20% across all business lines and auditing our cloud usage has been very very shocking.

I am sure as the effects of the pandemic continue, more and more companies are going to realize the same thing.


I think for most places and most workloads, the state of the art in cost estimation is “figure out the fixed costs, see that they’re reasonable, then launch and keep an eye on the first few bills to hope all the bandwidth and other “surcharge type” costs aren’t deal-breakers...”


This doesn’t work when the cloud marketing associates this as a “cost of doing business”

Which causes revenue chasing and profit hunting to “stay alive”.

Not to mention the massive startup credits issued which ensure complete and utter lock-in for the entirety of your business’s existence (successful or not).


I never have to patch my service's operating systems again, I'll pay for that.


I'd say this is what any distributed system looks like, it's not exclusive to AWS and serverless architecture.

It's not even exclusive to distributed systems, most computer systems look like this when you actually think about it.

Consider what an application would look like if you mapped out each service and object, it would easily look as complex.

This isn't a new concept, we've had the idea that everything in computing could be represented by little computers communicating with each other since the 60s, Alan Kay called it "object-oriented programming".

The only difference is between the 60s and now, is that instead of mainframes we have platforms, so now we can actually have "little computers communicating with each other".


True, if you squint then an application will look like this too. That doesn’t mean this paradigm is not different in other important ways.

For example, in an application you can use things like types and functions to make explicit the coupling between different parts. You can iterate and verify changes on the time scale of a few seconds (instead of waiting for a lambda deployment). You can use all the tools for exploring source code we’ve developed as an industry over the decades. And you can atomically change the entire application simply by deploying it.

Of course serverless has its use cases. But evolving a system like this article shows, without most of the tools and guarantees we enjoy when dealing with code, seems like an exercise in pain.


in our serverless application(s) we still use types, and yes, that requires runtime verification at more boundaries, and therefore technically more compute cycles. we share those types across all our Lambdas. Yes, technically each Lambda must be deployed separately; in our case any changes to message-passing types would require equivalent "overhead" as a no-downtime database migration - then again, the best practice here is the same as with databases - always try to make your types backwards-compatible, only adding new fields, etc.

there's certainly _a_ cost; I just haven't seen it be as high as people seem to imagine.


I continue to be astonished at how many people seem violently opposed to serverless architectures.

Having been working in one for two years and seen very few "hard" problems come up (and having solved the ones that did), I guess I'm just wondering how it is that our product, with its 5 different GraphQL endpoints, roughly 20 queues, and the equivalent of 100 REST endpoints, has somehow managed to avoid all these supposedly inevitable disasters.


It's the same dogmatic fervor seen in defense of this or that language, this or that pattern, this or that library. I'm sure most of the detractors have spent a long time learning their favorite cup of tea. Serverless models are like demanding they switch to espresso.


Boy look at that architecture diagram, I'm going in the opposite direction. A mono repo that compiles to one single binary that does everything. Sort of like how an fertilised egg divides and each cell finds its place in the organism. A self organising architecture, you hear'd it here first.


We are already there. One repo for the entire organization, one binary image per customer, and one management tool to bind it all. Everyone is working in the same box and adding value in the same direction.

It is the most magnificent thing I have been a part of so far. It almost feels like we've actually figured this shit out. Looking back at separate repositories from this perspective is like looking back at decrepit playground equipment you used to enjoy as a child.


A well organized and understood single binary always beats a bunch of decoupled systems nobody can untangle fully, this is not rocket science. It’s why we started coming up with tracing systems like Jaeger and OpenTracing to merely figure out what happens across different domains and code boundaries. The skill is in making changes that accommodates both current and future business needs as well as future developers. If your random CRUD app isn’t understandable without a PhD in a niche domain, you need to decide if that’s a feature (HFT perhaps?) or a fault (99%+ of commercial CRUD apps).


Your approach is great for small teams that can work closely together.

As the team size scales the additional complexity of this architecture let’s teams work More independently.

At least that’s my take on why you’d want to replace a monolith with micro service & serverless.


I used to think the same way, give frontend & backend their own repos and we will talk in a meeting about some rest api, works fine for most, but if you want to grow ur proto anus into an intestinal track that safely processes nutrients and uses the same pipe as the airways then it matters what everybody else is doing.


Would love to hear more about this! Are there benefits to splitting binaries which regular architecture techniques in a single monolith cant achieve?


Yes. Given schedule pressure a dev, in a monolith, can avoid good practices and create a web of dependencies. Eg. To get this feature shipped in time I’m not going to refactor the billing interface to expose new attributes that I need. Instead I’ll just query the DB directly from the user profile controller. Repeat this over a few years and a monolith can be a nightmare.

Whereas shipping as independent services it is harder to cheat because the user profile team doesn’t have access to the billing tables and must go through the billing API.

Of course it is possible for monolith teams to enforce the separation but no team has because it would slow things down in the short term. Then in the long term it’s too complex to unwrap and teams refactor using serverless because that’s what is in style. Of course however nobody is writing about the new problems this approach creates.


> Instead I’ll just query the DB directly from the user profile controller. Repeat this over a few years and a monolith can be a nightmare.

I've seen this, a rat's nest of module dependencies, poorly thought-out module/library use, and a lack of ownership and direction around shared bits.

Monorepos also make dangerous things like breaking api changes look easy, or maybe hide them altogether.

Hard problems around ownership, dependency management, and api versions are hard in polyrepos, and I think I'm OK with this. It makes you be more thoughtful.


This is kinda true of serverless, too, it can degenerate into tight coupling. EventBridge now solves that (somewhat). I've also seen huge StepFunctions, which are also awful to test (except for uploading them). Every project needs some discipline.


I started to have a look but that first diagram arrows everywhere, does no one do simple any more?

It does look like what a normal web app would do in say PHP or Django. Is there really an advantage to splitting it into microservices and putting it in AWS Lambda over a load balancer and extra instances when needed? A normal app would have those services split into classes / modules and would run on the same machine (the aysnc tasks being an exception). I imagine my approach would need a lot less ops and coordination between the parts.


That was my conclusion after getting all excited about serverless initially. I was seeing lower latency, no coldstart concerns, and easier log access/ability to debug just by going back to having a server. Plus having your entire deploy be completely governed by a gigantic YAML file that no one's ever bothered to fully (or even properly) document gets old very fast. So I'm back to using servers.


In my experience, when a single engineering team owns the entire service, a single web app >>> cluster of microservices.

The advantage to splitting up systems is when multiple teams own part of the service and do not want to be dependent on each other for testing and deploying and signing off on features.

Although given the amount of time I spent this week to be able to send a string from my microservice to a partner teams' microservice and get a different string back, I'm not feeling microservice love today...


Conway's Law?


Not sure yet I like how aws does this but I started to think that serverless and microservices are about state after all. It is about ephemeral apps that hold only the bare minutes state. For instance you don’t need user management as identity is better when decoupled into another service. A lot of it sounds like Unix philosophy as in one good tool for one job. HashiCorp YouTube videos explain a lot of those ideas quite nicely. In this world stuff like aws lambda makes sense. However you can build your own lambda and deploy as a container service. So again not exactly sure if aws is doing things right, but the concepts make sense. Apps as functions.


except unix pipes don't often deal with at-least-once delivery, service limits, role and policy management and runtime deprecations


Nevermind that a network is way less reliable than a computer's memory or even disk drive.


There are a few advantages. Not worrying about maintaining and securing underlying infrastructure is huge. Not worrying about load balancing logic is helpful. Having isolated blast radius of application failures is nice. Overprovisioning can be a real problem depending on the underlying machines.

And then when you get into the surrounding services and features you really start to feel the benefits. Want to have canary deployments? Want to have a queue of failed events? Want to start piping data into a data lake for future analytics? Need to introduce a decoupling message queue in front of an expensive operation? All of it is almost turnkey.


I read the entire article, and my first thought is... why? Why go through all that complexity? How much money does this actually save over writing a regular server to prompt favoring this approach? Wouldn't this have a similar cost as a regular server if you get regular activity, minus all the complexity? What happens when your lambdas get DDoSed, do you get overcharged?


As developers we have done a poor job of managing complexity. Largely a result of schedule pressure that leads over time to code bases that have modules that reach deep into the internals of other modules. The consequence is a complex codebase that slows down development.

So I believe serverless/micro services is a way to enforce those boundaries in such a way that a junior developer or schedule pressure won’t allow us to cheat and reach into the internals of another module to get a feature shipped in time. You simply can’t do that when code is running on a different server. So this additional complexity forces separation of concerns that can’t be cheated by schedule pressure or shortcuts.

Of course the downside to this is we have to now manage a different kind of complexity.

As developers I wish we pushed back and simply built our monoliths as a set of independent npm packages, jars, or whatever your language supports. Doing this enforces clear separation of interfaces / dependency management. If we did that then the complexity of debugging and supporting the architecture presented, which strikes me as several orders of magnitude more complex, would be avoided and instead we simply manage at our code level.


https://aws.amazon.com/api-gateway/faqs/

“ Q: How can I address or prevent API threats or abuse?

API Gateway supports throttling settings for each method or route in your APIs. You can set a standard rate limit and a burst rate limit per second for each method in your REST APIs and each route in WebSocket APIs. Further, API Gateway automatically protects your backend systems from distributed denial-of-service (DDoS) attacks, whether attacked with counterfeit requests (Layer 7) or SYN floods (Layer 3).”


It's being attributed to API Gateway, but I think this is really just Cloudfront doing all that.


CloudFront is just a CDN. APIGW let’s you create usage plans based on API Keys


I wish articles like these would include sections about how their developers dev / test / promote through different environments. Only seeing how a production system is structured is less than half the battle and can mask true complications of going serverless.


Typically you spin up a dev stack with the same cloudformation templates, so you get an exactly prod parallel environment (with parameters for things like DDB dev table names)


Typically they simply don't test!

If you do test, you really can only test each unit in isolation - testing the kind of setup in the article is just too hard and the tooling for doing automated tests on that kind of thing is not very mature. There is stuff like localstack/samstack/etc but I haven't seen anyone really use it as part of a workable test strategy for serverless AWS.

Consider what happens outside of serverlessland: when people have 15+ microservices each of which needs a different docker container and different data store. Do they tend to do whole-system testing of it? No, in my experience people tend mumble something about "contracts" and test individual bits in isolation.

This is on top of the open secret which is: it's very, very hard to predict what the ongoing costs of a serverless solution will be. Most of the services are priced by usage but the units are small (seconds, minutes, etc) and the sums are too (eg $0.000452 per hour of xyz). Post-project bill shock is a considerable problem.

Source: worked at an AWS partner consultancy on many cloud "architectures" similar to the article.


Actually using cloud formation, it’s really easy to do integration testing. Often times server less can be cheaper because the systems only pay for what they use. High tps is expensive on server less, but low tps is dirt cheap. Source is I work at aws


chiming in here to agree with the other replies - one of the big benefits of serverless for us is that every environment is a "complete" environment, using an identical architecture to prod and no mocking or sharing of services. And costs maybe $50/mo/dev, most of which is a NAT gateway.


Deploying serverless code through the cycle is no different conceptually than deploying regular code. In our case we deploy Cloudformation templates that reference the same Lambda package in S3 across accounts. Rolling back is redeploying the old template v


This looks so expensive! I've been going in the same path (with centige.com, WIP) but then I stopped and refactored some stuff:

- I had lots of lambdas written in TS. Things were going out of hands. So I refactored them into three lambdas only. A large one that has a switch statement to execute the correct function, and two other ones for some quite long-running processes.

- The big lambda was executed too many times. Moving it to an actual server would cost me way less. So I did. Plus I could add more security features that would need AWS API Gateway if I chose to stick with lambdas, which is too expensive.

- Initially I used Amplify. And while recently its bundle size has been reduced substantially, it's still big (see https://bundlephobia.com/result?p=aws-amplify@3.0.12). In my case, I only needed the Auth package, which is 50kb (https://bundlephobia.com/result?p=@aws-amplify/auth@3.2.7). However, my options were quite limited and I found myself changing things around to make my app compatible with the way AWS Cognito worked[1] (used by Amplify Auth). I ended up removing it and handling authentication on my own.

A better stack for a serverless app would be Vercel for deploying the app (it's cheap, it has SSL, and it has automatic deployments) and either lambdas or a server, depending on your needs, for handling backend stuff.

Also, use something like aws4fetch instead of the actual SDK if you can. It will save your visitors some kilobytes :)

---

[1]: Email/password auth with Cognito is not the best thing out there. So I wanted to add Google Sign In. At the same time, I wanted to use DynamoDB's row-level access (similar to Firebase's write rules, but more limited). My plan didn't go too well, as I needed the user's Id (provided by Google) when writing to DDB. But Cognito wasn't returning the Id... I spent a couple of days on it, then I figured that I'm just wasting my time.


That looks really complicated to me. What's the advantage of setting up a system like that over a typical server, i.e. with most of that stuff handled by one server, writing code to coordinate between a few external things (like maybe auth0 and a DB)?


For the core functionality the only part you touch is the code in the lambdas. Everything comes out of the box if you use a framework like Serverless or SAM. (The step functions, batches, identity, and async pieces you have to configure separately).

The number of OSs you patch or upgrade over the course of running this for a few years is... zero. And the time and effort to scale up, run failover tests on your servers, migrate to new hardware ..also zero.

But the servers are just a small part of it. The application services around it make it possible to build enterprise grade scalable services with low ops overhead really fast.


That sounds really appealing. I really love the sales pitch for serverless stuff. I think I've probably just been exposed to a lot of really advanced ways of setting it up for a big project. It seems complex.


This is what vendor lock-in looks like.


Have you seen how many “vendors” the average decently sized company is “locked in” to?


Cloud is the biggest boondoggle there ever was. Larry Ellison is jealous. He's trying to get in on some of that pie.


Too much marketing, not enough content. Glossing over the dumpster fire of Cognito is enough for me to not trust the rest of the content.

There is so much custom code to essentially fling data between services. I imagine only a small percent of the code and dev time is spent on business logic. These Rube Goldberg machines are my least favourite part of AWS, especially when dealing with at-least-once delivery.


I only used Cognito via Amplify and it was a rather pleasant experience.

What's wrong with it?


Try finding a cognito user from a cognito identity, you can't.

Visit the amplify-js and amplify-cli github repos and search for cognito issues.

Try using the cognito console, it is a litter box of warning messages and exceptions. Try connecting Cognito and Pinpoint if you are in Europe, 2 years and it still doesn't work, but no indication.

See its cloudformation configuration and how many field changes cause a Replace. Yes, replace Cognito and goodbye users.

Until recently user names were case sensitive, and they didn't have basic account enumeration protections.

Its UX is exactly how DynamoDB can do damage to a product.


I recently built a new app on AWS and Lambda. I really wanted to use Cognito to keep everything in AWS. I fought with getting it working for a day or so before giving up and trying out Auth0. I was up and running with Auth0 in a couple hours. I found the Cognito documentation to be insufficient for my needs. Others may have better experiences.


Are you familiar with Firebase's write rules? DynamoDB has something similar with Cognito, but it's _very_ limited.

That's when I figured that I'm not gaining anything by using Cognito.


No, I was not aware of that Firebase functionality as I have never used it. Interesting. Yes, it would be ideal if I could just let another layer do all of the authentication and authorization. That would have worked for the bits of code that use the database. I still would need a way to use the authentication token for things like the Stripe integration.


I see.

I understand the wish to keep everything in AWS.

Especially after I read many times thaz Auth0 is pretty expensive. Is this true?


Auth0 is only $23 per month to start with. That's not a lot for my use case.

I would have preferred to keep everything in AWS just to make it easier to use one set of credentials (IAM) to manage permissions.


Cognito has an awful API and worse documentation. It's also super limited in terms of how user metadata is represented, is hard to query efficiently for users meeting certain criteria, and produces super cryptic errors when integrations aren't set up correctly. As a whole the Cognito product feels like some weird bolted-on POC that AWS decided to charge money for. Compare that to a real _product_ like Auth0 and it just falls to pieces.

If the appeal is about free usage under 40k monthly active users, you probably don't need an external complex managed auth solution in the first place.


There's a lot of negative comments, so I'll just put out - I enjoyed reading this and appreciate the time put into explaining this.

What does your dev/sandbox environment look like? Are you using localstack?


I just fell on the post today. I am the one who wrote the article... I'm a bit terrified by the comments haha. But that's a way to learn and improve stuff.

Thanks for your support in the middle of this.

To answer your question: our dev environment is iso to staging and production with AWS accounts for each developer. The cost is close to 0. Deploying and testing a code change takes seconds. Deploying and testing a config change however is still a little bit longer for sure...


If you think serverless is scalable and cost efficient wait until you try servers.


Did any of the people commenting how complicated (and/or expensive) this looks actually READ the box titles before they ran to proclaim how this should all be replaced with a simple monolith on a single VM?

There is a TON Of stuff here that is not related to your application business logic, but still has to exist SOMEWHERE for a complete production application.

Let's go through what's actually here that your single VM/Rails application does not/cannot encapsulate:

* CDN

* Domain name and certificate management

* OAuth/identity handling

* File upload

* Workflow orchestration

* Facebook messenger/email integration

And for those complaining that this is untestable, it's...exactly the opposite. ALL Of this is infrastructure as code declarable in a configuration file. Which means you can spin up a beta environment as quickly as production, or hell every developer can spin up this entire environment in their account (FOR FREE i might add - because that's the whole value proposition with serverless which is that you don't pay for something if you don't use it)

What's inside these Lambda function is still standard code you can apply good software engineering and SOLID principles and unit test thoroughly and in isolation.

And if you think workflows, and asynchronous queues, and event buses are overkill - well maybe they are for your use case. If you have 1 API invocation per minute you don't need ANY of this.

But if you're handling any serious scale, this will scale elastically with minimal operational intervention.

Remember - your company is not just your devs but your ops people too. And this makes their life way easier.

I understand the frustration with the 'serverless' buzzword, but let's not miss the forest for the trees. This shit is REVOLUTIONARY. And all people can do is try to one-up how much they could replace it with a simple shell script. The lack of humility in this crowd is mindboggling.

Look I agree - at both extremes (very little usage, A LOT of usage) serverless architectures like this are either overkill or to expensive. But there is avery fat middle (I'd venture >90%) of software projects out there for whom this would be both the easiest and the cheapest architecture (in terms of total cost of ownership) to build and maintain.


> What a typical 100% Serverless Architecture looks like in AWS

Does it look like an unmaintanable untestable mess?


If it's decoupled, it's plenty testable.

Serverless covers 2 scenarios very well: when you have volume so low, that you don't want to pay for reserved capacity: if nobody is using the stuff, you pay next to nothing.

And also if there is a lot of load, it's very very easy to scale it to cover very high demand.


SAM provides a pretty painless way to test locally and is core to most CI/CD with Lambda.


Unless you use custom API Gateway authorizers, want to interact with other AWS services or ...

Don't get me wrong, I absolutely love AWS SAM, but I believe testing serverless applications locally is still an unsolved problem.


These reference architectures always look less like anything a sane person would come up with than an opportunity to cram every possible cloud service into a solution and jack the price of hosting a simple web app up into the thousands of dollars a month.


Actually when I saw the diagram at the top of the article I thought "Yup, that looks almost exactly like the architecture of the last project I worked on", as it uses all those AWS services in roughly the same arrangement.

That's not to say I disagree with your comment though...


Lots of bashing here. Please provide clear examples, based on facts. I'm not very interested in how complex you think serverless architecture seems to be. Instead, let us know: What did you try, what was the aim, why did you choose serverless, what was your experience, what did you learn? Make an addition to the discussion.

I'm a solution architect in an environment where 20 teams of five developers each are deploying API- and web-oriented code daily, and have been for a year, on a 100 % serverless architecture. The teams are able to deliver MVPs within days, from scratch. We don't pay for any allocated capacity, only usage. Aside from bonuses such as being able to spin up a complete set of services as a temporary environment, in minutes and for free, the production environment is also incredibly cheap.

I can't say whether this stack would have been as effective if we weren't at the size of benefitting from a microservice way of working, with independent team responsibilities and explicitly defined ownerships. But what I can say is that in my 12+ years in IT, I've never seen developers be this productive. Especially considering the complete development lifecycle.

I'm sure plenty of this would have been possible using a traditional architecture, but having 500 lambdas in production and knowing that you can go home and not get a phone call... pretty nice. The reason being that they are small and thus easily testable and securable, in addition to the obvious (fully managed, auto-scaling etc).

We've had to solve issues of, course. Some silly, some unfortunate. But I wanted to provide a counterweight to the apparently popular opinion that serverless is complicated. To me, that's like looking at a car and complaining that it's more complicated than a train. Both have their use cases.


thanks for this detailed and concrete counterweight opinion. I hear horror stories but my experience jibes with yours.


It's not typical (because most people don't know about it), but API Gateway can integrate directly with many AWS services without the need to put AWS Lambda in between.

If you find yourself writing many Lambdas just to do simple transforms and piping around data, it's always a good idea to check if the two services can directly talk to each other first.


The mapping template language(?) is frustrating to do anything in though. Its documentation is somewhat hidden / distributed, especially any special variables / util functions, and suffers from limitations like the foreach operator being limited to 1000 iterations. Unless you're literally returning entire database objects (unprocessed, including all the type info) or something similar, it's pretty limited.


Yes, the usual AWS problems.

The docs for most services are really bad. I wasn't fond of learning VTL for AGW either :/


At a past job, we hired some people out of Amazon to work on our cloud services. They rewrote large portions of it using AWS serverless tools.

Cost was a big concern for us. Strangely enough, the AWS experts insisted it would be impossible to estimate the cost of the serverless architecture until we deployed it at scale and tried it out. Is this still the case, or has it become easier to estimate costs during development?

In our case, costs did go up significantly after the serverless rewrite, but they also added additional functionality and complexity that made a 1:1 cost comparison impossible.

Serverless was interesting, but if I was involved in another backend project I would want to understand the cost better before diving into the deep end with all of these intertwined services.


How long ago was this?

I used AWS for a recent personal project. Lambda was not the right approach for me and primarily due to cost. I had an idea of how long a request takes to process (ball park estimate) and the expected throughout (requests per second). From that I knew that at minimum, I would be paying $x a month, and as the execution time of my code goes up or I get a spike in my traffic volume, it would only increase from there.

If you’re at scale, AWS Lambda blows up real quick.


That’s why you use Lambda+APIGW with proxy integration where you can use your standard frameworks like Express, Django, Flask, ASP.NET MVC and your can easily move from lambda without any code changes.


I’m not sure what specific problem is being solved with the solution you’ve written. Also, I wasn’t building a web application.


You’re not “locked in” to Lambda. When you see that lambda isn’t the answer you deploy the code to a server or Docker just by making a few changes to your deployment pipeline.

If you use the standard APIGW solution where it routes to your various lambdas, you would have to rewrite code.


Sorry but maybe you misunderstood my post. I never said anything about “lock in”. I could easily estimate ballpark minimum costs in front. There’s absolutely no reason to continue using Lambda at that point.

Lock in or no lock in, there’s also a cost of writing the deployment scripts and setting up your development vs. production Lambda stack. And if you are using Lambda, you’re going to end up making various decisions to keep your costs low. For example, I was working with a JVM stock. Spring is too heavy, and even Guice has some execution cost. So on Lambda, I would use Dagger... a decision made purely because I’m operating on Lambda.

Building an architecture on Lambda requires certain decisions to be made up front. Saying that you can just lift and shift later on is very simplistic and will be costly depending on what you’ve already done on Lambda.

I would always say estimate your costs first and think about the long term picture about your request rates, patterns (spikes), and growth rates.


Setting up your lambda deployment scripts is setting your CodeUri value in CF to match the output directory of your build and running CloudFormation package and deploy.

Changing your deployment to use Docker/Fargate is creating your Docker container by copying from the same output directory to your container in your Docker file, pushing your container to ECR, and running a separate CloudFormation deploy to deploy to Docker.

I’ve deployed to both simultaneously.

There are no decisions to be made up front. You create your standard Node/Express (for example) service and add four lines of code. Lambda uses one entry point and Docker or your VM uses the other.

It’s simplistic because it is simple. Java is never a good choice for Lambda.


>Setting up your lambda deployment scripts is setting your CodeUri value in CF to match the output directory of your build and running CloudFormation package and deploy.

What about writing CF to begin with for all your Lambda functions and setting up all their permissions to work alongside whatever other resources you have?

>Changing your deployment to use Docker/Fargate is creating your Docker container by copying from the same output directory to your container in your Docker file, pushing your container to ECR, and running a separate CloudFormation deploy to deploy to Docker.

Nobody said anything about Docker or Fargate. This has nothing to do with the topic. My problems had no need for using Docker. Do you randomly pick a tool kit and just hope it works out? I suspect you're not really working at scale and costs aren't a concern.

>I’ve deployed to both simultaneously.

Great job. We are all proud of you I guess, but this is not how engineering works, especially when you're building at scale.

>It’s simplistic because it is simple. Java is never a good choice for Lambda.

It's actually not simplistic. Software engineering is about being able to make tradeoffs. Not everything is a CRUD/web application. It's funny you keep going back to Node/Express as your examples. Have you looked into Express and what dependencies it has, or do you just randomly pick the new framework of the week? I don't mean this as an attack, but I'm more than slightly annoyed that you keep referring to Node and Express, when that has nothing to do with the problems I'm trying to solve.

The funny thing is you've written out all these lengthy posts but never stopped to ask about whatever constraints the system has. You started with the solution and are basically now looking at one of my constraints (JVM ecosystem) and are saying it's not a good choice.

In the real world, we often don't have the ability to randomly swap out a language. There's enough properly tested code that already exists or a massive code base that entire teams are supporting. They're not going to drop that just so they can go pick up a shiny new framework or product offering. LOL.


What about writing CF to begin with for all your Lambda functions and setting up all their permissions to work alongside whatever other resources you have?

There are no “all of your lambda functions” with the proxy integration. You write your standard Node/Express, C#/WebAPI, Python/Flask API as you normally would and add two or three lines of code that translates the lambda event to the form your framework expects. I’ve seen similar proxy handlers for PHP, and Ruby. I am sure there is one for Java.

As far as setting up your permissions, you would have to create the same roles regardless and attach them to your EC2 instance or Fargate definition.

But as far the CF template. Here you go:

https://github.com/awslabs/aws-serverless-express/blob/maste...

Just make sure your build artifacts end up in the path specified by the CodeUri and change your runtime to match your language. I’ve used this same template for JS, C#, and Python.

Nobody said anything about Docker or Fargate. This has nothing to do with the topic. My problems had no need for using Docker. Do you randomly pick a tool kit and just hope it works out? I suspect you're not really working at scale and costs aren't a concern.

You’re using your standard framework that you would usually use. You can use whatever deployment pipeline just as easily to deploy to EC2 with no code changes.

Great job. We are all proud of you I guess, but this is not how engineering works, especially when you're building at scale.

Actually it is. Since you are using a standard framework choose whatever deployment target you want....

Have you looked into Express and what dependencies it has, or do you just randomly pick the new framework of the week? I don't mean this as an attack, but I'm more than slightly annoyed that you keep referring to Node and Express, when that has nothing to do with the problems I'm trying to solve.

I also mentioned C# and Python. But as far as Java/Spring. Here you go.

https://github.com/awslabs/aws-serverless-java-container/wik...

The funny thing is you've written out all these lengthy posts but never stopped to ask about whatever constraints the system has. You started with the solution and are basically now looking at one of my constraints (JVM ecosystem) and are saying it's not a good choice.

I’ve also just posted a Java solution. The constraints of Lambda are well known and separate from Java - 15 minute runtime, limited CPU/Memory options and a 512MB limit of local TMP storage.

In the real world, we often don't have the ability to randomly swap out a language. There's enough properly tested code that already exists or a massive code base that entire teams are supporting. They're not going to drop that just so they can go pick up a shiny new framework or product offering. LOL.

My suggestion didn’t require “switching out the language”. Proxy integration works with every supported language - including Java. It also works with languages like PHP that are not supported via custom runtimes and third party open source solutions.


This solution is entirely wrong and doesn’t work for the problem I was trying to solve. You seem to believe every problem is a web application or a web API of some kind. The proxy doesn’t make sense in my use case, and for the simple reason that it’s cost prohibitive alone, I would pass.

Feel free to keep writing more paragraphs. You’re trying to solve problems that don’t exist.

It seems like you start from solutions and hope that the problem will fit. Lambda seems to be your hammer and you write as if everything is a nail.

Good luck!


It seems like you start from solutions and hope that the problem will fit. Lambda seems to be your hammer and you write as if everything is a nail.

So lambda seems to be the solution and my “hammer” even though I mentioned both EC2 and ECS? Is there another method of running custom software on AWS that I’m not aware of besides those three - Docker (ECS/EKS), Lambda, and EC2?


> At a past job, we hired some people out of Amazon to work on our cloud services. They rewrote large portions of it using AWS serverless tools.

I think they probably considered you a "sucker". Think you're ever going to be able to migrate that app to anything other than AWS?


> Think you're ever going to be able to migrate that app to anything other than AWS?

Obviously, no, we did not expect to migrate our AWS Serverless backend to something other than AWS.

Avoiding vendor lock in makes sense in few, select circumstances.

Avoiding vendor lock in for the sake of avoid vendor lock in leads to over engineered services that take longer to deliver because people are going out of their way to avoid using vendor-specific tools.

In all of my time as an engineering manager, the number of times that avoiding vendor lock in has solved more problems than it caused is still zero.

Use the best tools available at your disposal to get the job done. Cross the vendor-changing bridge if (and only if) it becomes a requirement.


> In all of my time as an engineering manager, the number of times that avoiding vendor lock in has solved more problems than it caused is still zero.

How long have you stuck around to find out? Never had to undertake a massive task to migrate an existing system that's closely bound to its legacy platform? Because that's the other side of the coin.

I also highly suspect that number is not zero, you're just not counting the obvious decisions people make every day that avoid lock-in because they seem like common sense. But I suspect you're not building everything using ColdFusion, though, or doing your version control through Perforce...

> Use the best tools available at your disposal to get the job done.

Behind this phrase lies the fallacy that there is any such thing as "the best tool" and the implication that anyone doing anything different is a clear idiot. In reality, everyone uses "the best tool", it's just they have different metrics by which they measure it.


I think you might be thinking it is a binary choice. Educated decisions can be made to decide if a particular item is worth the lock in or not. For example auto scaling groups in AWS are much faster to set up and maintain than vendor neutral choices. But leaving asg is pretty much the same amount of work as setting up auto scaling anyway. Switching out has cost but not that much. Something like cognito... that’s almost impossible to leave. But more importantly it’s the AWS “glue” between these services that is truly impossible to leave.

I have seen two businesses fail under the weight of AWS billing and be unable to get out from underneath it before they ran out of funding. It’s definitely not something to look at lightly.


And how many companies are successful either because they started on AWS until they found product market fit and they were capitalized well enough to migrate off (Dropbox) or are successfully running their entire business at scale on AWS (Netflix).

How many companies fail because they run out funding - period?


> And how many companies

Congratulations, you found two examples out of the millions of companies that exist in this world, both with astronomically greater funding and scale than anyone you or I are likely to work on. Yet people continue to obsess about these unicorns and assume the choices they've had to make are clearly the right choices for everyone.

The greatest problem with modern devops is peoples delusional ideas over the scale upon which they're really operating and where they're really going to start feeling the pinch. I've come across way too many tinpot "we're netflix!" setups that crumble under their own unmanageability as soon as they have budget and staff taken away from them for a couple of quarters.


So you are claiming that there are only two companies that have successfully run a business on AWS?

The reason for managed cloud is the same reason for using any other vendor. To let you focus on your core competencies that give you a competitive advantage. Do you also think companies shouldn’t use Github, Microsoft, Workday, Oracle, SalesForce, Atlassian, etc.?

Do you know how much infrastructure you can buy for the fully allocated code of one employee?

Maybe they know something that you don’t know...

Seeing that even Amazon admits that only 5% of Enterprise workloads are on any cloud provider, who is arguing that it is always the right choice?

I command a higher than (local) market salary for an individual contributor in no small part because of my expertise when it comes to AWS, but I’m the first person to tell someone who asks me for advice when it doesn’t make sense to bother with the complexity and cost of a cloud provider and just use a colo, VPS, or just use AWS Lightsail (AWS’s answer to companies like Linode).


> Do you know how much infrastructure you can buy for the fully allocated code of one employee?

$10k does not get you very far with AWS.


Where are you hiring developers for $10K? Even in a medium cost of living area you’re looking at $170K/year fully allocated.


Was there a specific service or reason with high cost that stood out?


The cost become a politically sensitive, and ultimately secretive, topic after it became clear that the costs were growing.

I was less involved with that team over time. I got the impression that serverless was quick and easy for the simple server-side operations that could be mapped easily to AWS' serverless building blocks. Of course, this is where they started and showed initial promise.

It broke down as they started taking more complicated server-side functions and forcing them into serverless style. It felt like a square peg / round hole situation that ballooned in complexity just to make it serverless.

If I did it again, I'd have the teams start with the most complicated server-side functions instead of picking the low hanging fruit first.

And, of course, be open to using regular old servers where it made sense to do so. AWS Serverless provided a lot of internal political ammunition because the cloud team could show up in a meeting with complicated diagrams exactly like what you see in this article, whereas previously we just showed single blocks for servers and another block for databases. The complexity did a great job of convincing execs that the AWS experts knew what they were doing, but then of course they were committed to going with the serverless options.


How do you evaluate cost without knowing usage patterns?


Lambda is only good in non mission critical services where you don't care about failures or scaling.

I've re-platformed so many projects these past few years because of these issues.

I wish more would take that to heart instead of marketing for Amazon.


I used Lambdas as our core logic processor for 4 years. We cared about every failure and the core motivation was it's scaling ability as our traffic was bursty and would scale 1000x on random days.

Lambda -- Serverless in general -- is fantastic when your application has any amount of downtime (on the order of 10 seconds). If there is even the briefest moment your system can be turned off, you'll save dramatically over a traditional server model.


This is actually a decent use case for it. I should have been more nuanced in my response.

I have seen more then one project fail when they use lamdas to serve apis. This is because of the cold start problem, but also because lambda does have scaling issues unless you work around them. By scaling I mean greater then 1000 tps.

All of the services performed within lambdas SLA but failed to meet the requirements that the project had.

The solution was always to wrap whatever function it called in a traditional app and deploy it using a contanerized solution where response times dramatically improved and services became more reliable.

The idea behind lambda is good, but it can't beat more traditional stacks at the moment. One could argue that most projects don't have these requirements, and I would agree, but the marketing behind lambda doesn't make that clear.


Examples?


Not a system architecture guy. When the industry says "serverless" they really mean hardware-less? It means you are renting on other people's servers?


I hate the "serverless" term, but I think people just use "serverless" for platforms where you can write some code and it will be automatically deployed, hosted and scaled on their server(s). So instead of "serverless" it's more like pre-configured managed server clusters running where you can only run restricted code/functions.

I personally don't understand who the target audience is, as small apps/companies very rarely (if ever) need to scale and large enterprise companies are probably better off hiring a sys admin and getting their own dedicated servers, which would lead to a lot more flexibility, performance and reduced costs.


Serverless means you don't manage servers. For instance, you create containers but don't deal with the hosts.


How do you mange cost, then? If you host your container on a DigitalOcean VPS, you can just spin one up in a few clicks and you have a consistent, predictable pricing. Who would be the target audience using "serverless" over spinning up a few virtual or dedicated servers? I haven't heard of any company that needed to instantly scale from 1k to 1m users overnight.


I'd hate to be the maintainer of that mess, good luck!


The entire system is described by a few cloudformation templates that you maintain in source control. Plus the actual application code running in the lambdas, these frameworks are super-thin.


Maintainable and reproducible, just like Gulpfiles.


Unless something changes in gulp, gulp packages, OS, your codebase, disk write speed or network latency...

Or was your comment sarcastic and I didn't get it? Whenever I tried to run the same gulp tasks on a different machine, it almost never worked first-try.


Any diagram like this should include an estimate of usage, and how much that usage will cost, for every "block of architecture" displayed. Absent that, it's impossible to tell whether it's a good idea or a bad idea.

Moreover the cloud providers should make it harder for an uncontrolled "block of architecture" to accidentally spend too much money. The focus seems to be on "always available", but depending on how fast it's spending your money, it might be better if it crashed.

Expanding further, perhaps each "block of architecture" should have a separate LLC dedicated to it to control billing liability. Incorporate your "lambda fanout" to keep it from bankrupting your "certificate manager" when it goes haywire, and let it go out of business separately.


AWS includes a boatload of free calls in both Lambda and API Gateway per account. So set your API Gateway call limit at/below the free limit and Bob’s your uncle.

What’s harder is when your app hits an Amazon limit and you have to figure out if it’s hard or soft, how quickly your TAM can do something about it, etc.


And it only costs $8000/month to host your website, yay!

The over use of microservices and cloud computing is terrible.


I wonder what the difference between managed serverless and having an EC2 instance w/ Firecracker loaded up on it would be.

I've used Lambda and I fear maintenance burdens due to the underlying environments migrating; they'll keep the runtime for a running function for you but if it's old I don't think you can update that without having to upgrade runtimes.

Also you get billed in 50ms increments at minimum, and so beyond that there's little incentive to make a function faster / a lot of incentive to stuff more logic into a function and make it less composable.

I think I'd rather define an AMI w/ Firecracker using Packer, and stand up a box with Terraform, and write logs to CloudWatch, and have a much more predictable billing cycle / greater control.


Rolling your own firecracker solution would put the control in your hands, but then aren't you limited to .metal EC2 instances (with all the cost increases that come with it)?


I don't get why you have to go with .metal EC2 instances only, can you elaborate?


Firecracker uses KVM to implement virtualization to launch containers inside "micro" VMs. As far as I know, Amazon only gives you virtualization extension access on bare metal instances.


> I've used Lambda and I fear maintenance burdens due to the underlying environments migrating; they'll keep the runtime for a running function for you but if it's old I don't think you can update that without having to upgrade runtimes.

AWS supports runtimes as long as there are upstream support security updates available [1]. If there are no security updates anymore, it's a good idea anyway to update to a newer version. If course you don't have to do that if you host it on your own, but it's a good idea nevertheless.

Also mind that most of the runtimes AWS deprecated already are old NodeJS versions. That's apparently because NodeJS only offers 30 months of support for their LTS versions [2]. So choosing a language which offers longer support (for example Python which offers 5 years of support [3]), might be a better choice if you don't want to update your software regularly.

Also you could still provide your own runtime, which wouldn't be deprecated at all via custom runtimes [4].

> Also you get billed in 50ms increments at minimum, and so beyond that there's little incentive to make a function faster / a lot of incentive to stuff more logic into a function and make it less composable.

You get billed in 100ms increments [5]. While I'd like to see smaller increments as well, I encountered more money thrown out of the window by overprovisioning memory (and therefore CPU) for AWS Lambda functions, than you could possibly save by optimizing for billable duration.

[1]: https://docs.aws.amazon.com/lambda/latest/dg/runtime-support...

[2]: https://nodejs.org/en/about/releases/

[3]: https://devguide.python.org/#status-of-python-branches

[4]: https://docs.aws.amazon.com/lambda/latest/dg/runtimes-custom...

[5]: https://aws.amazon.com/lambda/pricing/


Hmm, you make some good points w.r.t. the custom runtime. I do kind of wish that AWS would maintain its own LTS releases of things like node.js, but I guess it's difficult to say whether that would require source mutation from the user.

I do wish there were more examples of Lambda functions, like a marketplace of some sort. I'm using NodeJS only because certain tutorials offer examples in node.js (like static site authentication): https://douglasduhaime.com/posts/s3-lambda-auth.html

I actually logged into CloudWatch just to see that point on 50ms vs. 100ms and yeah you're right, it's 100 not 50. I'm pretty new to Lambda, and wonder whether the billing insights dashboard will give me the breakdown b/w duration and memory.

Thanks!


If you want to figure out if you overprovisioned memory for your AWS Lambda functions: There is a pretty handy pre-defined query in CloudWatch Insights for that.


If your load is within a reasonably small range, Lambda is quite expensive on a per-computation basis. But, if your load varies wildly (and especially unpredictably), you can spend less time worrying about that than on a build-similar on EC2 solution.


I'm using Lambda@Edge for static site auth, so I just need to pass some HTTP headers, and it still seems a bit overkill. I really like the ability to integrate with a CDN, I don't think I want to move away from that to something like nginx that could fall over and be more expensive, but being billed for every signin I do is a bit nerve-wracking. Maybe it's just new and I don't have enough experience with it yet.


That sounds like more of a maintenance nightmare. Why go through the hassle instead of just using Docker/Fargate with either ECS or EKS? If you don’t want to use Lambda?


I've only used Docker w/ ECS and EC2 task definitions, but creating and draining instances is pretty time-consuming and I'm guessing not responsive enough for serverless-specific workloads. If you use Packer and lock down your AMI and atomically update it, I don't see maintenance as being that big of a hassle, but I don't have too much experience in this regards and I value optionality to a high degree. I guess this would also depend on how many microVMs you might be able to reliably stick on an EC2 instance.


I have serverless functions that I've written a few years ago and forgot about. They are still working when needed, costing me nothing when not in use. How much does your server cost? Is it patched? secured? Does it have a name?


In my startup, we began with a monolithic back-end where we could refine our product and prove market fit. At first, this was indeed fast to develop and simple.

The company grew and the product saw heavier use. The original monolithic server became slow and problematic. Hosting cost became very high. Separate teams ran into conflicts while working on the server.

So we have incrementally switched to an FaaS-centric architecture, similar to the one in the article. The new design makes it easier to get excellent stability, performance, cost and observability. Now that the development team is a little bigger, FaaS/microservice design makes it easier for the teams to work without blocking each other. I’m sure that same benefits could be achieved in a traditional monolithic server with careful design and control of the development process, but it seems easier and quicker to accomplish with FaaS.

Indeed, a simple REST API sometimes requires dozens of handlers. One might worry that this is complicated because each handler is somehow running in a separate container... but that’s really the cloud provider’s problem. Execution is very finely segmented, while logical division of the code-base is generally free to follow natural domain divisions. It’s pure benefit to us: endpoints have excellent isolation. We have excellent control for behavior in overload scenarios.

FaaS handlers tend to be simpler than each action in the former monolithic server. With simplicity, it’s easier for us to ensure handlers follow a common set of best practices. With simplicity, consistency, and fine-grained isolation it is easier to reason about the performance and capacity of the system. While there are scenarios where FaaS is prohibitively expensive, our use stays within its sweet spot and is significantly less expensive than running the previous monolithic server.

A downside of FaaS is that it quickly became impossible to run the entire system on a developer laptop. We really need to run things on the cloud infrastructure for any kind of integration test. So far, it has been hard for us to provide developers or test infrastructure with independent Integration test sandboxes. Vendor lock-in is a problem: it will be hard for us to change cloud providers if ours becomes problematic.


I build a FaaS based project and it runs locally using docker-compose. When running locally it just switches to a persistent mode. It's not nearly as large as yours I suspect, but I'm wondering what the problem you're running into is?


Maybe we have too much reliance on proprietary technologies of our cloud host. More likely our problem has to do with the order in which we learned to build certain things relative to the growth of the organization. Or maybe we just haven’t tried hard enough to set up a local build of our system.


Ah, fair enough I could see that. We do use Dynamo, S3, and SQS, but there are local versions of all of those.


At what point do you just run a "normal" application instead of this giant spaghetti mess? Is this really cheaper compared to ecs or eks? What do you get out of this?

This can easily be 1-2 very simple services.


When your business API is one get data and one post data I find 100% serverless passable.

I'm starting to accrue a list of projects that started out looking like that, with product owners swearing blind that was the sum total of all they could want, but ended up becoming much more complex. Then you get into the pain of sequential lambda cold starts, etc and before you know it everyone thinks you're in too deep to turn around and the pain just won't go away.

Just one more AWS service and all our problems will be solved...


At best, our children will look back at this and laugh. Most likely they won’t though because this whole pile of garbage won’t exist anymore and will be long forgotten.


> and their pay-per-use pricing model

Sure, if you want to inflate developer costs. Isn't a huge portion of the cloud argument that people are expensive, infra is cheap so who cares if you spend a lot on AWS as long as you have fewer sysadmins? If you suddenly care how much you're spending on infrastructure but, apparently, don't care how much you spend on developers to work within such a convoluted system, why not exit the cloud at that point?


Everything is convoluted if you don’t know what you are doing.For instance, I find the entire modern front end development ecosystem a royal clusterf*&$ but people are using it effectively everyday.


I have to say I saw that diagram first and started reading with the understanding that this was satire. Quite shocked this seems to be for real.

At this point, it might be feasible to reclaim "serverless" as the domain of p2p systems, as some others here have suggested. The non-p2p folks have clearly given up on the "it's too complex to reason about" line of argument, so now might be the time. :)


I always wonder how well these systems will do once they become legacy. It seems to that if it's difficult to maintain old COBOL apps then maintaining such a complex architecture a few decades later will be a lot of fun.


All I see is lots of brittleness and potential failure points.


Details?


It is known that the more moving parts a system has, the more likely it is for one of them to fail, thus for the entire system to malfunction.


So do you have any practical examples from actually implementing anything or is this just all theory?


There were several articles regarding microservices recently, for example: https://www.infoq.com/news/2020/04/microservices-back-again/

Also, this "simple is better" theory has been proven to be true again and again in a vast range of disciplines and use cases.


Because it’s an anecdote on the internet it must be true.

I can give you just an opposite anecdote - and from personal experience - not from what I found on the internet.

We have a lot of microservices that are used by our relatively low volume website, but also used by our sporadic large batch jobs that ingest files from our clients and our APIs are sold to our large B2B clients for their high traffic websites and mobile apps.

When we get a new client wanting to use one of our microservices, usage can spike noticeably. We have deployed our APIs to both Lambda for our batch jobs so we can “scale down to 0” and scale up like crazy but latency is not a concern and we host them individually on Fargate (Serverless Docker). Would you suggest that it would be architecturally better to have a monolith where we couldn’t scale and release changes granularly per API?


You constructed a straw man and then attacked it. There are some places where an architecture as you described makes a lot of sense and is probably the best solution. However, it does not change the reality that this type of architecture is often brittle and hard to debug because the experience needed to actually operate it is not present within the company that adopts it. Similarly the lego-block nature of these cloud components leads to further quickly glomming on of additional pieces without understanding the overall impact those pieces will have on the existing system.

I've seen it time and time again. Anecdotal of course, but we're all just shooting off anecdotes on HN anyways.


So I constructed a straw man when the original post was a very hand wavy It is known that the more moving parts a system has, the more likely it is for one of them to fail, thus for the entire system to malfunction.

But how pray tail did I “construct a strawman with a one word reply” - “details?”

And then when I asked for personal experience a random article on the internet was found. At least I was able to speak from personal experience. I can also go into great detail about best practices as far as deployments, logging, security, troubleshooting, testing (functional and scaling) and tracing without citing random articles....


How does the bills looks like for typical / best practice serverless application ?


One of the biggest problems with serverless on AWS with cloud formation is how ridiculously slow your deployments get. The other problem is long tail latencies on lambda invocations, it gets worse if you want to run services inside a VPC

Serverless isn’t going anywhere until we get something like kubernetes for functions.


TFA needs more exclamation points!


The premise cloud has been sold to business stakeholders under has been dramatically reduced cost mainly through not having to staff up for a number of technical positions, and it works for certain use cases. But I'm not seeing real payoffs for many less-tactical, broad-scale use cases.

What appears to happen is the "get going coding" part is dramatically accelerated, that saves on the order of days of waiting; a few weeks/months if new tech stacks the organization hasn't worked with before are in play. But man, once in production, I'm still puzzling over how to manage the operational costs.

There is a huge disconnect in the business community in their perception of cloud that I'm collaborating with. They're taking the delivered "get coding fast" benefits which do save time and money. They see we're only paying for infrastructure the moment devs start hammering fingers to keyboards so to speak, and not waiting around for teams to stand up that infrastructure and paying for all of that while it's not delivering benefit. They take that experience and are applying the same expectations to the operational, production support side.

I don't know how others are doing it delivering on the operational and production support benefits, but nowhere near the same scale of savings are happening so far for teams I work with. What I'm experiencing is while there are marginal savings on the nodes of services (the actual service itself), we're adding a lot to staff payroll at high-end pay scales and spending a lot of their time on all the connections and complexity between them; between troubleshooting and adding new capabilities (more troubleshooting), it is eating up a lot of expensive staff time.

I'm pushing for greater automation through DevOps, but that's encountering lots of resistance now with the economic outlook because DevOps generates a net increase in absolute payroll even with a marginal decrease in operational staffing once the automation is burned-in. In fact, because automation takes away the mundane operational aspects, we end up having to cost-justify far more capable and expensive operational staff that can handle the corner-cases that emerge as business-as-usual activity, after automation ate away all the usual cases. While headcount goes down, total payroll goes up, and budget-watchers resist the idea that they're saving overall by delivering more stable, consistent, quality service with more feature points per dollar, and if we continue to carry on the old way our costs would be easily triple to order of magnitude more. This is a marketing/education perception problem to work upon nearly daily.

I love working with the cloud tech stack, but at the end of the day I have to show my business stakeholders it really did save net budget. For judiciously-targeted and considered projects where I've had a lot of input tactically it has worked really well for them and me. For strategic-level "Cloud All The Things!!@!%!" initiatives, the cost savings get way murkier to discern; this is the perception problem on steroids. I'd love links to detailed readings from people who have been in the trenches of successful strategic-scale "we went to the cloud whole hog" efforts, describing the traps and pitfalls to avoid and how they showed a net benefit.


This article is a blueprint for not only creating a massive amount of technical debt, but ensuring your startup's entire opex is spent on AWS.


This article is click bait for everyone used to mono repos and traditional software architecture. It gives them a place to rip on new technologies, much like react and the trope about how many js frameworks there are. The truth is software like this is the future, no amount of get off my lawn complaining will change that


Can you elaborate why you consider this to be the future? For almost all software projects which are not at Google-scale you can always get away with just hosting a simple app on a few VPS or dedicated servers and be able to handle millions of monthly users.


[flagged]


Serverless is a pretty ubiquitous term meaning the servers are fully managed for you and abstract services are made available instead.


On servers you don't manage.


Right, rather than managing your own server with a known interface and familiar commands, you get the pleasure of managing everything through a single bloated configuration file typically with very poor documentation.


There are plenty of plug-ins/Linters for VSCode that provide inline documentation for CF.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: