I'm wondering about this now I guess it is old fashioned now to use environment variables and bare EC2 servers, managing your own APIs and websockets/DB on same server as opposed to breaking everything out. You need to use cloud formation and "oh did you know there is an AWS service for that?" Then you are using 5 services instead of 2. This twelve factor app concept. Don't know when is the right time to do this/at what scale.
My last startup was spending stupid money on AWS services and horizontal scaling for an app which had, when I left, about 100 users and maybe 2 concurrent at a given time. And they had been doing this since before I joined when the numbers were much lower. The complexity was idiotic and no one but the devops guy who set it up could grok it. We still managed to have frequent downtime
my two cents : until you hit scaling wall (and when you will congratulations, you are either successfull or a video streaming platform), a big server to upgrade is the best way to go and focus on building product.
Then you hit performance problems, any sysadmin could help you handle 2x/4x/10x scaling with simple separations of service and maybe some hours of downtime. In the meantime you probably have weeks/months to think about going really crazy with your infrastructure.
What kind of numbers am I looking for? Thousands of users? Ops/sec? I realize it is partly due to what your thing is doing eg. website vs. a complicated app.
Yeah it is a good problem to have assuming you have cash flow.
Of course it may vary a lot but i would say a hundred of thousands of users (100 000).
To explain the number, I assume a user will spend 1% of it's time on your app (that mean the average user spend around 15min per day (EVERY DAY) on your app, that may not sound impressive but it aldready is) and an nginx server on a powerfull machine could handle around 1k connection at the same time.
If you want to dig about an exemple of number far less abstract I remember stack overflow published their settings :
When your service can no longer process requests as fast as they come in, you've hit a wall. Until then the simple solution is to allocate more resources to your service (i.e. scale vertically).
There's obviously a balance to be struck but the more often you do something the easier it is to do. If you have a simple app with a db, backend, frontend and proxy, vertically scaling every time you hit that wall is going to be very painful. A little complexity goes a long way - using a managed system adds a very small amount of complexity in exchange for some breathing room when you need it. The last thing you want when your service is in a death spiral is to start thinking about the practicality of migrating your db and taking the hours of downtime/working overnight/weekends to do it at a convenient time for your customers.