Hacker News new | past | comments | ask | show | jobs | submit login
Scalability is overrated (waseem.substack.com)
628 points by mooreds on Feb 4, 2023 | hide | past | favorite | 232 comments



After nearly two decades in early stage startups I couldn’t agree more. Looking back, I know we often built too much too soon, and had too much confidence that we were building the right thing.

These days I often advise would be founders to start with doing their idea manually with just the minimum amount of tech.

Maybe just a google sheet and answering emails and a handful of “customers.” If you can’t give people a good experience with a totally custom personal experience, they’re not going to like it any more when it’s automated and scalable.


too much confidence that we were building the right thing

SW is weird in that way, we can basically build houses that nobody wants to live in. Thing is you probably were building the right thing, structurally speaking.

My rule over time has become "structure follows complexity" i.e. Anticipate complexity where possible but don't apply the engineering structure until the complexity really exists. If you built an MVP house, don't add the second bathroom until you've got customers complaining about deadlocks on the first.

It's tough because that basically runs counter to the Spirit of an Engineer, which is just to imagine the grandest well-thought-out + organized thing and march toward creating it.

The bonus of having two decades+ in building SW though is you start to see the patterns and realize that the same fundamental needs drive everything, so the intersection of building something great with customer needs becomes less of a mystery and more of a "how fast can we juice this thing". At that point I think scalability prep is like compound interest.


There’s a ninety ten rule with structure that can be hard to communicate in code reviews. There’s a subtle difference between not adding unnecessary structure and writing code that is antagonistic to it being added later. Opening a PR where someone added a feature or fixed a bug exactly where I would have done so is one of the best feelings. Especially when they’re changing my code. Either they’re a very good coworker or I’ve laid the groundwork well. Either of these are good news.

Very recently I’ve come to think of my architecture style as Jazz. Like the old trope about listening to the notes he isn’t playing. But also the improvisational nature and the flexibility to adapt to new information.


> There’s a subtle difference between not adding unnecessary structure and writing code that is antagonistic to it being added later

This is such a wonderfully concise way of describing something that I try to teach people.

I have come to use the phrase "no regrets". Leave things out, but do it in a way that you won't regret it, if you need to add it later.

Another phrase that I read somewhere else was, "Weeks of programming can save you hours of planning" (pretty sure I read it on HN, and I wish I had kept the link). The point being that when you decide to leave something out, or when you decide to intentionally do a half-assed job of something, you should still think about how someone in the future would go about efficiently fixing it.

If you do it right, there's missing functionality, but little or no technical debt. If you do it wrong, you miss out on functionality and also take on technical debt. And I think that's the subtlety that you speak of.


Most people who push for early design understand that they have more control of the project at the beginning, before deadlines accelerate, before a bunch of bad hires and competing priorities. And they are definitely speaking from a place of regrets. This hurt so much last time I vowed never again.

There’s a list of things I won’t try to add on later. It’s smaller than it used to be, but the shiv I brandish at you for suggesting we cut those corners is also much scarier.


But the later you design something the more info at hand you have to make the design fit better.


Bingo.

Jim Highsmith has the best theory I’ve heard in the subject. His opinion is that the reason we keeps having the same unwinnable arguments is because we think in terms of solving problems but the real issue in front of us is resolving paradoxes. There are no neat answers that fit in a short essay box on the test everything depends, and everything depends on everything, and something is always changing.

What the PTSD reactions come from is other people making decisions on behalf of the rest of the team without consent. But like most things in software it’s a 2 dimensional problem not a 1 dimensional one. The trick is to delay irreversible decisions as long as possible and get all responsible parties involved.

The flip side of the coin is that reversible decisions should be made as cheaply as possible. For instance my “Kobayashi Maru” solution to the bike shedding problem is that the team should have budgeted three colors of paint into the bike shed budget, picked green for the first coat, and moved on immediately. That $200 is far cheaper than the time and importantly energy that would go into discussing this at a meeting with senior staff members present. If you start thinking about all of the $5000 conversations you have to solve $2000 problems it’ll drive you mad. I almost recommend you don’t look at it, for your own safety. Almost.


> The trick is to delay irreversible decisions as long as possible and get all responsible parties involved

This is exactly right, but the irreversible decisions are often horizontal slices of software design that underlie other decisions - thing like authz, IPC, API design etc. If these questions can be identified and resolved up front, the cost of building vertical slices - features - goes way down. You're not tempted to reinvent the wheel, you can focus on the business logic.

What I'm saying is: it's worth spending time planning the foundations of a product. That includes the technology choices, but also the high level logical layout of the whole product. If you approach this foundational stuff like any other project, then if you're doing it right, you'll "get all responsible parties involved", like you should.

But defer the detailed planning of domain logic until you're actually ready to write it, and remain flexible with the plans you do make. "Plans are nothing; planning is everything".

Because like the GP said,

> the later you design something the more info at hand you have to make the design fit better

I find it remarkable how often engineers just leap in and start coding, and worry about the foundations later. Or never. I suspect this is one reason why rewrites are so common. If you start with bad foundations, it's often easier to blow up the whole building and start again.


> I find it remarkable how often engineers just leap in and start coding, and worry about the foundations later. Or never. I suspect this is one reason why rewrites are so common. If you start with bad foundations, it's often easier to blow up the whole building and start again.

To be fair, leaping in and starting coding allows you to discover the problem better. The problems start where you get stuck with your initial design and don't adjust/remake it once you explored some ideas, maybe found some new along the way and found out what doesn't fit.

That is of course problematic if you have deadline driven development because rewriting last 2 weeks of code might seem like total waste even if then-better design gonna save you much more even in near feature.

Not rewriting code when design flaws appear will lead to rewriting it eventually later and more expensive.

Of course, all of that depends on how much you know about domain and the problem. Exploring existing solutions and their flaws or having really detailed requirements might be much better approach that doesn't require starting to write code.


Yes, 100%.

Some things you can cut corners and save time.

Some things, when you cut the corners off you also cut your fingers off.

We have all promised that we will “fix it later”. But whatever we do now is what’s done. Fixing it later is a promise whose resolution is often not in our control.


I suspect this dynamic is a big part of why Joel Spolsky jokes about transforming from a technical leader into an amateur psychologist. We are all lying to ourselves first and to others second.

Or as Feynman said, the trick is not to fool yourself, and you’re the easiest person to fool.

Or a personal favorite, unattributed: I know the guy who made this plan [me] and he’s full of shit.


I believe that quote is thought to have come from Frank Westheimer or at least he is associated with an earlier version of it. “Why spend a day in the library when you can learn the same thing by working in the laboratory for a month?”

https://news.harvard.edu/gazette/story/2007/04/frank-h-westh...


I’m quite fond of a version of this somewhat popular in Ops circles: why spend an hour doing a task that you can spend a week automating?


It's definitely on HN, found the quote on the following page: https://news.ycombinator.com/item?id=12001925

Not sure what the actual source is.


"YAGNI, but plan to be wrong"


Jazz is an excellent analogy. Been in software dev for 30 years. It's hard to explain to hirers that one can only play jazz well (i.e. know what matters in a software project as you go) after a long time, so trust me :)

I tried to bottle it in this document: https://pcblues.com/assets/pdf/approaching_software_projects...


I realized at some point I got put in charge of things to shut me up. Someone would ask me why and I would launch into a treatise on all of the confounding factors and cognitive science reasons and and and OMG shut up already. Clearly you care about this more than I do, why don’t you do it?

Don’t put a hyperkinetic person in charge of something thinking that you’ll call their bluff.


>It's tough because that basically runs counter to the Spirit of an Engineer, which is just to imagine the grandest well-thought-out + organized thing and march toward creating it.

I'm not sure if Spirit of an Engineer is a book, essay, or some prior set of principles I'm unaware of but if it's not (and even if it is), I tend to disagree here.

The spirit of of an engineer, in my opinion, is to solve problems and problems for people--that's what I and most engineers I've spoken with have been drawn to (ignoring those purely seeking a high paying profession here since were talking about "spirit"). When I was youthful I sought out technical complexity and grandiose solutions (I admittedly still admire it when it's done correctly). None the less, to me, most of it is wasteful and at the end of the day I want to build the most useful solution for people to make their lives easier.

Some problems require high complexity and glass cathedrals in terms of technical solutions, most really don't, at least most of the problems I've been exposed to.


Actual engineers generally do try to solve problems for people. But the engineers understanding of the people that they are solving it for is generally rather flawed. And therefore their solutions frequently are difficult for the intended users for reasons that the engineer can't see.

And yes, engineers really do err on the side of designing for future problems and future users that often will never exist. This effect is worst on a rewrite of a system - look up the "second system effect". We tend to only truly get pragmatic tradeoffs the third and later times.


Maybe “spirit of an engineer” could be better phrased as “visionary mindset”. Lots of engineers I’ve worked with imagine potential problems and treat them as requirements to solve even though the problem may never actually materialize.


Figuring out which problems to solve before they happen and ignoring the ones that probably won't happen is the cosmic dance of software development. The Serenity Prayer states "... accept the things I cannot change, courage to change the things I can, and wisdom to know the difference." Except change "cannot"/"can" to "don't need to at least not yet"/"need to now because it's about to be an issue!".

Make a bare bones version and see if it sticks, and go from there. Both the Progressive Enhancement philosophy and the Lean Startup methodology demand the minimal before building further. Ex: Will anyone click this survey link? Who knows but you'll learn real quick if you offer them $50. If they still don't, then it probably ain't gonna get clicked ever. If they do, then you found at least _something_ viable.


I'm not sure if Spirit of an Engineer is a book, essay, or some prior set of principles I'm unaware of

Good idea, you are welcome to help me hone my later chapters. For me the engineering spirit - especially in software - is being a creator/builder. At the core, someone who takes the Legos in front of them and creates with those pieces.

A couple people took my "grandest" comment as a nod toward scale/size/narcissism, which is reasonable, and I meant it more in terms of total architectural completeness - i.e. a system perfectly fulfilling all the requirements that necessitated its creation. A blade of grass is not "grand" in the primary sense, but it still is grand + magnificent by its own perfection in design.


It's always a balance. You need some structure to avoid spaghetti, maybe a bit more to allow for plausible extensibility (microscopic pre-abstraction) but not too much you're adding fat.


I call it "keeping an escape hatch".

The potential need to scale and add features will inform my design decisions, but they will not direct them.


"keeping an escape hatch". I like that. A former colleague said something similar that also stuck with me. Good software architecture allows you to delay decisions.


I use "always keep an out", which is taken from poker terminology, but I like yours as well.


China proves that building cities worth of actual houses that nobody wants to live in is a thing.


We talking Dandong, city at the border of China and North Korea? They built everything including a giant bridge, high-rise apartments, shopping centers, parks, and a ferris wheel. But there's no people and it's been ~10 years now. It is, however, still fully maintained. [0]

[0] https://www.globaltimes.cn/content/974210.shtml


wikipedia says 2M people live there: https://en.wikipedia.org/wiki/Dandong


Yup. Literally first sentence of my citation "Dandong New City, a district of Dandong which was built from scratch to be a China-North Korean trade hub, now stands mostly empty." [0]

[0] https://www.globaltimes.cn/content/974210.shtml


A large number of those cities are filled, and various of them are 'tier 2' cities now. That means 10 million people.


A large number of them aren't and they're crumbling apart after only a couple of years being built.

https://youtu.be/XopSDJq6w8E


A large number of them being filled and another large number of them YET not being filled still makes dozens of millions of cities coming to being out of nowhere and providing people with affordable, well-planned urban environments instead of using the demand to skyrocket the real estate prices at the cost of causing homelessness and suffering.

That's responsible, future-oriented development.


Wikipedia has only 7 Chinese cities listed with a population over 10 million


Yours is an incredibly good comment. The comparison software/architecture is something that I have encountered here and there over the years, but you brought it to a new level. "don't add a second bathroom until you ve got customers complaing about deadlocks on the first". Beautiful. Thanks


> Spirit of an Engineer, which is just to imagine the grandest well-thought-out + organized thing and march toward creating it.

i think that's just delusions of grandeur. Not everybody would be (or should be) building the sistine chapel.


you really couldn't write software?

I’d vote for SWE if we are taking up acronym namespace


Writing "SW" instead of "software" is just another example of premature optimization.


I think this kind of concierge / person behind the curtain approach only works for a certain type of business though, usually services where aspects of the service pipeline can either be automated or not.

For purer technology / product businesses, how do you do this, fundamentally? How would Google have manually mocked up their early product? How would Facebook? Github? Tesla for that matter?

Sometimes you really do just unavoidably have to build the product out before testing the market, and if it doesn't work, just accept the sunk cost and throw it away - and sometimes fail completely as a result.

I don't see this as a fundamentally solvable inefficiency, just part of how tech product startups work, and the very tradeoff that must be made to allow for innovation to happen.


> how do you do this, fundamentally? How would

there are still smaller pieces you can MVP to a smaller audience before launching it to the world.

> Google have manually mocked up their early product? How would

Crawl an intentional community (remember webrings?) or other small directed subset of the web and see if you're able to get better search results using terms you know exist in the corpus, rather than all of the Internet.

> Facebook?

They had Myspace as an example so the idea wasn't exactly unproven.

> Github?

Kernel developers were already using the software, successfully, all (for large values of "all") they did was bring it to a wider market with a better UI.

> Tesla for that matter?

People don't get this, but Tesla's biggest achievement to date, isn't the cars themselves, but the factory that they're built in. There's no way to MVP an entire factory, but building a car in a bespoke, pre-assembly fashion is totally possible and totally doesn't scale.

If you're asking if electric cars were known to work, the first one came out in 1832. If you're asking about product-market fit, they keep selling cars they haven't made yet, just to gauge demand. Aka where's my Cybertruck!?


> just to gauge demand

The hundreds of millions of USD as an interest free loan seemed more important than anything else.


> > Google have manually mocked up their early product? How would

> Crawl an intentional community (remember webrings?) or other small directed subset of the web and see if you're able to get better search results using terms you know exist in the corpus, rather than all of the Internet.

But that isn't a mock up, it's the real thing but on a smaller dataset. If you're going to do the real thing anyway, why not run it on all the data you can?

After all, the throttling factor to release is in the engine, not in the dataset. If you're going to write the full engine anyway, there's nothing to be saved by limiting it to a subset of the data.


> why not run it on all the data you can

Because more data requires more cleaning and standardization (with more edge cases). It also requires a bigger scale to obtain and process.


Most startups have a red-ocean indicator in the space to point at when telling people about their problem. Most startups fail.


are those remotely correlated though?


Google was manually editing and merging Perl scripts to get web scraping data almost every day early on. Yahoo manually verified content and added it to lists with hand typed summaries fir years, even up to the time Google came in the scene.

You are right that some businesses scale better doing high touch customer service like this. In the case of Pilot, you have the lead sales guy and accounting domain expert (CEO) asking SaaS customers to schedule enterprise service sales calls with him.

Which makes total sense. What he’s not automating is setting up Marketo or Hubspot and Drip or Constant Contact and committing to some CRM system that is both impersonal and adds friction when you’re going for quality over quantity.

He could hire BDRs and CSMs or outsource and spam and set up AI and knowledge bases, and possibly scale up faster.

But not only would that take away the personal touch and competitive advantage, he’d lose the opportunity to educate himself on what real customers really need.

Not to mention all the time spent evaluating tools and setting up automation and negotiating contracts that lock you in to a specific process that might not be what you want 2 years down the line when your business changes.


This seems like a big reason why taking on investors too early can be harmful. You can't be sitting on runway money and taking baby steps with it, even though sometimes that's the right thing to do.


> with just the minimum amount of tech.

as an engineer this part I entirely agree with. Far too often CEO driven development leads to pivoting 1Month into 9 month projects, 9 months in a row. such that you have nothing deployed when you at least could have had 1 thing, but instead you have none.

This can be solved as much by discipline to stay on track as by choosing to reduce scope to 1 month ideas only.


Steve Blank has talked about this for decades. A startup is about validating you have a product that customers want. You want to connect with customers and prove you have a market before scaling. https://steveblank.com/tag/customer-validation/


It really depends on the situation. No one at a successful startup says, we spent too much time on scalability in the beginning. They probably say the opposite. I worked at one company that we couldn’t rack servers fast enough to keep up with our growth. Our competitor who was the market leader had to turn off the ability for new customers to signup to prevent their site from crashing under load. We ended up surpassing them.

Getting a million customers isn’t that challenging and really a function of money. With $50 CPA, you just need $50m. 1m customers will take down most sites unless you do some optimizations.


> It really depends on the situation. No one at a successful startup says, we spent too much time on scalability in the beginning. They probably say the opposite.

Depends. No one at a unicorn says that, but I've seen plenty of startups that. The unicorn is the outlier, not the rule.

> I worked at one company that we couldn’t rack servers fast enough to keep up with our growth. Our competitor who was the market leader had to turn off the ability for new customers to signup to prevent their site from crashing under load. We ended up surpassing them.

Were they still in business at the end of all this? Because it turned out you made a bet, and it came through. The odds are against this, though.

> Getting a million customers isn’t that challenging and really a function of money. With $50 CPA, you just need $50m. 1m customers will take down most sites unless you do some optimizations.

A million in what time period, and what are the usage patterns? Because from what I've seen, as long as the backend isn't in Python, Ruby, PHP or similar, a single hosted server easily vertically scaled to handle more than 2m million users, because it a) didn't need to handle 2m concurrent connections, and b) usage patterns were infrequent enough (users interacted only every few minutes at most).


I think you’re both right, even if I’d guesstimate the situation you encountered is very, very rare.

You should make architectural choices that CAN be scaled within reason, on demand and ideally incrementally. That doesn’t mean you need to build your app on Kubernetes and automate welcome emails, surveys, metrics, A/B testing etc. But you probably shouldn’t build on top of fundamentally non-scalable systems either.

This is provided you’re in an environment that can reasonably scale to 1M users or whatever, which isn’t true for all domains. Say B2B where that last B is only 10000 people world wide.


It’s difficult to explain to people what you said without the experience of living in the failures of early optimization. K8s from day one, a-b testing, programmatic quality control without any paying clients on the product, scrum master and a PM and a product owner for 5 devs! Whole thing feel like a Ponzi scheme to keep overseas staff employed, perpetrated by the overseas development shop.

Unless you built something absolutely atrocious, scaling it vs getting it ‘right’ or ‘right enough’, is trivial. You couldn’t sign up to Facebook if you weren’t a college student.

Chatgpt overloads daily so what? When your app becomes necessary to enough people, bring in your k8s evangelists and shove into x cloud provider. But by god it doesn’t take 30 people to write a marketable mvp.


Maybe I had the fortunate luck that we never had issues of finding market fit. We also had lots of investment outside of software dev. Fintech for example requires significant investment in compliance(eg: kyc).

Chatgpt doesn’t have competitors. Customers can’t go elsewhere. Twitter had the same protection when they launched, there wasn’t a short form social network so they could have full day outages. Friendster would be the opposite, when they closed signups, customers just went to MySpace.

Scale goes beyond the ability to support traffic. It also means efficiently handle data/traffic. Heap is a good example, great product but they haven’t figured out to provide that level of analytics without significant costs which are passed on to the customer.


Couldn’t agree more.


What's wrong with Kubernetes from day one? I'd use it from day one everywhere. Avoiding kubernetes is a sign of technical incompetence.


Nah, statements like the above are a sign of lack of experience, missing business understanding and/or hubris.


> What's wrong with Kubernetes from day one? I'd use it from day one everywhere. Avoiding kubernetes is a sign of technical incompetence.

Everywhere? Sounds like you're cargo culting.

There's a use-case for it, and that use-case is when your load is so high you need the orchestration.


Kunerbetes works just fine on a single node. And it provides a lot of good things over docker run scripts or docker-compose. Kubernetes basically is an OS for OCI images. Even if you don't need orchestration, it still provides great experience. And when you would need orchestration, you will grow organically.

Right now I'm using cluster of 2 nodes (master, worker). No reliability, but convenience is good and we have very simple path to more nodes when we would need it.


I strongly agree. Having Docker images of your application is very convenient gor deployment and fairly easy to set up for anyone who has done so before. Then once you have these images anyway, hosting in (managed) k8s in the cloud is a breeze.

I'm not sure why you are getting downvoted. It would be great to hear of some of the bad experiences of those that disagree. Were they running their own cluster instead of a managed one? Or doing k8s with a team where nobody had any experience? Or run into some helm-chart spaghetti from hell?


> Kunerbetes works just fine on a single node.

But ... no one claimed it didn't, did they?

The claim was that the unnecessary complexity (of which Kubernetes was just one) from day 1 consumes valuable time that could be spent on making the product better.


Kubernetes brings very little unnecessary complexity and does not consume valuable time (if there's at least one person in the team who knows it well enough).

Kubernetes might start consume valuable time if you would start to bring niceties. Like GitOps, CI, and so on. Those things are good but not necessarily justify time spent from the start. So you can start with simple deployment yaml which you can write with 10 minutes and apply it manually whenever you need to update it.

Kubernetes almost out of the box brings to you:

ingress. You don't need to set up your own nginx reverse router. This task alone could consume days of setting up and maintaining. I work with organization which spends 2-3 hours to update their nginx if they need new route. It must be done in off-hours because they break it every single time they touch it and then they need to fix it. Kubernetes ingress is a breeze of fresh air after this nightmare.

cert-manager. Again, I work with organization which failed at setting up certbot. Their certs break all the time. cert-manager just works.

zero-downtime rollout restart. This is absolutely nice thing which works from the box and hard to achieve manually.

kubernetes-dashboard. There're other dashboards as well. I think that lens is very popular although I didn't use it. It allows for people who know very little about kubernetes to interact with it productively. For example in my team we have developers who use kube-dashboard to view logs, to exec into pod and check things, to update deployment, change envs. They just push buttons in GUI, they have no idea about any yamls or kubectl. And it works.

And those things in my experience so far do not bring any unnecessary complexities. Quite the opposite. They solve lots of inevitable complexities for very little cost.


Is hosting in kubernetes really more time consuming than one of the proprietary PaaS cloud hosting options? What better alternative would you propose?


In you're case, you clearly already had Product-Market Fit. Most startups do not. It's far more valuable to build quick than it is to build for scale until you know PMF.


Post product market fit scalability is critical. But a lot of effort is spent on things that will never get traction in startup land. Before you get traction the times at bat is the more important metric.


I think this is the right approach.

There is however a pitfall that this approach seems to lead to unless you actively prevent it. Roughly the process is 1. You hire someone to take care of things manually. 2. As you grow, your devs are worrying about other things and so instead of automating you hire more and more people to handle the toil that should have been automated along the way. 3. Because these jobs are handled by one group of people and developers are working on customer facing features, the communication and prioritization necessary to fix the issues rarely come up.


Hire an experienced (0-1) Product Designer and have them work out the features, design, system, interaction flow, and research. Test and validate you have the right thing before you start building. This can be done well in 6 weeks and it'll save years and years of debt.


Where do you find someone like this?


Anywhere you'd find any other employee. Product designers aren't rare, it's just that immature development organizations usually use designers wrong. Bringing designers in at the end to add polish to the user's experience nearly guarantees they'll be polishing a turd. A big part of a designers value is figuring out what problem users need to solve with your software and what tools they need to do so.


100% on the money


In other words, founders keep insisting on not reading the lean startup, or ignoring all the advice given there.

Ignoring history is a guaranteed path towards repeating it.


Is it possible to generate enough excitement with just a Google sheet however? I suppose if it is B2B it would be fine.


Well don't tell the public there's a Turk under that AI chess board.


What type of business is this? It would be hilarious to have a human reading through http requests and writing out responses!


They seem to be discussing the more complex, business- and customer-driven aspects of technology, not the simple process of building a CRUD application. DoNotPay.com is one of these examples. It's a straightforward CRUD software where one enters parameters to establish a contract; however, despite claims to the contrary, a human being actually creates the contract form and sends it back 24 to 72 hours later. The entire enterprise embodies the idea of a Mechanical Turk.


Except it doesn't work when your project ends up on HN's front page and your server gets hugged.


Not being able to deal with low quality traffic spikes is one of those things that engineers see as a failure, but aren’t necessarily so from a business perspective when compared with the potential cost of a scalable infrastructure.

Not that all spikes are low quality, or even that one on HN ones are. I reckon for some products they are.


The HN effect is overrated. I had two articles on the front page at the same time and never exceeded 1000 requests per minute. And very very few of those visits turn into repeat users.

Edit: (The load was easily handled by a single Heroku dyno. I turned it up to two dynos overnight, just in case, but it turned out to be unnecessary.)


I do at least 2 of these calls per day. Interacting with end users is the only way to maintain PMF. IMO anyone too big, cool, or fast-growing… sadly mistaken.


Hard to get funding with such a pitch.


For the 1996 Olympics, the startup I was working at (Wayfarer Communications, you don't remember them) did a super fast, lightweight "medal watcher" service that took in event feeds from the games and broadcast them in real time to desktop apps.

We spent a fair amount of time building something scalable, rented some co-lo space, and in a couple of weeks had something we were pretty sure would scale to tens of thousands of users. We got . . . like a few dozen. Oops.

Never even fazed our sales guy. His pitch was something like: "We built this proof of concept system to scale to ten, twenty thousand users. Do you know how many we got?"

Customer: <non-committal nod>

Sales guy: "We got thirty."

I could never do sales, I just don't have that mindset.

(We had some pretty nifty technology for doing real-time messaging, it actually worked and wasn't a technical failure, but it was sure hard to get market share with just a bunch of APIs and some demo apps. After I left the place it was bought by a company that was in turn bought, and that's how I turned a ton of cash into 17 shares of Oracle, -spit-).


It's easy to say "scalability is overrated" if you've dealt with unnecessary k8s deployments.

It's easy to say "scalability is underrated" if you've dealt with businesses built on hundreds of standalone PHP/Perl/Python/JS scripts with zero unifying architecture, where migrating a data schema is virtually impossible because of the state of technical anarchy.

Scaling is hard.


You damned kids never heard campfire stories of being brought in to consult on “scaling” a bespoke system for a growing company that has been limping along with VBScript in an Excel spreadsheet for five years past when it was still tenable to do so. The amount of feature parity required to get out of that blind alley often killed the project and injured the company. Some lived it, the rest of us knew someone who had.

There was a brief moment after Oracle purchased Sun where I thought Oracle might try to engineer am Excel competitor on Open Office that had an easier migration to a real database (Oracle), but that dream died early.


FWIW I have a friend who works with market analysis and his excel scripts save an enormous amount of manual labor, today. It was the best tool for the job, for them. There are even international competitions in excel automation, which is kinda funny but also points to how far ahead Excel is for actual business uses.

Are there scaling issues? Version control issues? Absolutely! But again, that doesn’t mean that it’s not the best tool for the job.

It’s easy to mount the highest horse from a technical perspective, but as engineers it’s also our responsibility to be curious about what people are using voluntarily, and not just what we think they should be using.


Microsoft's commitment to not adding modern accommodations to Excel, Access and VBA is infuriating.

A git integration in the editor, and a decent test runner. Some API hooks do Jetbrains can do something with it maybe.


Office documents are zip files - you could get a start on this by version-controlling the contents individually.


I was this many years old when I learned that tar and zip on linux have an rsync compatibility mode that tries to do some cleverness with compression blocks to make it easier to diff two archives.


I thought they were xml?


XML files in a zip file (plus images and other things). There are many individual xml files in a single zip file, each file (part) is generally responsible for different area, e.g. one file for one sheet cells, one file for style definition, one for comments, one for workbook structure and so on.

The whole structure is called Open Packaging Conventions and it is implemented in `System.IO.Packaging`.


As recently as in 2014 my college friend was asked to help a company which was running on Excel and reached the point you mentioned.

IIRC he just optimized their sheets and set up a CRM for the things that had no business being stored in an Excel file and that was already very helpful.

Attempts at writing actual software to deal with the problem would have failed and everyone was acutely aware of that.


At my last job, my work was basically to read data from proprietary APIs and shit an excel table.

I think I was hit by everything. Easy stuff at first: Crlf issues, XML Apis, weird rpc apis.

Then, halfway through the project, the results had to change. Not only the order, datatype and headers (I actually overengineered the first version so those were configurable), but the format, duplicate on multiple columns (and empty fields counted as duplicate...). Worst job I've ever done. I'm also disappointed in myself tbh.

But now I'm a bit of an expert on excel format issue and limitations, and that already helped me.


FWIW IRC ANAL, a couple jobs ago there was a policy about only hiring copy/paste from Stack Overflow headcounts. Since good code was literally not allowed, scaling meant more servers with more RAM.

They eventually replaced me with a handful of their relatives, or at least it seemed. It was a lot of fun watching how many LOC one could "write" to request some trivial data from an endpoint. Luckily I was only golfing the latter half of that tenure so I have no regrets.


Scaling is hard. True.

But the question is, are you trying to make your life miserable by scaling before exhausting other options?

Most applications can be optimised for performance by orders of magnitude, much easier than trying to scale them by orders of magnitude. Any problem is much easier to solve and faster to deliver when you can fit it on a single server.


Some people just don’t know how many users can be served from one server.

Usually it is simple thinking that goes wrong. System is slow - add more hardware. But then it turns out developers did bad job and you could still run it from single small server only that someone would have to write code understanding big O notation.

Main point of big O notation is that there are things that implemented incorrectly will be slow regardless of how much hardware you throw at it.


I don't know if knowledge of big O notation is that big a deal - many of the issues I've seen at least in recent years have come about in O(n) code where the value of n in production use cases was several orders of magnitude higher than anyone had bothered to test (classic example, all our test files were ~50k in size. But recent customers expected to be able to work with files over 1Gb - which of course took 20000x longer to process). And the user perception between something taking 100ms and taking well over 30 minutes is rather a lot. In fact in this particular case realistically there's no way we could process a 1gb file in the sort of time it's reasonable to have a user sit there and wait for it, so it really requires a rethink of the whole UX. In other cases it turned out some basic DB lookup consolidation was sufficient, even if it did require writing new DB queries and accepting significantly higher memory usage (as previously the data was read and discarded per item). If I have found the occasional bit of O(n^2) code that didn't need to be it was usually just a simple mistake.


Notation alone maybe not but O(n) just like you write needs to be addressed. Users or stakeholders expect that they can get "all data" loaded with one click and it should always be instant. With N getting bigger just like you write UX or workflow often has to be changed to have data partitioning even if it O(n) - like adding pagination, moving crunching statistics to OLAP. It quickly gets worse when you have database joins and you might have to understand what database does because you can also have O(n^2) queries then even if db engines are insanely fast and optimized on their own knowing what query does like full table scans also can kill performance.


Until it was bought by AOL, ICQ presence scaled as the largest Digital 64-bit Unix box they sold. (Messages were peer to peer.) it worked remarkably well (solid hardware platform, so not HA but pretty available nonetheless). Network communication was UDP, which was quite a lot cheaper at the time.


There are some big blunders that you can commit to that are incredibly difficult to fix. I think the advice should be on avoiding nailing down closed doors, just keep them locked instead and put the key under the door mat.

I have a Java codebase that calls more constructors inside their constructor. So you have this massive god object that instantiates the whole project in side the constructor. If you want to run parts of it in separate threads you can't just take it apart. You first have to rewrite all the constructors.

“… Because the problem with object-oriented languages is they’ve got all this implicit environment that they carry around with them. You wanted a banana but what you got was a gorilla holding the banana and the entire jungle. “ —Joe Armstrong, creator of Erlang progamming language

I don't think this is a problem with object oriented languages. You could certainly do the same thing in any programming language. All you have to remember is to keep your constructors as simple as possible. Passing in dependencies through the constructor is often an easy solution so I don't get the hate for dependency injection frameworks. The original XML driven iteration of Spring was pointlessly overcomplicating it but nowadays you can just define your "beans" in code which basically means the DI Framework just helps you set up a dependency graph and nothing more.


On a technical note, people underestimate the costs of horizontal scaling.

The Silicon Valley world would do well to learn some things from finance. You’d never write a horizontally scaled order matching engine.


You'd think... I worked at one that tried. I joined the team after the project had been running for quite some time and they said their goal was 10,000 transactions per second. I said, OK, what can we accomplish in 100 microseconds? They laughed at me and said they were going to horizontally scale.

It wasn't successful....


That's conflating scaling with paying off technical debt


> It's easy to say "scalability is overrated" if you've dealt with unnecessary k8s deployments.

Exactly. The author forgot the put the word "premature" before the word scalability.


TFA isn't talking about that kind of scalability. Its scalability in business processes.


Looking at the article title and comments I wasn't able to clearly tell. The second heading "do things that don’t scale" is clearly recognizable and meaningful, but then in agreement, I'd have no reason to click.


I have worked with both and 100% disagree. Give me a mess of bad PHP (my by far least favorite language) any day of the week. It is usually trivial to scale, unlike when I get handed some complex Kubernetes mess. Fixing failed attempts at scaling is usually much harder than making naive bad PHP code scale. It is amazing how much harm "clever" engineers who do premature optimizations can do.


> PHP/Perl/Python/JS scripts with zero unifying architecture,

That's why you use something like a Rails monolith....but oh wait, Ruby doesn't scale well!


That's why you write a monolith on the JVM or CLR, which do scale well.


JRuby!


As long as you can scale (shard) your persistence layer, I don't see why RoR won't scale.

Look at Github, for instance.


Not everything is a glorified CRUD app.

If you're doing any computation or highly concurrent workloads then you will discover the performance issues with Ruby well before you outgrow your persistence layer.


I have done both in Ruby, and addressing it was not a big problem. E.g. my MSc. involved doing a lot of statistics and image processing in Ruby, and solving the performance bottlenecks meant rewriting about ~30 lines or so in C using Ruby Inline. Later I did map tile rendering on demand handling thousands of layers uploaded by customers in Ruby. Both using 1.8.x, btw. - far slower than current versions.

It took more CPU than if we'd rewritten the core of it in something else, but it let us iterate the rendering engine itself much faster, and most of the expensive computations were done by extensions in C anyway (e.g. GDAL and the like).

Of course you can find areas where it's still not viable, but my experience is that if you start by prototyping in whichever high level language - no matter how slow - that is most productive for you, you'll inevitably find you need to rewrite far less than you might think to get it fast enough. But more importantly: The parts you end up rewriting will very often be entirely different parts than what you expected, because being able to iterate your architecture quickly tends to make you end up in a very different place.


I know the argument, I don't buy into the argument (actually not a rubyist). Ruby just imposes some factor x on the performance as compared to Java roughly speaking. Which means you'll start a little earlier to look into queuing requests, distributing stuff in ruby as opposed to a java or go application. Nevertheless if we talk about scale, this is just a constant multiplier.

With a kafka queue, and worker nodes, compute-heavy jobs are easily distributed across many worker nodes.

For many parallel requests, you load-balance.

If the PostgreSQL or mysql table is the bottleneck (persistence layer), well this is actually a design decision thats orthogonal to the programming language (PostgreSQL won't scale better with a JDBC ORM).


Usually web applications fork and runs processes to do the computational work you describe (ffmpeg, imagemagick, git, etc). These are usually written in variety of fast(er) languages like C, C++, Java, etc. Plus now you get multicore scaling for free.


Agree it’s a balance, you need to incrementally pay off tech debt


I am aging myself, but: http://widgetsandshit.com/teddziuba/2008/04/im-going-to-scal...

The bit that stuck with me (re-reading for the first time since 2008, I forgot how aggressive it was): "Scalability is not your problem, getting people to give a shit is."


That's a great write-up. However, once people do give a shit, if you didn't think about scalability, it becomes an enormous problem.

And, funny enough, the people that didn't think about scalability, if they're smart, have typically moved on to something else.


There's also the situation where you've passed your scalability limits, you have users, the system is falling down and the business has blinders on and simply won't accept that they need to invest in a better way.

Being overly speculative and ambitious early on is a great way to fail. Building on top of something past its limits and ignoring the (developer, customer, velocity) pain is yet another.


One great way to guard against this is to force teams to design for scaling up and down. I regularly give a "plan for success, and plan for failure" speech that covers this area.

If your service opex breaks even at 1k users because you designed for capacity up to 100k users, and you actually end up with 100 dedicated super-sticky users, you'll resent the users you have as they bleed you to death.

If, instead, you build a system that less efficiently, but still reasonably, scales to 100k users by ramping up cost to support 100k users by 4-8x... but has an opex break even at 50 users... Congrats! you've planned for success and planned for failure.

Go get those 100k customers, then optimize. But don't wait to optimize an unscalable system until you already have customers. That's a recipe for lost customer trust and "ahead of their time" epitaphs.


> And, funny enough, the people that didn't think about scalability, if they're smart, have typically moved on to something else.

OTOH, the smart people who thought about scalability and designed a highly complex system to handle the load of 5 requests per minute have also moved on.


Great post thanks for sharing. I also highlight this one: http://widgetsandshit.com/teddziuba/2011/12/process.html


Before startups rush to cargo-cult this model, I want to point out one important thing – Adjust according to your market.

Pilot's customer base probably looks very different than yours. The author says in the article that they are a "startup-focused accounting firm", so for their customer base which is likely startup founders or similar, the experience of talking to a fellow CEO probably "helps to 10x the customer experience" as the article puts it.

On the other hand if you are selling me some software for a boring Enterprise integration, please don't send me calendar invites for a conversation with your CEO, I'd rather get great API docs and integration guides.


Yes, I run small customer oriented project, having meetings with people who spend couple of bucks is kinda ridiculous(to both, I think only super chatty people who like to was time would come), but I noticed discord is super powerful - people often share ideas or problems since its very effortless. (Also very rarely when I am not on discord other people jump in with explanations&help "this is not how you do it... you should ...")


Wow, very good point. Basically, this article is basically marketing for them, since they know startup founders read stuff like this, and they want to sell to founders. Blergh.


I completely agree with the idea that scalability is often overrated. While it's certainly important for large organizations and enterprises, for smaller companies and side projects, focusing too much on scalability can actually hinder growth and creativity.

It's often more beneficial to focus on solving a problem effectively and efficiently, rather than trying to scale it before it's even proven to be successful. By starting small and iterating based on customer feedback, you can create a better product that solves a real problem, which will naturally attract more users and drive growth.

That being said, scalability should definitely not be ignored altogether. As a project grows, it's important to plan for future scalability, but this should only be done once the product has proven its worth and a solid foundation has been established.


It's kind of along the lines of "premature optimization is the root of all evil." A true quote, that doesn't mean optimizing isn't important! The whole trick is in being able to tell what's core and what's premature.

Another way of looking at it -- scalability is pretty much the entire value proposition of software. It allows people to do the same, or more, amount of work without having to linearly scale out the number of people working on something. When you're first starting out, demonstrating even being able to _do_ the work is the first hurdle, but past that you're definitely going to need to show you can scale that out efficiently.


> Another way of looking at it -- scalability is pretty much the entire value proposition of software

That sounds like a good way to look. People are Turing-complete, although that doesn't mean that they are very good at scale without instructions, and "programming" clusters of people has its own set of quirks. :)

> It allows people to do the same, or more, amount of work without having to linearly scale out the number of people working on something.

But this is not the entire story, but only the throughput part of it. And there are entire classes of software where at least part of the value lies in latency. That's, for example, most of Internet as we know it, from the hypertext browsing to videoconferencing, and most software we would call "embedded", from watches to spaceships.


> But this is not the entire story, but only the throughput part of it. And there are entire classes of software where at least part of the value lies in latency.

Yes, that's a good point!


    That being said, scalability should definitely not be ignored altogether.
Yeah. You have to think about what's possible to scale.

You're planning to scale to 1,000,000 or 1,000,000,000 users eventually? OK. Well, what's your service? Are you planning the next Facebook where everybody can talk to anybody? Then you have some challenges you better start figuring out.

Or, is it a SaaS where nobody needs to access anybody else's data? OK, then you can probably simply scale vertically for a while up to $SOME_MILESTONE, and then scale horizontally via some fairly simple sharding or maybe even just database partitioning or something. No need to do that now.


> You're planning to scale to 1,000,000 or 1,000,000,000 users eventually? OK. Well, what's your service? Are you planning the next Facebook where everybody can talk to anybody? Then you have some challenges you better start figuring out.

Thinking this way actually leads to one of the most common failure modes. Specifically the day-dreamer death where the principals become so enamored with a fantasy of what the end-state could look like that they lose focus on the next step to improve their odds of success. This is under-reported because day-dreamers are so common that when they try a startup and fail, it's not really an interesting pattern to report on. Such founders will tend to cite specific reasons that sound better in context (eg. wrong market, product flaws, sales funnel flaws, etc). But as someone who's been in the web startup space 25 years, I've seen it over and over again where founders fail to fully engage with the hard problems right in front of them in favor of working on what's fun, easy or tied to some dopamine hit of imagined future success.

In practice, you should not spend one second thinking about the requirements of 1B users until you have at least 10M. You will need to completely rewrite everything multiple times along the way to that scale anyway, and more importantly, you won't know the product requirements to actually get there until you achieve the earlier milestones and get the necessary user feedback about what's working at each size of user base.


There's a misunderstanding. I may have miscommunicated.

I'm advocating for understanding your business, not for prematurely building things.

If you're in the extremely rare class of businesses trying to build something on the scale of FB or YouTube where that sort of massive scaling is intrinsic to your mission then yes, you need to start thinking about those problems early.

For everybody else, yeah, absolutely focus on product and getting to 100 or 1000 or 100K or whatever users first.


Precisely. It'd be like trying to make an F22 fighter jet right off the bat when you're still in the era of biplanes.

I think people often underestimate the strata of learning, and the essential feedback that gets you to the next level. Or at least that's how it works for mere mortals like myself :)


I'm advocating for thinking about if you'll need an F22 fighter jet someday.

If the answer truly is "yes", that's something you'll need to figure out early or at least plan for.

For the other 99.99% of situations where the answer is "no" or "almost certainly not" then absolutely, you should not be wasting time building an F22 or even thinking about it.


> Are you planning the next Facebook where everybody can talk to anybody? Then you have some challenges you better start figuring out.

Even Facebook didn’t start trying to figure that out until they had to. Famously you needed an invite for a very long time. I’d imagine infrastructure limitations factored into this.

You’d be surprised how many successful companies push off dealing with scaling problems until they absolutely have to.


> Famously you needed an invite for a very long time.

And then it opened to needing a college/university email address. Then it opened to everyone.

I got in with my college email address, and at the time it was considered a feature, not a limitation: a combination of higher quality since people couldn't just make new spam accounts whenever, plus a bit of a status symbol compared to just being on MySpace.


That's a great example, thanks.


That's exactly right: it depends so much on the context and (expected) access patterns. Sometimes you really do need to plan ahead, but even that's more about not painting yourself into architectural corners rather than specific tech.

What I see also is not just scalability over-engineering, but the same in the solution domain. When you don't really understand your customers needs sometimes there's a desire to create something with ultimate flexibility.

In the days of Xp we used to say YAGNI when people would be overly general. I'm trying to bring that back into fashion.


IFIFY: focusing too much on scalability WILL actually hinder growth and creativity.

It will hinder growth and creativity because you are solving the scaling problem, which is a hard problem. Especially if you're early on and don't really have a product or customers.

I've seen multiple projects tank because people working on it were solving problems they didn't have and wouldn't have for the next 2-3 years at least. But they only had a few months of funding.


> (...) scalability is often overrated. (...) focusing too much on scalability can actually hinder growth and creativity.

I find this blend of comment unproductive and intellectually dishonest. "Often" and "too much" are weasel words that remove any objective meaning from the verbs they qualify, to the point where stating the exact opposite is also valid.

The truth of the matter is that scalability is only not important when you don't need it, but it's already disastrously too late if you failed to scale when you need to.


The problem is if the commenter says “doing X works” instead of doing “doing X often works” there will be a HN avalanche of “yabut so-and-so didn’t do X” and allegations of survivorship bias.


> (...) there will be a HN avalanche of “yabut so-and-so didn’t do X” and allegations of survivorship bias.

Isn't it also survivorship bias to claim that scalability is overrated because a non-scalable system didn't broke in a particular case of a local deployment with residual demand?

Meanwhile hikes in demand hitting an unscalable system can and do break businesses, and when it happens is already too late to rearchitect things.


“SAS Startups: offer a one on one chat to new customers”

Doesn’t have the same ring to it.


It's interesting because I work for a Fortune 100 at which most of the software challenges are related to scale (or sadly data quality...) A lot of vendor and startup systems that try to work with us fail because they simply can't handle the scale of a single enterprise customer.


>or sadly data quality...

Could you expand on this? What are some challenges Fortune 100 companies face due to data quality?


the last 50 companies you bought left their mark, their processes were integrated slightly differently, the tech stack has been piling up for the last 50 years. Some business processes involve human entry and are thus faulty, other business processes exchange data via pretty unstructured pathways. You may receive some data from a supplier only in PDF format, etc.

Even if there are standard interfaces and APIs to supply data to, each and every engineer doing the integration will fill the fields differently.


It's easy to see how this is a huge issue. I'd like to learn more, would you know who usually deals with this problems in these companies? Is this a COO/CIO thing?


CTO i would say?


The startups who try to work you and fail - is it the scale or the requirements?


As they say... when all you have is a hammer, every problem starts looking like a nail.

Nowadays, software engineers barely finish learning basics of their first programming language and they jump in on scaling their first applications they developed while following a tutorial.

It is always better to first exhaust other options like improving basic efficiency and performance of your applications. A single server can do hundreds of thousands or even millions of transactions per second. I have many times seen vast farms of servers doing each at most hundreds or thousands of very simple transactions per second.

A problem is almost always easier to solve and faster to deliver when it is only expected to run on a single server.

And don't start arguing on the need of having multiple servers for redundancy. Properly automated, new service node can spawn within seconds -- maybe couple of your users will see a slight hiccup if you handle it correctly.

The goal here is not to say those practices are bad. What I am trying to say is that engineering is about understanding tradeoffs and making informed decisions.

--

I worked for a large bank which had internal automation platform. It consisted of about 140 "microservices" (apostrophes appropriate here...), 20 servers running those services and about 40 developers.

When I left 3 years later we had 3 servers (bank required hot backups, one within same primary DC and one in secondary DC), just one monolithic app and only 5 developers.

Our job was easier and we were happier than ever before. Reliability improved vastly. And our internal clients started getting the functionality they needed.

Previously, any service call from a client had to jump through multiple layers of "microservices". Vast majority of the application was devoted to communication between nodes, error handling, recovery from strange situations, etc. Once everything rolled into a single service those vanished because there were no longer network calls but method invacations.

And we no longer had to release 140 components and maintain versioning and dependencies between all of them. We only had one component to take core of.

I made a "new component checklist". Whenever somebody wanted to spawn a new component they had to first go through team review and prove the requirement would benefit from being separate component rather than part of existing monolith. No new component was ever created after this was instituted.

Yep, they def did not need microservices...


Very interesting, would you mind sharing what was on your list?


This is an excellent article on customer experience, but most comments here seem to project the message onto infra choices which is a mistake.

You can stop doing CEO calls if customer volume surges. You can't instantly stop using non-scalable infra choices.

The dynamics on that choice are fundamentally different when you don't have that get out of jail free card. You're forced to make a judgement call & walk the tightrope between not overengineering scalability vs ensuring enough scalability to handle surges.


Scalability "requirements" are often misused by the developers so they get to play with "cool" and often very slow technologies at the owner's expense. I've seen it happen over and over.


Yeah. Resume-driven development. "Hey, we need k8s and/or huge AWS clusters because scalability" when often something simpler would suffice.

At a previous position our devops guy had provisioned a cluster of god knows how many AWS Redis instances. I pointed out this was a lot more than we needed. He swore up and down we needed them because scalability.

He had no idea how much data was in those things. LESS THAN A MEGABYTE of data. In the whole cluster. We had no plans to grow it, either. And we were only accessing it a few times per day. We didn't even need Redis. We could have just stored it in the database directly.

He didn't know any of that, and he didn't care. Because it didn't matter. Resume-driven development.

He wasn't stupid. He was very smart. He knew what he was doing. "Managed a Redis cluster" looks good on a resume. "Just used the database instead" does not.


> He wasn't stupid. He was very smart. He knew what he was doing. "Managed a Redis cluster" looks good on a resume. "Just used the database instead" does not.

Did your company fire him and replace him with a DBA instead? If not, then the resume driven development is working!

I hate it when people look down on resume-driven development while simultaneously also doing things that promote it by letting HR people unrelated to technology screen resumes for recruiting.

Here's a hint -- if your company is screening, even subconsciously for buzzword (and believe me, most of them do) then you are enabling resume-driven development whether you like it or not.


Of course we were enabling it.

Our CTO actually thought he was doing a good job.


You're almost exactyl describing what went down at my previous employer.

The company already had a bunch of poorly written legacy systems which were used by about 50 clerks and written in PHP, Node.js and some .Net. They had problems with poor code and data quality, but none with performance.

So they recruited a new CTO and a new Head of DevOps which planned the new infrastructure together. Of course now everything has to be split up in Microservices (or Nanoservices, because each one should basically have the least functionality possible) and thrown in the cloud.

The only thing where a separate service actually makes sense are the machine learning processes that should extract information out of received documents. The rest could basically run in a basic PHP application on a Raspberry Pi without performance issues.

When I quit that circus, development had basically stopped altogether and they were already paying five figure numbers each months just for hosting.

I moved to a large corporation with the expectation that things should be better there. Turns out, that basically the same is happening on a larger scale. There they have a team of four guys having to manage more than a dozen microservices which were cut in strange ways.

I am only a senior developer and no architect myself, but the ones I know are everywhere between crippling depressions and just laughing their asses of because of the stupidity that's going on in the tech world right now. There are companies where it's done properly, but either I have extremely bad luck in my surroundings, or those are an absolute minority.


Then these guys dance off from that to a company that actually needs to store tens of billions of k/v pairs, HA, and can make a big mess there before anyone figures out they're actually clueless beyond `sed s/replicas: 3/replicas: 69/ && docker-compose up`.


>He wasn't stupid. He was very smart. He knew what he was doing. "Managed a Redis cluster" looks good on a resume. "Just used the database instead" does not.

Why not just lie that you did that?

How employer would check that, lol.


If the person interviewing you actually knows their stuff, it will be much tougher to lie your way through the interview.

(I know... that's a big "if")

So it is beneficial to have actual experience instead of just lying about it.

Sometimes.


In this case if guy just adds redis instances where some apps just reads/writes a few entries,

then he'll probably have harder time anyway if the interviewer knows their stuff


Many experienced devs have ptsd from working themselves into a corner on unscalable stack which required death marches to resolve. Not surprisingly they often do a complete 180 after that. A strong cto / senior eng core would prevent the company from going to either extremes but unfortunately not many of them have that.


I am sure there is truth to this, but from my personal experience the death marches I have seen have come from overly complex designs which try to handle all future problems but then fail miserably. Whereas evolving a simple design to be more scalable often works out well.

I am sure there are many situations this isn't true, but I have not seen them in my experience.


This also becomes a self-creating problem where devs are afraid not to have played with the cool tech, lest they fail to have the golden resume.


As always, senior dev: "it depends".

There are many things that scale pretty far just by doing them right, without a lot more work.

For a process like calling new customers, I bet they have a halfway-decent CRM and email system to send out invites, track acceptances, and schedule calls. You could manage it with a spreadsheet, but it'll be easier and more reliable and contribute to other CRM tasks to manage it the right way from the beginning. And when the CEO can't do it anymore the whole process seamless shifts to letting VPs or account managers pick up the task (until that doesn't scale).

On the technical side, for an example: it's not very difficult to start with a scalable-by-default architecture, like with one of the "serverless" Docker-image based managed platforms like GCP Cloud Run. If your service fits there, you get scale-to-zero and scale up for basically free.

The trick, as always, is deciding what to do when you hit some feature that would be easier on a single stateful server. Do you bail, and go single-server or push through? If you stay with serverless, you're better positioned for future growth. But, you might hit that next feature that makes you go single server, and have wasted the work to stay serverless. This is what makes this stuff hard.


I just moved away from GCP to DigitalOcean and went from paying over $1k a month to $100 a month.

At this phase of the startup, we aren’t missing a beat.


Starting out on Digital Ocean or Heroku is usually a great idea.

I consult with companies that are looking to move in the other direction, from Digital Ocean to AWS.

Their biggest pain points as they grew are:

1. The lack of granular permissions scoping and access control.

2. The networking primitives in Digital Ocean ended up being too restrictive. For example, they wanted to have static egress IPs and it was much harder to do that in Digital Ocean than AWS.

3. The need for more of the managed services AWS offers.


I‘m also running everything on DO. Dead simple and good pricing.


Do you have live backups of the DB and encrypted disks, or did you have to set it up yourself?


DO has a hosted database which handles automated backups.


> I just moved away from GCP to DigitalOcean

Did you compare between DO, Lightsail, Vultr, Linode, etc. or was the difference between that tier of offering so little that you just picked one and ran with it?


I picked DO because they were the only one who had a hosted database offering (Postgres).


The market wants cloud to be a commodity - I’m certain it will get what it wants eventually.


The startup in question is a tax accountancy by the looks of it. No need for webscale architecture there of course.

If you are starting the next Vercel then presumably you need to think about scale from the beginning. There will be no forgiveness for not doing so.

Are you actually a tech company or a company using the internet as a channel, should be asked.


> If you are starting the next Vercel then presumably you need to think about scale from the beginning.

But for them, scale is at the core of their business, which is really the argument being made - don't let concerns about scaling problems you don't have distract from getting your business going. Scalability can't be a distraction if that's what you're selling.


I don't necessarily think that it's an all or nothing thing, I think the author is just pointing out that optimizing before getting the right product can be a mistake. For me this article is also talking about what to prioritize.

Obviously there's nuance to each individual situation but I think there's some good points here.


I once had a manager tell me scalibility was a good problem to have as it meant people where actually using your product.

Obviously with some caveats. That you don't fall over at a small amount of users and you are at an early stage.


Cash when you're growing so fast you're running into scalability issues is also likely to be far cheaper than cash before you have any customers to speak about.

This is more general than scalability: The more of any work you can afford to defer without hampering growth, the more of your initial capital you can spend on growing, and the more you manage to grow the cheaper it will be to raise money to build out the rest when it's time for your next investment.

The hard part is to determine which aspects you can safely defer.


Your manager was correct. People often overengineeer or overarchitect what needs to be done. In reality, you need to meet the current needs of your business. Not every single action has to account for what might happen - or what you hope will happen - if absolutely everything happens perfectly.


Not only that, you've got more people using your product than you expected


I agree with the premise, but would rephrase it as “know what you’re going to need, when you’re going to need it, and have a rough idea what it’ll take yo get there”.

And as the article implies, not just systems but also your processes. I recently saw a moderately big startup strangling themselves on processes that they won’t need for several more years. Mid level leadership (in itself a problem) agonized over scale, and didn’t realize that the products were being built at a snails pace and money was being burned with abandon. Late 2022 was a harsh wake up call.


For me, if you are selling a service, customer service is paramount. Everything else boils down from there, just my opinion.


I wrote an app that served 290K customers with 727K data layer hits in the month it was relevant in the late aughts. It had a Python (v2, threaded) frontend, and a Perl + MySQL backend. I wrote it in two weeks, and the biggest lifts were the threading framework for the frontend and testing indexing schemes for the backend (MyISAM was intentionally chosen for read throughput and easy dataset swapping).

I built it to scale: the frontend could be hosted at any ISP and the backend could be run pretty much anywhere which could accept a TCP connection. The 727K data layer hits were served off of a DSL connection to a server running in my house.

But there were politics (literally) around the data. Whatever hosted the data had to be under control of identifiable meat.

In the end though, the data owners wouldn't clear me to distribute the data to other operators until it got to the point that the only way I could do so would have been to drive around with CDs (because there was no spare bandwidth on the DSL link).

You can call that a success or a failure, it's politics after all. But basically they didn't care, which is what they were never going to say.


"everyone thinks they need the same architecture as Netflix, Twitter, … In the end, people end up with a very complex setup to handle the load a simple $50 hosting package from the 90s could have done." see: https://linkedrecords.com/the-big-devops-misunderstanding-84...


I've been working on and off in supercomputing since the 80s.

every system I've ever worked on has needed work to get to a certain scale. and that carries you for a while and if you need to get past the next plateau you need to rearchitect again. rinse and repeat.

so its not a binary thing.

so unless you have some specific goal in mind, invest to get a little past where you need to be today.


A problem many start ups encounter though is that they're only sustainable whilst doing non-scalable things.

Using the author's example perhaps it's like they only way they actually convert customers is if the CEO calls them.

Or sometimes it's more to do with engineering. Each customer will only stay if you implement a feature that costs you more than their CLV.

More than once I've worked at a startup, or a company that has acquired a startup where this is the case. And when you suggest to fix the problem usually people can't quite comprehend it, so instead it becomes "you're the problem"

Whenever this is the case it eventually catches up with the. Sometimes through burnout and burn and churn employees, sometimes because they finally learn you can't sell a dollar for 80cents forever.


Instead of focusing on scalability, focus on productivity. Scaling is optimization, and optimization needs to be guided by profiling or generally instrumentation in production. Having good culture and a productive infrastructure, a company can rapidly scale its systems if needed. Without such productive environment, a scalable system could crumble any time. Reflecting upon my days in Netflix, I realized that we didn't build scalability from day 1. Instead, we evolved a scalable Netflix quite easily. Almost everything was a "non-event". When needed, things just appeared to have happened. That's the magic of productivity, not scalability.


Another example is how casinos treat their best customers. You get an account representative, personal phone number you can text message to.

Slack still answers every customer feedback personally which is endearing and encourages more ideas.


The project I’m working on, will probably not get more than a thousand users for at least a year, but I’m testing it with 10K fake users.

I want it to work well, with that many, so it works awesome for less.

There are a number of things that would need to be done in the dashboard/backend, to afford much more scaling. I want the frontend to work well, at a fairly humble scale, though. That’s what most end-users will experience, anyway.

And I’m aware that 10K users is a rounding error, for a lot of folks, around here, but it is 10X what I expect, and it will buy me time, to do a proper dashboard and backend.


Scalability is a sexy problem to have though… so doesn’t want to pretend that their scalability is the biggest problem at their cashflow negative startup with 10 users?


Scalability is unnecessary when you're not growing rapidly. But when you are growing rapidly, it's suddenly very important. And if you didn't think about it before, it's already too late.

You don't have to build to scale right now. But you should have a plan of when you're going to need to scale, and how you'll do it. Or don't, and wait until your pants are on fire and then run around screaming.


Scalability doesn't matter until it does.

“How did you go bankrupt?” Bill asked. “Two ways,” Mike said. “Gradually and then suddenly.” ― Ernest Hemingway, The Sun Also Rises


The time you waste on building an unnecessary scalable system is nothing compared to the time and energy you spend on maintaining that overly complex system.


Yup. Over-engineering is harder to fix than under-engineering.


Indeed. The hardest thing for a technology person to do is nothing. It’s seemly irresistible to “solve” problems fast and consider the consequences slowly. My term for the consequences of early structure is “calcification” - the loose aggregate of a startup becoming an unyielding mass resistant to change. Code, processes, culture, whathaveyou.


> But more importantly, it locks in an experience before you’re sure it’s right.

This single sentence communicates more than the author intended. I see the phenomena all the time: poorly architected systems that are brittle in face of user requirement changes. In the extreme case, a new widget or input on the UI echoes all the way to the database, touching and mutating every layer in between.

So, it is a perfectly valid matter to weigh the benefits of optimistic and forward looking engineering vs gaining deeper understanding of a market faster. This is a valid consideration. But if anyone tells you that going down any other path other than tacking on bits and pieces on the way "as we learn" is an error because of "user experience" (other than, ahem, performance) then be certain, certain, that the speaker lacks the necessary technical understanding to be offering strategic technical advice.


I’m not sure I like this because you will always be wrong in what ways the code needs to be flexible. The most flexible code is soaking WET where each component is self-contained with no shared dependencies, no abstractions, and can be thrown out or rewritten with absolutely no possibility of it affecting any other parts of the service.

Devs hate writing this code, every fiber of your being recoils at the thought of just copy/pasting everywhere, ignoring anything that seems like a pattern, and having several identical API endpoints just to keep everything wet. But you will never be so productive in the face of changing requirements.


> But you will never be so productive in the face of changing requirements.

I couldn't disagree more. Failing to represent each piece of knowledge from the requirements in a single authoritative place means that simple requirements changes often have massive costs, as instead of one change to the place where the changed element is unambiguously represented, you need to change all the distributed locations where it is (often opaquely/implicitly) embedded in logic.

The problem is when DRY gets misinterpreted as “never write generally similar-looking code” rather than “every piece of knowledge must have a single, unambiguous, authoritative representation within a system.”


I see the point of the call, but I don't see the point of manually sending the email every time. Unless he's manually researching each user and trying to incorporate that knowledge in the email, then an automated email will appear exactly the same on the receiving end as his handcrafted one.

Doing things that don't scale makes sense when it gives users a white-glove, personal experience, or when it saves you from building a system before you know what that system should look like. Manually typing the email and clicking send does neither in this case. He's already sure that sending an email is the right thing to do, and if he's unsure of the wording or timing, he could be A/B testing or at least only occasionally changing these rather than manually doing the steps every time


It's interesting how everything is a balance abd experience and mastery count for a lot, but nobody acknowledges this in blog posts.

There is an appropriate amount of architecture and design that should go into systems. One aspect of that is how much effort you put into scalable design. Building and maintaining scalable systems isn't free, so the "right amount" is pretty much somewhere between "none" and "we are AWS."

Knowing when adding complexity is worth doing takes experience, humility, and maturity, the exact qualities that are sorely lacking in many egomaniac software developers.

It's why we get ridiculous fads and pendulum swings where people get fed up with bad design and then go do the opposite thing, having learned nothing.


Also, scaling issues are not a problem, they're milestones that ought to be celebrated. The things you are building are being used. You get to solve known unknowns, which are so much better than unknown unknowns. The reward is clear. What's not to love?


Every time I read one of these I realize how non-obvious this advice is to most people, especially because in the physical world we have already pushed efficiency to its limits.

It’s much more difficult to do things that don’t scale when you’re competing with physical products on established markets who’ve spent years building a huge chain of efficient processes.

I don’t think this piece of advice will remain relevant for many more years. Starting a digital product today is already much more difficult than it used to be because the bar is much higher (and regulation has grown too).


I do agree but I hope people don't take the wrong lesson from this.

For example, there's a "right way" to not do scalability and a wrong way. The right way would be avoiding patterns like microservices early on, while still minimizing human intervention and brittle integrations. The wrong way is to just have some kludgy solution that causes headaches for your customers.

With our budgets tightening we axed startup vendor we were paying $90K/yr to because their shit was slow and kept breaking, pissing off our customers. We were able to spin up replacement functionality for their product in 6 weeks.


Because I've had a few jobs working on products produced by recently-acquired startups I get where it's coming from but also why it doesn't make sense: those startups cashed out by making a product that was so inefficient that the numbers weren't even necessarily on the right side of the ledger. But no matter because they teed up the clients and coming back and optimizing a successful product (even if that means wholesale replacing all or part of the underlying systems) is easier than conjuring a customer base from zero.


Agreed. Having worked at a few early stage start-ups and in product, this is a topic that fascinates me.

The number of people I see building product before they talk to a single customer astounds me. I'd really like to understand the psychology of it. Building for scale is doubling down on this behavior. Not only do you not know if your customers will like it, you scaling something people don't potentially want.


I agree with this, and someone has already pointed to the Paul Graham essay "Do things that don't scale". I will say, you need to build to accommodate some growth, but in my head at the startups I've been at, I always figure we should build for 10 times current traffic. Room to grow. There were times we never got to 10 times current traffic. (and died). And a time or two we did, then it was time to re-architect for 10 times the new current load. Lather, rinse, repeat.


I find myself saying "No" more often times than not with founders who have scope creep. Scalability is one of them.

Does it work well enough in the geo you want? Yes? You're good to go .


Someone explain the article to me.

> I still personally email every new Pilot customer and offer to get on a call with them.

How is this a thing that needs to scale and how is it time consuming to write some automation? Just export the contacts, and write a small script that loops over it and sends an email using some transactional email script if you've got something like ses. It shouldn't take more than an hour to do from scratch, and like 5 minutes if you look it up using chatgpt3 or even Stackoverflow.


> Just export the contacts, and write a small script that loops over it and sends an email

Email is almost always one-way spray-n-pray. A call on the phone is two-ways. The other party on the phone always tells you more than you asked, whereas the other party in an email reads it, files it away (if you're lucky) and moves on to the next email.

The other extreme is to play golf with your business partners; you'll learn their pain points more effectively than by sending an email, but it will take 4 hours.

OTOH, they'll have golfing buddies who are potentially your next client.


If you're going to spend 15 minutes in a call with someone, automating away a 10 second email is not much of a time savings.


Do things that don't scale, but have it in the back of your head would be my take on this. Don't block yourself and keep working on acquiring users and scalability. If you make the right choices in the beginning, you have very little overhead and still can work on scalability. And in my experience, scalability for most of us means just doing clean code (+ KISS) and good architecture.


I don't often do b2c but I have had my fair share of projects where "we already have X at home" meant you needed to scale up a system that was thrown together optimizing only dev time or requirement flexibility. Often a bigger box was the cheapest solution. Also interesting is defining architectures that can scale DOWN to optimize costs for highly variable workloads.


Went to the pilot.com website. Scroll down.

In huge text: "We partner with the best financial tools in the business" and a lot of logos from famous companies! I think, oh wow I haven't heard of this company before. But then see the text below the large text in small, gray text: "We're fluent in your finance tech stack and seamlessly integrate with the tools you use."

oh...


(Post author here.)

In addition to having deep integrations with these tools, we actually do have real partnerships with almost all of them.

"What's a real partnership?" is a great question, but I'd mostly think of it principally as "We know them, they know us, we have a red phone we can pick up to get things solved for each other" (which is good for the customer), and then also "And we occasionally do some comarketing together."


Adjust scope to match your teams skill level. That way no one has to adapt/train, and can stay in their comfort zone.

Works fine until markets use-case suddenly shift, and a platform is hammered out of existence.

Really depends on whether a business plan includes rapid growth (i.e. throw more money at the scaling issue), and an apology tour in the media. =)


This is so badly written.

I assume they are complaining about front end frameworks, libraries and approaches developed at big tech.

They did not actually try to sell you this stuff, it’s free. Stop complaining. The fact that you failed to properly evaluate your situation and see it was very different from theirs in your fault.


Couldn't disagree more, from my own experience, if performance is important for your service.


If you create something so great that demand is off the charts chances are your users will stick around long enough for you to fix your scalability problems.


"Don't solve problems you don't have"


Do Things that Don't Scale: http://paulgraham.com/ds.html


Scaling down is scaling too. Eg - can your app can be productively developed and run by 1 dev and 1 vps.


Rework is the book that is surprisingly underrated in this community which covered all points of comments but a decade ago.


My startup didn't grow up to the max potential because I underrated scalability.


IME people say "scalable" when they cannot say "fast".


seems like you have to try to make things slow at small scale. been running my service for 8 years and it's still fast and unoptimized and unscaled. why? because i still only have a couple hundred users. will it scale? it should. I'll have to do something about my database at some point but not today.


Tdlr: Over optimize is a bet on the unknown, usually the bet doesn’t pan out.


Yes. Most SaaS are B2B and good luck getting 200 customers.


Unless I have micro services then how scale??????


s/scalability/technical scalability/

Other kinds of scalability (financial, development) are underrated.


chase demand, and build scalability only when the lack of it becomes a problem -- definitely agree


I make a joke at my current startup (though I'm not really kidding) that my goal when designing systems is to make it last long enough that we can hire a team to rebuild the given system by the time it starts failing to meet its use case.

The other side of that is, of course, that I don't want to design systems that will last a long time. I always have a "EOL" threshold for when a system we've built needs to be rethought; 100 customers (we're a b2b SaaS so 100 would be a gigantic ARR), 10k users, 500 reports a day, etc.


> hire a team to rebuild the given system

Do rebuilds actually work once there's that much code though. The system gains so much inertia by the time you can hire a bunch of folks it's probably too late.


You delete the system and start over.


premature scaling is the root of a lot of wasted time

That said if you know where boundaries are in your design you should keep those obvious and clear so that scaling will not be hard if you are lucky enough to need to scale.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: