Hacker News new | past | comments | ask | show | jobs | submit login
That won't scale: Present cost vs. future value (codesolo.substack.com)
119 points by kiyanwang on April 23, 2023 | hide | past | favorite | 71 comments



I go by "build for 2x, architect for 10x". As in the current system should have enough spare capacity to handle occasional spike from ads or whatever and not get bad latency from running near the limit, but it shouldn't need to be re-architected (much) when traffic grows 10x


Sage advice. So often major architectural decisions are made in haste because of YAGNI arguments or business pressure. But a high-level design that can cope with stress and volatility is so cheap to do early, and so expensive to try to back up and do later with an existing implementation.


I don't think it's cheap to do early. The costs are in people writing, drawing, talking, and often debating, rather than doing iterations on a basic working solution, and the outcome usually doesn't really work and needs to be redone in a year or two when more information is available.

Which is not to say it isn't necessarily worth the cost. I just think the cost shouldn't be downplayed; it should be an investment undertaken with open eyes.


> I don't think it's cheap to do early. The costs are in people writing, drawing, talking, and often debating, rather than doing iterations on a basic working solution, and the outcome usually doesn't really work and needs to be redone in a year or two when more information is available.

It is, for vast majority of apps.

All you need to do to make sure your app running on one server can be run on 10 i.e. using actual SQL transactions to manage data, not in-program-memory locks. Then you can just get fatter DB server and boom, you're 10x

Above that, sure, the "bigger server" starts to be expensive so you might need to start thinking about sharding, or moving some data off to separate instance (separate search cluster for example), or adding cache here and there, or, eventually, rebuilding the core of it for performance but that's 100x territory


Maybe. I dunno. I never seem to be working on things recently that fit into this "vast majority of apps" frame, where the right answer to up front architecture design is "just build it like a web app with a transactional database that scales with number of users". You're probably right that this is what the vast majority of projects look like, but it isn't very useful information when you aren't working on this kind of project.


I guess it's only cheap (or good value) if you get it right though. How do you know you're getting it right early on?


Experience in getting it wrong a whole bunch in the past, though you'll probably get something wrong this time too.

It helps to know how far you can get with various approaches. Knowing that sqlite actually manages just fine for things up to X or where the reasonable edges are for using (say) a db for tasks rather than rabbitmq. Then some experience swapping things over. Been on a project upgrading from sqlite to postgres? Postgres on a single node to something larger? You'll have a better idea of how easy or hard these things are. Sometimes it's small things, like going with sqlite but being aware of the differences so that later the migration path is easier.

You can also sketch out architectures, and see what does or doesn't change if you swap out a component/step. I'm probably going to have something do a basic iteration for comparing vectors right now because

1. I know the scale I'm at means that it's going to be near instantaneous

2. For a larger problem I know I can keep the same interface (find closest X) and swap in various other tools I know exist

3. Adding those tools should be possible but due to some constraints probably at least some hassle

So I have a simple quick solution, but know how to change it later. I also have identified why I'm not going for the "proper" solution straight away, what the benefit is.

However, I'm going to keep the interface as "find closest to string X" not "find closest to vector X" because I've seen the issues that arise when multiple different things care about exactly how the vector is constructed. I've been through that kind of thing which dramatically slowed deployment of updated systems. Perhaps in those other cases it wasn't particularly avoidable but the problem feels like it has the same shape (hard to describe but problems and solutions feel like they have shapes to me, I hope that makes sense regardless of how you picture them).

These are some pretty small examples I know, but hopefully useful as approaches that can be communicated without a large background in a specific project.

My main point is where I started - experience making these mistakes. I am quite aware of deadlocks because there's nothing that teaches the lesson about lock ordering quite as well as seeing a robot freeze mid-task in a demo. When mentoring, people seem surprised when I solve some of their problems immediately, it's not because I have quickly solved it from first principles but because I remember making the exact same mistake 10 times.


I think this is implicit in your comment, but making it more explicit: The more similar the new thing you're doing is to something you've already done, the better this works.


Thanks that's worth making more explicit.

To add to this, a very important part of the process to me is to start by asking how this problem is similar or different to other problems we've faced (very useful for estimation, you'd be shocked at how often people think a thing will take say 1 week when it's taken over a month each previous time something similar was done). This also means you can reach out to others if you identify a good similar problem but you weren't the one that worked on it.

It decomposes too, so the vector thing above is new to me but "align version numbers" isn't.


Certain things, like using a database instead of a flat file when search is needed, or going with a linear algorithm instead of quadratic, are hard to do wrong.

Deep assumptions about the problem space are hard(er) to do right. This is why the architecture should remain somehow flexible.


> How do you know you're getting it right early on?

I think that's definitely the key question.

If it's the sort of situation where you've been given all the requirements up front - which does happen in certain domains - you can take the full-blown requirements engineering approach, and validation of the product will be built into the development lifecycle; it should be be effectively impossible to deliver an invalid product. (You might well go broke or lose all your team on death-marches trying to get there, but if you do, you know it'll be formally validated and signed off by the customer).

But of course, this is expensive, doesn't preclude delivery of a brittle product that can cope with change, and, moreover, just won't fly for situations where the requirements aren't fixed at the outset and the business is relying on dynamic interaction with the development team to evolve the product. So, for those situations, we have, what? Heritage, heuristics, trends. And the results are, to me...unsatisfyingly contingent. Why is the software shaped this way, and not some way?

For my $0.02, my projects are typically in between - there are some requirements provided, maybe they are a bit vague, but we largely know what we are trying to build, and we do some design to figure out what the big pieces are and how they ought to fit together. I've found some real success in approaches that subject the design of the software up front to the stresses of potential change, like those given by Juval Lowy (IDesign, Righting Software) and Barry O'Reilly (Black Tulip, Residuality Theory). And to subject a design to stress, of course first you have to do some system design! If you're not at the stage of being able to put together a design, I think it's likely more a prototyping or an investigative programming kind of activity, which to me has a different goal.


You can’t really do more than 10x anyway, because the solutions change when the order of magnitude changes. In fact even at 9x you’re risking investing in a more expensive solution than you actually need.

The real power is not in what you architect, but the negative space you put in the architecture that allows for future growth without committing to it today.


Simple code is easy to change, I feel like a lot of architecture astronauts (cough Enterprise Java cough) don't get that.

"Why is everything an interface?"

"Because we may need to swap out its implementation in the future. It's Clean Code."

"Then refactor it out into an interface when you need it! What's an IDE for?"


Interfaces have a different value, they make it easier to build testing facilities for unit tests as well. This works in many languages, Java, traits in Rust, etc.


Is it really a good idea to bend your code to satisfy your testing environment and not the other way around?


"Bend" is not a good description of the intention here. Good code should be easy to test effectively and interfaces are a common way to achieve that. We should absolutely design our code with testing in mind.

For example, you have a repository for some entity as a dependency of a service and you would like to test some behavior of the service when the repository fails. In this case, the service can depend on an interface of the repository instead and the test can create a stub that implements that interface, whereby you can simulate the error. If the service depended on the actual repository, this becomes very difficult to do.


> Good code should be easy to test effectively

Can you explain why this is? or maybe more specifically what you mean by this.


It's kind of unavoidable once you're trying to do unit tests with mocks. You either end up with interfaces for everything and a dependency injection framework, or you end up with some very non-isolated tests that take ages to run because they need inseparable parts of the whole real system.


Your tests are part of your code. Your implementation code should not make writing your test code difficult.


It's not really about replacing the implementation.

Depending on interfaces means you (by definition) can't depend on the implementation. It's a decoupling thing more than anything else. You get some benefits for mocking during testing and so on but it's mostly just to get something like a C++-style class declaration/definition (or .h/.cpp)-split.

It's admittedly taken a bit too far in some codebases and often done without much thought to why, but it's also a bit of an artifact of pre Jigsaw-era Java where it was very hard to enforce boundaries between modules. In that light it makes at least some sense.


Because then someone else comes along and builds to the implementation and now that simple change becomes a mess of a refactor.

An interface isn't just a type construct, it communicates the intent of the systems design.


Depending on language, interfaces also places additional constraints as it limits interdependencies.

For example, in Delphi classes in the same unit (file) can access each others protected members. This is usually a code smell, which interfaces prevent.


I dunno - I found "Enterprise Java" objectively quite easy to refactor to fit high-scalability and modularization needs. Having stuff defined as interfaces actually helped greatly with this.


Straightforward yes, quick no.


The real answer is interfaces, even if they only have one implementation, let you not have to worry about cyclic dependencies.


How? (honest question, I'm curious!)


If you start with AService in module A, other modules need to depend on A to use it. You get into circular dependencies trouble when A needs to use a service from another module B that itself or through a dependency needs the AService. One solution is to split it into an AService interface you put in a new module, call it e.g. A-api, and leave the implementation in A which now depends on A-api. Do the same for B/B-api. It's a lot less likely that your -api modules will circularly depend on each other because they only contain interface definitions, so A can safely depend on both A-api and B-api and B can safely depend on both B-api and A-api.

A more concrete example of an A and B that need to interact with each other might be a user account service and a (financial) transactions service. (Why not let them live in the same module? That can also be a solution, but as the scope of each grows and especially if separate teams end up owning each it can make sense to split them and enforce some boundaries via the interfaces.) Another example could be a notifications service and several of its clients, like whatever is handling user posts kicks off a notification event and asynchronously the notification service processes it and may need to reach around to call back into other services that could be part of the sender's module. (Passing simple lambdas may be an adequate alternate solution too.)


> One solution is to split it into an AService interface you put in a new module, call it e.g. A-api, and leave the implementation in A which now depends on A-api.

This is what Fowler describes as “Separated Interface”. [1] Specifically he calls out the specific need for this special case:

> “[Y]ou might need to invoke methods that contradict the general [module] dependency structure. If so, use Separated Interface to define an interface in one package but implement it in another.”

It’s a special case because until you need to “contradict the general dependency structure”, you don’t need to do it. In particular, this is not the first tool one should reach for if you have a circular dependency between two modules.

[1] https://www.martinfowler.com/eaaCatalog/separatedInterface.h...


> Simple code is easy to change, I feel like a lot of architecture astronauts (cough Enterprise Java cough) don't get that.

It's not easy to uniformly change in a way that doesn't add regressions. It's also not usually easily changed except by a large team. That large team likely adds more regressions because communication is lossy.

"Simple code" that avoids abstractions leans more on people, code that embraces abstractions (good or bad) attempts to lean more on the machine.


The examples in the article are not wrong, but seem a bit simple. Rarely do I find the cost of the switch from not scalable to scalable to be close to zero. More often, I find software built on a years-long pattern of these types of low-cost immediate-effect decisions. The result is something that can hardly be fixed at all.


To stay in the examples: when it came to the text file and database I thought.

The issue usually isn't that you receive too many mails for the size of a text file, but that you probably want to store additional data soon, in a structured way. So you start hacking some field separators in your file, it's becoming a file format and switching to a database later is notably more work than it would have been at the start.


I dunno about that. It's really easy to switch from a CSV format to a single table in a DB.

Switching when your data is spread out over multiple CSV files is a lot more work (still less work than was expended into CRUD across multiple CSV files, though).


To store additional info one can use a tab separated file with filed names in a header - such file is easy to load into a database table later.


I think that's part of the issue. Once you're using a text file, adding "one more feature" to the text file implementation is always easier than switching to a database.

So you add your tab-separated value. Then you find out that in some cases the extra field you're adding can contain a tab. So now you need to be able to escape actual tabs in your tab-separated file - because once you have the tab-separated file, adding an escape is still easier than switching to a database.

Cut to: 3 years later. You now have optional fields, a key-value implementation, locking, and two or three bits of code that work around the edges of the "proper" API to access the file directly, because at every stage that was easier than creating a "real" database backend. But now its harder to change than it would have been to do the right thing at the start, and the sunk cost fallacy means that everyone's brain is resistant to throwing away all the work that's been invested in the current solution even if they know it's the right thing to do, and even though whenever you onboard a new recruit they always fail a SAN check and take a couple of weeks to recover from the horror of your codebase before they can face it for reals.

And implementing the next new feature in the text file still looks easier than replacing the whole implementation with a real database right now, so let's put that off for just one more release cycle, OK?


What I've also experienced is a whole ecosystem has been built up in other departments in the organization (but outside of development) around the crazy CSV file formats, eg: Marketing uses it to trigger campaigns; Accounting uses it to reconcile live accounts; Support uses it to find accounts.

So anything you do to make this better means you now have to deal with this ecosystem. At best, you're solving some pain point these groups have and they're happy about it, but in most cases you're probably forcing unexpected, extra work on them to "fix" something they perceive as not broken. At worst, you don't discover they're even doing this until after you've changed it, and now you just broke a production system you didn't even know existed.

In some cases, the hack another department put in place will also have their own ecoystem built on top -- it's turtles all the way down.

How bad this situation is directly corresponds to how your organization has grown and how siloed different departments are. And keep in mind we're talking about a fictional but easy-to-understand "emails in a CSV file" problem -- most of the real-world problems are significantly more complicated.


I'm not saying that one should use a file as long as possible but a real DB from the start for anything the system writes is not always warranted. Yes, with a file you'll need to think about either escaping or data validation but there is an option to use json-per-line at a price of higher overhead (compare to TSV) and a JSON library will take care of serialization (including escaping). Locking is not required (on Linux/BSD) if the file is opened in the append mode and each line is written as a single write request (which is atomic).

> But now its harder to change than it would have been to do the right thing at the start, and the sunk cost fallacy means that everyone's brain is resistant to throwing away all the work that's been invested

With "proper" implementation suck cost fallacy can be even stronger because more effort was spend to make it right from the beginning.

IMHO switching from a file to say ClickHouse would be as easy as switching from MySQL to ClickHouse (or my be even easier).


This is like the cartoonishly strawman version of this argument. Also the text file example is awful.

Do you really want do deal with durability and parsing a flat file, etc? Is that really easier that using a simple database? I see these kinds of arguments a lot from developers who think things like relational databases are hard to use, which artificially inflates the argument against "extra complexity"

There are hard tradeoffs to make, and the interesting arguments are in the place where reasonable people disagree about what kind of robustness will be needed over a particular timescale. It's never as simple as "it won't scale" where it's obvious to everyone that having millions of users tomorrow isn't realistic.


The text file example is pretty flawed. There are quite a few benefits to using databases over text files aside from being a place to store data. Structuring, availability, race conditions, etc. Including and using a NoSQL database is hardly the bottleneck in producing a production-level app/website/whatever.


This site being a prime example of how storing production data in text files on a file system leads to terrible performance.


It's a perfect use case for SQLite or similar. It's a low overhead solution (just a library) and you get a lot of benefits out of the box.

A text file? Maybe if it's an append only kind of deal.


IME you're gonna screw it up the first time anyway, and the simplest option is usually the easiest to patch once you figure out what you actually need.


This is mine too and I feel like people's personal biases make them think otherwise. 9 times out of 10, the extra effort and concern put into making a new thing more scalable or generic wasn't needed, that time and effort could have been better spent doing other things. Also, skipping the work and going for the simpler option wouldn't have blown timelines nearly as much. Problem is, you don't feel that wasted time very harshly, but for that 1 in 10 you have to go back, throw out code and re-do it, the pain is very sharp and acute so everyone takes the more burdensome path and pats themselves on the back for being a good architect.


This is true and I agree, but not every project is the first time. When you already know the territory, the question often becomes “now or later”?

The correct answer is most often “later”… but the trick is being able to work out what you need today.


You can mitigate both examples with appropriate internal API choices. Though the second is much harder to retro fit into an application as it grows than the first. For the email example, you build something with a similar API to what you would expect the database to expose (DAO, etc ...) and eventually replace the code backing it with something more robust using the API.

The second example is harder, but not insurmountable. I would probably start with principle/credential reference of some sort that is provided to an evaluator along with a "right" reference rather than adding a method directly to users. If you keep the mechanism mostly opaque to the business code, you can retrofit more complexity into the system as the needs arise. Retrofitting that into a system get's really complicated really fast if you don't have a reasonable API to start with though.


Pipes that can easily be swapped out. Is there a name for that? I've said "hooks". Like when a 3rd party application allows you to squeeze in between the steps that it's internally taking and that you can't (shant) edit.


Abstractions is a good word for it. We do it a lot in computer engineering https://en.wikipedia.org/wiki/Hardware_abstraction


I find myself nodding along. But it also occurs to me that sometimes the maligned seasoned developers in the post have just seen the same pattern a dozen times in the past and know exactly where this is headed. Both of the examples are so typical you can just take an off the shelf library and do it future proofed from the very start with no extra original code.


This also ignores the overhead of business processes. The dev effort may be very low to switch the (flawed) example of text to database, but the product/project team is going to have to create and maintain stories for it, the qa team is going to have to test it, the documentation team is going to have to update their information (where necessary, of course), the support team is almost guaranteed to get calls about it so they need to be informed, and I don't know about you, but my production data rarely looks like my dev/qa data, so there's the non-zero potential for bugs to creep up in the data migration.

All of that sounds a lot more than "the same" to me.


I think an understated point is that scaling changes costs both ways. A mode of operation that is cost-effective for netflix or google is likely going to be prohibitively expensive for a small start-up. At a sufficiently large scale any constant factor overhead is negligible, big-O is all that matters. At a small scale, the constant factor can drown you.

It's a very common trap to design a system with technology that could scale to a million users, but without being able to afford the servers so it sort of struggles to serve even a hundred users.


We frame these situations under the banner of "solve the problem you have".

Avoid premature optimisation for problems you might have. If your site only has 1000 users, there's no benefit to deploying it in infra scaled for a million users. Could you have a million users in the future? Sure, solve it when you start heading that way. Could your site stagnate with 20,000 users? Absolutely. It's a waste of engineering efforts (and money) to scale for 1 million users now.


I try to apply powers of ten thinking.

If I have 10 users then I should build for 100. If I have 100 users, build for 1,000.

(Of course this rule of thumb only applies if you’re not shrinking and not doing crazy growth.)


A seasoned dev will just say "The cost of changing the the text file to the database will only keep growing with our system. The sooner we use the database the better". If necessary, he'll back it up with an anecdote, and his expertise, and you won't have a counter to that.


Migration from one database to another is even harder than migration form a flat file to a database.


Typically scale is a matter of math-weighted decisions in the process of architecture planning. Consider the planning process for freeways in your local city. Here in Texas we have a phenomenon call mix-masters and some scale well for future improvements despite their complexity and fixed space in the present. By future improvements I mean things like civil water infrastructure, rail, electrical power routing systems, and so forth.

https://www.google.com/search?q=mix+master+freeway&client=fi...

Likewise in software if you know how to plan, how to architect things (put plans into action and pieces together), and how shape the finite pieces into composable units any application can scale well. Scale is not a matter of cost benefit analysis but just a combination of knowing how to plan and experience with the processes currently available.

Most people get this wrong for a couple of reasons:

* They have not been taught how to plan. This requires quite a bit of experience to really get right. It isn't rocket science, generally the most primitive the better, but it does take a lot of practice. Looking at the freeway example imagine investing billions of dollars into a major freeway system that scales insufficiently because the engineers couldn't be bothered with some planning or modeling.

* They outsource architecture. Frameworks are basically boxed architectures. The problem with frameworks is that they are generic because they try to make everybody happy. Your application is probably not generic. It probably has specific needs serving a specific business. In our freeway example labor can be outsourced, but the architecture and planning are still custom to the given project.

* They are insufficiently trained on their given software platform. For example in front end web development the DOM is the center of everything and everything uses lexical scope. Developers can choose to not accept this reality as ignorance is bliss, but there are consequences and limitations that come from putting your head in the sand. In our freeway example engineers survey and evaluate the composition of the land and make determinations about the movement of rain around the location. They don't ignore the reality of their situation because a given work task is outside their area of experience.


Much complexity can come from trying to allow for future extension points. I believe a developer trying to second guess where the domain expert sees an app being extended is futile. Requirements always seem to change down the track in ways the original developers did not anticipate, So simple ends up being better.


It sounds to me very much like some of the guiding principles behind the agile movement, or maybe more specifically XP.

Building stuff to be simple, and to be engineered so as to embrace change as requirements appear is very much part of that vision. Building for current requirements with the confidence that the software is robust and extensible so as to be able to tackle future requirements is basically the job - it's been a long time since i've seen an actual software specification (last one I saw was in 1991!).

Of course it's all about balance. Too simple so as to make refactoring painful, or too complicated to support scenarios that aren't likely to happen is bad. Experience tells you where the middle ground lies.


File storage. So simple until you factor-in read/write locking. Now it's more complicated than using a database.


This felt like someone intercepted my inner dialog during prototype development.


The examples are pretty bad because the features he talks about are already included in most frameworks. Adding a non scaling database is trivial and role based security is also trivial. A seasoned developer might need three or even less days to implement these things.

Meanwhile the time spent going from one inadequate solution to a less inadequate solution might take more than that.


So pay attention to your APIs.

If you have an API that still makes sense for what you'll need to do at larger scales, it matters a whole lot less if your current implementation doesn't scale.

It also means that if when you start exploring options for scaling, the pain points around integrating your POCs won't be nearly as bad.

(No you won't get it right, but you can get it far less wrong.)


Engineers do really love to engineer things. It's exciting to be able to do things "The Right Way". I think the actual pragmatic solution is somewhere between what is presented.

There's an initial solution that is not scalable, but is designed in such a way that the path to a scalable solution isn't a complete re-write.


> Surprisingly, in most cases, the cost of change is close to zero.

That’s surprising alright. The reason developers bring up these what ifs is that they’re trying to prevent the repeat of pain they’ve experienced in the past.

Which is to say, the pain that happens when it turns out the cost of change is terrible. For some things like localization and security, the cost can trend to infinity (we literally never know when the work is done, we just haven’t identified any tasks recently). But there are many many more.


I like the author's intention and totally agree in building an as simple as possible solution which solves the problem and evolve from there. This is not a discussion about writing robust and good quality code, for example do we need interfaces or other abstractios in the code, imho. The topic is more about - do we need a micro service architecture in the beginning - do we need k8s - do we need to optimize in ms ranges to serve thousands of concurrent user?

In this context i like to refer to gall's law.

http://principles-wiki.net/principles:gall_s_law


In the first example, you have to implement the file reading approach now, then the database reading/writing approach, plus a migration script. You did extra work that you would not have done with the DB-first approach (migration + the initial system that goes to trash)

In real life, as projects get more complex, the migration + initial system cost is so high that I'd rather come up with a clean architecture first. Also, it's more fun as a programmer to know that you are not working on a dirty solution.

Also, in real life, if we scale up these examples, maintenance cost can be higher for the dirty solution.


A system with a bad architecture is way worse than a system with "no" architecture.


I have always felt that making software "scalable" is just an excuse for not making software performant. I would prefer people attempting the latter first.


By definition a "scalable" system AFAIK is a system which can handle more load (say 10x more, exact number is part of the design) without architecture change. Not always for this you need shared-nothing-horisontally-scalable system. If your code is fast, current load is low then you may handle 10x more load by running the same code in a VM with more CPU cores or even in the same VM/HW but having higher CPU utilization.


I think the gripe is more directed at letting super inefficient code stand, because it's easy to scale it 10x.

Often there is no problem with the architecture either way. Just code which if it was faster, you wouldn't need scaling out just yet.


An important skill is to differentiate when it's absolutely worth planing ahead. Sometimes what would have been a simple change at the beginning becomes a risky refactoring in the future. Other times starting simple and change it later is just right.


TLDR: If you can do it later with no huge headache, then do it later.

"Lean Startup" [0] is all about making your MVP and proving that it works on however small of a scale. Ex: Fantasy Football was originally done on pen and paper [1] and it worked. And then it naturally grew from there.

I find myself most comfortable with looking 2 turns ahead and building for that.

[0] https://theleanstartup.com/principles

[1] https://www.mercurynews.com/2015/09/13/fantasy-football-was-...


TLDR: YAGNI




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: