I've led 2 major rewrites, both of them extremely successful, replacing slow and unmaintainable systems by fast and extensible ones (and they have been extended well beyond the capabilities of the original systems).
Rewrites are like heavy power tools - they are dangerous, but they can do great things when used right.
The other fun thing is that management tends to give you benefit of the doubt for the first six months give or take. They don’t know it’s a boondoggle yet that you will eventually come to regret almost as much as the original (and which they will regret more). This realization dawns slowly enough for most people that they mistake it for a desire to move on to another project. A certain number of the guilty parties will have moved on before the bill comes due.
Love the parasite analogy, to add to it, it perhaps morphs into a symbiotic relationship between them for a while, with the parasite eventually becoming the host and the original app becoming the parasite that is gradually getting smaller and smaller but you just can’t shift, and the finally one day you clean out that final piece and it feels great. Meanwhile the business got all the new features they wanted when they wanted them and it kept moving forwards at all times.
Yep. I think the biggest source of "we could make this so much better!" re-writes I've seen in my career is in-house design system libraries. Just a big library of button styles and such. I've personally been responsible for 3 massive failures over 2 companies on that front in the past, and the place I'm working at now actually has a 3rd-order design system rewrite. (An old library that's considered legacy and is still used, a replacement that covers maybe 80% of that API surface with "newer stuff" and is used in some places alongside the old library, and now we're making a new-new library to "replace" those two, but will ultimately live along side them.)
Maybe I'm just getting jaded but I'm 100% on board for "just never rewrite". It's almost always a money pit to start fresh, but it's pitched to management as a time save or efficiency move.
Rerouted are definitely more work than usually anticipated, unless most or all of the original team is available.
The other option to figure it out again from scratch has its own issues.
If existing business logic works and has been hardened, it can be worth looking at wrapping it an API instead of rewriting and reimagining from scratch.
There’s a lot of orgs out there that will do all of:
1. Never consider a rewrite / v2.
2. Never schedule refactoring work.
3. Not accept any refactoring work as part of feature work, because the refactoring is always deemed more risky (despite tests). “Let’s split this PR into feature and refactor.”
So my question to these orgs is: when are we supposed to clean up the hacks to ship yesterday’s features?
Edit: I'm definitely not saying they're always right, or even often. I'm usually with the line of thinking of the people commenting to me. It's just, I've also seen systems that had ten years of further life and never needed their hacks fixed. We developers are sometimes too fixed on the quality.
Paying people 100’s of thousands of dollars a year to get tens of thousands’ worth of work done because everything takes eight times as long as it reasonably should is both expensive and gets old real quick.
But there is no proof it takes 8x as long or costs 100000s or more because of that; if you can prove that then many companies will say, yup, that’s a shoe in, let’s rewrite! I have been in a lot of rewrites and more often than not, it did make the tech folks happy, but it did nothing, immediately or middle/long term, to the bottom line or the productivity. Worse; the productivity plummets at first because a lot of new things have to be learned.
I don’t like it much either. I have friends who don’t think about such things. They tend to be happier 95% of the time. Not as useful when everything catches on fire though.
But they don't care - instead, they just start a new one and repeat the process all over again.
The mistake here - made not just by the devs complaining about tech debt, but also by all of the company's customers - is that the goal is to produce something of value. It's not. It's to produce something promising to be of value at some point in the future, getting as many people as possible, as fast as possible, to buy into that promise (and stringing the paying users, if any, along for as long as possible), and then jettisoning the whole thing once it's past its peak, escaping with riches, to do it all again, or finally go FIRE.
Yep exactly my experience working in a startup. When it was time to make a real product our of the shit spaghetti code they wrote to win customers with """working""" demos, it was too late. Bankruptcy followed.
Isn’t this a slightly different problem? The product built was just vapourware and not really doing the job. So customers went elsewhere when the real version didn’t appear in time? Often companies will win the customers with a genuine working product but complicated code and then hire like crazy and all the new developers struggle and want to rewrite it all. But they mustn’t because it is a working product earning money. So gradual improvement and controlled rewrites of sub areas that need to change the most are focused on to succeed. If a company has gone bankrupt because of spaghetti code of a demo product, it feels to me like the problem was the product being just a demo rather than it being spaghetti code?
You're not wrong! But IMO, the product was being limited to being only a "demo" product because the code was too bad to run extensively in production. I guess it was both problems.
This post doesn't provide any evidence or even any arguments.
There are lots of different types of rewrites, and rarely does anyone find themselves in the position to rewrite an existing system in a way that perfectly reproduces behavior 1:1, because rewrites are often motivated by a ton of ugly features/flows that were forced as kludges, where the whole point is to improve them, along with making it vastly easier/cheaper to write new features.
And no, waterfall works just as badly for rewrites as everything else, because when it comes time to program it, you run into bugs and unexpected behavior with libraries and the realization that your new architecture has problems and all of the problems that come with regular non-rewrite programming.
That idea that "First you need to determine the requirements. All of them. In excruciating detail." is the problem waterfall always encounters -- to write the perfect list of all requirements, that's ultimately writing all the code. Such supposedly perfect waterfall specifications are an illusion.
So no. Approach rewrites with the modern common sense of any software project, which involves defining and building out working components incrementally with testing, according to defined high-level priorities, and flexibly dealing with issues along the way as you run into the same unpredictable problems you always do.
And finally:
> Leaders who have delivered successful rewrites often describe the process as one of the greatest accomplishments in their career, but they never describe it as fun.
This sounds made-up. What leaders? Why would a leader ever describe a rewrite as one of the greatest accomplishments of their career? I've never heard of such a thing. And why shouldn't it be fun? Rewriting something to wipe away all the old horrible tech debt and start over from scratch can be super-fun!
I think I understand where the author is coming from. I have not done a lot of rewrites but the largest one I did was an application for maintaining and generating our product catalog. The original design had split North America and international catalogs into different files that were 95% the same and these had to stay in sync. Other oddities of the old system had page numbers manually adjusted when new pages were inserted and a hand maintained table of contents. All together there were 12 different catalogs generated and each one had a manual process of importing new pricing before being generated.
What kicked off the project was a desire to correct the amount of manual labor required for generating new books but as soon as the project was proposed we had a VP that wanted to justify the project by doing something more. Now more was a little hand wavy but the concrete things were color pictures and automatically generating and laying out the entire catalog automatically. The problem was there was no data for either of these task; we had no color pictures and there was no repository of product attributes to automatically lay out the catalog.
The break in the project that made it become possible for success was that the VP with hand wavy dreams left the company and I was able to start work on a rewrite that solved the important pain points without adding the overhead of multiple departments coordinating taking new pictures and and building a PIM with the required attributes.
It has been been five years since we went live with the new catalog and the color pictures are around 50% of the book now, and the PIM has is getting rolled into and entirely different project. If we had waited then we would still be waiting on the benefits that kicked off the project in the first place.
> > Leaders who have delivered successful rewrites often describe the process as one of the greatest accomplishments in their career, but they never describe it as fun.
> This sounds made-up. What leaders? Why would a leader ever describe a rewrite as one of the greatest accomplishments of their career? I've never heard of such a thing. And why shouldn't it be fun? Rewriting something to wipe away all the old horrible tech debt and start over from scratch can be super-fun!
It's also pretty hard to argue against a statement like that; if you point out someone who says otherwise, they can just say "they're not a leader". If anything, it's somewhat circular, since basically they're defining "leaders" as people who agree with them.
There seems to be an underlying tone of waterfall=bad.
That’s missing the nuance of why it’s often considered bad. Usually because you never know the requirements up front, leading you to build the wrong thing.
Except, when doing a rewrite you do know the requirements up front. Based on the experience of your current system.
No, that's only half of why waterfall is often considered bad.
The other half is because of the unpredictabilities in building software period -- internal rather than external. You can build out one component fully and another component fully and then finally test them together at the end and discover there's a horrific race condition that requires rebuilding both from scratch to fix.
That's why a more iterative process is considered best practice today, to find showstopper bugs earlier rather than later, particularly to catch and address architectural issues sooner rather than later.
(But also, like I said, rewrites are rarely trying to produce a 1:1 copy in the real world. The whole point is usually to fix a bunch of problems with the previous system along the way, not to continue to reproduce the unwanted behavior. I'm not saying there are never perfect 1:1 rewrites that intend to reproduce all the bugs and warts and all, but it's rare.)
Having an idea of what is known and not known in a solution can shed a light on approaches.
It’s more possible for known (existing) solutions to go through the phases of waterfall for a rewrite.
Where unknown solutions exist, using agile to get the vectors correct and signed can help a great deal.
I have learned from experienced PMs how even agile waterfall loop where each step of waterfall itself is an agile undertaking can be beneficial in some cases.
A rewrite can represent getting something much better than it is in a different way.
A rewrite can represent getting something much better than it is in a different way than developers who may want to start from scratch to relearn the lessons of the past and develop technical debt in a new way.
Of course to be optimistic how you both experienced and new developers will be open minded to listen to approaches they might not be used to, and not assume that if they haven’t seen or understood a way forward it doesn’t mean there isn’t understanding in why things were done in a certain way.
Needing to feel significant from a coding project maybe worth some introspection - empowering users to benefit from software is most important. Tech should work for users and not the other way around to validate developers for finding a novel way to solve a problem that doesn’t move the needle for them
You guys are asking for permission and leadership supervision and scheduled time to do your rewrites? I was planning on just banging it out over a weekend and then sliding it in all stealth like on a slow Monday morning. I ain't no fancy nuclear weapons silo programmer like the author here.
This feels too dogmatic to be very useful. I'll ofter a softer version that I think everyone can agree to?
- Don't mix rewriting code and adding features in the same commits
- If you're squashing commits on a feature branch, and there are rewrites in there, cherry-pick them to master and rebase
That at least leaves you with the ability to track bugs down to rewrites vs. feature additions.
The advice in the original post feels like it's potentially useful if it's in a large system that sees heavy use with good test coverage. Then it's probably the right impulse. But a lot of code isn't that.
> First you need to determine the requirements. All of them. In excruciating detail.
Funny, I am currently in charge of a rewrite that did exactly the opposite (start with an MVP, do agile increments), and just recently went into production with great success.
It was the third attempt to replace a 25 year old web app written in C. The previous attempts both failed exactly because they tried to "determine the requirements in excruciating detail" - they never got past that stage.
Waterfall is a formal development model (5-10+ phases) that includes strict requirements, specifications, documentation, development, testing, and deployment. If you need to build an elevator control system or a SCADA system for a steelworks plant, then definitely go there.
Rewrite is throwing away code (and downstream) and starting over.
There comes a point in some projects when support costs exceed greenfield development costs, or prevent forward velocity due to tech debt. This is rare, but it happens. The problem of massive rewrites is they are too often not lean and equivalent to big design upfront (BDU) because they cannot deliver an MVP and they throw away features users depended on (but failed to questionnaire, focus group, or look at telemetry for). The other issue is massive rewrites too often are used an excuse to avoid refactoring and selective rewrites with the magical thinking that "new == better". Too often, this is reinventing the wheel that isn't much better for 2x the effort.
It might be correct, it might not; either way I would find this more convincing if there were even one supporting argument behind these assertions. I certainly don't believe them as presented here.
Large scale rewrites are really good at promoting the people that had the idea and "delivered" the rewrite with one or two bogus metrics improved and leaving a dumpster fire that those that will have to maintain it in the long run will pay the actual price for.
Only get involved in large scale rewrites if you're the ideator that will be blessed with the laurels of the first delivery, if you're going to own the shitshow later, run away while you can.
The only successful rewrites I’ve been involved with were clandestine ones. A conspiracy of developers and occasionally one first level manager to swap all the parts of a “working” system for ones that didn’t need scare quotes. Usually trying to stay just ahead of requirements that feel like giant left turns for the original code, but are tenable in the new.
Sherman does not appear to understand what waterfall is or why people rewrite code.
Waterfall specifies that phases of a single implementation are performed one after the other, completing one before the next phase begins. The classic example lists this as "design->coding->testing". What Sherman is describing is "iterative design," where you build something, put it in front of customers, then make changes suggested by customers (or maybe there was a list of features you wanted to add yourself.)
Throwing away code is okay if it leads to "better" code where better means faster, easier to fix, smaller code size, etc. Often you do not know what the issues with your code are until you write it. Writing code for "experimental" purposes is decidedly iterative, not waterfall.
I like Agile, but I think the industry needs lot more Waterfall. Imagine building a house with Agile, you would start from the roof (rain is the most pressing customer's problem), and the foundation would be built last (after, 5 years into the future, a crack appeared in the house wall).
I also think what's problematic is this debate goes too much around project management, rather than actual methodology of building software (software engineering). I have an idea how we can reframe the debate to this effect.
When software engineers plan and write software, they actually build two things - the product itself and the infrastructure/tooling for it. From the project and product management perspective, the focus is on the first and the latter is often neglected as unimportant (and waste of time). This sort of jives with the human intuition, we want to build a house, while foundation and scaffolding are only incidental to it.
However, I can't help but notice that the software products that are actually most "agile", and easy to change, are written in a way where there is a huge amount of generic infrastructure and tooling with a tiny veneer of product ("business logic") on top of them (and in fact this goes all the way down). For a canonical example, Emacs, to the point where people even joke about it.
But also, this is completely different to how the commercial industry builds software. The focus is on very specific features, pieces of business logic, instead of generic infrastructure, that makes delivering changing features easy. That is a very short-term view, which actually becomes more expensive as the time goes on.
I think the way we should build (commercial) software products is the first one, lots of infrastructure and tooling which only incidentally happens to create a product. See also functional core / imperative shell, or Unix philosophy.
That's why I think Agile failed in the industry, because the project management people who pushed for it, didn't understand the above distinction. This distinction is also related to Eric Raymond's the Cathedral and the Bazaar distinction, although his is more about social organization around building software than the actual software architecture. (The relation manifests in Conway's law.)
To conclude, I think we need more Waterfall in the sense that we need to think (plan) more deeply about the infrastructure and tooling that we need to build our products. Jumping directly into building a product that solves customer problem will have detrimental effects on long-term productivity.
I would also add I see role of open source only as a relatively small part of all the infrastructure and tooling. As you get closer to the actual product, more and more infrastructure is specific to your business domain. The open source will not save you, because it's too generic. Emacs is not just a generic OS or a Lisp interpreter. It has huge amount of infrastructure around text editing specifically.
I think scrum and waterfall are both just procrastination. Instead of building the app, you do all this other stuff instead. Like when you clean your house instead of doing what you're supposed to be doing. Nobody wants to do real work, so productivity theater ensues. It's really all just productivity theater.
The most productive way is kanban because it's like sending orders to the kitchen in a restaurant. There's no due dates, here's the stuff that needs doing, just do it.
Yup. I think this was a sideways swipe at my analogy so I'm going to double down now because I'm right. You would never get your food before closing at a restaurant if the cooks had to do sprint planning and provide estimates and deadlines. For some reason this is not only acceptable but preferred in software industry.
couple of points. building software is so far from building a house that's not really worth comparing. (just bringing this up because the analogy is used quite a lot). second, the signatories on the agile manifesto may have done some management but most, if not all, were strong developers.
While I think you can write good software with other methodologies I do think agile is a good fit for something that is supposed to change a lot which was the original idea behind software (as opposed to hardware) existing. the core idea is that things will always change, set yourself up so you have early knowledge and can change tack when you need to.
Another thing is depending on your experience you may have been exposed to differing version of agile. I have seen many places where it has been distorted into a KPI / mini deadline / micromanagement framework, which was never the point. From my point of view agile was always about developers teaching managers how they can manage us effectively, given the uncertain nature of software estimation and process of trying ideas and learning new things feeding into working software. it's part of our professional duty to explain how to do these things properly.
> building software is so far from building a house that's not really worth comparing
I disagree. I think the comparison is between engineering part of building a house, i.e. creating blueprints. The actual "blue collar" work in SW engineering is making sure the whole thing compiles, runs and is tested.
> the signatories on the agile manifesto may have done some management but most, if not all, were strong developers
Yeah, that's what I am suggesting with my post, maybe it worked for them, but it was later misinterpreted, or they thought that the secret of their success lies elsewhere.
> is supposed to change a lot which was the original idea behind software (as opposed to hardware) existing
I disagree here. Software is easy to copy first and foremost (you don't need much materials). I think easy to copy does not mean easy to change. For example, DNA sequence is also easy to copy, but difficult to change. (And that's why people wish for rewrites from the ground up.)
> agile was always about developers teaching managers how they can manage us effectively
I disagree. That implies you have a much bigger problem - useless managers. Managers simply have to understand the domain they are managing (and ideally have hands-on experience). There shouldn't be any "teaching" going on.
But I also agree, in a way. I think who needs to listen to SW engineers are product and project managers, that the software is not just a bunch of features to be built, just like a house is not just what you see on the promotional render. That's what my post was about.
I find that many analogies that seem not to hold water actually hold a lot of water, frequently including some the purveyor hoped it wouldn’t.
“Low hanging fruit” is about as stupid an idea if you understand how to make performant systems as it is if you own a mature fruit tree. Going for the low hanging fruit on a real tree grants you the biggest mess. Neither domain works the way idiots pretending to be enlightened think it does. And you can paint a sane picture in both if you know wtf you’re talking about, even a little.
Similarly, what the home improvement industry has taught me is that you can build a house with siding and a roof for about 10% of the cost of the finished house. This house will keep you dry. It will not keep you warm or clothed and feed you. For that you need plumbing, wiring, ceilings, walls, insulation, and finally flooring. These are intensely labor intensive jobs, and if you 1) learn a few of them and 2) know the right people, you can build things to code and pay someone only to sign off on them. Drywalling doesn’t even require that, but it’s a shitload of work.
So yes, in fact you can build an MVP of the house. It’s just the bones and a couple of the most obvious features. And if you live in an old enough house, it’s been refactored and retrofitted so many times that you can’t tell that it used to look a exactly like five other houses within a block radius of your house. And so did all you’re neighbors’.
I have never seen a rewrite where there were no new features added to the software. Maybe the is part of the message, don't add features until you hhave replaced the old system. That sounds like a good idea but almost no one approaches it like that, which is one of the reasons rewrites often fail.
rewrites (successful ones) are some of the most fun I've had in my entire career! it's incredibly satisfying to already have all the requirements and end user features crystallized and battle tested, and have your sole job be to figure out a better way to express them in designs, algorithms and code.
that last sentence doesn't make sense to me - as a leader I enjoy succeeding rather than failing so if a rewrite is what's necessary then that's the fun option.
rewrites happen when changing requirements bent your code past its limit. they're natural and necessary and resisting one on principle is way less fun than the alternative.
this is interesting. in my experience it is always the third situation. they try to migrate the data slowly from "old system" to "nexgen". it never works. it sounds like maybe the answer is do a big bang switch over to the new system or don't rewrite at all. (i.e. the system you are rewriting probably needs to be small enough)
Ship of Theseus is the other way. Fact is a lot of companies have code that has been rewritten and they don’t know it, because the devs that did it know they would have been stopped if people knew. But piece by piece, bit by bit, everything was shifted to a new organizational paradigm from the original.
You can’t really do objective case studies on development processes that involve subterfuge. Even if they work.
You guys are asking for permission and scheduling time to do your rewrites? Leaders? Pfft. Bang it out over a weekend and then slide it in on a slow Monday morning.
I appreciate that this isn't an overly long blog post that says too little by repeating itself a lot, but I'd prefer a little more to it. Some personal or second hand experiences of completed or failed rewrites would be nice. Alas, as it stands, this post has very little to offer.
I’m not sure that article really aged well. Code rots pretty quickly nowadays. Convoluted old code no one understands starts to create wicked bugs as future teams wedge in new features. Processors get overloaded. Memory leaks appear. Libraries evolve. Design trends change.
I'm just seeing a lot of articles about bad rewrites recently, it could have started even before this one. It's just it wasn't as prominent as it is now I think
Don't do a re-write kids, but if you do, do it from within the existing software like a parasite or a chest bursting creature from the movie Aliens.
Either way it always be messy, terrifying and most of the time ends in tragedy.