Hacker News new | past | comments | ask | show | jobs | submit login

I perpetually love the references to Ship of Theseus, and find it particularly applicable to this problem.

You mention fundamental functional problems, and I'd like to add something: sometimes it's not a functional problem, but a changeability problem. The code functions fine, but the process of adding a new feature is incredibly painful.

I've done big-bang partial rewrites of systems before, quite successfully, but I've got a Ship of Theseus rule of my own that I follow: no new features during the rewrite, and no missing features. The first example that comes to mind was a rather complicated front-end application that had become a total spaghetti disaster. It had been written using Backbone, and from my research it fit the Angular model of things quite well.

I took a pound of coffee with me out to the cabin for a weekend, and rewrote the whole frontend. Side-by-side. I started with the first screen, and walked through every code path to populate a queue of screens that were reachable from there. Implement a screen, capture its outbound links, repeat.

Nothing changed, but everything changed. The stylesheet was gnarly too, but I left it 100% untouched. That comes later. By keeping the stylesheet intact, I (somewhat) ensured that the new implementation used identical markup to the old system. The markup was gnarly too, but keeping it identical helped to ensure that nothing got missed.

48-72ish hours later, I emerged a bit scraggly and the rest of the team started going over it. They found 1 or 2 minor things, but within a week or so we did a full cut-over. The best part? Unlike the article, clients had no outward indication that anything had changed. There was no outcry, even though about a quarter of the code in the system had been thrown out.




A full rewrite absolutely should not be taken lightly (if ever). It's very much a last resort and something that requires deliberation and a clear path to success. You're spot on with your rules - nothing new; nothing lost.

I had a similar experience, sans the cabin. I was hired onto a startup that already had a functioning app in the wild. The API was written by one of the founders, who is not a developer in the professional sense, and holds my undying respect. As an early-stage startup with lots of ideas, we needed to move fast, and I wasn't going to be able to do so with the existing code base.

I stocked the fridge, locked myself into my tiny Brooklyn apartment, and got to work. I started by logging all requests to the API in order to ensure I had all the necessary endpoints covered. Then I wrote integration tests - acting as an HTTP client - for the entire API.

About a week or so later, once the rewrite was finished, I added automated tests that compared the output between the two APIs, and once those matched perfectly, ran it live beside the original API (sending requests to both) and compared results from real requests to ensure there were no discrepancies.

Besides a couple very small bugs after the switch, it went very well. The user base was none-the-wiser, besides the sudden uptick in features after the rewrite. The startup was relatively successful (acquired), and I still work with those guys from time to time.


The thing that struck me is that they decided to rewrite the system without talking to the customer. I believe that if they kind of sold this first, they might have gone a different path.


I love both your story and the patent's :-) great work!


I have similar experience: the "incremental rewrite" is usually the most effective tactic to apply. It reduces risk, cuts "time to market" and lets you apply Pareto's rule - making the process efficient.

Very rarely a rewrite is the answer: for example if the product is functioning poorly, where even low risk changes cause random regressions. Where large system-level changes are needed to make the system work - say because the original developers didn't implement authorization checks after the login.html page. Where none of the original developers are available and no-one knows the requirements. Where the system is a hodgepodge of 4 different frameworks including one custom one (whose developer is long gone, leaving 0 documentation or tests behind).

In cases like that the software artifacts are a liability; it's better to use the assets you've built up (domain knowledge) and develop a new product in parallel. Put the old one in zombie mode and spend two years building the new product with feature parity.

There is one other situation where a full rewrite is good: if v1 is a throwaway prototype. However that would never have been put into production for any significant number of users.


>In cases like that the software artifacts are a liability; it's better to use the assets you've built up (domain knowledge) and develop a new product in parallel. Put the old one in zombie mode and spend two years building the new product with feature parity.

In two years your devs should be able to understand the existing code, fix the auth problems, and excise the worst of the code and hodgepodge framework mess. If they can't, then they certainly can't maintain the old system while building a replacement.

If you can't understand the old code, you can't replace it. If you don't understand the requirements, I'm not sure you can even maintain the existing system. It shouldn't take two years to add auth, or remove an unsupported framework.

This is the textbook case of when a rewrite will fail, because the scope is not just too big, but unknown (because you don't even understand the requirements). The choice to rewrite in this situation is not logical. It's emotional. The existing system is a mess, and the fix seems so difficult. But the rewrite estimate is likely poor because you don't actually understand the system you're rebuilding, and even if it's an accurate estimate you lose two years guaranteed just to ship feature parity. You can make a whole lot of improvements to a codebase in two years while also shipping features.


> There is one other situation where a full rewrite is good: if v1 is a throwaway prototype. However that would never have been put into production for any significant number of users.

Tell that to my manager. Since I joined this company a little over two years ago, all of our tools have been put into protoduction, despite my warnings and protests.


>cabin for a weekend, and rewrote the whole frontend.

The problems with big bang rewrites don't manifest in rewrites that take 2-3 days. Even if your rewrite burns a full week, if it fails you've only lost a week. Rewrites are problematic when they are expected to take months or longer. That's when the amount of code is high, the complexity is high, and the estimates tend to be bad. Losing many months of forward progress to chase a rewrite can kill a company. If a single lost week can kill your company, you're probably doomed anyway.

I've successfully done the "big bang" rewrite myself for things that needed a week or so of rewriting. I don't believe for a moment that this experience is relevant for large scale rewrites. I've only ever seen those fail spectacularly.

Anyone who tells you that a 1-week rewrite is never appropriate is just cargo culting and probably not worth listening to in general. A one week rewrite to avoid weeks of refactoring can be a very good tradeoff. A one year rewrite on the other hand is likely to end in disaster, not least because it guarantees a year of lost forward progress.


That's why I targetted a specific subsystem. Redoing the whole system, frontend and backend, would have definitely taken much longer than the 48 of 72hr I put in over 3 days. I took the piece of the system that was the gnarliest, rewrote it, and bought us a quick win. Down the road, other vertical chunks of the backend were ripped out and replaced in a similar fashion, once they became the now-worst piece of the system.


I guess I'm a little unclear about your argument here. You used the term "big bang" rewrite, but you're describing incremental rewrite.


Only a pound of coffee? I'd be terrified of running out just before I finished. Stay safe, man. Always bring lots of extra coffee to your cabin in the woods.


Hah, funny enough... Not on that project, but on a different messy one, I packed up and went out there, only to discover at 9am on the first day of the project that I had accidentally brought a pound of decaf. The nearest town is about a half hour drive one-way, and the best I could find at tiny grocery store there was a large can of Folgers with questionable vintage, and a box of Red Rose tea.

Surprisingly, if you use pre-ground cheap coffee in an Aeropress, it still turns out... sort of OK. Better than I expected, worse than I'd hoped!


Friends don't let friends drink decaf.


That is precisely what I did with a set of Qlikview dashboards at my previous firm. They were always very dodgy, but kind if worked. My new boss walked in and started fiddling with the underlying scripts and buffered up everything.

Eventually I and another colleague was tasked with fixing the mess. What I did was to look at the sources, then work out what each graph was trying to do. I then setup a new script that extracted the data into a cleaner data model, and then I (rather painfully) copied and pasted entire sheets into this new Qlikview dashboard. From here all I needed to do was to hook up the graphs to the new model by changing the source fields, expressions and calculations.

After a bit of UAT from internal departments I had fixed the reports and no client was really any the wiser. If I hadn't done it this way, there would have been awkward questions and I just know I would have been chasing my own tail modifying the rewrite to get it back to the way the old system looked.


A about 500 of Fortune 500 would like you to show them how to do that with each of their 500 worst hairballs.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: