> when the predictions say, that, forseeably, the company will lose money down the line because we waited to long for a rebuild, it may be time to pitch that to whoever allocates resources.
Only developers who love greenfield or need a new framework on the CV would suggest a company could lose money by not rebuilding.
If the developers are not competent enough to write maintainable code or maintain existing code, then you will have exactly the same difficulties after the rebuild.
If they are competent enough to write maintainable code and maintain existing code, then you have no need for a rebuild. Just adapt and extend the existing code to meet the new requirements.
> If the developers are not competent enough to write maintainable code or maintain existing code
Or if the old system simply doesn't work with modern environments.
Or if it depends on long abandoned frameworks.
Or if the business grows but the old implementation scales badly or not at all.
Or if it depends on components that incur licensing fees that become prohibitively expensive when it's scaled up.
Or if there are other legacy systems on different technical baselines that it could work with better after being rebuilt on the same base.
Or if its tech simply requires more maintenance than an alternative, thus binding dev resources the company could otherwise use more productively.
There are alot of reasons why maintaining an old system may be an undesireable move in the long run, that have exactly zero to do with the competence of the developers involved.
You say that as if these events are unfortunate accidents instead of lack of technical leadership.
I’ve met quite a few “unlucky” people in my life and what nearly all of them had in common was the inability to connect their actions and inactions to consequences. They were blindsided by predictable outcomes over, and over, and over again. Eventually the people who could actually help you get tired of your drama and move on, and then you’re left in an echo chamber where your narrative makes sense.
I’ve spent a lot of time on this job thinking about the thoughts and plans of the people who have left. If our vertical were more compelling I believe we could make a better product with the people who have left than with the people who stayed. Even considering the lack of depth in domain knowledge. There’s a lot of things that don’t change because they were the right thing to do ten years ago. That’s an explanation for how we arrived here, not a reason to stay.
You say that as if these events are unfortunate accidents instead of lack of technical leadership.
Sometimes they are. I've seen several applications that were successful for many years but eventually had to be rewritten because some vital dependency was no longer viable.
In web development we had an early generation of web apps that used plugins like Flash or Java to do things. Fast forward five years and those plugins have been brutally killed off by the browser developers. However other web technologies have become viable alternatives for some of those things. That's a big rewrite.
Some programming languages have had big jumps that weren't entirely compatible. Python 2 to Python 3 is an obvious example that took years but eventually resulted in not only Python 2 no longer being supported but some libraries never being updated to support Python 3 and others being created to provide similar functionality. In this case many of the direct code changes could be automated but you can't automate swapping out each obsolete library for a replacement with a similar purpose but a different API. And maybe you wouldn't want to because in the 5 or 10 years since you built the last version new ideas have come along and you're better off adopting them instead since you have to make a big change anyway.
Browsers discontinuing Flash support is indeed a good reason for porting an app to a different platform.
But this is also a very different scenario than the grandparent example, where the cost of maintenance due to bad code quality is supposed to justify a ground-up rewrite. In the Flash example there is a clear business case for porting, even though it is understood the porting will expensive.
The fallacy is believing a ground-up rewrite will lead to more maintainable code. This is just developers deluding themselves. There is no reason to think a ground-up rewrite of existing working code will lead to better and more maintainable code.
There is no reason to think a ground-up rewrite of existing working code will lead to better and more maintainable code.
As with almost every argument in this area that depends very much on context.
A substantial rewrite might be an opportunity to use better tools and improved techniques that have been developed since the original was written. In some cases that could represent a huge improvement in things we actually care about like developer productivity and the quality and performance of the product. Importantly this doesn't imply anything was done wrong or any bad decisions were made by the developers of the original product or the people who have maintained it so far. It's just that we work in a fast-moving industry and sometimes even a few years can see some big improvements in what the available technologies can do.
A development team starting with a relatively clean slate can take into account all the knowledge and feedback received over the lifetime of the existing system. Maybe it was too expensive to make use of those insights while evolving the original system but a new version can take advantage of them. Again that can represent big gains in areas that we care about.
It's often observed that a big rewrite risks losing a lot of small improvements and fixes that have accumulated over the lifetime of the existing system and of course that's true. However it's also true that a big rewrite can avoid a lot of existing design problems or get rid of long-standing bugs that no-one was ever going to get around to fixing.
I've seen big rewrites that didn't end well. But I've also seen big rewrites that were done for sensible reasons and had very positive outcomes. And I've also seen things that should have been rewritten but weren't and instead became a drag on everything. There is no universal rule here.
A substantial rewrite is developers asking for a do-over, which is infantile behavior (or as GP more kindly put it, delusion).
This is a hill I will die upon: The people who don't deserve rewrites ask for them, early and often. The people who deserve a rewrite rarely mention them, and in fact they likely already have done it, bit by bit as a sibling comment mentioned by way of example. I will say that I've often surprised myself with features I never would have hoped for on the refactoring road. Things that would have been entire Epics become six weeks of work, sometimes less.
Refactoring is the Ship of Theseus scenario. You replace the ship bit by bit, until it's both a new ship and the same ship. Yes, it's a titanic pain in the ass, but it's also atonement for your past bad decisions. Which are heavily populated with cut corners you will be able to spot the next time.
All worthwhile learning is effortful, and greenfield is the lowest effort path to anything. You don't learn much from greenfield except why greenfield is not a panacea.
> A substantial rewrite is developers asking for a do-over, which is infantile behavior
That depends entirely on the reasons why they ask for a rewrite.
"I like this tech better", "I don't want to work with this tech", "This new tech is shinier": I agree with you, those are not solid engineering reasons.
"This doesn't interface well with the rest of the system because...", "This is going to cost us in the future because, ...", "This won't scale well because..."; I strongly disagree.
You may notice that the operative difference here is the term "because". If someone can give a quantifieable, technical, verifieable reason for a rebuild, then management should at least hear the guy out. They can still say no. But then the engineer did his job, and if the whole show goes haywire later because the 14 year old Java backend fails to scale up and the company loses money over that, nobody can say he didn't warn them.
You approach scalability problems by identifying bottlenecks through measurements, and then you redesign the problem areas to solve the problem. This might be changing an inefficient algorithm, add caching, partitioning a database or whatever the problem calls for. Most likely the majority of code will be unaffected by these changes.
If an engineer propose that the only way to solve scalability problems is to rewrite everything from the ground up, it just tell you they haven't been able to identify the root cause of the problem. The rewrite will probably end up having he same problem.
Unfortunately that argument is like saying you approach security by identifying vulnerabilities and patching them or you approach performance by profiling to find hot spots and optimising them. Of course those things are often true in specific instances and that's usually where you should start.
However all of these are systemic issues and once you've picked the low-hanging fruit you can still be left with systemic problems that are not concentrated in one place but spread throughout your code. Eventually your profiler curve is nearly flat but your JavaScript or Python code still isn't going as fast as C or Rust. Eventually you think you've patched all of your injection points but if you're using manual string concatenation to build your SQL queries you'll probably keep missing others. And eventually you run out of places to add caches and load balancers and it turns out that your existing data storage model is fundamentally limited and needs to be replaced.
In each of these cases you may end up needing to rewrite a whole section of your application or a whole service in your distributed system because you can no longer paper over the cracks. Fortunately it happens relatively rarely but it certainly does happen!
That is the calculus of a project that has been running for 1-2 years.
The calculus of a project that has been running for 10-20 years is often different.
I've explained why - and why in some cases the approach you advocate is literally impossible - in my other comments in this discussion, which I invite you to read if you haven't already.
> A substantial rewrite might be an opportunity to use better tools and improved techniques that have been developed since the original was written. In some cases that could represent a huge improvement in things we actually care about like developer productivity and the quality and performance of the product.
Can you give an example? I have a hard time imagining what kind of techniques cannot be applied to an existing code base with some adaption, but require the code to be written from scratch.
One example is adopting a safer or more productive programming language for a new version. The Rust ecosystem has now reached a point where it's a viable replacement for a lot of things we would almost certainly have used C++ for a decade ago with Rust offering significant safety and productivity benefits. Rust is also now being used to replace tools for web developers where the incumbents were written in JavaScript and in this case the advantage is order(s) of magnitude performance improvements.
Another is when your platform evolves and forces the issue like the web plugins being replaced by new web standards that we were talking about before. Here you might not need to rewrite your entire application but you probably are forced to rewrite the affected parts and possibly significantly change the software architecture around them. A related example is if you previously wrote your application targeting a specific environment and now want to support multiple environments but the operating systems or frameworks or other dependencies follow very different conventions that impose some constraints on your software design.
The key is identifying which parts of the code needs to be adapted or replaced and which do not. If the code is well designed with separation of concerns, replacing a framework or library or external dependency should not require all the code to be scrapped, just the layers directly interacting with replaced part.
Some trivial applications are basically just glue between frameworks, but most non-trivial application of value will have lots of code which is not tied to any particular framework, and often this is the most valuable part of the application.
Scalability is improved by identifying the bottlenecks and improve the design at those core points - not by rewriting everything from scratch.
I guess moving to a different programming language is a case where you literately have to touch all the code, but even then code can often be ported semi-mechanically instead of starting from scratch.
In my experience one of the main reasons for wanting to scrap code is not only that it has scaling/other tech issues, it also has very poor separation of concerns.
Granted, that concept can and should be introduced into old codebases. Last year my team successfully warded off the Sirens of Rewrite by just doing the hard work of extracting all of the dead framework calls and then NOT adding them back in drag-and-drop style, but properly exposing them through interfaces that don’t require everything to know everything about the particular replacement framework we used.
Sure, but my point is, if the code is not well designed, the result of a ground-up rewrite will not be well-design either, for the same reasons which caused the scrapped version to be badly designed.
It is even likely the new version will be worse, since it wont be developed incrementally, and that harsh deadlines will be imposed when the organization realize they can't evolve the product as long as the ground-up rewrite is underway.
> Sure, but my point is, if the code is not well designed, the result of a ground-up rewrite will not be well-design either,
That doesn't follow for me, sorry.
A rewrite, as I understand and use the term, doesn't mean transpiling what exist to, say another language or to another framework. The old version is essentially just a very detailed and testable list of functional requirements; everything the old thing can do (as long as that functionality is still actually useful), the new thing must be able to do as well.
How this functionality is implemented in the rewritten version, and how it's internals are designed, is entirely up to the rebuild. The way functionality is implemented in the predecessor does not necessarily determine how it is implemented in the new version.
If the rewrite is by the same organization, the same forces which caused the first version to be badly designed will cause the rewrite to be badly designed.
If the developers are not competent enough to write maintainable code or maintain existing code, then you will have exactly the same difficulties after the rebuild.
Why assume the same developers would be doing the rewrite as the original? Maybe the reason for the rewrite is because the original is hopeless and most of the people who worked on it are no longer around.
Also everything usrbinbash said in a sibling comment - but if something like a changing environment forces a big rewrite of an otherwise successful code base then having the original developers still around can dramatically increase the chances of success IME.
> Maybe the reason for the rewrite is because the original is hopeless and most of the people who worked on it are no longer around.
Why are the current developers not able to maintain code they didn't write themselves? Is it because the new developers are less experienced, or because the organization culture have encourage writing convoluted, idiosyncratic and badly documented code? Whatever the reason, the root problem is certainly not solved by rewriting the code base from scratch, since you will just have the same problem next time there have been a few replacements.
I don't think that follows at all. I've seen my share of development groups where someone had previously made a bad hire or brought in the wrong consultants and that had left them with a body of code that simply wasn't very good. That doesn't necessarily reflect the overall culture at the organisation and it doesn't say anything about whether the people currently available would make the same mistakes.
The assumption that a big rewrite would be too expensive and end up with the same problems is itself quite dangerous. Some of those groups I mentioned knew very well they had a pile of junk but management had apparently read the usual advocacy about how Big Rewrites Are Bad and stubbornly insisted on adapting the existing code instead of recognising that it should be written off. They spent far more time and money on the updates than it would have taken to do a clean rewrite. And then they got into this kind of sunk cost fallacy where because they'd spent months doing what should have been weeks of work once they then became even more attached to the flawed code and kept repeating the same mistake.
> They spent far more time and money on the updates than it would have taken to do a clean rewrite
Of course this claim assumes you can reliably estimate the time and cost of the rewrite and the claimed improved productivity after the rewrite.
It is still unclear to me what kind of improvements cannot be applied to an existing code base through refactoring and gradual improvements, but requires all the code to be written from scratch.
Of course this claim assumes you can reliably estimate the time and cost of the rewrite and the claimed improved productivity after the rewrite.
I almost preempted that counter-argument in my previous comment. :)
Some software development is 90% research and 10% development and it's true that you never really know how long it's going to take and how well it will work until it's almost done anyway. But the tar pits I'm talking about were not that kind of software development. Most of the cases I'm thinking of were the stereotype over-engineered "enterprise" code that had become bloated and excessively interconnected. Others were "clever" code where someone had tried some fancy design patterns or data structures or algorithms, typically with a severe YAGNI complex as well. Either way making simple changes required many times the effort it should. And yet a drop-in replacement for the whole system would have been a low risk project, with very predictable work required that could be done by any mid-senior developer on the relevant team, taking a fraction of the time.
Only developers who love greenfield or need a new framework on the CV would suggest a company could lose money by not rebuilding.
If the developers are not competent enough to write maintainable code or maintain existing code, then you will have exactly the same difficulties after the rebuild.
If they are competent enough to write maintainable code and maintain existing code, then you have no need for a rebuild. Just adapt and extend the existing code to meet the new requirements.