How to Improve a Legacy Codebase

apeace · on May 30, 2017

> Do not fall into the trap of improving both the maintainability of the code or the platform it runs on at the same time as adding new features or fixing bugs.

I don't disagree at all, but I think the more valuable advice would be to explain how this can be done at a typical company.

In my experience, "feature freeze" is unacceptable to the business stakeholders, even if it only has to last for a few weeks. And for larger-sized codebases, it will usually be months. So the problem becomes explaining why you have to do the freeze, and you usually end up "compromising" and allowing only really important, high-priority changes to be made (i.e. all of them).

I have found that focusing on bugs and performance is a good way to sell a "freeze". So you want feature X added to system Y? Well, system Y has had 20 bugs in the past 6 months, and logging in to that system takes 10+ seconds. So if we implement feature X we can predict it will be slow and full of bugs. What we should do is spend one month refactoring the parts of the system which will surround feature X, and then we can build the feature.

In this way you avoid ever "freezing" anything. Instead you are explicitly elongating project estimates in order to account for refactoring. Refactor the parts around X, implement X. Refactor the parts around Z, implement Z. The only thing the stakeholders notice is that development pace slows down, which you told them would happen and explained the reason for.

And frankly, if you can't point to bugs or performance issues, it's likely you don't need to be refactoring in the first place!

ethbro · on May 30, 2017

From personal experience, a good way of approaching the sell to business stakeholders is getting them involved in the bug triage and tracking process.

You need to make the invisible (refactoring and code quality) visible (tracking) so they can see what the current state is and map the future.

The biggest reason business stakeholders push back against this is that developers tend to communicate this in terms of "You don't need to know anything about this. But we've decided it needs to be done." Which annoys someone when they're paying the hours.

I've had decent success with bringing up underlying issues on roadmaps, even to the generalness of "this feature / component has issues." It's a lot easier conversation if it's adding "That thing that we've had on our to-do list for a couple months" vs "This new thing that I never told you about."

And as far as pitching, if the code is at all modular, you can usually get away with "new feature in code section A" + "fixes and performance improvements in unrelated section B" in the same release.

PS: I love the simple counter-based bookkeeping perspective from the linked post. (And think someone else suggested something similar in a previous performance / debugging front page article)

crdoconnor · on May 30, 2017

I've tried this "getting them involved" approach and it failed miserably for me. I've tried explaining why module A had to be decoupled from module B to stakeholders. I've tried explaining why we need to set up a CI server. I've tried explaining why technology B needs to isolated and eliminated.

In almost all cases they nod and feign interest and understanding and their eyes glaze over. And why should they be interested? The stories are almost always abstract and the ROI is even more abstract. It's all implementation details to them. These stories usually languish near the bottom of the task list and you often need to sneak it in somehow to get it done at all.

I think the only real way of dealing with this problem is to allocate time for developers to retrospect on what needs dealing with the code (what problems caused everybody the most pain in the last week?), then time to plan refactoring and tooling stories and time to do those stories alongside features and bugs.

Stakeholders do need to assess what level of quality they are happy with (and if it's low, developers should accept that or leave), but that should be limited to telling you how much time to devote to these kinds of stories, not what stories to work on and not what order to do them in.

I don't see why they shouldn't have visibility into this process but there's no way they should be allowed to micromanage it any more than they should be dictating your code style guidelines.

This is, IMO, the single worst feature of SCRUM - one backlog, 100% determined by the product owner whom you have to plead or lobby if you want to set up a CI server.

ethbro · on May 30, 2017

> In almost all cases they nod and feign interest and understanding and their eyes glaze over. And why should they be interested? The stories are almost always abstract and the ROI is even more abstract. It's all implementation details to them.

If you're explaining it in terms of internals and implementation details, then you're always going to get this response.

Your job as a business-facing developer is to translate the technical details (in as honest a way as is possible) into a business outcome.

I'm not naive. We've all worked with stakeholders that make stupid choices and can't seem to grasp a point dangled right in front of them.

But. Even more often than that I've seen (especially in-house) IT talk down to the business, push an agenda through the way they summarize an issue or need, and try to use technical merits to subvert corporate decision making.

Ultimately, you're in it together with business stakeholders. Either you trust each other, or you don't. And "the business can't be trusted to make decisions that have technical impacts" is the first step towards a decay of trust on both sides.

crdoconnor · on May 31, 2017

>If you're explaining it in terms of internals and implementation details, then you're always going to get this response.

You're also going to get this response if you explain in terms of a business case.

The business case for literally every refactoring/tooling story is this, btw:

This story will cut down the number of bugs and speed up development. By how much will they speed up development? I don't know. How many bugs and of what severity? Some bugs and at multiple levels of severity and you're not going to notice it when it happens because nobody notices bugs that don't happen. By when? I don't know, but you won't see any impact straight away.

The benefits are vague and abstract. The time until expected payoff is long. Vague, long term business cases don't get prioritized unless the prioritizer understands the gory details, which, as we both know, they won't.

The features and bugfixes - user stories - are not vague. They get prioritized.

>I'm not naive. We've all worked with stakeholders that make stupid choices

I am not complaining about stakeholders in general. I've worked with smart stakeholders and dumb stakeholders. I've never worked with a stakeholder that could appropriately compare the relative importance of my "refactor module B" story and "feature X which the business needs". All I've worked with are stakeholders who trusted me to do that part myself (which paid off for them) and stakeholders who insisted on doing it for the team because that's what SCRUM dictated (which ended badly for them).

>Ultimately, you're in it together with business stakeholders. Either you trust each other, or you don't. And "the business can't be trusted to make decisions that have technical impacts" is the first step towards a decay of trust on both sides.

No, the first (and indeed, only) step is not delivering.

MaulingMonkey · on June 1, 2017

> This story will cut down the number of bugs and speed up development. By how much will they speed up development? I don't know. How many bugs and of what severity? Some bugs and at multiple levels of severity and you're not going to notice it when it happens because nobody notices bugs that don't happen. By when? I don't know, but you won't see any impact straight away.

Point to historical data where possible. SWAG where appropriate.

"We've probably spent over 100 hours fixing bugs in this janky ass system for every 10 hours of real honest-to-god implementation of features work. That outage on Friday? Missing our last milestone by a week? All avoidable. We've been flying blind because we have no instrumentation, and changes are painful. Proper tooling would've shown us exactly what was wrong, easily halving our fix time, even if nothing else about this system changed. A week's worth of investment would've already paid itself off."

Frankly, I'm way better at estimating this kind of impact than how long it'll take to implement feature X.

DougWebb · on May 30, 2017

"Currently, every time we want to build a release of the software in order to test it before deployment, __ developers need to stop working on features and maintenance while we go through the build process, which takes __ hours/days. There are a lot of manual steps involved, and we found that we make an average of __ errors in the process each time, which takes an additional __ hours/days to resolve. We go through all of this __ times a year.

We've determined that we can automate the entire process by setting up a Continuous Integration (CI) server. There's some work involved in setting it up; we estimate it will take __ days/weeks to get it running. But once it's running, (we'll always have a build running __ minutes after each code change)|(we can click on a button in the CI's GUI and we'll have a build running __ minutes later), and we'll be saving __ hours/days of effort per build/year."

Plug in your numbers. If the time to deploy the CI server exceeds the savings, the business would be justified in telling you not to do it. (You'd have to make a case based on quality and reproducibility, which is tougher.) If the cost is less than the savings, the business should see this as a no-brainer, and the only restraint would be scheduling a time to get it done. (Not having it might cost more, but it might not cost as much as failing to get other necessary work done.)

deathanatos · on May 31, 2017

> If the cost is less than the savings, the business should see this as a no-brainer, and the only restraint would be scheduling a time to get it done. (Not having it might cost more, but it might not cost as much as failing to get other necessary work done.)

And that's the crux of the problem. The business invariably mistakenly believes that piling more features onto the steaming pile of crap that is the codebase is the better solution. Add on to that that some mid-level PM promised feature X to the C-level in M months, where M is such short notice even a engineering team with cloning and time machines would be short-staffed, and was chosen without even asking the engineering staff what their estimate of such work would be.

To the business, the short term gains of good engineering practices are essentially zero. The next feature is non-zero. The long-term is never considered.

I've had multiple PMs balk at estimates I've given them. "How could internationalizing the entire product take so long? We just need to add a few translations!" No, we need to add support for having translations at all, we need to dig ourselves out from under enough of our own crap to even add that support, we need to figure out what text actually exists, and needs translating, actually add those translations, and we need to survey and double-check a whole host of non-text assets because you mistakenly believe that "internationalization" only applies to text. Next comes the conversation about "wait, you can't just magic me a list of strings that need translating? I need that for the translators tomorrow!" No, they're mixed in with all the other strings that don't need translating, like the hard-coded IPv5 address of the gremlin that lives in the boiler room eating our stack traces.

Then, later, we'll lose a week of time because the translation files that engineering provided were turned into Word documents by PMs. One word doc, with every string from every team, and then those Word docs got translated. So now we have French.docx, but that of course only has the French. So now engineers are learning enough French to map the French back to the English so they know what translations correspond to what messages.

noway421 · on May 31, 2017

Depending on how convoluted the case is, you don't know what the end result would save in costs. "we'll be saving __ hours/days of effort per build/year" is a complete unknown.

DougWebb · on May 31, 2017

If the expected value of a task is a complete unknown, then there is NO business justification for doing the task. As an engineer with the responsibilty (or desire) to get business buy-in for a task, you must learn to quantify its value in terms that are meaningful to the business.

It doesn't have to be cost, that just happens to be easiest because it can be opinion-free. You can also express value in terms of business risks or opportunities, but the impact can be seen as an opinion, and you can be challenged by someone with different opinions.

noway421 · on June 1, 2017

That's the thing, the research and cost assesment itself would take days of work. So not many would care, and one might argue why /should/ you care to begin with. That's how it ends up being stalled at the idea stage.

MaulingMonkey · on June 1, 2017

I can conjure a scientific wild ass guess on the spot.

"I wasted around 10 hours last week thanks to inadvertently pulling broken builds because we don't have a CI server. I spent 4 hours manually deploying things because we don't have a CI server."

"When can I move the new CI server I already setup on my workstation - because fuck wasting half my week to that nonsense, and I had nothing better to do while the devs who broke the build fixed it - to a proper server where everyone can benefit?"

Extrapolating, that's what - 4 months per year of potential savings?

Sure, 14 hours might not be enough time to automate your entire build process, but it should be enough to automate some of it, get some low hanging fruit and start seeing immediate gains. Incrementally improve it for more gains when you're waiting for devs to fix the build for stuff the CI server didn't catch.

noway421 · on June 5, 2017

Thank you for insight. If the deployment indeed takes hours and there are high chances of pulling wrong builds, then indeed the time costs are very tangible. But if manual deployment takes at most 10 to 20 minutes and relatively straightforward, then it might be less of a case. Depending on how often you deploy also impacts this greatly. I guess in certain cases ROI is just not extremely high and that greatly reduces the appeal of such investment.

MaulingMonkey · on June 6, 2017

No problem, hope it's useful :). You're right about deploy frequency - that 10-20 minute manual deploy done 3-6x a day already adds up to my 4 hours a week deploying things. 10% of the work week right there!

On the other end of the spectrum, a lot of my personal projects don't warrant even the small effort of configuring an existing build server. I'm the only contributor, nobody else will be breaking anything or blocking me, builds are super fast... even if "it will pay off in the long run", there are other higher impact things I could do that will pay off even better in the long run.

In the middle, I've put off automating some build stuff for our occasional (~monthly) "package" builds for our publisher - especially the brittle, easy to fix by hand, rightly suspected to be hard to automate stuff. I was generally asked to summarize VCS history for QA/PMs anyways - can't automate that.

When we started doing daily package builds near a release, however, it ate up enough of my time that non-technical management actually noticed and prodded me before I thought to prod them. Started by offloading my checklist to our internal QA (an interesting alternative to fully automating things) and eventually automated all the parts QA would forget, not know how to handle, or waste too much attention on.

Even then, some steps remained manual - e.g. uploading to our publisher's FTP server. Tended to run out of quota, occasionally full despite having quota available, sometimes unreachable, or uploading too slow thanks to internet issues - at which point someone would have to transfer by sneakernet instead anyways. Not much of a point trying to make the CI server handle all that.

HelloNurse · on June 1, 2017

No, the time/effort to make a build with a working CI system is zero, plus any activities that remain manual by design (e.g. installations that require physical access to the server). The uncertainty is only about the feasibility and cost of implementing CI: this is one of the rare cases in which the benefits of software can be measured objectively, easily and in advance.

noway421 · on June 5, 2017

I would disagree, you still have the build time. And even if engineer doesn't do anything during that time, they are still occupied. No one would switch tasks during 10 minute build, there's just nothing you can do during such a short timeframe. In that case, in terms of business costs, it doesn't matter if engineer is busy or not during that period: they are still not gonna be doing more.

HelloNurse · on June 6, 2017

The alternative to spending X minutes waiting for a CI build is spending slightly more than X minutes executing a manual build with a nonzero chance of time-wasting mistakes, not doing nothing.

Not waiting for a build means not testing it, in which case a manual non-build of an uninteresting software configuration seems attractively elegant but a CI system still provides value by recording that a certain configuration compiles and by making the built application available for later use in case it's needed.

noway421 · on June 7, 2017

That is indeed true, CI is way less error prone, thanks. Having a build ready to go is quite handy too.

What do you mean by "not testing it" "seems attractively elegant"? Testing a build is still a must, although that ends up to be manual testing usually (unit tests don't assure much, integration take a lot of engineering effort for setting up and writing, especially if they were not taken care of from the start).

crdoconnor · on May 30, 2017

Who the fuck writes a fully costed business case on whether or not to spend a day setting up a CI server?

I'm trying to get some fucking work done, not convince investors I need a series A.

DougWebb · on May 31, 2017

Ah, I see the problem. You have no interest in understanding why your business makes the decisions it makes; you just expect them to give you permission to do whatever you say you want to do.

You said: I've tried explaining why we need to set up a CI server. ... In almost all cases they nod and feign interest and understanding and their eyes glaze over.

The reason you've failed to make a convincing case, I believe, is because you're talking in your language instead of theirs. Perhaps they've tried to explain to you, in their language, why they won't prioritize your CI server, and you nodded and feigned interest while your eyes glazed over.

The quote I gave you expresses your request and justification for a CI server into terms the business needs: what problem does it solve, what does it cost, how does it affect on-going costs, what are the risks of doing it and not doing it, and what impact does it have on other activities if it is done and if it is not done. This is not a "fully costed business case" or "convincing investors you need a series A". If you've given any thought at all to why you want a CI server beyond "I want it" you should have no problem filling in the blanks in my quote. And if you haven't bothered to think that much about it, your business is doing the right thing by giving your requests a low priority, because they shouldn't give your ideas any more attention than you're giving them yourself.

sanderjd · on May 31, 2017

You're making good points, but there is a lot of truth to your parent's sense that making a business case for every little thing is deeply inefficient. The hard part is striking a good balance between one extreme of arrogant engineers who never think about the business case for the things they are working on and the other extreme of having technical decisions micromanaged by non-technical managers.

DougWebb · on May 31, 2017

Yes, it can be deeply inefficient, but so is not getting approval to do necessary work. You have to start making progress somewhere, even if it's not as fast as you'd like it to be. If you're sucessful with this, you gain credibility and over time your recommendation will be sufficient to get approval for smaller tasks, and the business case will only need to be made for bigger tasks.

If you're not sucessful with this approach, and can't get approval despite showing that it's in the business' best interests using the business' own criteria, then your business is too dysfunctional and toxic to fix. Time to move on.

crdoconnor · on May 31, 2017

>Yes, it can be deeply inefficient, but so is not getting approval to do necessary work.

No, actually not needing approval to do necessary work is very efficient.

>You have to start making progress somewhere, even if it's not as fast as you'd like it to be. If you're sucessful with this, you gain credibility and over time your recommendation will be sufficient to get approval for smaller tasks, and the business case will only need to be made for bigger tasks.

There's no point in working to gain enough credibility to be able to do your own job effectively when you can simply leave and go and work somewhere else that doesn't expect you to prove to it that you can do your job after they've hired you.

Even if you manage to prevent the company from shooting itself in the foot as far as you're concerned by "proving your worth", it'll probably only go and shoot itself in the foot somewhere else and that will also ultimately become your problem.

In any case, this process tends to feed upon itself. Failures in delivery lead to a lack of distrust which leads to micromanagement which leads to failures in delivery. It's not that you can't escape that vicious cycle, it's that it typically has a terrible payoff matrix.

DougWebb · on May 31, 2017

I meant not getting approval for necessary work, and therefore not being able to do the necessary work, is inefficient. Not needing approval for necessary work is great; we agree on that.

You're right about having to make a choice between fixing the place you're at or finding a new place to be. There are many factors to consider, and sometimes trying to fix the place you're at can be worth the effort.

sanderjd · on May 31, 2017

> If you're sucessful with this, you gain credibility and over time your recommendation will be sufficient to get approval for smaller tasks, and the business case will only need to be made for bigger tasks.

Maybe! Alternatively: if you give a mouse a cookie, it will want a glass of milk. It might be worthwhile to establish early on that the technical leadership needs to be trusted to make their own decisions about trivial things.

crdoconnor · on May 31, 2017

"Ah, I see the problem. You have no interest in understanding why your business makes the decisions it makes"

No, the problem is that you believe that micromanagement is effective.

"The reason you've failed to make a convincing case, I believe, is because you're talking in your language instead of theirs."

No, the reason is because the ROI is vague and not easily costable and the time until expected return is usually months. By contrast, feature X gets customer Y who is willing to pay $10k for a licence on Tuesday.

This hyperfocus on the short term and visceral ROI over the long term and vague ROI isn't limited to software development, incidentally. It is a very, very common business dysfunction in all manner of industries - from agriculture to health care to manufacturing. Companies that manage to get over this dysfunction by hiring executives who have a deep understanding of their business and are willing to make long term investments often end up doing very, very well compared to the companies that chase next quarter's earnings figures with easy wins.

This is also why companies that are run by actual experts instead of MBA drones inevitably end up doing better (ask any doctor about this). It's not the fault of the people beneath them for not speaking the MBA's language. It's the fault of MBAs for being unqualified to run businesses.

Now, fortunately, product managers don't have to understand development because they can choose not to have to make decisions that require them to. However, if they insist on making decisions that require them to understand development then they will damage their own interests.

"The quote I gave you expresses your request and justification for a CI server into terms the business needs: what problem does it solve, what does it cost, how does it affect on-going costs, what are the risks of doing it and not doing it, and what impact does it have on other activities if it is done and if it is not done."

How low level are you willing to take this? Would you agree to make a business case for why you are using your particular text editor? Would you provide an estimate of the risks of not providing you with a second monitor? Where's the cut off point if it's not a day's work? Perhaps you are costing the company money with those decisions, after all.

DougWebb · on May 31, 2017

At some point, dysfunctional management can't be overcome. I'm not really talking about that extreme case; I'm talking about the more common case where engineers don't understand management priorities because they're not aware of the business' non-technical concerns that are part of the prioritization decisions.

If you want to spend a day on a CI server, it'll cost the company a day of your time (say, $1k) and will save maybe 5x that over the year by saving an hour of your time dealing with each build. That's great and worth doing. But, if it means that your company will miss out on $10k of revenue Tuesday, it's a net loss. And if missing that revenue means payroll can't be made on Friday, the company is screwed. The hyperfocus on short-term may be dysfunction, or it may be a sign that the company is in serious trouble. Jumping ship might be the best choice.

"Speaking the MBA's language" isn't really about terminology, it's about a different point of view with different concerns and priorities. A PM choosing your text editor sure sounds like micro-management of a technical decision that the PM doesn't understand, but maybe the text editor you want to use has licensing costs for business use that you're not aware of because you always used it personally, and the PM's decision is based on that business concern rather than the technical merits. Same topic, same choice to make, different point of view.

crdoconnor · on May 31, 2017

>I'm talking about the more common case where engineers don't understand management priorities because they're not aware of the business' non-technical concerns that are part of the prioritization decisions.

Ok, so assuming:

* All user stories are prioritized by management.

* Management determines the exact % of time spent on refactoring stories.

* Refactoring stories are prioritized by devs and slotted alongside user stories (according to the % above).

What kind of hypothetical non-technical concerns that are part of managment's prioritization decisions would become a problem?

Because, as far as I can see, in such a case, it wouldn't matter if the devs are not aware of the non-technical concerns because those concerns would still be reflected by the prioritization.

DougWebb · on May 31, 2017

If all three of your assumptions are true, then you're correct. The trick is getting the second two assumptions to be true. Management and devs have to agree on a % time split, and management has to agree with intermingled priorities instead of "dev % goes at the end".

Negotiating those agreements is where having a common ground on business concerns helps. And yes, it sure does help when the managers can also see things from the dev's point of view too. In my experience, it's easier for devs to understand business concerns than the other way around, so that's the way I lean.

abraae · on May 30, 2017

Yes. This sort of cost-benefit analysis also ignores some intangibles such as:

"When we're interviewing people and they find out just how backwards our CI system is, the smart ones will laugh at us and work somewhere else and we'll be left with just the dumb ones."

sobani · on May 31, 2017

As @DougWebb said, immediate cost saved is the easiest way to sell CI, especially if the savings are large.

He didn't say it was the only way. Nor that you can't add more arguments if cost savings alone isn't convincing enough.

crdoconnor · on June 1, 2017

CI's cost savings will not be immediate, large or easy to measure. That's why creating this "sales process" to make them happen is such a toxic mistake.

I worked somewhere once that forced me to spend political capital to make these kinds of things happen and it was a terrible waste.

Nobody notices the disasters that don't happen and when somebody is 2x faster and develops code with fewer bugs, that tends to reflect well upon them even if they were building upon your work.

flukus · on May 31, 2017

> Who the fuck writes a fully costed business case on whether or not to spend a day setting up a CI server?

A lesson I learned the hard way is that if the business doesn't care then neither should you, It's just not worth fighting uphill battles like this. The only way to measure what a business cares about (distinct from what they say they care about) is by looking at what they're willing to spend money on.

If building software is annoying for you personally then you can automate much of it, maybe even setup a CI server on you're own machine.

crdoconnor · on May 31, 2017

>A lesson I learned the hard way is that if the business doesn't care then neither should you, It's just not worth fighting uphill battles like this.

Absolutely. I used to work for a company where the battles were uphill and constant. I quit and now work for a company with no battles. One had bad financials and the other has very good financials.

The first company did teach me how to deal with very extreme technical debt, though (they'd been digging their hole for a while), which actually is a useful thing to know.

>If building software is annoying for you personally then you can automate much of it, maybe even setup a CI server on you're own machine.

The solution is GTFO.

flukus · on May 31, 2017

Agree completely, just note that you can only pull the GTFO card a couple of times in a row, then it hurts your ability to pay rent, no matter how true.

It's a trap actually, sometimes it's the shit companies like this that are the only ones hiring.

eldavido · on May 31, 2017

The tone of this post is a little flippant, but ultimately, I have to agree.

It comes down to how much trust the executive sponsors have in the face of the engineering org, and how the business views technology, as responsible professionals or children who have to be closely supervised and monitored.

Nurturing that relationship is one of the most important jobs of an executive/C-level engineering manager.

naasking · on May 30, 2017

> I've tried this "getting them involved" approach and it failed miserably for me. I've tried explaining why module A had to be decoupled from module B to stakeholders. I've tried explaining why we need to set up a CI server. I've tried explaining why technology B needs to isolated and eliminated.

Because that's too technical. You have to frame the problems in terms that impact them or their employees in terms of user stories/case studies.

Notice the OP said that users have long login times due to various issues and he can solve them by doing X, and not "TCP/IP timeouts and improper caching policies are causing back pressure leading to stalls in the login pipeline..."

crdoconnor · on May 30, 2017

Note that I said "I've tried explaining WHY", not "I've tried explaining what a TCP/IP stack is".

The explaining "why" in and of itself isn't particularly hard - refactoring and development tooling will speed up development in the future and reduce bugs. It's getting it prioritized that's hard - and that's because the business case of 'potentially reducing the likelihood of bugs in the future' and 'a story 3 months from now might take 3 days instead of 2' isn't a particularly compelling one - not because it's not important - but because it's not visceral and concrete enough.

In practice I've seen what this does. The process of introducing transaction costs (having to 'sell' the case of refactoring code is exactly that) simply stops it from happening.

If, as a business, you want to introduce this transaction code into your development process you will end up paying more dearly for it in the long run as you deal with the effects of compounding technical debt.

slavik81 · on May 31, 2017

Setting up a CI server is not a user story. It doesn't deliver any value to the customer on its own, and thus is not really something that should be in the customer backlog. It should be rolled into the first story done on the project, as it's a part of setting up the development environment. Similarly, you probably didn't have a story for creating the git repository, nor one for installing your text editor.

You work with your customer to decide what end-user bugfixes and features to prioritize, but it's your job to make technical decisions. That's why they hired you. Don't push those decisions back onto them.

devonkim · on May 31, 2017

The problem isn't so much of setting up a CI server as much as the fact that the task requires some time and resources from potentially other people. It might need approval for a simple VM with some disks, for example. But more importantly, use of such a server oftentimes means you need some time from other engineers to setup CI tasks including QA, security, etc. Anything that requires team consensus potentially requires meetings and some formalization.

But really, management that doesn't keep up conceptually with the business trends of software engineering management are low performers in the same way as engineers that refuse to learn how to improve their code even if it doesn't directly impact their immediate codebase (functional programming patterns as an embedded software engineer comes to mind).

Flenser · on June 1, 2017

I've worked with people where every task was a user story. E.g.:

Title: Reveiw code for feature X

Description: As a OUR-APP product manager I want to understand how feature X is implemented.

Acceptance Criteria: AC1: Feature X is documented in the wiki in context Y. AC2: ...

jacquesm · on May 30, 2017

> PS: I love the simple counter-based bookkeeping perspective from the linked post. (And think someone else suggested something similar in a previous performance / debugging front page article)

If you're short on budget and need to sell the whole package to management just do that one, it will make all that invisible stuff visible in more detail than they likely have the stomach for and you'll be granted budget in no time because there is nothing that spells lost business better than a fair sized gap between customers entering on the left and only a trickle coming out on the right.

In effect this is funnel visualization for the internals of an application in all the gory detail.

usmeteora · on May 31, 2017

In my personal experience, the codebase is not the problem, but the people and the culture who made the codebase.

We sent people to the moon with the computing power of a calculator, and with enough good people and effort, and version control, you can rewrite any legacy codebase to meet rigorous standards and meet the performance needs of the users.

Out of all the things humanity is trying and has accomplished, this is not unacheivable.

If the people don't want it, and the culture does not lend itself to high standards, often people will not see the same pain a developer will experience when they see poor code performing poorly. They may complain about the output and all the other things wrong about the platform, but that doesn't mean the company is going to support a legacy reboot, that just means the company has an accepted culture of low performance and complaining.

I say this in the context of a non software development company relying on alot of software.

my most recent form of personal torture I have endured is watching my IT department take a 45 yr old blackbox piece of software from a very old outdated engineering firm who never specialized in software, and actually trying to port it to AWS, thinking it will speed up performance AND save costs. They have no idea if there is capability for the kernel to exploit concurrency of the algorithms on the inside.

What they do know is there was a bug in the code, and it took 8 months to fix embedded in 1500 lines of code, it had no API and all the developers were dead or unemployed by the original company. They pay millions of dollars annually for this liscense and additionally millions more for an "HPC" to run it on.

They would never consider rewriting it, or contracting a new firm with a timeline, performance standards, needs and competitive cost recruiting. They don't know how. They don't understand how.

This is the way of the world outside of software development companies.

It's....painful...

If youre wondering how I exist in such a painful environment, I'm an Electrical Engineer, and I do not work for a software company. I get to mentor under some of the most brilliant and game changing Engineers in my industry, but it has very little to do with software development.

It could have alot to do with it, but the engineers have no interest in taking advantage of software. They have to be able to first understand the advantages software can provide but..I mean, some of the engineers I work with don't know what a GPU is. Never heard of it.

I write all my own code for my own work from scratch.

randomf1fan · on May 31, 2017

Your comment reflects what I heard last week from a friend. He's a PE (Professional Engineer) working with a large power utility company, and his company uses software that, in his words, "really sucks".

He was telling me this because he got a job offer from a startup, where they wanted him to be the subject matter expert for an application they were developing for power utilities.

It's a good offer, and he's tempted, but he's not sure if he can fit into the software/startup culture. In addition, he felt that the developers were looking to use him like a reference book - he got the impression that the founders saw software as the answer to everything, and that they didn't see power engineering as a particularly hard domain. There was (in his words) a distinct whiff of "developers are the cool guys".

This turned him off somewhat. So he's not sure if he'll take the offer, and I'd say he's leaning no.

Just an anecdote to illustrate the clash of cultures.

usmeteora · on June 5, 2017

Interesting.

I am getting the same kind of offers. Theres alot of subsidies and VC investment in "clean" energy and "smart grid" so developers left and right have a new market to apply their software development skills to.

In my experience, I am not being used as a reference book, and my long term investment in coding/Minor in Compsci and personal projects make me able to translate enegineering speak to software speak. I have not had the experience that the developers think software is the answer and that power is trivial.

I think the real clash of cultures is the culture of software development realizing how much beuracracy surrounds the power grid, and how much resistance there is to change within the industry, because the industry really does not understand how 95% of their day manually editing excel workbooks that output fortran run files from the 80s could be deleted and improved, and of course, most of them don't want to improve since there is a 30yr generational gap in the power industry.

Half of the people I work with don't believe in climate change and think Elon Musk is taking their jobs. This industry is ripe for disruption and I don't think the mentality that software/smart grid stuff can improve it is incorrect, but I do think the naivete that everyone in every industry is as openminded and continually putting in effort to learn and produce working products, which is a mindset required by successful growing software firms, is proving to be a big barrier and wake up call to software companies trying to come in an help.

I honestly blame our industry over software developers, but yeh, theres also a goldrush in "smartgrid" "clean energy" stuff and everyone want's to be apart of it.

Definitely as a Power Engineer it leaves you getting lots of opportunities and having to sort through who is willing to invest in understanding the complexities of innovating on the power grid, and who is going to cop out once they realize you can't whip up an app and make money off users the way snapchat will.

Theres also, and for good reason, lots of cybersecurity policies surrounding software running on the power grid, because hacking the grid has detrimental effects that can quickly translate to coast wide blackouts etc. That also means the newest github release of that multiplatform coffeescript spinoff is not going to be allowed to be used in alot of the grid side applications, and there is more work involveded with vetting development.

All of these things really quickly weed out the devs who are looking for quick stardom and the next easiest cashcow, and landing on the latest buzzwords related to clean energy. It can be frustrating weeding out the companies who try to hire you from that perspective.

It doesn't mean the grid doesnt need better software, it just means people, even developers who want to cash in on hot finance markets are going to take the path of least resistance, and the powergrid is not the path of least resistance (no pun intended) when it comes to quick cash and unicorn apps.

You have to actually CARE about innovating on the grid, and not just to pretend to care because theres billions sitting around in funding waiting to fund good smart energy innovation. Regardless, because this money is there now, and there's the backing of social reinforcement of being able to herald your startup as bleeding edge world saving technology thats going to stop societies impending doom from global warming, is a very compelling emotional appeal that makes marketing and justifying your product easy.

It's hot right now and in then ext ten years we will see who is around to grab quick cash and feel good about being the poster child for saving the climate and who is willing to invest in truly renovating the powergrid and enabling clean energy as a sutainable long term solution that is eocnomically viable without startup subsidies to cover the cost of initial investments.

Eventually these companies have to show a profit...

It is frustrating to be in the industry as an Engineer under the age of 35, and also have exposure to Compsci and friends working at Amazon and Google, and having to explain a hyperlink to a coworker 3 levels above you.

It's also frustrating to take graduate level classes and do research and R&D and have spent years designing power grid and actually being on the power grid and seeing construction and putting in hard work to learn Electrical Power before it was "cool" and then have an insurmountable rush of developers who want you to help them change the world. They are the CEO, you are the reference book engineer.

I get that, but its important to look past that and see there is true benefit to the innovation. And for the most part they are right, this industry has been sitting in static mode riding comfortably for a long time in many ways when it comes to trying to stay technologicially relevant, and maintain sustainable infrastructure that allows for growth. So it is frustrating, but ultimately the software/tech community is in the right. It's time for a change and this industry needs to admit it kind of sucked at changing itself for decades.

It also helps that I have 8 years of coding experience I put in on my own time to help ease this barrier but I am an exception to the rule as I have been told by recruiters doing smart grid dev.

jacquesm · on May 30, 2017

These problems tend to be systemic, not just tech problems and usually by the time we reach this stage management is a little more amendable to things like feature freezes than what the regular crew would be dealing with. There is a reason you get to that stage.

So I can see how we have (much) more freedom when it comes to setting the time table and more diplomacy and better salesmanship might be required at an earlier stage. But then you can point to this comment here and suggest that it is probably much cheaper to do this in house than to hire a bunch of consultants to do it by the time the water is sloshing over the dikes.

gary__ · on May 30, 2017

Extreme planned refactoring perhaps:

"Many teams schedule refactoring as part of their planned work, using a mechanism such as "refactoring stories". Teams use these to fix larger areas on problematic code that need dedicated attention. Planned refactoring is a necessary element of most teams' approach - however it's also a sign that the team hasn't done enough refactoring using the other workflows."

https://martinfowler.com/articles/workflowsOfRefactoring/

crdoconnor · on May 30, 2017

>I don't disagree at all, but I think the more valuable advice would be to explain how this can be done at a typical company.

I resolved to try this if I ever ran into the same problem again after a whole bunch of arguments at a previous few companies:

* Set up a (paper) slider with 0-100% on it and put it somewhere prominent on the wall. Set it at 70%. That's the % of time you spend on features vs. the % of time you spend on refactoring (what that entails should be the development team's prerogative).

* Explain to the PM (or their boss) that they can change it at any time.

* Explain that it's ok to have it at 100% for a short while but if they keep it up for too long (e.g. weeks) they are asking for a precipitous decline in quality.

* Track all the changes and maintain a running average.

I think a lot of people suspect that management would just put it at 100% and leave it but I suspect that wouldn't happen. Most manager's "cover their ass" instincts will kick in given how simple, objective and difficult to bullshit the metric is once it's explicit.

DougWebb · on May 30, 2017

I'd call it "Maintaining existing code" rather than "Refactoring".

To a non-technical manager, the former sounds like pretty much what it is, and won't raise many questions. (If they do question it, ask them if they maintain their car while it's still running ok, or just wait until it breaks down before they do anything to care for it.)

Refactoring, on the other hand, sounds like a buzz word, and if they look it up they'll get "rewriting code that's already working so that it continues working the same way". They probably won't get the nuances about why that's a useful thing to do, so it'll sound like busywork and they won't be happy with letting your team do it. They also won't be able to justify it to their management if they're questioned about it, which is critical for getting buy-in from your managers.

jacquesm · on May 30, 2017

That's a good one, I will definitely steal that. Thank you!

bluGill · on May 30, 2017

Freeze is a business decision, NOT a technical decision. How much risk is acceptable to them is the question. If they want low risk than they need to freeze early, if they can stand risk then they can freeze latter. If they want the best of both worlds then they need to invest in automation (build and test) up until the point where the costs of automation exceed the value of lower risk with a late freeze date.

Remember you need to work in their terms. Risk is something they understand. The risk is they ship as soon as the last feature is done, without discovering that the last feature broke everything else. From there they move back, the last feature is done, so we do a 30 second sanity test - increase that to 30 minutes, 1 week, 1 month... They should have charts (if not create them) showing how long after a bug is introduced is it discovered on average, use those charts to help guide the decision. If the freeze time frame is too long then they allocate budget to fix it, or otherwise plan around this.

There are a lot of options, but they are not technical.

dllthomas · on May 30, 2017

If a change is hard to make, first make the changes that will make it easy. Recurse.

HolyHaddock · on May 31, 2017

> And frankly, if you can't point to bugs or performance issues, it's likely you don't need to be refactoring in the first place!

I feel this is a lack of clarity around the word refactoring. Improving the code in a way that fixes bugs is "bug fixing", in a way that makes it do its job faster is "optimisation" and in a way that improves the design is "refactoring".

Of course one can do several of them at the same time. And add features, at least in the small.

Refactoring can be a valuable activity for bits of a code base where the cost of change could be usefully reduced. It's useful to have a word that can be used to describe that activity that isn't commonly conflated with bug-fixing or optimisation.

specialist · on May 30, 2017

Sound advice.

re: Write Your Tests

I've never been successful with this. Sure, write (backfill) as many tests as you can.

But the legacy stuff I've adopted / resurrected have been complete unknowns.

My go-to strategy has been blackbox (comparison) testing. Capture as much input & output as I can. Then use automation to diff output.

I wouldn't bother to write unit tests etc for code that is likely to be culled, replaced.

re: Proxy

I've recently started doing shadow testing, where the proxy is a T-split router, sending mirror traffic to both old and new. This can take the place of blackbox (comparison) testing.

re: Build numbers

First step to any project is to add build numbers. Semver is marketing, not engineering. Just enumerate every build attempt, successful or not. Then automate the builds, testing, deploys, etc.

Build numbers can really help defect tracking, differential debugging. Every ticket gets fields for "found" "fixed" and "verified". Caveat: I don't know if my old school QA/test methods still apply in this new "agile" DevOps (aka "winging it") world.

vickychijwani · on May 30, 2017

> Semver is marketing, not engineering.

I agree with many of your points, but that casual dig at semver is unwarranted and reveals a misunderstanding of the motivation behind it [1]. Semver defines a contract between library authors and their clients, and is not meant for deployed applications of the kind being discussed here. Indeed, the semver spec [2] begins by stating:

> 1. Software using Semantic Versioning MUST declare a public API.

It has become fashionable to criticize semver at every turn. We as a community should be more mindful about off-the-cuff criticism in general, as this is exactly what perpetuates misconceptions over time.

[1]: https://news.ycombinator.com/item?id=13378637

[2]: http://semver.org/

specialist · on May 30, 2017

Build numbers, internal accounting process, engineering.

Semver, outsiders view, marketing.

Two different things, conflating them causes heartache. Keep them separate.

erikpukinskis · on May 30, 2017

> re: Write Your Tests, I've never been successful with this ... I wouldn't bother to write unit tests etc for code that is likely to be culled, replaced.

I think you misread the author. He says "Before you make any changes at all write as many end-to-end and integration tests as you can." (emphasis mine)

> My go-to strategy has been blackbox (comparison) testing. Capture as much input & output as I can. Then use automation to diff output.

That's an interesting strategy! Similar to the event logs OP proposes?

rzzzt · on May 30, 2017

Sounds like approval testing: http://approvaltests.com/

You capture the initial output from the original code, then treat this canonical version as the expected result until something changes.

deathanatos · on May 31, 2017

The thing about end-to-end and integration tests is that at some point, your test has to assert something about the code, which requires knowing what the correct output even is. E.g., let's say I've inherited a "micro"service; it has some endpoints. The documentation essentially states that "they take JSON" and "they return JSON" (well, okay, that's at least one test) — that's it!

The next three months are spent learning what anything in the giant input blob even means, and the same for the output blob, and realizing that a certain output in the output comes directly from the sql of `SELECT … NULL as column_name …` and now you're silently wondering if some downstream consumer is even using that.

specialist · on June 1, 2017

Belated reply, sorry. Been chewing.

Methinks I've prioritized writing of tests, of any kind, based on perceived (or acknowledged) risks.

Hmmm, not really like event logs. More of a data processing view of the world. Input, processing, output. When/if possible, decouple the data (protocol and payloads) from the transport.

First example, my team inherited some PostScript processing software. So to start we greedily found all the test reference files we could, captured the output, called those the test suite. Capturing input and output requires lots of manual inspection upfront.

Second sorta example, whenever I inherit an HTTP based something (WSDL, SOAP, REST), I capture validated requests and generated responses.

debuggest · on May 30, 2017

Pinning tests can be helpful for scary legacy code! http://rick.engineer/Pinning-tests/

rileytg · on May 30, 2017

much like https://github.com/github/scientist

marco_salvatori · on May 30, 2017

For testing comparison testing should probably be the preferred means of testing (solves the oracle problem). A combinatoric tester of the quickcheck variety can be invaluable here,and can be used from the unit test level all the way to external service level tests. Copy the preferably small sections of code that are the fix or functionality target, compare the old and copied paths with the combinatoric tester, modify the copied path, understand any differences, remove the old code path (keep the combinatoric test asserting any invariant or properties).

Some other important points:

- Inst. and Logging: And also add an assert() function that throws or terminates in development and testing, but logs in production. Sprinkle it around when your working on the code base. If the assert asserts assumptions were wrong and now you know a bit more about what the code does. Also the asserts are your documentation and nothing says correct documentation like a silent assert

Fix bugs - Yes, and fix bugs causing errors first. Make it a priority every morning to review the logs, and fix the cause of error messages until the application runs quiet. Once its established that the app does not generate errors unless something is wrong, it will be very obvious when code starts being edited and mistakes start being made.

One thing at a time - And minimal fixes only. Before staring a fix ask what is the minimal change that will accomplish the objective. Once in midst of a code tragedy many other things will call out to be fixed. Ignore the other things. Accomplish the minimal goal. Minimal changes are easy to validate for correctness. Rabbit holes run deep and deepness is hard to validate.

Release - Also almost the first thing to do on a poorly done project is validate build and release scripts (if they exist). Validate generated build artifacts against a copy of the build artifact on the production machine. Use the Unix diff utility to match for files and content or you will miss something small but important. For deployment, make sure you have a rollback scheme in place or % staged rollout scheme because, at some point, mistakes will be made. Release often because the smaller the deploy the less change and the less that can go wrong.

taude · on May 30, 2017

To help others with this strategy of blackbox/comparison testing, it's also often called "characterization" testing [1]. (In case you want to read more about this strategy.)

[1] https://en.wikipedia.org/wiki/Characterization_test

mannykannot · on May 30, 2017

>My go-to strategy has been blackbox (comparison) testing. Capture as much input & output as I can. Then use automation to diff output.

Same here - you have an oracle, it would be a waste not to use it. You can probably also think of some test cases that are not likely to show up often in the live data, but I would contend that until you know the implementation thoroughly, you are more likely to find input that tests significant corner cases in the live data, rather than by analysis.

douche · on May 30, 2017

> My go-to strategy has been blackbox (comparison) testing. Capture as much input & output as I can. Then use automation to diff output. I wouldn't bother to write unit tests etc for code that is likely to be culled, replaced.

I think that is precisely what the article advocates - although the definition of what end-to-end and integration tests are varies wildly from place to place.

> First step to any project is to add build numbers. Semver is marketing, not engineering. Just enumerate every build attempt, successful or not. Then automate the builds, testing, deploys, etc.

A thousand times this. And get to a point where the build process is reproducible, with all dependencies checked in (or if you trust your package manager to keep things around...). You should be able to pull down any commit and build it.

jacquesm · on May 30, 2017

That's absolutely true, I totally wrote that under the assumption that you at least have some kind of build process and that it actually works. I will add another section to the post.

raphar · on May 30, 2017

> write your tests.

From my point of view, this is always key. The moment you can have testable components, it's the moment you can begin to decompose the old system in parts. Once you begin with decomposition, Its easier first to pick on low hanging fruits to show that you are advancing and then transitioning to the dificult parts.

pd: I've been all my carreer maintaining & refactoring others code. I've never had any problem to take orphan systems or refactor old ones, and I kind of enjoy it.

If you have such of that old & horrible legacy systems, send it my way :D.

thijsvandien · on May 31, 2017

Interesting read for those who don't understand our fancy for legacy code: http://typicalprogrammer.com/the-joys-of-maintenance-program...

raphar · on May 31, 2017

The article is a description of my career! Thanks for sharing.

realcoopernurse · on May 30, 2017

+1 for the split testing + diff approach. We've successfully used this several times to replace old components with new implementations.

cessor · on May 30, 2017

I'd add a prerequisite to the top of this list:

- Get a local build running first.

Often, a complete local build is not possible. There are tons of dependencies, such as databases, websites, services, etc. and every developer has a part of it on their machine. Releases are hard to do.

I once worked for a telco company in the UK where the deployment of the system looked like this: (Context: Java Portal Development) One dev would open a zip file and pack all the .class files he had generated into it, and email it to his colleague, who would then do the same. The last person in the chain would rename the file to .jar and then upload it to the server. Obviously, this process was error prone and deployments happened rarely.

I would argue that getting everything to build on a central system (some sort of CI) is usefull as well, but before changing, testing, db freezing, or anything else is possible, you should try to have everything you need on each developer's machine.

This might be obvious to some, but I have seen this ignored every once in a while. When you can't even build the system locally, freezing anything, testing anything, or changing anything will be a tedious and error prone process...

flukus · on May 31, 2017

> I would argue that getting everything to build on a central system (some sort of CI) is usefull as well, but before changing, testing, db freezing, or anything else is possible, you should try to have everything you need on each developer's machine.

I'd extend this and say that the CI server should be very naive as well. It's only job is to pull in source code and execute the same script (makefile, whatever) that the developers do. Maybe with different configuration options or permissions, but the developers should be able to do everything the CI server does in theory.

A big anti pattern I see is build steps that can only be done by the CI server and/or relying on features of the CI server software.

jacquesm · on May 30, 2017

Added, thank you.

Also added a bit about the very obvious backup that you need to make before starting any work at all. Just in case...

taude · on May 30, 2017

This is a good high-level overview of the process. I highly recommend that engineers working in the weeds, read "Working Effectively with Legacy Code" [1], as it has a ton of patterns in it that you can implement, and more detailed strategies on how to do some of the code changes hinted at in this article.

[1] https://www.safaribooksonline.com/library/view/working-effec...

zimablue · on May 30, 2017

Second this, this is one of the best coding books I've read.

edit: it also gives a lot of similar advice to the article, big-bang rewrites often impossible, drawing a line somewhere in the application to do input-output diffing tests when you make a change

bmh_ca · on May 30, 2017

I mostly agree with this - bite-sized chunks is really the main ingredient to success with complex code base reformations.

FWIW, if you want to have a look at a reasonably complex code base being broken up into maintainable modules of modernized code, I rewrote Knockout.js with a view to creating version 4.0 with modern tooling. It is now in alpha, maintained as a monorepo of ES6 packages at https://github.com/knockout/tko

You can see the rough transition strategy here: https://github.com/knockout/tko/issues/1

In retrospect it would've been much faster to just rewrite Knockout from scratch. That said, we've kept almost all the unit tests, so there's a reasonable expectation of backwards compatibility with KO 3.x.

jacquesm · on May 30, 2017

> In retrospect it would've been much faster to just rewrite Knockout from scratch.

That's most likely not true, but looking backwards it often feels that way. The problem is that you're now a lot wiser about that codebase than you were at the beginning and if you had done that rewrite there could have easily been fatalities.

But of course it feels as if the rewrite would be faster and cleaner. How bad could it be, right? ;)

And then you suddenly have two systems to maintain, one that is not yet feature complete and broken in unexpected ways and one that is servicing real users who can't wait until you're done with your big-bang effort. And then you start missing deadlines and so on.

It's funny in a way that even after a successful incremental project that itch still will not go away.

rattray · on May 30, 2017

> The problem is that you're now a lot wiser about that codebase than you were at the beginning

That may not be true in this case, if the rewriter is also the original author and has remained active in the codebase over the years.

jacquesm · on May 30, 2017

Yes, but that's an entirely different situation than the one I'm targeting in the article. But yes, in that case you have better chances.

Even so, there is the Netscape story as evidence to the contrary.

rattray · on May 30, 2017

And I'm sure Netscape is far from alone in that category ;-)

But (disclaimer) as someone who as advocated for big-bang-rewrite's before, I'm still under the impression that there are situations where they can be net-better.

Factors may include:

- there is no database involved, just code. Even more helpful if the existing code is "pure".

- a single developer can hold the functionality in their head.

- there are few bugs-as-features, tricky edge cases that must be backwards-compatibility, etc.

- as stated above, it's the primary author.

- much of the existing functionality is poor, and the path for building, launching, and shifting to a "replacement product" is relatively clear.

Advocating to never rewrite can be harmful, and make things harder for people for whom that actually would be the best approach.

jacquesm · on May 30, 2017

Yes, but those are special cases. For every rule there is an exception, and of course if the parts above apply you are fully in control and are well able to judge whether you should rewrite or not.

But the situation that I'm describing is not ticking any of those boxes and I think I made that quite clear in the pre-amble.

rattray · on May 30, 2017

> the situation that I'm describing is not ticking any of those boxes

Oh, there's no doubt in my mind about that!

Some people may read this and extrapolate too far regarding their own situation (there's a reason this is a specialty field, it's hard stuff).

jacquesm · on May 31, 2017

One thing that bothers me is that people tend to expect miracles. I usually tell them it will take as long as it took to fuck it up to fix it. But that doesn't mean that you can't have some initial results to point the way in a short time. It's more about establishing a process and showing that there is a way out of the swamp than that it is something super tricky or difficult. Just follow the recipe, don't let yourself be distracted (this can be really hard, some management just can't seem to get out of the way) and keep moving.

humanrebar · on May 30, 2017

> In retrospect it would've been much faster to just rewrite Knockout from scratch.

You're getting a bit of pushback on this sentiment, so I'll play devil's advocate a bit here.

I've tried gradual refactors in the past, with poor results, because unfocused technical teams and employee turnover can really kill velocity on long-term goals that take gradual but detailed work.

That is, replacing all those v1 API calls with the v2 API calls over five months seems fine, but there's risk that it actually takes several years after unexpected bugs and/or "urgent" feature releases come into play. And by that time, you might have employee turnover costs, retraining costs, etc.

I'm just saying the risk equation isn't as cut and dry as it seems. There's is survivor bias in play in both the "rewrite it" and the "gradually migrate it" camps.

jacquesm · on May 30, 2017

The rewrite only works - in my experience, YMMV - if the team is already 100% familiar with the codebase as it is and the task is a relatively simple one and there is a nice set of tests and docs to go with the whole package.

Outside that boundary you're set up for failure.

ebiester · on May 30, 2017

The one caveat is that there are times when the business realizes that their old workflows and features aren't what they now need. The rewrite becomes a new project competing with the old rather than a functional rewrite.

This is also fraught with peril. However, it is a different set of problems. In an ideal world, you have engineers who can make reasoned decisions.

However, if the company culture allowed one application to devolve into chaos, what will make the second application better?

jacquesm · on May 30, 2017

You raise an excellent point and usually in tandem we educate management (not the tech people) on how they failed in their oversight and guidance role.

The real problem of course is to let things slide this far in the first place. But that's an entirely different subject, for sure the two go hand-in-hand and often what you touch on is the major reason the original talent has long ago left the company. By the time we get called in it is 11:58 or thereabouts.

typednothing · on May 30, 2017

At some point they'll junk the in-house program and buy something off the shelf.

jacquesm · on May 31, 2017

Assuming something off the shelf is available, yes. In fact, if something off the shelf is available we'll be happy to make that recommendation, too many companies that aren't software houses suddenly feel that they need to write everything from the ground up. And even companies that are software houses suffer from NIH more often than not. (Though, I have to say that in my experience in the last couple of years or so this is improving, it used to be that every company had their own in-house developed framework but now we see more and more standardization.)

humanrebar · on May 30, 2017

I agree about the YMMV part. The same caveats, small scope and developers with expertise, apply in the gradual migration plans as well in my experience. It's clearly true in the extreme cases (python2 -> python3) and I've seen the same patterns happen inside companies as well.

marcosdumay · on May 30, 2017

Looks like you had a too ambitious goal. Your rewrite would suffer from even more unexpected bugs, and the same urgent features, but worst, because you will have to fix them in 2 different systems. When your organization won't help you, you have to do less.

lloeki · on May 30, 2017

> bite-sized chunks is really the main ingredient to success with complex code base reformations.

An excellent talk about this is "The Scandalous Story of the Dreadful Code Written by the Best of Us" by Katrina Owen [0]

[0] http://www.kytrinyx.com/talks/scandalous-story/

Anderkent · on May 30, 2017

Is anyone else flabbergasted by the amount of effort required to mock a function call in Go, as described by this talk?

Like, when at 3:20 the presenter says there's a thing you can do that makes it utterly trivial to test this feature, I immediately assumed she'll just have to write some mocks for the `comm` package, and plug that in. Cool, I guess she'll talk about a nice mocking library or something, or there's some business complexity involved where the comm package is particularly stateful and so difficult to mock.

But no. The big difficulty seems to be that the language doesn't allow you to mock package-level functions; and so before you can mock anything you have to introduce an indirection - add an interface through which the notify package has to call things, move the code in the comm package into methods on that interface, correct all code to pass around this interface and call methods on it.

Why would you choose to work in language that makes the most common testing action so painful?

lclarkmichalek · on May 30, 2017

It shouldn't be 'the most common testing action'. In my mind, the number of mocks required for a test is usually inversely proportional to the quality of the code; if you need to mock out 20 random implementations to test something, you've either got an integration test masquerading as a unit test, or you've got very tightly coupled code. Mocks that need to be injected via monkey patching are worse than 'normal', dependency injected mocks. `quality = mock_count^-1 + monkey_patched_mock_count^-2`

Monkey patching is a sign of bad code in 99% of cases. In that 1% of cases where it might be justified, you can restructure your code to use indirection and dependency injection, and avoid having to use monkey patching. It might not be as nice as monkey patching in that 1% of cases. But I'd rather work in a language without monkey patching, precisely because it makes it incredibly obvious when you've coupled your shit.

Working in Go changed how I write my JS code. I don't know if you write much JS, but to my mind, `sinon` is mocking. `proxyquire` and `rewire` are monkey patching; monkey patching with the aim of helping mocking, but monkey patching none the less. My JS tests now don't use proxyquire or rewire, though they might use sinon. I find this produces easier to read code.

Anderkent · on May 30, 2017

Well, every external call is 'coupling' in your code. Whether it happens on an interface passed as an argument or by resolving the name in some other fashion doesn't really change how tightly coupled your code is.

To me, having to change a function into a method on a singleton interface just to be able to mock it for tests seems like working around inadequacies of the language. And I'm not sure why `module.Interface.method` is easier to read than `module.function`.

bmh_ca · on May 30, 2017

That really is an excellent talk, thanks for sharing.

lmm · on May 30, 2017

> In retrospect it would've been much faster to just rewrite Knockout from scratch.

Why do you say that? The idea one could get it right writing from scratch is one of those seductive thoughts, but in my experience it never works out that way.

bmh_ca · on May 30, 2017

> Why do you say that? The idea one could get it right writing from scratch is one of those seductive thoughts, but in my experience it never works out that way.

Of course the alternate route – rewriting - is just a hypothetical so we can only suppose how it would've turned out.

That said, rewriting from scratch would've been pretty straightforward, since the design is pretty much set.

The real value of the existing code resides in the unit tests that Steve Sanderson, Ryan Niemeyer, and Michael Best created – since they illuminated a lot of weird and deceptive edge cases that would've likely been missed if we had rewritten from scratch.

So I suspect you are right, that it's just a seductive thought.

gpderetta · on May 30, 2017

Amen. By the time (if ever) the rewrite has reached feature parity with the original, it is as bad as the original.

Also second system syndrome.

taude · on May 30, 2017

Nice work on Knockout refactor. We still are actively using KO in our core product, and it's nice to see some legs left in the framework.

noir_lord · on May 30, 2017

KnockoutJS is hands down my favourite JS library of all times (it's a large part of why I build things in a structured better way (I was/am primarily a backend dev)), it's awesome to see that it has a modern future since I have quite a few projects using it and 'porting' will be a lot easier so thanks for the amazing work you are doing :).

flukus · on May 31, 2017

Can I still install and use via a nuget package? It looks like it's integrated with all those crazy npm tools now but I'm not sure if that's just for development nor usage.

bmh_ca · on June 12, 2017

Sorry, @flukus, the alpha is not yet on nuget.

_virtu · on May 30, 2017

How does one get better if they only ever work in code bases that are steaming piles of manure? So far I've worked at two places and the code bases have been in this state to an extreme. I feel like I've been in this mode since the very beginning of my career and am worried that my skill growth has been negatively impacted by this.

I work on my own side projects, read lots of other people's code on github and am always looking to improve myself in my craft outside of work, but I worry it's not enough.

catwell · on May 30, 2017

You can certainly improve some of your skills working on terrible code bases. For instance, you should become much better at debugging. You will have to learn debugging techniques and tools that you may never have had to use in other code bases.

Also, here is a paradox: take someone who has only ever seen terrible code bases and someone who has only ever seen very good code bases. How can they know? They might take a guess based on how well the software works, but that's probably not very reliable.

I think a good software engineer is someone who has seen a lot of different things, good and bad; someone who knows what design choices work and what will plunge software into the depths of Hell; probably someone who has make mistakes themselves and lived through the consequences.

But yeah, when working on such a code base, do read some code outside of it now and then, never forget there are better ways to do things. And if you are starting to feel burnt out by the quality of the code base you work on, you should probably make a change.

rb808 · on May 30, 2017

I think its pretty common- and I think you're lucky.

I was surprised to see the article say "It happens at least once in the lifetime of every programmer,". I think if you work on greenfield projects your whole career you're likely the one who's creating these 'steaming piles of manure'.

By working on bad legacy projects you learn an awful lot of things about what works and what is a problem to maintain - it will make you a better developer.

The only issue is if you always work on legacy stuff and never get to write greenfield you might get typecast as such. Whether that is a problem of not is up to you. Sounds like you care enough you can change when/if you want to.

couchand · on May 31, 2017

I think you're setting up a false dichotomy. There are codebases other than just legacy and greenfield projects: high-quality, well-structured and well-maintained code.

I would agree that if all you work on is greenfield you're probably making the messes others are cleaning up, but I don't think that means developers are bound to either make messes or clean them up. There are plenty of good, long-lived projects out there.

Not every old project is legacy.

_virtu · on June 11, 2017

This is what I've been wondering about. I don't care if the stack isn't the newest, or the tech is the shiniest. I'm just more interested in working on code that was _engineered_. That is code that was designed and then built. That's the problem I have with most of the code I'm working in.

At my current place of work, we're not even using xmlhttprequest. We're using an antiquated xml library that's been hand rolled (xajax + major changes) to emulate our ajax requests. It's insanity to me that we're still in this mode.

remotehack · on May 31, 2017

> I think if you work on greenfield projects your whole career you're likely the one who's creating these 'steaming piles of manure'.

Eggzactly, well stated.

xemdetia · on May 30, 2017

I generally clear my head by reading mailinglists and looking at how projects of my interests do things and keep their commits in order, especially around bugfixes. OpenBSD is a fun one to read through as well as others. I also go to/watch talks about people managing their own piles of manure and change processes.

As long as you keep your eyes open to other people doing what your organization is struggling with right the first time it gives sufficient motivation to approach every problem with 'why is this here and how could we do this better.' The great thing about the state of F/OSS right now is that you have codebases that have to change because of things like large amounts of RAM being so cheap- that very well understood algorithm designed to only do things in 64MB so as to not swap out no longer makes sense and so there are intelligent motions to fix it. I've been planning on reading the Postgres 9.6 changes for parallel queries to understand how they did the magic in a sane and controlled manner and shipped a working feature.

anarazel · on May 30, 2017

> I've been planning on reading the Postgres 9.6 changes for parallel queries to understand how they did the magic in a sane and controlled manner and shipped a working feature.

Very incrementally - we've been adding more and more infrastructure since PostgreSQL 9.4. Which finally was user visible with some basic parallelism in 9.6, which'll be greatly expanded in 10. There's some things that we'd have done differently if we'd started in a green field, that we had to less optimally to avoid breaking the world...

_virtu · on May 30, 2017

Thanks for the response, this makes me feel a bit more positive.

FLGMwt · on May 30, 2017

It sounds like you at least have a good feel for what's bad and what's worse (which is good).

I think one thing you can do is attempt to isolate the code surrounding the next chunk you work on. Do as much as you reasonably can of the things the article mentions. This may only be writing tests and adding logging, but if it's an improvement over what's there, you'll improve the experience of the next person involved with that code.

I'd warn you against jumping ship in hopes of finding a "clean" code base. Most code is somewhere on a spectrum of "maintainable enough" and something... grimmer.

If you really are unhappy and don't feel like you're growing or have the ability to grow, maybe try out contributing to a well-maintained OSS project. If you find yourself immensely happier, dust off your resume ;)

mattnewton · on May 30, 2017

While starting out, knowing what not to do and precisely why is nearly as important as knowing what worked. In the case of good codebases and bad codebases though, you still need to be careful not to cargo-cult wholesale the architecture that worked before, and conversely not to do everything different from the last horror you worked in. View it as a learning opportunity as you debug: some things they will have gotten right, sounds like many things have been gotten wrong, but the process of reasoning them apart is still valuable.

All that being said, certainly do not hesistate to look around if you feel like you aren't growing as fast as you could be. Life is short and it's a sellers market for engineer labor in most places I have seen.

_virtu · on May 30, 2017

Yeah, right now I haven't worked at my current place long enough to leave. I want to put in enough effort on my part to warrant feeling like I haven't grown enough too. I've decided that the best thing I can do is focus on what I'm not doing well enough or consistently enough until I feel like I've covered all my bases/can't learn any more on my own.

The main problem I have is how to structure what it is I'm trying to improve upon. I also want more external perspective to help guide me towards becoming better in the web development field, but I don't feel like the company I'm at has developers with a modern web development skillset to offer that guidance.

Unfortunately I work in an area where the web developer talent is pretty shallow. The general programmer talent pool is deep, but I still feel like the specialization towards webdev and modern practices just aren't here.

jgalar · on May 30, 2017

IMO, it really depends on the context. If you are working with people who share your assessment of the current situation (both business and technical folks) and want to improve it, you'll have a great chance to learn from others' (and your) mistakes.

However, constantly putting off fires, under the gun, in horrible code bases, is probably not a good way to learn how to design software... It's a good way to learn how to debug and reason about problems, which is also a valuable skill to develop, though.

artursapek · on May 30, 2017

Start your own company? But even that I think is futile.

The causes of manure code are usually out of your control - tight deadlines; new devs touching stuff without properly understanding the whole; organization prioritizing short-term reward over long-term sustainability.

You also have to consider the inherent survivorship bias - only successful businesses live long enough that their codebase has time to grow into a big mess. Any company that lives more than a few years inevitably ends up with "manure". You'd have to be in the extremely rare position where you are profitable and have no pressure to keep growing (investors) in order to invest enough time into technical craft to not end up with manure code.

thehardsphere · on May 30, 2017

You can learn a lot from mopping up steaming piles of manure. Recognizing what manure is and the thought processes/business incentives that produce it will be helpful to you in not making your own.

Also, even if your current codebases are manure, that doesn't mean everyone in your company makes manure. Find people on your team who don't write it, and learn from them.

If nobody is like that in your company, then maybe you should change jobs if you've been there more than two years. Cleaning up manure helps with interviewing because you can share your war stories with the interviewer.

flukus · on May 31, 2017

I'd never trust a developer that's only worked on green field projects, they're oblivious to the mess they leave because they aren't there long enough to feel the pain of their design decisions. So you've got one up on a lot of people there.

Aside from your own projects, look for opportunities for other projects at work where you can start with a fresh technology stack. Some of these projects might be taking over the non-core functions of the main app. For instance, chances are a lot of the UI is sub-optimal (generic crud based) for some specific users. You might be able to create a slicker interface that makes it easier for them to do specific tasks that feed that data into the main database.

cafard · on May 30, 2017

Frankly, I never quite understood the importance of clear documentation until I found one such code base smoldering on my porch.

chiph · on May 30, 2017

At the very least, write a doc that explains how to build the product, including where to find the parts in source control, what the dependencies are, what servers it'll get installed on, and so on.

The goal being to increase your shop's "Bus Factor"

https://en.wikipedia.org/wiki/Bus_factor

_pmf_ · on May 30, 2017

> At the very least, write a doc that explains how to build the product, including where to find the parts in source control, what the dependencies are, what servers it'll get installed on, and so on.

... in the form of a Jenkins build configuration. (If possible; if the system requires legacy compilers that only run on old Windows versions or a proprietary compiler for an embedded target, good luck.)

chiph · on May 30, 2017

The legacy projects are the ones where a doc with all that info would be the most useful. :)

Roboprog · on May 31, 2017

Something to be cognizant of when creating this "keystone" doc - not losing it on some "share" that nobody can find anymore.

Thus, the use of README in the root directory of a project.

chiph · on May 31, 2017

Since it would contain the location of the root directory, putting there would be circular. Hopefully the organization has a central location for their documentation that is somewhat organized (via SharePoint, or even a network share with folders). Reducing the number of things that a new hire would need to "just know" to a minimum should be a goal.

joobus · on May 31, 2017

> Hopefully the organization has a central location for their documentation that is somewhat organized...

My office uses a combination of Redmine, Slack, email, gitlab, network drives, google docs, dropbox, some pdfs floating around, and a readme in the root of each repo...

Roboprog · on June 1, 2017

Exactly. Because...

It started out on...

That Novell drive

... but then...

That Win95 share

... Samba

... Wiki

... Sharepoint

... nah, let's start using Confluence now

etc.

Roboprog · on June 1, 2017

If you have it, it's nice to have a build/CI server that has a UI showing all the projects in a dept/work-group and where they come from in source control.

_virtu · on May 30, 2017

I love the notion of bus factor. Whenever a bunch of devs go out drinking I think this every time we cross the street. :)

BurningFrog · on May 30, 2017

I've learned an enormous amount from fixing terrible bugs in terrible code.

One tip is that when you've finally found the actual line(s) with the bug, always try to understand why the programmer made that mistake.

This has taught me much about what constructs are error prone.

_virtu · on June 11, 2017

This is an interesting one. I'll have to think about this while I'm taking notes about bugs.

user15672 · on May 30, 2017

I hate to say it, but I think the answer is "with difficulty".

From my own experience, it's really hard to know what's bad, what's good and what's an acceptable workaround if you've never seen anything different. Myself, I got lucky and ended up working on a project after the start of my career with someone who could explain the whats and (more importantly) the whys of bad/good/ugly code bases.

Generally, try and get some skill in being able to view a codebase from a high level. Draw it out on a whiteboard in boxes. Perhaps do this on other, pet projects first as it's nearly impossible to do this with a spaghetti-code project. If you can't pick out modular parts, then you have a big ball of mud. If you can, try and work on making and keeping them uncoupled. If you can, try and work on finding the natural boundaries of the other code you couldn't break up, and make those less coupled (you don't need to solve the coupling problems all at once!).

Are there a mix of architectural patterns in the code? This is pretty common when you're working on a legacy project. It's what happens when you get someone who doesn't really know how to architect, or there were a bunch of folks throughout the history of the project who (probably) had the right intentions, but didn't get it finished. Or, and this is the worst, you had two or more team members trying to bend the project to their own preferences without communicating with each other. If this is the case, talk to your team, agree on one and then you can work towards getting the style consistant. You don't even need to pick the best one. Getting a project into a consistant state is better than having an ugly mix and match.

Are there a bunch of mixed up design patterns floating around? Try and refactor those out as much as possible. Design patterns are great, and you should use them where appropriate. But if you find a lot of them nested within each other, it's not a good sign and probably indicates someone at some point swallowed a design pattern book and thought it would be a good idea to implement them. All of them. Nested patterns can more the likely be refactored out to simplify the code. Though again, make sure you understand what they are there for first. Otherwise you may be unpicking something intentionally complex that needs to exist to remove complexity elsewhere.

What does the DB look like? Is it designed around the projects business logic? Is this sensible for your project? Personally, I dislike putting any business logic into the data storage layer but it might be sensible for your particular project, so YMMV. If business logic in the DB is causing nasty workarounds, then you may have something else to refactor there, though this may not be possible.

Never refactor just for the sake of it! If you don't have buy-in for your ideas on how to improve a code-base from the rest of your team, you're going to be creating problems. You may also be missing critical information that your tech-lead knows about and made design decisions based on it. There have been several times I've tried to make things better as a Junior dev, only to find out I'd made some bad assumptions and created a mess.

Don't refactor without tests either. The system may be reliant on strange code, so make passing tests before changing things. That way you at least know the behaviour hasn't changed.

kentt · on May 30, 2017

> Do not ever even attempt a big-bang rewrite

I'd love to hear a more balanced view on this. I think this idea is preached as the gospel when dealing with legacy systems. I absolutely understand that the big rewrite has many disadvantages. Surely there is a code base that has features such that a rewrite is better. I'm going to go against the common wisdom and wisdom I've practiced until now, and rewrite a program I maintain that is

1. Reasonably small (10k loc with a large parts duplicated or with minor variables changed).

2. Barely working. Most users cannot get the program working because of the numerous bugs. I often can't reproduce their bugs, because I get bugs even earlier in the process.

3. No test suite.

4. Plenty of very large security holes.

5. I can deprecate the old version.

I've spent time refactoring this (maybe 50 hours) but that seems crazy because it's still a pile of crap and at 200 hours I don't think it look that different. I doubt it would take 150 hours for a full rewrite.

Kindly welcoming dissenting opinions.

buzzybee · on May 30, 2017

What tends to happen as you refactor bad code is that you gain some intuition about the way the code needs to flow. The longer you spend grinding away at the existing code, the more likely it is that rewriting it will work, because you'll have pent-up "architectural energy" waiting to be used, and good, already-debugged code from the previous version that can be copied in.

The most likely causation for crossing a threshold from refactor to rewrite, while steering clear of the "big bang rewrite", is that you have to ship a feature that triggers an end-run around some of the existing architecture. So you ship both new architecture and the new feature, and then it works so well that you can deprecate the old one almost immediately, eliminating entire modules that proved redundant.

Edit: And if you don't really know where to start when refactoring, start by inlining more of the code so that it runs straightline and has copy-pasted elements(you can use a comment to note this: "inlined from foo()"). This will surface the biggest redundancies at a minimum of effort.

hinkley · on May 30, 2017

Joel Spolsky has a pretty good rundown. The biggest takeaway for me was that legacy apps usually don't have clear reproducible requirements. All the corner cases are written down in one place: the old code. Throwing that out means you'll recreate most of the bugs that were already fixed in the old system.

It is painful to look at and work with the old code, so we want to avoid it. But some things worth doing are painful, like exercise, or getting a cavity filled.

[edit] https://www.joelonsoftware.com/2000/04/06/things-you-should-...

jacquesm · on May 30, 2017

10k loc is very very minor league. You can do anything you want on a base that size, it won't matter.

100's of thousands to millions of loc is a lot more problematic, many moving parts and weird interplay is to be expected.

kentt · on May 30, 2017

I understand that it likely "won't matter". My point was to ask if it was worth talking about outliers to the Never Rewrite law.

eg it's assumed when talking about refactoring over rewriting that a large portion of features is working. There should be some percentage where it's worth rewriting over refactoring. Or perhaps a size where it's small enough to easily rewrite.

jacquesm · on May 30, 2017

Yes, that's definitely a discussion worth having.

To me you can rewrite anything that:

(1) you fully understand (and you'd better be right about that)

(2) you have total control over already

(3) is small enough for (1) and (2) to be possible

(this is where I think a lot of people over-estimate their capabilities)

(4) where you have the ability to absorb a catastrophic mistake

(which usually isn't the pay-grade of the programmers)

and finally

(5) where you have a 'plan-B' in case the rewrite against all odds fails anyway

None of these are absolutes, if there is no business riding on the result then you can of course do anything you want. The history of IT is littered with spectacular failures of teams that figured they could do much better by tossing out the old and setting a date for the deploy of the shiny new system. Whatever you do make sure that your work won't add to that pile.

The older, the larger, poorer documented, worse tested the system is the bigger the chance that it is not fully understood.

ef4 · on May 30, 2017

Your example is much smaller than what people are usually talking about in terms of big-bang rewrites. So maybe you will be successful.

Even so, you're better off doing a step-by-step rewrite, where the new stuff and the old stuff coexists in a single application. That way your users can continue getting incremental benefits over time even if the rewrite takes dramatically longer than you're optimistic estimate.

If you can't figure out how to manage the complexity of a piecemeal rewrite, consider that you may not actually understand the system well enough to avoid making version 2 just as bad as version 1.

Most people overestimate their ability to act differently than they've acted in the past. It's like the unjustified optimism of a New Year's resolution that this time you're actually going to exercise every day. To get a better result than last time, you need to impose some very clear rules on yourself that cause you to work differently.

lucozade · on May 30, 2017

Not attempting a full rewrite of a significant codebase is excellent advice because it's usually the right advice.

That's not to say that it can never be successful, just that the circumstances in which it will are sufficiently rare that it's usually worth discounting relatively early on.

In >20 years of dev experience, I can only think of one occasion where I successfully did a big bang rewrite i.e. tore down an application and restarted it with an equivalent system that had approx zero common code.

In that case, it was a C++ program that wouldn't actually build from clean. A lot of the code was redundant as the use cases had morphed over time (and/or weren't ever required but were coded anyway) and most changes were stuffed into base classes as it was effectively impossible work out how objects interacted. Releases took about 3 months for about 2 weeks worth of dev.

Initially, I didn't plan to rewrite it. When I realised I couldn't understand what it was doing, I took a step back and worked out what it should have been doing, assuming that I could map one to the other. What I found was that, at heart, it should have been doing something fairly simple but that the original "designers" had thrown the kitchen sink at it and its core function was lost in the morass.

I also came up with a way of making it easy to show that the new system was correct more deeply than just tests. This gave me, and folks I needed to convince, a lot more confidence that a rewrite made sense than would normally be the case.

In summary, it was quite a rare set of events that led me to the conclusion that a rewrite was the right direction: the existing system being a complete basket case, my happening to have a lot of domain expertise, the problem space turning out to be relatively simple and finding a way to "prove" correctness, all contributed. I doubt I would have made the same decision if any of them were different.

tyingq · on May 30, 2017

There are cases where you can do a rewrite, but still avoid the big-bang cutover, by exposing the new app only to some subset of customers or transactions. That isn't possible with every app, of course.

I think the gospel view is when you have to do both...rewrite and big bang cutover. Especially when there is no obvious fallback.

Yhippa · on May 30, 2017

Not a dissenting opinion but I'd love to see some case studies on rewrites. As a consultant this is a frequent request and will probably be big business in the future as people migrate off of expensive legacy mainframe or other applications from the 80's, 90's, and possibly 2000's.

ef4 · on May 30, 2017

It's not "rewrite" that's bad, it's thinking you can cut over to a new system in a "big bang".

Rewrites are definitely common and beneficial, but the successful ones always run the new code and the old code side-by-side for an extended period of time. Which means you're still tending and caring about the old code, even as you strive to direct most of your effort into the new code.

tim333 · on May 30, 2017

There's the Sivers CD baby to rails (fail) and back to PHP (success) case https://sivers.org/rails2php

b0rsuk · on May 30, 2017

The application may be a steaming pile of crap, but you probably don't have as much knowledge of the problem domain as the creators did. You will get there, over time. Starting a complete rewrite throws away the bad parts, but it also throws away accumulated knowledge.

collinmanderson · on May 30, 2017

I'd say as long as you've read and understand nearly every line of code in the old system, you're good to rewrite it from scratch.

busterarm · on May 30, 2017

And if you write the test suite first, you're in a much better position to do this successfully.

nawitus · on May 30, 2017

I also disagreed with that part in the article. Big-bang rewrites can be just fine - but usually there are reasons it's not possible.

maxxxxx · on May 30, 2017

How do people handle this in dynamic languages like JavaScript? I have done a lot of incremental refactoring in C++ and C# and there the compiler usually helped to find problems.

I am now working on a node.js app and I find it really hard to make any changes. Even typos when renaming a variable often go undetected unless you have perfect test coverage.

This is not even a large code base and I find it already hard to manage. Maybe i have been using typed languages for a long time so my instincts don't apply to dynamic languages but I seriously wonder how one could maintain a large JavaScript codebase.

stickfigure · on May 30, 2017

I think you just captured the essence of why microservices are so popular. Dynamic languages just don't scale to large codebases, so there's enormous pressure to decompose software into chunks that can be digested more easily.

Some amount of this is good, but it often forces the chunk boundaries to be smaller than the "natural" clumping of data and behavior in a distributed system. IMHO this is a much worse problem than a messy monolith; you can refactor a monolithic codebase to be more modular, but refactoring hundreds of microservices is a herculean endeavor.

My problem with microservices is the word micro.

gnaritas · on May 30, 2017

> Dynamic languages just don't scale to large codebases

You mean "popular" dynamic languages due to their lack of tooling. Dynamic languages like Smalltalk scale up just fine, but Smalltalk has automated refactoring tools. In other words it's a tool support problem, not a dynamic language problem.

Roboprog · on May 31, 2017

> Dynamic languages just don't scale to large codebases

Static languages scale to large codebases. There's no app that a static language (and those who insist on static types) can't turn into a much larger codebase :-)

I love the imagery of "mountains of dirt": http://steve-yegge.blogspot.com/2007/12/codes-worst-enemy.ht...

maxxxxx · on May 30, 2017

"hundreds of microservices"

I can't imagine a scenario where you need hundreds although I don't doubt that people will create such a system.

jacquesm · on May 30, 2017

Do not underestimate architecture astronauts, ever.

gedrap · on May 31, 2017

I was reading "Building Microservices" by Sam Newman, he mentioned that some of his clients moved from monolith to 300+ microservices without going into details, so yeah, that made me wonder about it as well. (it was a decent book otherwise).

Roboprog · on May 31, 2017

:-)

Or non-generalized, custom hard-coded static typed end-points for every single reference/option list and workflow state transition.

Welcome to our little "full 'big bang' rewrite" Frankenstein 4.0 :-(

Not my idea...

nulagrithom · on May 30, 2017

Try TypeScript.

Though I wouldn't think that test coverage needs "perfect" to catch a bad variable name, but maybe that's why there's so much obsessive tooling when it comes to coverage in the JavaScript world.

Roboprog · on May 31, 2017

Don't forget that JS is often in a UI, doing asynchronous event/IO handling, so testing timing is important, not just spelling. (great, that's exactly the property names that object would have had, if it existed yet)

That, and it's often reading in data (JSON or XML) from another system, and it is what it is, so see if it quacks or not.

From the people that brought you SOAP, it's (drum roll) TYPE SCRIPT!

It's not really solving my problems, just making more work.

nulagrithom · on May 31, 2017

So because Microsoft made SOAP and also made TypeScript then TypeScript must be bad? That's nonsense.

Also, I'm not a frontend guy, and the comment I was replying to was talking about node.js, but having to put a setTimeout or something in your tests just seems wrong.

Roboprog · on June 1, 2017

On the one hand, argumentum ad hominem is a logical fallacy.

On the other hand, expecting not to be bit after the last dozen times seems kinda stupid. I'm not a fan of MS.

Roboprog · on June 1, 2017

re: timeout. Yeah, waiting for one (or many!) other async operation(s) to complete in response to an event is a nuisance, but that's how it works, in particular if you don't want a UI to freeze up.

Full stack is hard, at least if somebody wants you to swap in and out of levels several times a week. But that's another rant about ruining projects...

Roboprog · on May 31, 2017

Let me try again.

Don't rename properties that "escape" from a given context. Sorry, but it's not going to be a good use of your time. Do document (JSDoc or similar) what the property is used for and why (as far as you can tell)

It's OK to rename local variables and parameters (the "root" local identifier, not the properties), though.

It might not be Smalltalk (I wouldn't know), but the JetBrains IDE support for JS is pretty good in terms of type inference, "where defined" lookups, "show documentation" support, duplicate / undefined symbol detection and other stuff I'm probably forgetting at the moment.

Seriously, though, avoid the traditional class/constructor/prototype setup (rather than short lived object literals as parameter objects and return values). It makes things too widely visible, and harder to safely change later. And it's more work, anyway.

Learn how to refactor a nested function which uses closure values into a reusable function with a longer argument list on which you can use partial function application as a form of dependency injection - or the other way around, for something used only one place.

An important lesson in managing code in a dynamic language is to limit the scope of everything as much as possible. Software designed as a cluster of many mutable singletons is going to hurt.

OOP was the hotness in the 80s. It's time to learn other paradigms, too (move from the '60s to the '70s), even if IDE designers have to update how "intellisense" (aka auto-complete) works :-)

rocky1138 · on May 30, 2017

I've found integration testing to be very useful when dealing with JavaScript web stuff. If the desired output looks correct, you can usually work with the understanding that the JavaScript did its job.

See Selenium. http://docs.seleniumhq.org/

Thasc · on May 30, 2017

It's not exactly a general solution fit for all situations and persons and purposes, but there's always TypeScript.

WorldMaker · on May 30, 2017

Typescript's increasing ability to type check JS code without modification (especially if it already has JSDoc comments or is already using npm-installed libraries with type information) is moving it to be a better fit as a solution for more situations.

Roboprog · on May 31, 2017

Contributing factor: thinking in terms of Simula67 / C++ / Java / C#. (stop doing so)

Since property names are dynamic, avoid making data global (singletons, et al) at all costs, to limit the amount of string searching and informed "inferences" you have to make. Using a more functional programming style that tracks data flow of short lived data works better than trying the "COBOL with namespaces" approach of mutable data everywhere that gets whacked on at will.

Sorta ironic: monstrous, so called, "self documenting" identifier names are not a good idea in a dynamic language. A short (NOT single letter, long enough to be a memorable mnemonic) identifiable name is more likely to be typed and eye-ball checked correctly.

There is no "self documenting" code - literate programming is your friend, or at least JSDoc is. It's not practical to put "why" something is into its name.

Of course, if you inherited some hot mess written by a hard-core Java / C# programmer, yeah, life is gonna suck :-(

Disclaimer: I've been doing a lot of Angular the last couple of years, which is over reliant on long lived, widely visible, mutable data. I would rather go the route of something like Redux than Type Script, though. (I suppose you could do both, but I want to NOT do Type Script if I can help it)

I've also worked with a number of languages that had runtime types and/or that allowed some kind of "string interpolation" for identifiers here and there since the 80s. No biggie.

Buh, buh, buh, TYPESSSS!!! Yeah, so. Let's talk about excessive temporal coupling, (mutable) OOP (only) folks...

<RANT ENDS> (for now)

butabah · on May 30, 2017

You'll find these issues with pretty much any dynamic or scripting language once the codebase becomes large enough.

For JavaScript, your best bet would be to integrate external tooling (such as JSHint) into your gulpfile or grunt.

lbill · on May 30, 2017

I used to work on a messy legacy codebase. I managed to clean it, little by little, even though most of my colleagues and the management were a bit afraid of refactoring. It wasn't perfect but things kinda worked, and I had hope for this codebase.

Then the upper management appointed a random guy to do a "Big Bang" refactor: it has been failing miserably (it is still going on, doing way more harm than good). Then it all started to go really bad... and I quit and found a better job!

OutsmartDan · on May 30, 2017

Big bang rewrites are needed in order to move forward faster.

A huge issue with sticking to an old codebase for such a long time is that it gets older and older. You get new talent that doesn't want to manage it and leave, so you're stuck with the same old people that implemented the codebase in the first place. Sure they're smart, knowledgable people in the year 2000, but think of how fast technology changes. Change, adapt, or die.

jacquesm · on May 30, 2017

A big bang rewrite will nine out of ten times slow you down, it will not accelerate things, and the most likely outcome is that not only will it be slower, it might fail entirely.

It's a complete fallacy to think that you're going to do much better than the previous crew if you are not prepared to absorb the lessons they left behind in that old crusty code.

So you'll have to learn them all over again.

> Change, adapt, or die.

Die it is then.

alkonaut · on May 30, 2017

It's not a given that legacy code means "no people still around, no docs and no tests". I'm on a rewrite project and I'm 10 years in, and the whole crew from the last project (also around 15 years) is till in this project too. That helps.

The causes of the big bang rewrite are usually not just "this code smells let's rewrite it" but rather that the old product reached some technical dead end. Perhaps it can't scale. Perhaps it's a desktop product written in an UI framework that doesn't support high DPI screens and suddenly all the customers have high DPI screens. Obviously in that situation you'd aim to just replace a layer of the application (a persistence layer, an UI layer) but as we all know that's not how it works. The cost of a rewrite shouldn't be underestimated - as you said there is no reason to believe that if it took 50 man years for the last team then the new team will take 50 too. But that is in itself not a reason to not do it.

jacquesm · on May 30, 2017

Fair enough. So the real lesson then is 'it depends', as with everything else. But the kind of jobs where the cleanup crews get called in are on the verge of hopeless and it is not rare that we do these on a 'no-cure, no pay' basis.

Great to see you be part of such a long lived team, that's a rarity these days. That's got to be a fantastic company to work for. Usually even relatively modest turnover (say 15% per year) is enough to effectively replace all the original players within a couple of years, most software projects long outlive their creators presence at the companies they were founded in. Add in some acquisitions or spin-outs and it gets to the point where nobody even knows who wrote the software to begin with.

OutsmartDan · on May 30, 2017

Software is only as good as the people that write it. In an ideal world, you'll have a team that specializes in this sort of things, can understand the business needs, and get it done.

There are always risks with every action taken. You can't be scared to take a big risk for a bigger payout versus sucking it up and doing things the way they've been done for 15 years.

specialist · on May 30, 2017

All software is written three times.

First, to learn the problem. Second, to learn the solution. Thrice, to do it right.

Skip a step at your own peril.

Incrementalism and do-over both have their place.

If you're resurrecting legacy code, I can't imagine successfully rewriting it until after you understand both the problem and solution. Alternately, change the business (processes), so that the legacy can be retired / mooted.

ethbro · on May 30, 2017

Anything that fails isn't moving anything anywhere.

And in the type of place that has a dysfunctional, legacy software system running core business operations, don't count on all the other ducks being in a row (anything resembling agile, ability to release to prod on a reasonable cadence, ability to provision sufficient test data, working test systems, etc).

If it's an internal system that you've been working on and maintaining... for 10 years... maybe (just maybe). If you're a consultant stepping in, I wouldn't touch that option for love or money.

vincnetas · on May 30, 2017

This might be true (big bang rewrite) for small web sites or non business critical utility software, but if your cash flow depends on the software, you do not want failing software to stop that cash flow.

OutsmartDan · on May 30, 2017

Correct- rewrites should be done in tandem with maintaining legacy systems until the new system is finished.

There should be some sort of overlap before completely sunsetting the old system.

deedubaya · on May 30, 2017

I think this mostly applies to newer startups who have changed teams with no or little hand off.

Fighting technical debt is hard. Fighting it with a blindfold is harder. Fighting it with 0 frame of reference is daunting. Fighting it the rest of the company is demanding new features right now is a recipe for stagnation, bugs, and burn out.

cntlzw · on May 30, 2017

The problem with big bang rewrites is you end up falling in every trap the original developers fell into.

protomyth · on May 30, 2017

It is amazing how much of our profession's knowledge ends up as a odd if statement buried deep in the code to some method or stored procedure dealing with an edge case that gets missed in the big bang rewrite. Its also amazing how much money the failure to preserve that knowledge can cost.

I wonder if its time for professional software archeologists?

jacquesm · on May 30, 2017

> I wonder if its time for professional software archeologists?

No, but it is time to make a real effort to teach the lessons learned to newcomers. I really feel that as an industry we completely fail at that. Blog posts such as these are my feeble attempt at trying to make a contribution to solving this problem.

cntlzw · on May 30, 2017

Writing software is still "creative" and less "engineering". There aren't many ways to build a bridge but many ways to express yourself in language. Natural language or computer language that is.

Add this to "business requirements" and you get the big pile of manure we walk in every day. Like how does the knowledge of IEEE754 help me if the requirement is to sum up some value of the last three days, unless the last three days are on a weekend or holiday. (ok, stupid example) The point is domain language does not translate to computer language very well and a programmer is not a domain expert. He is .. just a programmer, a creative programmer, and we are millions each doing their thing a little different.

protomyth · on May 30, 2017

Strangely, at all the big corp jobs I've been at, the good programmers have become domain experts as part of the job. How else could they have a real feel for what the business needed and if the code was correct?

cntlzw · on May 31, 2017

Oh yes, I totally agree with you on this. But it is a long road to become a domain expert. Don't forget people eventually leave jobs and you get new hires. Leaves lot of room for errors.

eej71 · on May 30, 2017

The issue to think about is - if you don't know enough to "upgrade/replace in place" - then you probably won't know enough to rewrite from scratch.

OutsmartDan · on May 30, 2017

That's why you need product owners/managers to cover the business logic and design a system that incorporates it. There needs to be tiers of coordinations to make sure a system is built to spec. TDD plays a big part in rebuilding legacy codebases.

busterarm · on May 30, 2017

All of this seems to focus on the code, after glossing over the career management implications in the first paragraph.

I've done this sort of work quite a number of times and I've made mistakes and learned what works there.

It's actually the most difficult part to navigate successfully. If you already have management's trust (i.e., you have the political power in your organization to push a deadline or halt work), you're golden and all of the things mentioned in the OP are achievable. If not, you're going to have to make huge compromises. Front-load high-visibility deliverables and make sure they get done. Prove that it's possible.

Scenario 1) I came in as a sub-contractor to help spread the workload (from 2 to 3) building out a very early-stage application for dealing with medical records. I came in and saw the codebase was an absolute wretched mess. DB schema full of junk, wide tables, broken and leaking API routes. I spent the first two weeks just bulletproofing the whole application backend and whipping it into shape before adding new features for a little while and being fired shortly afterwards.

Lesson: Someone else was paying the bills and there wasn't enough visibility/show-off factor for the work I was doing so they couldn't justify continuing to pay me. It doesn't really matter that they couldn't add new features until I fixed things. It only matters that the client couldn't visibly see the work I did.

Scenario 2) I was hired on as a web developer to a company and it immediately came to my attention that a huge, business-critical ETL project was very behind schedule. The development component had a due date three weeks preceding my start date and they didn't have anyone working on it. I asked to take that on, worked like a dog on it and knocked it out of the park. The first three months of my work there immediately saved the company about a half-million dollars. Overall we launched on time and I became point person in the organization for anything related to its data.

Lesson: Come in and kick ass right away and you'll earn a ton of trust in your organization to do the right things the right way.

sz4kerto · on May 30, 2017

The OP has so many reasonable, smart-sounding advice that doesn't work in the real world.

1) "Do not fall into the trap of improving both the maintainability of the code or the platform it runs on at the same time as adding new features or fixing bugs."

Thanks. However, in many situations this is simply not possible because the business is not there yet so you need to keep adding new features and fix bugs. And still, the code base has to be improved. Impossible? Almost, but we're paid for solving hard problems.

2) "Before you make any changes at all write as many end-to-end and integration tests as you can."

Sounds cool, except in many cases you have no idea how the code is supposed to work. Writing tests for new features and bugfixes is a good advice (but that goes against other points the OP makes).

3) "A big-bang rewrite is the kind of project that is pretty much guaranteed to fail.

No, it's not. Especially if you're rewriting parts of it at a time as separate modules

My problem with the OP is really that it tells you how to improve a legacy codebase given no business and time pressure.

jacquesm · on May 30, 2017

On the contrary, we do this work under extreme business and time pressure, sometimes existential pressure (as in: fail and the company fails).

That's exactly why this list is set up the way it is: you will get results fast and they will be good results.

If you want to play the 'I'm doing a sloppy job because I'm under pressure' card then consider this: the more pressure the less room there is for mistakes.

Here is a much more play-by-play account of one of these jobs where management gave me permission to do a write-up as part of the deal:

https://jacquesmattheij.com/saving-a-project-and-a-company

(For obvious reasons management usually does not give such permission, nobody wants to admit they let it get that far on their watch, I did my best to obscure which company this is about.)

sz4kerto · on May 30, 2017

> That's exactly why this list is set up the way it is: you will get results fast and they will be good results.

What do you mean by 'fast'? If you can get meaningful improvements in a few months' time, then you're just working with smaller code base than what I thought of. If you're talking about stopping for a year, then .. well, that's the problem I'm talking about.

> If you want to play the 'I'm doing a sloppy job because I'm under pressure' card

No, I just wanted to share my opinion that I disagree with the overly generalized suggestions you're making.

jacquesm · on May 30, 2017

> What do you mean by 'fast'?

Much faster than by going the rewrite route (assuming that is even possible, which I am convinced for anything but the most trivial problems it isn't). Preferably to first deploy within a few days and incremental changeover to the new situation starting within two weeks or so of the starting gun being fired.

> If you can get meaningful improvements in a few months' time, then you're just working with smaller code base than what I thought of.

No.

> If you're talking about stopping for a year, then .. well, that's the problem I'm talking about.

Who said so?

All I said is that you should only do one thing at a time. Do not attempt to achieve two results with one release.

> No, I just wanted to share my opinion that I disagree with the overly generalized suggestions you're making.

You are very welcome to your own opinion about my 'overly generalized suggestions' it's just that they are a lot more than suggestions, they are things that I (and others, see this thread for evidence) have used countless times and that simply work.

All you do is a bunch of naysaying without offering up anything concrete as an alternative that would work better or evidence that anything posted would not work in practice. It does and it pays my bills.

alexwebb2 · on May 30, 2017

> > What do you mean by 'fast'?

> deploy within a few days and incremental changeover to the new situation starting within two weeks or so

I'm going to take this as confirmation that you're working on very, very small projects. This would be an extraordinarily unrealistic timeframe for large projects, which take vastly larger quantities of time to apply the steps you've outlined - which, in turn, renders those steps useless in a competitive business context as far as large applications are concerned.

jacquesm · on May 31, 2017

No, it just means that I have crew for jobs like these that knows their stuff.

500K lines is 'small' by our standards and if we are not moving within two weeks that translates into one very unhappy customer. That's something a typical team of 5 to 10 people has produced in a few years.

Note that I wrote 'incremental' and 'starting'. That doesn't mean the job is finished at that point in time. But we should have a very solid grasp of the situation, which parts are bleeding the hardest and what needs to be done to begin to plug those holes. That the whole thing in the end can become a multi-year project is obvious, we're not miracle workers, merely hard workers.

In a way the size of the codebase is not even relevant. What is most important is that you get the whole team and the management aligned behind a single purpose and then to follow through on that. Those first couple of weeks are crucial, they are tremendously hard work even for a seasoned team that has worked together on jobs like these several multiple times.

The one case I wrote about here was roughly that size (so small by my standards), within 30 days the situation was under control. We're now two years later and they are still working on the project but what was done in that short period is the foundation they are still using today.

If a project is much larger than that then obviously it will take more time. Just the discovery process can take a few weeks to months, but in that case I would recommend to split the project up into several smaller ones that can be operated on independently with 'frozen interfaces' where-ever they can be found.

That way you can parallelize a good part of the effort without stepping on each others toes all the time.

The problem is not that you can't tackle big IT projects well. The problem is that big IT projects translate into big budgets and that in turn attracts all kinds of overhead that does not contribute to the end result.

If you strip away that overhead you can do a lot with a (relatively) small crew.

If you're going to tackle a code base in excess of something 10 M loc in this way you will again run into all kinds of roadblocks. For those situations it would likely pay off to spend a few months on the plan of attack alone.

If a project that large came my way I would refuse, it would tie us down for way too long.

But that's out of scope for the article afaic, we're talking about medium to large project, say 50 manyears worth of original work that has become unmaintainable for some reason or other (mass walk-out, technical debt out of control or something to that effect).

If those are 'very very small projects' by your standards than so be it.

alexwebb2 · on May 31, 2017

> 50 manyears worth of original work that has become unmaintainable for some reason or other (mass walk-out, technical debt out of control or something to that effect)

That's the scale I'm talking about, so at least we're on the same page there.

It sounds to me like your specialty routinely puts you in situations where the client has reached the end of the line and is in Hail Mary Mode, where they're amenable to having a consultant do Whatever It Takes to turn things around. To me, that sounds like just about the best case scenario for addressing the issues with legacy software, and pretty far removed from the Usual Case.

In my mind, the Usual Case is legacy software that's in obvious decline but still has significant utility, and for which there is still a significant portion of the market that can be attracted with added features. That's the long tail for a huge swath of the industry. In those cases, it's unthinkable to halt development for _any_ significant stretch of time. It's dog eat dog out here, and when your competitors aren't pausing for breath, you can't either - it's just a totally different world, and I think you're inappropriately pushing the wisdom from your own corner of it out into spaces where it's just not applicable.

In a similar vein, I think your opinions on rewrites are a bit skewed by the fact that the _only_ ones you encounter in your specialty are ones that have failed miserably (or at the very least, they're seriously overrepresented).

You clearly have a very solid and proven game plan for the constraints you're used to, but I think many of the extrapolations aren't valid.

jacquesm · on May 31, 2017

I'd be more than happy to believe you if the comments in this thread weren't for the most part confirming my experience. On the other hand I'm more than willing to believe that there are plenty of places where none of this applies (though, I haven't seen them) and where with some slight variation you could get a lot of mileage out of these methods.

Because if the only extra constraint would be 'you can't halt development' then that's easy enough: simply iterate on smaller pieces and slip in the occasional roadmap item to grease the wheels. But that does assume that development had not yet ground to a halt in the first place.

The biggest difference between your experience and my experience I think is that our little band of friends is external, so we get to negotiate up front about what the constraints are and if we put two scenarios on the table, one of which is ~70% cheaper because we temporarily halt development completely then that is the most likely option for the customer to take.

hacker_9 · on May 30, 2017

But.. all these things do work in the real world. Keeping refactorings to a separate commit is important as when you break something, you can just roll it back. Countless times I've seen new people come in and try refactoring things to their liking as they work on some other feature. Then when it later breaks some component they didn't know about, we have to unpick their commit and manually separate out the crap 'refactoring' they did and the actual feature they were working on.

End to end tests are great to test the system actually works as expected as per the requirements spec. You should know how to write this, else.. how are you even testing your feature to begin with after you've written it?

And big rewrites always take longer than people think, which can sink a business if they're not careful with their resources and don't manage their time appropriately. All in all, these points you've mentioned all seem actually very reasonable to me.

pc86 · on May 30, 2017

A "big-bang rewrite" means rewriting the entire app from scratch. So if you're rewriting pieces as modules you are by definition not doing a big-bang rewrite.

anilshanbhag · on May 30, 2017

While the advice sounds cool, doing one feature at a time is often impractical. I recently moved an old app with spaghetti Jquery to Vue. The paradigm is completely different. What ended up happening is I had a base that worked, moved a set of features at a time from the old to the new. This is more like rebuilding the git history from a new base compared to doing incremental one change at a time that the article advocates.

thehardsphere · on May 30, 2017

>3) "A big-bang rewrite is the kind of project that is pretty much guaranteed to fail.

>No, it's not. Especially if you're rewriting parts of it at a time as separate modules

I guess it depends on what he considers to be a "big bang rewrite." I don't think any of the incremental approach you mention counts as one.