It's not always easy to find the right words to describe why a feature pause to refactor is needed. This quote was a succinct statement that avoids analogy:
"Postponing a small cleanup can transform it into a big cleanup because, over time, code builds up around the problem, and it too must be refactored."
I recently needed to make a change in how something common is done across several services. I made the change in one place (and have some passing unit tests), but then I realized that I’d have to duplicate more code than I’d like. So I’ll take a couple days to pull all related functionality out into a shared library.
This refactoring could’ve been done ages ago — there’s already some messy and duplicate code there — but now’s the perfect time to do it because 1) I’m already touching and destabilizing that area of the product, and 2) the clearer, de-duplicated code will be easier to spot check.
Makes sense. I guess for any given refactoring, the advantage of doing it sooner is that you get residual recurring benefits sooner. And the advantage of doing it later is you have more information about the requirements of your application and the "correct" way to do the refactor (or even if that code is going to be a long-lasting part of the app, cc Lindy effect). So whenever you touch any given piece of code which could use a refactor, you could try to figure out which advantage is more important right now.
Of course, if the total effort of doing the refactor plus building the feature on top of the refactor is less than building the feature on top of the debt, you should always do the refactor first. That's pretty much a no-brainer.
Another idea I've thought about is informally keeping track of possible refactors, maybe give them an informal prioritization, and taking a glance at your list before each task. Maybe there's some value in letting a refactor stew in your brain a little bit before actually putting it in to action. And continually maintaining a rough refactoring plan could help prevent your codebase from entering a state where it's actually unrecoverable.
Interesting, I think this is part of what the article is alluding to: what happens with tech debt is more to do with the team’s process rather than with the tech debt itself. Some teams are stuck in a process that makes them unable to tackle the problems even if they want to due to that process.
Of course you could just say “change the process” but it’s also obvious that there are many other factors and pressures that are non trivial
Also, as you alluded to, if you're changing related code which will require manual testing, then that decreases testing costs, if they're shared across the refactor and the new code.
Yeah I’ve got about five ideas for refactors that will make my life noticeably better. I’ve been pondering some of them for more than a year, slowly improving the design, but I’m still waiting for the right time when the ROI is higher.
This isn't refactoring - it's just hygienic coding.
"Refactoring as you go" sets up an adversarial relationship between dealing with tech debt and the feature work that that the time and effort was actually assigned to. IMHE, human nature and engineering politics then means that tech debt doesn't get addressed at all until it's already too late.
Approaching with this intention also takes refactoring entirely off the roadmap - from the eyes of product management, refactoring becomes invisible and "free", and no longer requires budgeting for. "Why do you need time to handle tech debt? I thought we were refactoring as we go?"
By all means, it's good practice to leave the code better than you find it, but don't just leave the tech debt that isn't trivial, is too large to address immediately, to fester.
Instead, take notice of inefficiencies and cruft while working, and write tickets for it, so it can be seen and brought up for discussion, prioritized against other tasks. Perhaps even leave a comment on the ticket whenever you run up against the same issue, so there's a paper trail of the how serious the issue.
This way, at least management has visibility into the state of the codebase and can't act surprised when tech debt brings productivity to a grinding halt.
A lesser issue, but still important IMHE, is that actually trying to "refactor as you go" makes PRs inscrutable. What's refactoring? What's feature work? A little bit of noise here and there is unavoidable, but any refactoring substantial enough to increase development velocity is likely to make meaningful code review very difficult.
Agreed about mixing a refactor with the incremental feature add makes PRs confusing.
"and write tickets for it, so it can be seen and brought up for discussion, prioritized against other tasks"
I have been doing this for almost a decade now and I have at no point ever seen even a top-ticket tech debt item get prioritized. Ever. It's always too much effort for little payoff (in the eyes of the business).
I explicitly tell juniors that when I cut this kind of ticket, I do it because it is right, not because I think it will ever get worked unless I myself (or someone similarly insane) directly disobey marching orders to work on it.
And even when someone does disobey orders and refactor, it's often a pet peeve that isn't one of the more major ones. So that's fine but nobody ever one-man-armies against the real problems.
And to illustrate what I mean by "the real problems", let me explain. I like to talk about tech debt as high interest or low interest.
Low interest tech debt is the shitty python script I wrote the other day that took 1-2 hours of manual monkey work we'd have to do in Presto sql and that only 1-2 people even knew how to do, and makes it so literally anyone at the company can do it perfectly right in a minute or two. It's a shitty script, but it knows it's shitty, and tries to have no pretenses about it and make it as simple as possible to debug and make incremental changes to it. So yes, if I spent more than an hour or two on it, I would not have done it like this at all. And yea it'll probably be a little annoying to change the next time someone has to do it. But it saves a lot of time, didn't cost much, and it's not going to change how we design the system at all.
That last bit hints at what I mean by "high internet tech debt". High interest tech debt is how it takes 8 hours of babysitting to deploy a schema change. It causes blip outages in every region as you deploy it. It's so painful it perverts our schema design to minimize how often we have to migrate the database. And it's all because, shortly after the Big Bang, right in between photons existing and the first hydrogen nucleus existing, someone had a cool idea for a simple trick about how they could eat their cake and have it too with how our databases work. And we have paid for it ever since. And they built on top of the primitives this offers so deeply that we'd have to change a ton of shit to make it work. The best time to fix this was 5 years ago, the second best time is now, and I guarantee you we will never ever do it. We'll replace the whole thing with something else because it's shiny and will be worse for years before we actually just pay down this tech debt
Seems like maybe shops have a tendency to get stuck in one extreme or another, either shoving crap out the door or else excessive concern with prettifying minutia.
I’ve literally never seen the latter: I think companies that do that just disappear. The former though is a great driver: Let your debt pile up sky high because it doesn’t matter as long as it lets you make more money in short term vs dealing with it.
you may say it’s short term thinking, but if you’re optimising for investment, that’s the right thinking. Make money now, you can always move it elsewhere that makes better use of it.
It’s not what I personally agree with, but just my observations so far
With regard to "excessive concern with prettifying minutia", I'm thinking of e.g. code reviews where people spend significant attention going back and forth on minor stylistic issues.
Another example: I remember a coworker reviewing code I wrote many years ago. I came up with a solution that was reasonably simple and workable, but wasn't the "right" way to do it. My coworker complained, I pointed out various reasons the "right" way wasn't practical, and they said something like "yeah but it has to be right" (without suggesting any concrete plan). Very frustrating -- I don't think it was particularly important code.
Working on my own, I already have a sense of how much effort I want to spend on code quality. When you add code review on top of that, it can feel a little excessive, depending on the importance of the application.
I actually have a lot more memories of code review frustration than code review gratitude. At my next job I would like to experiment more with code "previews" or design reviews -- that seems more efficient than rewriting code which already works.
I sometimes catch myself discussing “correctness” in PRs because it’s important to know what tradeoffs you’re making, and whether a neater/simpler/better solution exists. So yea, sometimes my suggestions can be misinterpreted as blockers.
I have since learned that giving actionable feedback is the trick to smooth review process, as well as trusting the authors (in a way). Their implementation may not be the way I would have done it, but I have to ask myself “will it still work?” as a basis for pass.
But I digress, I can see what you mean in your original reply, and agree, that weird “arranging deck chairs on a sinking ship” does indeed happen
"Always leave the code you're editing a little better than you
found it"
- Robert C. Martin (Uncle Bob)
There's no point in refactoring the whole thing. Maybe add a longer comment explaining the logic you had to decipher when you encountered the code. Rename a few variables from foo, bar and baz tom something more descriptive etc.
Agreed. The architecture mismatch paper [1] identifies common assumptions that software can make, such as "I own the main thread of control and other modules will do my bidding", that tend to be baked-in from the start.
> Deep architectural/design flaws in a codebase can't always be addressed using a series of small independent changes.
Not sure that's true. At the extreme end, you introduce a replacement with a better architecture and run it side by side with the old one, incrementally switching over dependents. Of course, that may take more overall work, but maybe the incrementality is sometimes worth it.
In my experience you need to do ‘the big refactor’ when a codebase has to handle something big that was not part of the original design such that you find yourself having to break the simplicity, elegance, completeness, coherence, etc of the existing system by tacking on some lopsided or alternate route or structure. What you really want in place of doing the ‘tack-on’ is a new simple, elegant, coherent, etc system that can handle both the old requirements and the new. In other words, you want to do the big refactor when you have new requirements that really should have been known at the time the system was designed such that you would have done things differently to accommodate them along with all of the requirements that _were_known at the time. This is easier to do the more monolithic and strongly-typed the application is.
Naturally then, you do want to know as many requirements up front as possible, which is the basic point of the article. Even though it’s not always possible, it’s still the best path to try.
All of this ‘screw design lets just roll up our sleeves and start coding’ is a great way to end up with spaghetti code and technical debt.
The key is to do as much requirements gathering as you can up front because your initial design will address only the requirements you know about, and the initial design constraints future updates.
"I'm sorry, I didn't understand much of that. Are you saying you can commit to finishing the feature on a shorter timeframe than you originally asked for? Would it help if we forgo writing tests?"
"I'm confused. When I greenfielded this app, several thousand commits ago, I took half the time you've already spent on this feature. You said you were a senior engineer!!"
I’m lucky that I don’t have a boss like this, but I’ve asked myself this question recently. I’ve been with the same company for almost eight years working on the same code base, which I and another dev greenfielded. I know it very, very well.
I’m often dismayed at how long things seem to take these days compared to when we first started out. Am I getting slower? Lazier? So far, I’ve identified the following factors:
- We have more customers, and those customers are much more demanding (used to be b2c, now we’re b2b). The cost of making a mistake is much higher.
- It’s just a lot of code. Parts of it are fairly complex, as much as I try to keep it simple. When a core component is changed, multiple services might need refactoring.
- We have many more features, and they sometimes interact in surprising ways. I’ve been around longer than our product people, so we often have to spend time iterating when they come up with a design that doesn’t fit with what’s already there.
- It’s important to refactor your database from time to time as you learn more about the domain and find simpler ways to do things. But refactoring a database is terrifying. I spend a lot of time triple-checking my work.
Working on such a large code base for years is super satisfying though. I’ve learned so much about system design, just from noticing how easy or hard stuff is to maintain.
I wish I had more than one upvote to give you for this post.
> I’ve asked myself this question recently.
I suspect both are true. There are real complexities that have grown around you and working in the same way on the same stuff for so long has caused you to habituate to a few inefficiencies. I suggest shaking up your world view a little and seeing what falls out. There are probably a few big gains you could make.
I would approach it—at least initially—as a mental exercise. What are your assumptions about the role, the code, the product? How could you (in)validate those. What would it look like to take each thing you think you know and invert them one at a time? What if things that you think are bad are actually good? What if things you think are fast could be twice as fast? Etc.
The first time, I lacked the self-confidence to speak up in the face of dominating people. So I internalized their behavior as "proof" of my incompetence - in the face of the patently obvious facts to the contrary, in the face of my own experience and judgement. Naturally, this made things much worse - I consider myself "part of the problem" in that case, though not the largest part by any means. Anyway, I was fired a few months later.
The second time, it was a breaking point - the CEO, who said those things was incorrigible, and the situation was unworkable. I called my boss in the morning (the fucker CEO had been yelling at me at 11 at night) and gave notice. He quit too - exhausted of losing engineers and being party to the abuse. Within the month, they lost most of their engineering team, and the few who stayed had received promotions and substantial pay increases to incentivize sticking around. They also "saw the light" and halted feature work for several months while (I assume) the worked on fixing their tech debt problem.
I've done really substantial work on myself since then, and I feel like I'm in a much better place to appropriately execute the soft skills required by my position. So I'd like to think that if (or rather, when :) ) the first situation occurs again, I will be self-assured enough to push back in an effective non-confrontational way, or at least speak my mind instead of being silenced by the unreasonable shame of an inappropriate dressing down. I would find ways of halting the narrative every time bullshit was spoken, and address the "inaccuracy" instead of behaving in a way that that manager took as confirming his suspicions that I was the problem.
And, in the second case, I'd have quit way, way earlier, when I saw all the previous red flags.
Mostly though, I'm not going to work for hotheaded, first-time founder-engineers so recently graduated from college, so bereft of the experience required to lead an engineering team. :)
I’ve come to realise that there’s no valid business case for dealing with tech debt early, nor adding tests to an existing project (bar some special circumstances / legacy change, critical outage, etc)
It’s like a lot of things have to be aligned for “good” development practices to reap benefits, most shops are much less organised, and a bit of chaos and early/quick iterative shipping will always yield better results.
Having said that, my core belief is still that if you take care and do things properly you’ll go fast in places where all other companies get bogged down
I'd argue that there's definitely a business case for frequent and early refactoring - it just needs to be refactoring worth doing in the first place. IMHE a most code gets touched rarely, and some parts get touched all the time, so refactoring needs to be strategic if it's going to have any utility.
This also argues for ongoing refactoring - addressing pain points as they arise, while they are fresh in people's minds, rather than suffering through them until Stockholm Syndrome sets in, people can no longer see the forest for the trees, and much of the velocity refactoring would provide is lost because the subsequent work is already done.
But the biggest issue I've seen with not handling tech debt on an ongoing basis is that there's never a good time to start. So if it's not built into the development cadence, then resistance builds for doing it at all. Product starts pointing fingers at "slow" engineering, who point back at the "breakneck" demand for feature work, and negotiations start for unrealistic (for both sides) halts in feature work - neither sufficient to resolve the problems nor short enough to avoid hurting the business.
Then product breathes a sigh of relief - the tech debt is "resolved" and will never need to be addressed again, engineering returns to wading through a codebase that is only marginally less swampy than it was before the cursory refactoring sprint, and the downward spiral (and finger pointing) resumes.
At least that's how it always seems to happen around me :)
As far as tests, IMHO refactoring without them (with BDD style tests being greatly preferable) is fraught with peril. But unless the team is bought in on BDD tests and using them to guide development, I agree they are a time sink to write them early. However, writing them later (and around code that might have been touched by multiple hands) rather than maintaining them is flavor of pain - like the refactoring, it's harder to be sure they won't miss things and you'll break something.
I think it goes back to experience. If everyone involved has suffered at least once, they’ll be keen to spot issues before they cause pain. both developers and managers.
So therefore my current reasoning is this sort of thing can’t be taught or explained /shrug.
I’m currently on a quest to teach/explain this now, so depending on how that goes I may change my mind, but i’m not holding my breath
"Postponing a small cleanup can transform it into a big cleanup because, over time, code builds up around the problem, and it too must be refactored."