Reading this I realised I've kind of drifted away from the idea of refactoring for the point of it.
The example with the for-loop vs. map/filter in particular - it's such a micro-function that whichever the original author chose is probably fine. (And I would be suspicious of a developer who claimed that one is 'objectively' better than the other in a codebase that doesn't have an established style one way or the other).
Refactor when you need to when adding new features if you can reuse other work, and when doing so try to make minimal changes! Otherwise it kind of seems more like a matter of your taste at the time.
There's a limit of course, but it's usually when it's extremely obvious - e.g. looong functions and functions with too many parameters are obvious candidates. Even then I'd say only touch it when you're adding new features - it should have been caught in code review in the first place, and proactively refactoring seems like a potential waste of time if the code isn't touched again.
The (over) consolidation of duplicated code example was probably the most appealing refactor for me.
I agree, which is better - for or map - depends on context. map typically is functional and allocates memory, for does not. But for is more likely to have side-effects. Which trade-offs matter depends on the larger context of the surrounding code.
>Remember, consistency in your codebase is key. If you need to introduce a new pattern, consider refactoring the entire codebase to use this new pattern, rather than creating one-off inconsistencies.
It's often times not practical (or even allowed by management due to "time constraints") to refactor a pattern out of an entire codebase if it's large enough. New patterns can be applied to new features with large scopes. This can work especially in the cases of old code that's almost never changed.
The key word is consider. If you wouldn't apply the pattern to the whole codebase, maybe you don't actually want to introduce it in just this one new place.
It's not that it wouldn't be applied to the whole codebase, it's that it wouldn't be applied to the whole codebase __at once__. You have to start somewhere and new features are a good place to start new patterns. Older code can be refactored piece by piece.
In the majority of places I've worked, nobody who started refactoring older code piece by piece ever finished it. The exception is people who documented the scope of the work, got leadership buy-in, and then worked on it continuously like any other project.
The problem is that sometimes the new pattern gets overridden by an even newer pattern, and so on, until you've got three different implementations from 2016, 2019, 2021, and then you find that in 2024 you're working on implementation number four and all the people who did the first three have left the company without writing any documentation or finishing their work.
In a sufficiently large codebase, this is just simply inevitable and you just have to accept it as a fact of life. If you have millions of lines of hand-written code, you're going to have archeological layers and some pockets that are more modern than others. It's not great but "everything is locked into a pattern established in 2003 and you can't innovate" is a worse problem.
This is why I always ask "Is this going to leave our code in a better or worse position if we abandon yhis half way through". If the answer is "worse" then don't start it. Not unless you can get the entire thing done in about a week. If it takes a quarter, the probability of reprioritization is way too high.
I've had success with strategies at introducing some abstractions/patterns at my current place(doing this alone for a enterprise SaaS company with 200-ish devs). It's weird that we don't teach these or talk about them in software engineering(AFAIK). I see them being re-invented all the time.
To borrow from medicine: First step is to always stop the `stop the hemorrhage`, then clean the wound, and then protect the wound(or wounds).
- Add a deprecation marker. In python this can be a decorator, context-manager, or even a magic comment string. This i ideally try to do while first introducing the pattern. It makes searching easier next time.
- Create a linter, with an escape hatch. If you can static analyse, type hint your way; great! In python i will create AST, semgrep or custom ones to catch these but provide a magic string similar to `type: noqa` to ignore existing code. Then there's a way to track and solve offending place. You can make a metric out of it.
- Everything in the system as to have a owner(person, squad, team or dept). Endpoints have owners, async tasks have owners, kafka consumers might have owners, test cases might have owners. So if anything fails you can somehow make these visible into their corresponding SLO dashboards.
The other alternative to this last step is "if possible" some platform squad can take over and do this as zero-cost refactor for the other product squad. Ofcourse the product squads have to help test/approve etc. It's an easier way to get people to adopt a pattern if you do it for them. But the ROI on the pattern has to be there, and the platform squad does get stuck doing cruft thankless work sometimes. If you do this judiciously the win might be thanks enough, like more robust systems, better observability/traces, less flaky tests etc. etc.
Tests might cover more code than a single unit owned by different teams, thus end up with multiple owners. Prefer "squads" as the owners rather the individuals.
But just like documentation the ownership might be stale and out of sync. So the idea would be let some reds in SLO dashboard correct them over time. It's not possible to automatically link "tests" to the "code" always.
End-to-end tests might get tricky. But unit tests should be owned by the person/team/squad that owns the unit.
And unit tests should never break/be red. If the code needs to changed, the test needs to be changed at the same time.
End-to-end tests can be flaky. Those probably shouldn't prevent deployments and can be red for awhile. Should probably manually confirm if the test is acting up, there's a change in behavior, or if something is legitimately broken before ignoring them though.
Consistency is good but isn't everything. If you have a very large code base and insist on perfect consistency then no changes can practically be made and all code maintains a "day one" style forever.
To me it’s less about “consistency” as some nebulous, subjective thing. If you want to set the new standard for $thing, whole-ass it and set the new standard for $thing. I fully support this, but within reason of course. The point at which I duck out of this is when numerous replacement operations require significant non-trivial changes to highly depended on and/or untested code. Otherwise if it’s a simple task that a good IDE and a couple hours of hard work can solve… just do it!
> If you need to introduce a new pattern, consider refactoring the entire codebase to use this new pattern, rather than creating one-off inconsistencies.
Putting aside the mis-application of "pattern" (which _should_ be used with respect to a specific design problem, per the Gang of Four), this suggestion to "refactor the entire codebase" is impractical and calcifying.
Consistency increases legibility, but only to a certain point. If the problems that your software is trying to solve drift (as they always do with successful software), the solutions that your software employs must also shift accordingly. You can do this gradually, experimenting with possible new solutions and implementations and patterns, as you get a feel for the new problems you are solving, or you can insist on "consistency" and then find yourself having to perform a Big Rewrite under unrealistic pressure.
> Putting aside the mis-application of "pattern" (which _should_ be used with respect to a specific design problem, per the Gang of Four)
This is not in any way a mis-application of the word "pattern". There is no exhaustive list of all design patterns. A design pattern is any pattern that is used throughout a codebase in order to leverage an existing concept rather than invent a new one each time. The pattern need not exist outside the codebase.
> Consistency increases legibility, but only to a certain point.
It's the opposite: inconsistency decreases legibility, and there is no limit. More inconsistency is always worse, but it may be traded off in small amounts for other benefits.
Take your example of experimenting with new solutions: in this case you are introducing inconsistency in exchange for learning whether the new solution is an improvement. However, once you have learned that, the inconsistency is simply debt. A decision should be made to either apply the solution everywhere, or roll back to the original solution. This is precisely the point the author is making by saying "consider refactoring the entire codebase to use this new pattern".
This refactoring or removal doesn't need to happen overnight, but it needs to happen before too many more experiments are committed to. Instead what often happens is that this debt is simply never paid, and the codebase fills with a history of failed experiments, and the entire thing becomes an unworkable mess.
For consistency, the same design problems should have the same design solutions a.k.a. pattern. If you don't value consistency, feel free to take a different approach every time. That will confuse your users. I used to work with a guy, the simple problem of reading a CSV was done using a library, problem sorted. Out of sheer excitement he then rewrote it was with combinator parsers, then as some astronaut architect functional monstrosity, so complex no one else could comprehend it.
This is how not to do it – for same problem, use same solution. I admit that's an extreme case but it's also a real one and illustrates the issue well.
(also patterns are not specific to OO, nor is OO incompatible with a functional style)
Are you replying to the right person? I'm saying that "pattern" need not be restricted to the narrow sense used in the famous "Gang of Four" book[0], not that patterns are inherently bad!
It is not just about legibility. Developers are trying to repeat patterns they see around. If you do not refactor previous places, they will reproduce the outdated pattern. Plus, it makes code reviews frustrating.
Refactoring isn't an end on its own, and it shouldn't ever be considered a standalone project.
The easiest way to accomplish a real goal like fix a bug, or add a feature, may very well be to first refactor the code. And yes maybe you want to merge that in as its own commit because of the risk of conflicts over time. But just having the code look nice (to who exactly?) isn't valuable, and it encourages the addition of useless abstractions. It may even make later real-work harder. Never refactor outside the context of real-work.
The cartoon with the PM also gets at a ridiculous pattern: engineers negotiating when to do various parts of their job with non-technical people who have no idea how do any part of their job. The PM doesn't know what a refactor is, the EM probably doesn't either. It doesn't make the organization function any better to tell these people about something they don't understand, and then ask them when it should be done. Budget it as part of the estimate for real-work.
Good refactoring respects the idioms of the language and the culture of the organisation. Change to new methodology is thoughtful and probably slow, except when a revolution happens but then, its still respectful to the new culture.
Bad refactoring is elitist, "you won't understand this" commented and the owner walks with nobody left behind who understands it.
That the examples deprecated FP and preferred an idiom natural to Java(script) only speaks to the principle. I can imagine a quant-shop in a bank re-factoring to pure Haskell, out of somthing else, and being entirely happy that its FP respecting.
So the surface "FP patterns are bad" is a bit light-on. The point was, nobody else in that specific group could really be expected to maintain them unless they were part of the culture.
"If you unroll loops a la duff's device, you should explain why you're doing it" would be another example.
The dig against FP is weird since the "good refactor" also uses FP, just a built in one in JS. Which I agree is better, but mostly by being built in and idiomatic, it's still exactly as functional.
"I don't like this style of FP coding" ok: if you run the group and own the codebase you can enforce that. So good refactoring is style guide enforcement.
I think I over-read his dislike of FP. really the complaint is "why did you introduce a new dependency" which I am totally fine with, as a complaint. Thats not cool.
Many of his examples kind-of bury the lede. If he had tried to write an abstract up front, I think "dont code FP" wouldn't have been in it. "use the methods in the language like .filter and .map" might be.
On point. Even then mentioning it used filter and map. But the bad refactor also uses filter and map. It's the exact same change of programming paradigm.
Given the text, I would have expected some minor refactor with range-based for loops (are these a thing? My JS is rusty). Where you get the advantage of map (no off-by-one indexing errors) without changing the programming paradigm.
>Good refactoring respects the idioms of the language and the culture of the organisation. Change to new methodology is thoughtful and probably slow, except when a revolution happens but then, its still respectful to the new culture.
The first example complained about the refactor appealing to functional thinkers (implying that it would be difficult to grok by the existing devs), but then the “improved” version is virtually the same save for the (unnecessary?) use of Ramda in the first.
And while many devs are resistant to try functional ways, this first example reads so much better than the original code that I find it impossible to believe that some prefer the imperative loop/conditional nesting approach.
(Raises hand.) I prefer the for loop. Pushing items to an array is idiomatic Javascript for creating an array. An if statement is an idiomatic way to do it conditionally. It's also easier to debug.
The map and filter methods are nice too, but they're for one-liners.
Writing assembly was the idiomatic way of programming before Fortran and human-readable languages came.
Writing with goto was the idiomatic way before Algol and structural programming came.
Having only a handful of scalar types was the idiomatic way until structural data types came (and later objects).
Writing programs as fragments of text that get glued together somehow at build time was the idiomatic way until module systems came. (C and partly C++ continue to live in 1970s though.)
Callback hell was the idiomatic way to do async until Futures / Promises and appropriate language support came.
Sometimes it's time to move on. Writing idiomatic ES5 may feel fun for some, but it may not be the best way to reach high productivity and correctness of the result.
That's because they are. Functional code is more readable. And if you look back, basically all advances in programming languages have been about "making stuff more readable". Thus, for loops (for this usage) are "old".
Nothing in that snippet is Typescript-specific, it's just plain Javascript.
All syntactically valid Javascript is also syntactically valid Typescript, it just adds stuff, though you can get runtime errors for things like reassigning variables in a way Javascript is fine with that Typescript disallows.
Readability is a characteristic of the reader, not what is being read.
This simple truth seems to be so hard for many people to internalize. My theory as to why is that most programmers never get exposed to drastically different and unfamiliar languages and styles of programming. If they were forced to confront and internalize 2-3 different ways of writing code, they would realize this truth.
Personally, I once thought Lisp was unreadable... until I learned it. I once thought BASH was unreadable... until I learned it. Same with half a dozen other languages. Same for styles. "Readability" is just a familiarity and proficiency of the reader.
Both sets of code were fine, but I understood the loop variant instantly, while it took me a bit longer with the FP code.
As a side note: The only real JS I did was optimising some performance critical code, and I did have to refactor a number of FP chains back to loops. This was because the FP way keeps constructing a new list for each step which was slow.
I also prefer the loop, I would negate the condition and use continue, but otherwise leave it unchanged. I don't have a problem with the functional version but it doesn't scan as well to me.
Aesthetic aside, I am under the impression that people start programming, by and large, with imperative for/if style => so the imperative style is readable by more people. Even for more experienced programmers, reading imperative probably cost less energy, since it is more internalised?
Futhermore, in JS, the functionnal style is less performant (nearly twice on my machine, i assume because it do less useless memory allocations)
So, same functionnality, readable by more people, more performant? The imperative example seems like the better code.
> Even for more experienced programmers, reading imperative probably cost less energy, since it is more internalised?
I disagree. For-cycles are usually more difficult to reason about, because they're more general and powerful. If I see "for (...", I only know that the subsequent code will iterate, but the actual meaning has to be inferred from the content.
Meanwhile, a .map() or .filter() already give me hints - the lambda will transform the values (map), will filter values (filter), these hints make it easier to understand the logic because you already understand what the lambda is meant to do.
Other benefits stem from idiomatic usage of these constructs. It's normal to mix different things into one for-cycle - e.g. filtering, transformation, adding to the resulting collection are all in the same block of code. In the functional approach, different "stages" of the processing are isolated into smaller chunks which are easier to reason about.
Another thing is that immutable data structures are quite natural with functional programming and they are a major simplification when thinking about the program state. A given variable has only one immutable state (in the current execution) as opposed to being changed 1000 times over the course of the for-loop.
I, a Datapoint of 1, find the functional style to generally express the ideas of what's happening to be significantly easier to grok. Particularly if intermediate variables and conversion constructors are introduced rather than relying on full chains. E.g.:
function processUsers(users: User[]): FormattedUser[] {
let adults = users.filter(user => user.age >= 18);
return adults.map(user => FormattedUser.new_adult(user));
}
On the performance tip, what scale are we talking? Is it relevant to the target system? Obviously the example is synthetic, so we can't know that, but does it seem like this would have a runtime performance that is meaningful in some sort of reasonable use case?
> I am under the impression that people start programming, by and large, with imperative for/if style => so the imperative style is readable by more people.
IMO, this is a simple consequence of technology moving faster than society. There are still instructors out there who learned to program in an environment where the go-to options for imperative programming were C and FORTRAN; the go-to options for other paradigms (if you'd even heard of other paradigms) were things like Lisp, Haskell and Smalltalk; and CPU speeds were measured in MHz on machines that you had to share with other people. Of course you're going to get more experience with imperative programming; and familiarity breeds comprehension.
But really, I believe strongly that the functional style - properly factored - is far more intuitive. The mechanics of initializing some output collection to a default state (and, perhaps, the realization that zero isn't a special case), keeping track of a position in an input collection, and repeatedly appending to an output, are just not that interesting. Sure, coming up with those steps could be a useful problem-solving exercise for brand-new programmers. But there are countless other options - and IMX, problem-solving is fiendishly hard to teach anyway. What ends up happening all the time is that you think you've taught a skill, but really the student has memorized a pattern and will slavishly attempt to apply it as much as possible going forward.
> Futhermore, in JS, the functionnal style is less performant (nearly twice on my machine, i assume because it do less useless memory allocations)
Sure. Meanwhile in Python:
$ python -m timeit "x = []" "for i in 'example sequence':" " x.append(i)"
500000 loops, best of 5: 796 nsec per loop
$ python -m timeit "x = [i for i in 'example sequence']"
500000 loops, best of 5: 529 nsec per loop
... But, of course:
$ python -m timeit "x = list('example sequence')"
2000000 loops, best of 5: 198 nsec per loop
Functional programming is more intuitive for many pure data transformation tasks.
It's not more intuitive for the entire system.
When you make a new file in a file system, you're already violating functional programming, even if you atomically create that file with all the content specified, and make it immutable.
You must construct a new file system which is like the old one, but with that file, and then have the entire system tail call into a world where you pass that new file system as a parameter, so that the old one is not known (garbage).
Unix has been the most successful system in getting partial functional programming into ordinary people's hands.
A Unix pipeline like < source-file | command | command | ... | command > dest-file is functional except for the part where dest-file is clobbered. Or at least can be functional. The commands can have arguments that are imperative programs (e.g. awk) but the effects are contained.
In the famous duel between Doug McIlroy and Knuth in solving a problem, in which McIlroy wrote a concise Unix script combining a few tools, McIlroy's solution can be identified as functional:
One problem I have with functional programming is that I find it hard to debug.
With imperative programming you can follow a program step by step, line by line, it can be done with a debugger, pen and paper, or in your head.
With functional programming, not so much. It runs functions. What do these functions do? Don't know, they are pieced up from someplace else in the code. And thanks to lazy evaluation, they may not even exist. In the design phase, it is mostly a good thing, as it is flexible, and pure functions are less likely to make a mess than functions with side effects, but there will be a point where the program will not behave as it should, no matter your paradigm. And that's when it becomes a problem.
It is also a problem with object programming if you abuse abstraction, in fact, it is a general problem with abstraction, but functional programming makes it the default, whereas imperative programming is concrete by default.
As for the Python example, I am a bit surprised that the optimizer didn't catch it, all three are common and equivalent constructs that could have been replaced by the most performant implementation, presumably the third one. But well, optimizers are complicated.
I disagree. People are sometimes just forced to think imperatively when dealing with computers, but I usually think declarative when I design my programs.
Say if I want to filter a sequence of users and omit users below the age of 18, I'll construct my predicate (a "what"), and want to apply that predicate to create a new sequence of user (another "what").
I really don't want to tell a computer how to process a list every single time. I don't care about creating an index first and checking the length of my list in order to keep track that I process each user in my list sequentially, and don't forget that important "i++". All I want at that moment is to think in streams, and this stream processing can happen in parallel just as well for all I care.
But I also do think Python, Haskell etc. are the most expressive here with list comprehensions. It can't get more concise than this IMHO:
users_adult = [
user
for user in users
if user.age >= 18
]
In this case, you first declare the end-goal - visit a friend and have full gas tank, with the actual steps to achieve them being much less important and often left to be defined at a later point (e.g. which particular gas station, which particular pump etc.). This corresponds more to functional thinking.
An imperative thinking would correspond more to "I will sit in the car, start the engine, ride on highway, stop at address X, converse with Y, leave 2 hours later, stop at gas station X" - in this case the imperative steps are the dominant pattern while the actual intent (visit a friend) is only implicit.
So when you arrange the visit with your friend two weeks in advance, you first think about sitting in the car, driving out of the garage, getting on the highway, turning on the radio, parking the car, ringing the bell and this other myriad of actions, and the actual talking with the friend is just one of the actions, with no prominence over the others?
I certainly don't think like that. My main goal is to visit a friend. The transportation is subordinate, it's only a mean to the goal, an implementation detail which I don't care about much. I might even take a train instead of driving the car, or even ride a bike, if I feel like it and the weather is nice on the day of the visit.
Now reflecting on this, I think such focus on the process (as opposed to focus on the goal), exact imperative order, not being able to alter the plan even if the change is meaningless in relation to the goal, is a sign of autism. But I don't believe most people think like that.
No, we just think "I'll get in the car and drive to the hospital, talk to my sick friend, and then drive home by way of the petrol station". Higher-level, but definitely still procedural / imperative. (Then while we're driving we'll think "I'll turn left here" or "I'd better overtake that lorry", or whatever. But we don't need to plan all that beforehand; we do stepwise refinement on-the-fly.)
I think most people think more or less like that, and that it is not "a sign of autism".
You're the one that added all those conditions about exactness, about needing to replay every step (even this is a problem because steps are fractal), or not being able to change the plan. I can think imperatively and still do those :)
I dont know a lot about the brain but I do know that people that research modeling brains have used models in which there are two types of things to think: declarative things AND procedural things. See SOAR and ACT-R.
The caveat here is to which extent computers have tinted their models of a brain, but they are professional cognitive researchers so I’d give them the benefit of the doubt :)
While I think of things in a defined order, I also think in sets. If I grab a bunch of peanuts, I don't visualize grabbing every single peanut one by one, I visualize getting a bunch at the same time.
If you grab a bunch of peanuts you're probably thinking in "hand fulls", but the fact that the world and actions we take are fractal in the way we can analyze them doesn't prove one or the other.
I've haven't written JS in a long time – are engines like V8 smart enough to roll the filter and map into a single loop? Otherwise wouldn't a reduce be more efficient there?
It's not a matter of being smart enough. Since JavaScript is interpreted, the optimization happens at runtime. If the code is executed once and the number of items in the array is small, then it will take more time for the compiler to optimize the code than to naively execute it. Most code falls into this category.
As for whether or not it's possible at all to combine a map and filter into a single loop I guess depends on whether the first operation can have side effects that affect the second operation or the collection that is being iterated over. I don't know the answer, but I would be surprised if there wasn't some hard to detect corner case that prohibits this kind of optimization.
Good refactoring should significantly reduce the size or complexity of a codebase.
These two metrics are interrelated, but as a general rule if the gzipped size of the codebase (ignoring comments) does not go down, it's probably not a good refactoring.
I'm going to disagree and see what other people say.
I don't think that reduction of size is of any relevance. I admit my own refractors tend to make things smaller but it's only a tendency. Most definitely some increase the size overall. I'm currently refactoring a code base – for each item there used to be one class. Each object was examined after creation then a runtime flag was set: Rejected or Accepted. As the code crew I found I was wasting a lot of time around this Accepted/Rejected stuff. Now I'm refactoring so I have two classes for each item, one for when it's Accepted and one for when it's Rejected. The amount of boilerplate has definitely bulked up the code but it will be worth it.
As for complexity, I don't know.
The only thing I refactor for is human comprehensibility. That is the final goal. What other goal can there be?
> I don't think that reduction of size is of any relevance.
More code equals more bugs. This has been a pretty consistent finding dating back decades. Simpler and smaller code bases will generally have fewer bugs.
Sure, comprehensibility is the terminal goal—but size reduction is relevant because it’s an easily measured, decently predictive proxy for that. No, code-golfing 3 lines down to one doesn’t help, but the bigger size differences are always of the “composition over XYZ” variety that change the entire complexity class of how much code you need to write (like React letting people write O(states) rendering code instead of O(transitions)).
Size reduction usually follows but not necessarily. That is easily measured is an irrelevance. The claim that its a decently predictive proxy I can't buy at the moment, can you try justifying that? Some very terse code can be very hard to understand without (comprehensive) commenting eg. https://en.wikipedia.org/wiki/Fast_inverse_square_root#Overv...
Maybe you're right. I've had a drink, I'll think over it in the morning, thanks.
The "fast inverse square root" is absolutely 100% all about performance. For it to make sense to use as a counter-example, you need to show alternate code that still meets the contract (the contract being: be as fast as this code), that is longer, and clearer.
I was saying that shorter code is not necessarily more readable. But what you're talking about is optimisation and I'm fine with that, uglier code for higher speed (or whatever). But is that refactoring? I don't think so. I may have led you a bit down the garden path here.
Imagine that you've refactored code and reduced complexity, but also reduced performance (the caching example) - would you move forward with the refactor?
From my perspective there should always be a buy in - after refactoring the system is more understandable, but also more coupled. Is this fine? If no, can given refactor be merged now and result tackled in separate refactor. Caching refactor can have a buy in as well - ie. remove caching because given request shouldn't be cached, or this functionality should be decoupled and done elsewhere
Oh god the ‘object oriented’ refactor. I wish everyone who had OO thrust upon them in the early 2000s received some explicit communication that what they were taught is essentially a hoax and bears no resemblance to Alan Kay’s original intention
To be fair OOP of today is much more similar to Simula then to Small Talk, reading the wikipedia I can see almost 1:1 mapping including the modeling philosophy and all that, people mostly yoinked the name from Alan Kay.
"Object" and "orient" are ordinary English words; Alan Kay doesn't own either of them. He may have been the first to combine them as a single term, but he doesn't own that either. (Or, did he trademark it or anyything? I doubt it.) That another definition of it than his came to be the dominant one is just the way things are, not in any way "a hoax".
This is an interesting topic, but I don't think the article effectively conveys its message.
The title focuses on good and bad refactoring, but most of the content discusses good and bad design. This means that many of the bad examples are inherently bad, regardless of whether they were refactored from another version or written from scratch.
The introductory comic and the conclusion mention how to perform refactoring, but the rest of the article drifts away from this and only discusses the resulting code.
The first pitfall mentions changing the coding style, but the explanation actually addresses the problem of introducing external dependencies.
The fifth point, "understand business context," should actually be "not understanding business context."
If we perform refactoring incrementally, it's inevitable that there will be some inconsistencies during the process. Therefore, the third pitfall, "adding inconsistency," should include additional explanations.
In summary, I think the article would be more helpful if it focused more on how to perform refactoring rather than criticizing a specific piece of code.
I agree with the sentiment of this article, more recently I have come to think the best refactoring is the one you don't undertake. Hoping to achieve consistency is a very high standard.
Often code structure matters much less than data flow and how it is piped.
Since most codebases use a mixture of:
- global state or singletons,
- configurations provided externally (config file, env var, cmd option, feature flag, etc.), that are then chopped up and passed around,
- wrappers and shims,
- mix of push and pull to get input/outputs to functions,
- no consistency or code representation of assumptions about handling mutable state,
It may be better to build around existing code using ideas listed here than to try to refactor code to improve its structure:
- open/closed principle = compose new code for new functionality (instead of modifying),
- building loosely coupled modules (that interface via simple types and a consistent way of passing them)
- enforcing an import order dependency via CI (no surprise cyclic dependencies months after an unrelated feature added some import that doesn't "belong")
The code's structure will be simple if the dataflow (input, outputs, state, and configuration) flows consistently through the codebase.
Words can not express my hate for this kind of articles.
Imagine working on a legacy codebase where the PM holds the dogma of refactoring being a bad thing and expecting you to do it wrong, even micro managing your PRs.
Most often than not, I do see projects suffering and coders actually resigning due to a lack of internal discussing about best practices, having space/time to test potential solutions, having Lead devs who resemble dictators quite well.
Let me guess, some PM wrote this article and they just want you to push the product asap by applying pressure and not allowing you ever to refactor. This is just a casual day in software development. I'm not surprised anymore when most web apps have silly bugs for years because it's gonna be a Jira ticket and a big discussion about..... one evil thing called refactor.
Several years ago I rewrote a full SaaS in about 3 months, it took another team 12 months with 5 devs. Guess which version made the investors happy, mine.
Bad refactoring is just a product of poor engineering culture.
> Imagine working on a legacy codebase where the PM holds the dogma of refactoring being a bad thing and expecting you to do it wrong, even micro managing your PRs.
I don't think the article said that anywhere? It was just a list of some common things that can go wrong when refactoring, along with some examples.
Nah, judging from the ancillaries (domain name, links to other articles, ads, etc) of the article, it was some guy selling an "AI" code tool of some kind who wrote the article.
(Probably a tool with Magikal Refactoring Functionality built-in... For a price.)
There was a great refactoring chance in the example where the cache was removed.
That is, extracting the caching logic from the API call logic.
Caching could have been a more generic function that wraps the API call function. That way each function does exactly one thing, and the caching bit can get reused somewhere else.
Instead, this weird advice was given: changing behavior is bad refactoring. Which is weird because that's not even what we call refactoring.
Refactoring is about moving existing code around, not introducing new code. Replacing localStorage methods with cacheManager is a fix/feature. Updating one part of the codebase to work completely differently from the rest is a fix/feature. Changing processUsers to a whole useless class is not considered refactoring, it is a fix/feature. A single page app for a SEO-focused site is NOT a bad idea since 2018. Most examples of "refactors" in the article are actual fixes and features which brought (bad), or not brought (good) new regressions into the software.
What I get out of this is that even for teams that prioritize trust and velocity and eschew the use of pre-commit code reviews, there's a strong argument to be made for putting all new hires on an approval-required list for the first few months!
I am not sure if I agree with the article, however I do agree that not every time code actually needs to be formatted.
> More than a few of them have come in with a strong belief that our code needed heavy refactoring.
The code might, but a blind spot for many developers is that just because they are not familiar with the code doesn't mean it is bad code. A lot of refactoring arguments I have seen over the years do boil down to "well, I just don't like the code" and are often made when someone just joins a team at a point where they haven't really had time to familiarize themselves with it.
The first point of the article sort of touches on this, but imho mainly misses the point. In a few teams I worked in we had a basic rule where you were not allowed to propose extensive refactoring in the months (3 or more) of being on the team. More specifically, you would be allowed to talk about it and brainstorm a bit but it would not be considered on the backlog or in sprints. After that, any proposal would be seriously considered.
This was with various different types of applications, different languages and differently structured code. As it turned out, most of the time if they already did propose a refactor, it was severely scaled down from what they initially had in mind. Simply because they had worked with the code, gained a better understanding of why things were structured in certain ways and overall gotten more familiar with it. More importantly, the one time someone still proposed a more extensive refactoring of a certain code base it was much more tailored to the specific situation and environment as it otherwise would have been.
Edit: Looks like it is being touched on in the fourth point which I glossed over. I would have started with it rather than make this list of snippeted examples.
That OO refactor isn’t actual OO. The tell tale sign is that it is named by what it does rather than what it is (verb vs noun) and the -or ending in the name [0]. It’s just a function masquerading as a class.
The better refactor to introduce OO concepts would have been to introduce an isAdult function on the user class and maybe a formatted function. This + the functional refactor probably would have made for the best code.
Being adult is not a property of the user but of the jurisdiction that the user is in. In some places or some purposes it is 18 but it could be, e.g., 21 for other purposes.
If you software is not going to just run on the USA it is not a good idea to implement isAdult in the user but in a separated entity that contains data about purpose and location.
Just having some fun with bikeshedding here: Yeah, that could work but IMO in a big/international system the responsibility should ideally live elsewhere, since:
* You may need to determine adulthood for a different jurisdiction than where the person currently resides. Their citizenship may be elsewhere, or you may be running a report that expects "adulthood" to be by some other region's standards, etc.
* Sometimes the underlying kind of adult-need wanted is slightly different, like for consuming alcohol or voting.
* There may be a weird country or province has laws that need additional factors, like some odd place where it's a different age-cutoff for men and women.
Yes this is just extreme bike-shedding at this point. But none of this is impossible with more OO principles, like interfaces:
class User {
// Convenience function to check if the user is an adult in their current location
boolean isAdult() {
return this.location.isAdult(this);
}
boolean isOfDrinkingAge() {
return this.location.isOfDrinkingAge(this);
}
}
interface Location {
boolean isAdult(User u);
boolean isOfDrinkingAge(User u);
}
class WeirdLawsLocation implements Location {
boolean isAdult(User u) {
return switch (u.gender()) {
case MALE -> u.age() >= 16;
case FEMALE -> u.age() >= 18;
}
}
boolean isOfDrinkingAge(User u) {
return u.age() >= 21
}
}
In the hypothetical that you want to check somewhere the user is not currently:
class SwedenLocation implements Location {
boolean isAdult(User u) {
return u.age() >= 18;
}
boolean isOfDrinkinAge(User u) {
return u.age() >= 18;
}
}
var sweden = new SwedenLocation();
sweden.isOfDrinkingAge(user);
That’s just your opinion. It’s ok to provide convenience functions. I see no difference between the amount of indirection in our implementations, except mine is in the more natural place and you don’t have to know how to get a location or jurisdiction to answer the question: “is this user an adult?”. Knowing that it uses a location or jurisdiction is an implementation detail that you shouldn’t couple yourself to.
Cheers mate, I think I’m done moving goal posts for this conversation :)
> Being adult is not a property of the user but of the jurisdiction that the user is in.
It's a function of both. And I'd argue it's really mainly a function of the person: Independently of jurisdiction, there's at least a rough global consensus what "being adult" means, and most jurisdictions set rather similar (many of them, identical) limits.
A four-year-old isn't an adult anywhere; a fourty-year-old is everywhere.
What about a pure function that can take anything that has an age as input? Well obviously that wouldn't work for a cat but it's just an example. It requires typescript and I'm not sure how to name the file it would go in, but I think it's interesting to consider this duck-typing style.
function isAdult({age}: {age: int}) {
return age >= 18
}
ps: I replaced const by function because I don't like the IDE saying I can't use something before it is defined. It's not a bug it's an early feature of javascript to be able to use a function before it is defined. Code is just easier to read when putting the caller above the callee.
That wouldn’t be object oriented. In OO you tend to want to ask an object about itself, Yegor talks a bit about this in his book Elegant Objects.
What you are proposing is just functions or data-oriented programming; which is fine if that’s your thing, but I’d be weary because of the reasons you outline above. Can a book be an adult? What about a tv show? Or recipe from the 9th century? isAdult really only applies to users and really belongs on that object.
I hate this article. It's a very smug way of blaming the dev who's just trying to make the app better when it's probably the culture that's the problem. Bad refactors usually happen because the person doing the refactor is getting a ton of pushback on it -- they probably underestimated the effort involved and are getting chewed out for taking too long on it, so they cut corners that might accidentally lose functionality, or they don't finish the desired abstraction / clean code they were going for which leaves the code less readable.
For a dev that's a new hire, refactoring the code is also a way for them to feel ownership over it. The PM should be happy that they're thinking about the way the code works and the way the code should work. It's on the company to have review & qa processes that catch problems before they lead to downtime.
I don't disagree that some of the examples given are bad refactors, but in regards to adding inconsistency I see that happen a lot more when rushing out new features or bug fixes than when refactoring; usually the refactor is the effort trying to establish some kind of consistency. And example 5 isn't a refactor it's just removing functionality. If that was the intent, the person should be told not to do that. If it's an accidental side effect of some larger refactor effort, then just add the functionality back in a new PR. Accept that mistakes happen, adopt some QA controls to catch them, and build a culture that encourages your developers to care about your product.
I feel this article is honestly disingenuous. One of the "common pitfalls of refactoring" mentioned is "Not understanding the code before refactoring". Well yeah ? The same would apply to doing anything with the code. The following one is "Understand the business context" (note that the author has already departed from the pattern of listing "common pitfalls" to just write whatever he feels like. Or he just published his first draft).
This article just reminds me why I hate JavaScript so much. I know you frontend engineers can’t avoid it, but I wish we could come up with something better.
Golang is my favorite right now, but I think it’s possible to create readable code in many languages, even ones I like less. The JavaScript and its ilk strike me as some of the worst of the “modern” languages. Again, no hate here. I started out with JS and PHP, 20 years ago. I just cringe any time I see modern JS syntax, and wish it were simpler so I could get back into it.
Concerning the first example: whoever thought that passing property names as string can be better than any other style should have his coding licence revoked!
A good refactor does not change behaviour, I would like the author to start with that point.
Take many more much smaller steps while doing so.
Not touching a piece of code in the first 6 to 9 months is something I don’t really agree with. Breaking complex methods up by extracting variables and methods can really help learning the code, whilst not breaking it.
If you are worried about consistency, just pair up of practice ensemble programming instead of asynchronous code reviews. Leaving a new dev alone with the code and give them feedback about the things they did wrong after they went through everything is just not a great way to treat people in your team.
By _your_ definition. If you apply it, a lot of effort is needed for little gain. The best refactors i've seen improved behavior while making code more concise.
"By definition" proceeds to define the word in a way that most people won't agree with. Okay mate. Your definition is ass. The person that wrote the article doesn't agree with you, I don't agree with you. I will still use the word refactor when I talk about simplifying the application such that code becomes simpler through simplifying and improving the design.
Huh, I guess that's true, even on wikipedia. I will have to stop using the word then. Though I'm pretty sure most people use it in day-to-day with much more abandon.
The metaphor, I think, is that your code is a mathematical function, and, to be even more specific and "toy example" about it, let's say it's a polynomial. If the old code was
x^2 + x z + y x + y z
then you notice that you can express the same polynomial as:
(x + y)*(x + z)
It's still the same polynomial, but you "separated concerns", turning it into a product of two simpler factors.
Similar ideas apply to sets of tuples. Perhaps you were given
{(1, 1), (1, 4), (2, 1), (2, 4), (3, 1), (3, 4)}
and you notice that this can be expressed more simply as the Cartesian product:
{1, 2, 3} x {1, 4}
Again, a literal factoring. You can imagine how variations of this idea would apply to database tables and data structures.
That's where I think the word "refactor" comes from.
If you change the polynomial, you change the order of calculations and hence change behaviour. You might get overflows in case of integers or different precision in case of floats.
Nice to learn where the word originates from. Often the meaning of words change over time. E.g. today no horses need involved in bootstrapping.
He said "I think" implying that he is not fully sure about the etymology. There is a history section on wikipedia for refactoring if you are interested.
You are right but in fact the maths are only a means to an end, a model that isn't exactly equal to the final real world implementation in analog or digital electronics.
Sure. Whenever you post on a forum it's half for anybody else reading, right? I just thought it was interesting to consider the underlying metaphor; maybe people don't think about it. Metaphors, connotations, etymologies -- I find these interesting.
The first example of a good refactor is a meh refactor at best, and possibly a bad refactor. Array methods such as map or filter are not "more conventional" in javascript; they are "as conventional" as for-loops, and arguably less "conventional", given how for-loops have been around since the introduction of the language. They are also inevitably more expensive than for-loops (every cycle creates an anonymous function; a map followed by a filter means another iteration of the array). The original example was fine; there was no need to "refactor" it.
Disagree on this. filter and map are much more readable and especially extensible than result-arrays. Plus it eliminates out-of-bonds indexing.
See the variable name. It's forced to be 'result' so that it's consistent with the result-array style. Therefore it lacks a descriptive name.
For the functional methods, you can easily assign the filter(age > 18) result to an intermediate variable like adultUsers to make the code even more descriptive. Useful when you have more steps. With the result-array approach, you'd have to repeat the looping code or bury the description deep in the loop itself and so you usually avoid that.
The example with the for-loop vs. map/filter in particular - it's such a micro-function that whichever the original author chose is probably fine. (And I would be suspicious of a developer who claimed that one is 'objectively' better than the other in a codebase that doesn't have an established style one way or the other).
Refactor when you need to when adding new features if you can reuse other work, and when doing so try to make minimal changes! Otherwise it kind of seems more like a matter of your taste at the time.
There's a limit of course, but it's usually when it's extremely obvious - e.g. looong functions and functions with too many parameters are obvious candidates. Even then I'd say only touch it when you're adding new features - it should have been caught in code review in the first place, and proactively refactoring seems like a potential waste of time if the code isn't touched again.
The (over) consolidation of duplicated code example was probably the most appealing refactor for me.