There is a very simple solution to that: large pull requests -> automatic fail.
One project I worked on some guy presented me with his masterpiece, a ridiculously large refactoring of the entire code base and clean-up of old code that was no longer used. He was really pissed off when I categorically rejected the request and asked him to break it up into manageable pieces.
Quite sad because obviously he put a lot of effort into it but he did not keep in mind that it's one thing to do this, it is an impossibility to actually review it.
When coding keep in mind the job of the people coming after you: review, QA, eventual long term maintainers and so on.
This! Large refactoring can and should be done, but reviewer should be warned in advance and goals of such rewrite should be very clear. And after the rewrite, code should be treated as alpha and properly tested. And rewrites should be rare events.
Anything else should be done incrementally, in small manageable commits.
You just sound lazy and apathetic. If you actually cared you would have worked with him. You could have done an in-person review where he briefs you on the bigger changes and then you can spin off and review the dead code removal and benign refactor pieces yourself. So what if it takes you an hour or two or four; it probably took him a few weeks. Let me guess, the PR was about 3-5k lines?
You can take as much time as you need in order to give a thorough review and express your frustration so that it's clear next time he should strive for something more manageable, but catagorically rejecting a large code change is just silly and juvenile.
This is why I hate agile. It's conditioning people to believe the only thing you can do to a codebase is incrementally add features. Sometimes the most meaningful or beneficial work comes in the form of broader systemic changes to a codebase.
And it sounds like you are quick to rush to judgment.
You're essentially accusing me of doing exactly what you are doing. You could have instead asked more about the context within which this all happened (which was a pretty bad situation to begin with) and if I tried to work with the person submitting the pull request.
Instead, you make me out for lazy (which is something that I don't think anybody that knows even a little bit about me) and apathetic (which also doesn't fit the bill, rather the contrary).
You can take as much time as you need to do a review if you have that time. The middle of a house fire is not the right moment to push through your pet project, and it also doesn't mean that if I point blank refuse a pull like that that I had not thought it through.
I said you _sound_ lazy and apathetic... if there was more context then maybe you should include it in your argument in the first place (which is all I'm responding to, it's not personal). All I have to go off of is a post where you describe a coworker submitting his "masterpiece", you admitting it was clear a lot of hard work went into it, and you rejecting it because it's too big. That, alone, is rather annoying IMO. As someone who will both review and submit bigger PRs that try to move the needle in ways that aren't possible with a 200 line diff, I would be extremely frustrated with your stance of "too big can't merge" with no other context. We've all been there under pressure faced with the decision of whether we can responsibly merge something or not. I'm sure you made the right call for your situation. But you suggested _categorically_ rejecting large PRs. I'm not the only one that took issue with that and it's much different than the scenario you've now described (lots of problematic, hard to parse, code with a tight deadline).
When a refactor cannot be done in an iterative fashion — there are some changes that are just too fundamental to be done in small pieces — then one approach is to write a review guide, document what has changed and why, indicate to reviewers where the pain points are.
One argument against gradual refractors is that they are often left half done when enthusiasm runs out. I’ve seem mloc sized code bases with multiple layers of half done refactors and migrations.
That might be an argument against gradual refactors, but it is not an argument for doing large refactors in one pass, as it does not solve the underlying problem of running out of enthusiasm (or time, for that matter).
You need an amazing test suite, for starters. If the test suite is good enough then even a large refactor isn't that scary.
This was part of the problem for the Segwit2X code: inadequate testing. The original Bitcoin Core codebase is up to a very high standard, and that high standard seems to be the first thing to go anytime someone goes to fork it. This is part of the reason why the developers of Bitcoin Core have so much influence to set future direction of the currency within the Bitcoin community at large; there aren't that many highly-skilled cryptocurrency developers out there. It's unsafe to launch a fork without them.
I'm putting off a very sizable one at work (due to important functionality going in before FCS), and luckily, although it'll hit nearly every file, I can do them individually.
My concern, actually, is eyes glazing over on the n'th review that's vastly similar to the previous n-1.
You mean how do we do large refactorings? There is no such thing as a large refactor.
Refactoring is a bottom-up design tool. You remove accidental complexity and uncover higher order organization in your code, occasionally uncovering or allowing new functionality that was intractable before.
A “large refactor” is a top-down change. Major surgery. You’ve used (maybe misappropriated) the tools of refactoring for a different purpose which is antagonistic to the school of thought that enumerated these tools.
I say this as someone who when I was starting out spent a holiday weekend rewriting hundreds of lines of repetitive code (a bad idiom) to move an O(n²) performance issue into a single function. I removed 500 lines of code in the process. At that point, and for years after, my most productive week.
Enter the Mikado method. With Mikado you start to do your large refactoring. You change A, which forces you to also change B. You change B, but now C has to change too. So you change C and, you guessed it, now D doesn’t work. Man this is getting out of control. D leads to E, and E leads to F.
Now you’re looking at making change F. You are worried, or rather, you should be. How are you going to verify all these changes are complete? How do you know here isn’t a G, H and I waiting for you after F?
Now the hardest thing for any developer to do (besides admitting they gave bad advice) needs to happen. You need to throw away code you just spent hours of time sweating over. Revert to master, and start over.
Just implement change F, by itself. Sure enough, F leads to G and H, but those three changes are still reasonably sized for a code review. Send it off and start working backward up the list.
The reason we do bottom up is that we uncover the difference between our intuition about the code and the situation on the ground. When you go to do E or D you might find you didn’t need to make the change quite as broadly as you thought. And you might be able to skip A and B entirely.
When I do Mikado, I find that only about 60-70% of the code I originally wrote survives the process. The changes are more contained, which means they impact fewer of your coworkers less often. They are easier to reason about, even self-explanatory. Of course you would do it this way, it just makes sense.
The interesting thing to me about Mikado is that some people independently discover the trick themselves. In fact these large refactors I had to do, I complained, reminded me of playing pick-up sticks. I can’t move this because of that. Can’t move that because of the other thing. I got overwhelmed by the size of a couple of these changes, and threw them out. Then started over either immediately or at a later date.
This is also why I'm super into rewriting local branch history - when you get into these situations, you reorder the changes into dependency order (F -> E -> D -> ...) and then it becomes trivial to create a PR for a suitably small prefix of the dependency path.
That helps but I really believe that it’s an important part of the Mikado Mehod that some of your changes go away during the process. It took a long time for me to believe that some changes are just distractions and don’t move the code forward in all of the ways that are important.
Because part of refactoring is realizing that certain problems can be fixed at any time you deem it necessary to do so, that means you don’t have to fix it just because you can. A lot of my code ends up prepped for a refactor I never do. Someone else does it for me, when they have a requirement that makes the change make sense. It’s my 90-10 compromise for the tension between YAGNI and broken windows. I don’t make the change, I just make the change easy.
Smaller changes tend to be safer and should break less. The less you change at one time, the better and less bugs which are easier to spot and fix.
But if safety does not matter, or its pure crap anyway. Then refactor away.
Proper review of core components that are used and proven is hard already, refactoring such components - don't do it if avoidable - there must be a very good reason. And convenience ain't one - you'll probably get some duplication of code and double maintainance during the transition period.
Making small PR is just a way to be safe and smart, one branch with individual gradual commits that could be used separately is also a way (if they could be reviewed separatley).
I'm not sure why so many people consider "refactoring" to be a synonym for variable renaming. Refactoring is a non-functional change which makes the code base more manageable [1] . Renaming is a kind of refactoring, but not a terribly important type of refactoring.
That is true, but it does not follow that it is possible to have large refactorings that cannot be accomplished in a series of smaller, independently-verifiable steps, which is what your original question implies.
Maybe that would occur if you had to replace a central algorithm with a significantly different one (because the original one did not scale, for example). That is more of a rewrite than a refactor, and should be done as a separate product, developed incrementally, of course.
My personal methodology is to alternate between iterative development and large refactors. I do very little testing during the large refactor, and test as thoroughly as possible afterwards.
Why would incremental development be unsuitable for refactoring, specifically? It is pretty much universally accepted that incremental methods are the best way to tackle software development in general.
If 'refactoring' did only refer to simple, mechanical changes like renaming, I could see it, but you have just argued against that interpretation.
There is a fallacy which says "if I understand the steps which got me here, than I understand where I am." Or "If I understand each change that has been made to a system, then I understand the system." This is simply not true. A myopic understanding of each change does not equal an understanding of the system. Human memory is simply not good enough to remember all of the previous bits of code, and comprehend whether a change won't happen to interact in a bad way.
With lots of testing, yes, you can incrementally build a working system that you don't understand. But by myopically staring at diffs all day, you'll only get farther from the truth of how your system works.
The major refactors that I do, are because I've come to the realization that my data model was not correct. A realization, that often happens multiple times during a project. My utility functions are still useful, but I have to rewire everything. There is no meaningful way to do that incrementally. Why would you even try to do that incrementally? It's like, if you have a salamander and you want to turn it into a horse incrementally. Of course, most of the RNA is the same. But testing and code reviewing the steps between salamander and horse seems to me like a waste of time.
Why does that not apply to development in general? Or perhaps you find it does?
While "if I understand the steps which got me here, than I understand where I am" may not be guaranteed (especially if you don't know where you started from), it is almost always among the more effective ways to proceed (though it is not particularly helpful if you don't know where you are going.) It can help avoid getting lost, which is one of the reasons for doing development incrementally.
There is no way in which I can see that turning a salamander into a horse has anything to do with refactoring.
In dynamic languages such as JavaScript or Python, renaming is definitely a kind of refactoring, as it can easily break things (I've seen it happen).
In static languages such as Java, renaming a variable is something your IDE can do for you with all the benefits of static analysis. There's almost no risk associated, and reviews can be a rubber-stamp.
Maybe we need something better than Git where a refactor of one variable does not give a crap on the entire repository but rather is listed as a single "change".
You don't. Your software doesn't ever need big changes because they're basically impossible to review. All changes must be small and trivially reviewable.
If I disregard "/s", I actually agree - with caveat that large refactoring is still ok if you treat the refactored code as a totally new project - not battle tested that is. So you absolutely can do large refactorings, but the resulting code should be thoroughly tested before deployment.
One project I worked on some guy presented me with his masterpiece, a ridiculously large refactoring of the entire code base and clean-up of old code that was no longer used. He was really pissed off when I categorically rejected the request and asked him to break it up into manageable pieces.
Quite sad because obviously he put a lot of effort into it but he did not keep in mind that it's one thing to do this, it is an impossibility to actually review it.
When coding keep in mind the job of the people coming after you: review, QA, eventual long term maintainers and so on.