>The number of bugs are proportional to the lines of code, this is undeniable from empirical data.
Do you have a link to this undeniable data? I haven't seen many empirical code quality studies that are not littered with possible confounders. It's very difficult to do these studies.
I do believe that bug density is roughly proportional to the size of the codebase if you average over large corpera of code. But the bug density of different types of code within those corpera varies a great deal in my experience.
So I think the important question is what sort of code is duplicated and why it is duplicated. Removing duplication means creating dependencies. If we create the right dependencies, i.e. the ones that enforce important invariants, that's a good thing. But that is a big if.
For a summary, sure [1]. There have been loads of studies on various metrics, but none have actually been any better than simple lines of code, despite the fact that it has such a large variance as a metric.
> Removing duplication means creating dependencies. If we create the right dependencies, i.e. the ones that enforce important invariants, that's a good thing. But that is a big if.
While enforcing invariants would certainly be good, I'm not convinced that's the only reason DRY reduces bugs. Common functions get manual reviews every time the code that calls them also get reviewed and/or refactored, whether due to new features or bugfixes.
DRY increases exposure of more commonly used paths through your program.
>There have been loads of studies on various metrics, but none have actually been any better than simple lines of code
True, but that doesn't mean SLOC is a very useful metric. Say you were to rewrite a large Java codebase in a language that eliminates all getters and setters. You will have greatly reduced the number of SLOC, but it is very unlikely that you will have reduced the number of bugs very much.
In other words, going for the low hanging fruit of programming language design wouldn't necessarily help much. Bugs are not evenly spread out over the entire codebase.
You will have greatly reduced the number of SLOC, but it is very unlikely that you will have reduced the number of bugs very much.
I agree strongly with your point about potential confounders (and was about to make it myself) but now you are making your own assertion that I'm uncertain about. Why are you confident that switching to a language that greatly reduces the number of lines of code would not reduce the number of bugs?
While I don't know of any hard evidence, it passes my internal plausibility test that if some of the "2 screen" functions become "1 screen" function, bugs might be less likely. There might be some counter-force that would confound this, but I wouldn't eliminate it out of hand. So what makes you say "very unlikely" rather than "not necessarily true"?
I think the idea was "A language that is otherwise identical to Java but eliminates the need to write trivial getters and setters." This obviously reduces line count, but probably does not equivalently reduce bug count, as the removed lines are very unlikely to contain bugs.
This obviously reduces line count, but probably does not equivalently reduce bug count, as the removed lines are very unlikely to contain bugs.
While I agree that the bugs are not likely to be in the trivial code, I don't think it's a given that presence of the trivial code has no impact on the number of bugs elsewhere. Consider a "fatigue" based model, where the human brain is distracted by the monotony of the bug-free getters and setters and thus unable to pay sufficient attention to the logic bugs elsewhere in the program. And again, I'm not making that claim that eliminating boilerplate reduces bugs, only objecting to the assumption that it does not.
My intent was to clarify (what I perceived to be) the parent's argument, more than make one of my own.
I think if our process is "1) write the software in Java, 2) remove those lines", it's clear that we've probably changed the average bug density of the project. I agree that there is much reason for concern in generalizing that result to what would have happened if we'd written in that other language to begin with.
I simply haven't found many bugs in (mostly auto-generated) getters/setters during the past 25 years, but it's purely anecdata of course.
>While I don't know of any hard evidence, it passes my internal plausibility test that if some of the "2 screen" functions become "1 screen" function, bugs might be less likely.
Yes, mine too, but only for randomly chosen pieces of code. What I don't believe is that the linear correlation between SLOC and bugs that studies have found in large codebases allows us to pick and choose the lines of code that are easy to eliminate and expect the number of bugs to drop proportionately.
Do you have a link to this undeniable data? I haven't seen many empirical code quality studies that are not littered with possible confounders. It's very difficult to do these studies.
I do believe that bug density is roughly proportional to the size of the codebase if you average over large corpera of code. But the bug density of different types of code within those corpera varies a great deal in my experience.
So I think the important question is what sort of code is duplicated and why it is duplicated. Removing duplication means creating dependencies. If we create the right dependencies, i.e. the ones that enforce important invariants, that's a good thing. But that is a big if.