The studies are almost uniformly terrible ... orders of magnitude of value away from being strong enough to support you standing on it
I find that too. Most of them seem so incommensurate with what they purport to be studying as to be nearly trivial. The researchers rarely seem to address (or even be aware of) the assumptions they're making, and their assumptions are usually significant enough to dominate the data.
That's not to say one can't extract value from such studies, but what the value is is so open to interpretation that everyone ends up relying on their pre-existing preferences to decide the issue, which defeats the purpose.
Edit: that study on code cloning that AngryParsley cited above is an example. They make some good distinctions among reasons why programmers duplicate code. But their empirical findings are dominated by their own view of what's valuable vs. not. They admit as much:
Rating how a code clone affects the software system is undoubtedly the most controversial aspect of this study, and also the most subjective.
I have mixed feelings about this study. On the one hand, it's good to see people working diligently to study real codebases. At least someone is trying to look at data. On the other hand, how they're interpreting it is no different than what we all do when we argue this shit online or over beers or – more to the point – when hashing out a design decision. This isn't science, it's folklore with benefits. The problem is that it's being shoehorned into a scientific format it can't live up to.
Their title, by the way, is a straw man. What they're really arguing is that not all forms of code duplication are equally bad, that some are good choices under certain circumstances like platform and language constraints. That's reasonable (if bromidic) and even interesting, but it's just musing. It's not at all up to the authoritative status that AngryParsley gave it; it merely looks that way because it was published in a journal. The reality is that they have an opinion and looked at some code. At least they did look at some code.
Nobody is going to change their mind because of such work, nor should they. It isn't nearly strong enough to justify throwing out one's own hard-won opinions-based-on-experience-to-date. The net result is that everyone will look at it and see what they already believe. For example, I look at it as a Lisp programmer and the examples seem almost comedic. It's obvious that in a more powerful language, you could eliminate most if not all of that duplication, so what the paper really shows is that language constraints force programmers into tradeoffs where duplication is sometimes the lesser evil. Exactly what I already believed.
I find that too. Most of them seem so incommensurate with what they purport to be studying as to be nearly trivial. The researchers rarely seem to address (or even be aware of) the assumptions they're making, and their assumptions are usually significant enough to dominate the data.
That's not to say one can't extract value from such studies, but what the value is is so open to interpretation that everyone ends up relying on their pre-existing preferences to decide the issue, which defeats the purpose.
Edit: that study on code cloning that AngryParsley cited above is an example. They make some good distinctions among reasons why programmers duplicate code. But their empirical findings are dominated by their own view of what's valuable vs. not. They admit as much:
Rating how a code clone affects the software system is undoubtedly the most controversial aspect of this study, and also the most subjective.
I have mixed feelings about this study. On the one hand, it's good to see people working diligently to study real codebases. At least someone is trying to look at data. On the other hand, how they're interpreting it is no different than what we all do when we argue this shit online or over beers or – more to the point – when hashing out a design decision. This isn't science, it's folklore with benefits. The problem is that it's being shoehorned into a scientific format it can't live up to.
Their title, by the way, is a straw man. What they're really arguing is that not all forms of code duplication are equally bad, that some are good choices under certain circumstances like platform and language constraints. That's reasonable (if bromidic) and even interesting, but it's just musing. It's not at all up to the authoritative status that AngryParsley gave it; it merely looks that way because it was published in a journal. The reality is that they have an opinion and looked at some code. At least they did look at some code.
Nobody is going to change their mind because of such work, nor should they. It isn't nearly strong enough to justify throwing out one's own hard-won opinions-based-on-experience-to-date. The net result is that everyone will look at it and see what they already believe. For example, I look at it as a Lisp programmer and the examples seem almost comedic. It's obvious that in a more powerful language, you could eliminate most if not all of that duplication, so what the paper really shows is that language constraints force programmers into tradeoffs where duplication is sometimes the lesser evil. Exactly what I already believed.