Hacker News new | past | comments | ask | show | jobs | submit login
An External Replication on the Effects of Test-driven Development [pdf] (brunel.ac.uk)
406 points by joatmon-snoo on Oct 19, 2016 | hide | past | favorite | 323 comments



This study, like most software development studies I've seen, is seriously flawed. It doesn't justify the sensational title here on HN.

* The sample size was tiny. (20 students)

* The participants were selected by convenience. (They were students in the researcher's class.)

* The majority of participants had no professional experience. (Six students had prior professional experience. Only three had more than two years' experience.)

* The programming problems were trivial. (The Bowling Kata and an 'equivalent complexity' Mars Rover API problem.)

Maybe, maybe you could use this to draw conclusions about how TDD affects novices working on simple algorithmic problems. Given the tiny sample size and sampling by convenience, I'm not sure you can even draw that much of a conclusion.

But it won't tell you anything about whether or not TDD impacts development time or code quality in real-world development.


> The programming problems were trivial. (The Bowling Kata and an 'equivalent complexity' Mars Rover API problem.)

The top-most comment on that page emphasizes this point:

Here's my hypothesis, based on personal experience: the benefits of TDD begin to manifest when they are applied at scale. During design and development, if a single developer can plausibly understand an entire system in their head, the benefits of TDD (and, in fact, unit testing) are negligible. However, there's a non-linear benefit as systems become larger, particularly in the diagnosis of large and complex system failures.


This is absolutely key. TDD within small projects has little value. It's a technical debt people are happy to carry as the debt is negligible.

However, the moment it becomes more than just a one-off project this debt becomes a problem.


Although, isn't this more true for writing tests at all versus writing tests first or last in the coding process?

In terms of technical debt, does it matter when the tests are written?


Yes. Some people do not know how to write extensible software. They solve the issue at hand without any regard to big picture, so when something needs to be added (like test) the code needs to be heavily refactored to accommodate the change. As others have stated, this usually(!) isn't a big deal when the project is small with a single contributor. However, writing the tests last without any forethought to what the tests are or how they might be implemented will incur debt when the code needs to be refactored.

Conversely, extensible code can have the tests written before or after without much change to the amount of technical debt. The code is written to adapt to change, so tacking on robust tests wouldn't require too much technical debt.

Note that there is a balance. If you know your goals and have no reasonable expectation for them to change, then making everything extensible is unnecessary overhead. You can also have a mix of abstract functions that are extensible alongside immutable functions that handle something specific. It's all about what works best for your project.


I don't do 100% TDD, but do write code to be testable, and a lot of that mindset and experience came from trying TDD.

Forcing myself to think "how would this be tested?" helped, but doing some of the tests first got me in that frame of mind.

And by practicing that, you're writing code that - by definition - is reusable - you're using it in the tests, and in your production code. The 'reuse' thing doesn't have to mean 'reuse on 8 other projects' (a common rebuttal I've heard). Reusing the functionality in another part of your project 3 months from now is reuse too.


If true, wouldn't this be detectable in academic studies? It's easy to write untestable code for small problems, too.


Writing well designed functions that do one thing enables good testing.


From experience, the value of TDD (or any style of automated tests, really, you don't really have to be driven by them) only really kicks in when the code is of a certain size, complexity and age. They are much closer to a tax in the early phases when you can reasonably keep the full scope readily in your mind.


>or any style of automated tests, really, you don't really have to be driven by them

Looks like you misunderstood what TDD is. TDD is not a method of testing, it's the method of design of system architecture, in which you formalize the contracts and design APIs by writing examples of their use in tests. Because of that TDD is applicable on every scale (e.g. even when you design complicated algorithm that can fit in just couple of screens) and because of that TDD must be compared not to other test methods, but to other methods of design.

Of course, that comparison will not show the supremacy of TDD for many reasons, like, for example, insufficient expressiveness of testing framework or domain specifics. TDD is useful as one of the many tools we have today, but it shall never be the only tool.


This benefit of test-driven design is not obvious (at least it wasn't for me) until you really experience it. Several years back I wasn't aware of the concept of dependency injection, but independently discovered it while trying to unit test some error logging functionality. Previously I'd always used a global/singleton logger object. That always felt wrong, but I always assumed it was necessary until writing unit tests forced me to avoid that design.

Also note that TDD is not the only way of getting this benefit. IMHO Paul Graham's essay Programming Bottom Up[1] and Casey Muratori's concept of 'compression-oriented programming'[2] both espouse this same idea of improving API design by "writing examples of their use."

[1] http://www.paulgraham.com/progbot.html

[2] https://mollyrocket.com/casey/stream_0019.html


Well, I practiced TDD for no less than 10 years, and its benefits are still not obvious, more like something that needs to be continuously reviewed in the context of the current work. Once you've understood all the design patterns that TDD enforces, you don't need TDD to use them. You just think how you _would_ test your code and defer test implementation for future.


Why not just learn the patterns directly?


:) Who said they should not be learned directly? TDD is just a good demonstration of their use.


While you are right that TDD is about design, and that it creates examples of use and contracts in a sense. There are better methods for this purpose like Design by Contract which in my opinion is a superior method. Why?

Mostly because TDD process of red-green-refactor is an evolutionary process. Like taking mother nature to design.

It's un-intelligent design....hahahha


TDD means that you write tests first, see them fail, write the code, see the tests pass. Writing the code first and then the tests is not TDD. I didn't find much value in tests first, especially because sometimes I really dont don't have any idea of what tests to write and I have to sketch some code anyway. I tend to write some code, then its tests and use them to debug my code. I save the time spent to write some tests that will be irrelevant to the code I eventually end up writing. In very few cases, let's say twice per year, I have to solve very clear algorithmic problems with very well defined inputs and outputs. Example: process the elements of an array and return another array. Then I write the tests first.

Anyway, tests first or tests later both are good. No tests is a nightmare as a couple of projects I've just inherited are reminding me today.

Edit: typoes


I have a somewhat different experience. I've found TDD is useful on even tiny projects because using TDD forces you to write better code. It's really hard to reliably test things that use "magic" and side effects that affect parts of your code they shouldn't really be affecting, and really easy to test things that have well-defined and properly documented interfaces that only do one thing. Consequently writing tests, even if you never actually run them, makes your software better.


I would humbly suggest that because you and parent (and other people here) have such a different positive experience with TDD, that you cannot quite agree when it works, only that it sometimes works, then this is a big indication that TDD is just a placebo.

But maybe that's what's needed - a placebo to feel you better as a programmer, and thus making you more productive through your feelings.


> TDD is just a placebo

This is my experience. I've watched teams do TDD, only to come up against their first refactor and have to change 80% or more of the tests. Just the other day, they were griping about a code review that they had to do which involved changes to 2,000 lines of test code over a code change with minimal functionality change.

TDD has the effect of making you think differently about how you code. Thinking about how you write code is a good thing, but you don't need TDD to make you think differently about your code. You just need to think.


I can't speak for the other person, but from my reading, we don't actually have very different experiences. We seem to agree on the value of testing of large/complex project, but have a (minor) difference of opinion over the value of testing in very small projects. Indeed, I'm pretty sure we're deep into hair-splitting territory, as I do agree that TDD'ing small projects can lead to better code, I just don't think the pay off is as obviously big as it is for more complex ones.

It's like observing two fans debate which Star Wars movie is the best and the merits of Jar Jar Binks, and concluding that because they disagree, in fact all Star Wars movies must be pretty bad.


> and concluding that because they disagree, in fact all Star Wars movies must be pretty bad

No, I am not concluding that the TDD is bad, if anything, I am concluding that it doesn't actually matter. Anyway, I am not even clearly doing that, I just wanted to entertain the idea that some practices (such as TDD) can be placebo, so they subjectively feel as a good thing despite the fact that we cannot measure any effect.

Neither was my intent to touch the moral issue about placebos.. I think if it works for you, do it.

In fact, thinking about it some more, there can even be practices in SW development that have nocebo effect, that is, have no measurable impact, but make you feel worse. Daily scrum meeting comes to mind for me.


I have the same positive experience as both of the above commenters. I've found that there are many other variables which can change its effectiveness though:

* The type of code I'm working on

* How well I remember/understand the full requirements.

* How much of the testing framework is built already and how much I have to build.

* The type of test I'm writing.

* How new the project is.


Yes, TDD seems to involve having a decent idea of the requirements (very un-agile). Many projects are making it up as they go along and iterating to a reasonably correct outcome.


It means having a clear idea about the requirement you're implementing at that point in time (not un-agile). It doesn't mean that you have to create a big set of requirements up front (un-agile). You can just pick a story, convert it into a test and implement the code, rinse and repeat.

Writing the test also gives you a second chance to think deeply about the requirement and fix any problems with it before wasting time implementing the wrong thing.


To me it's sort of like people who use an index card when they read. They move it line-by-line down the page so their eyes have an easier time traversing the line of text.

If this is the crutch that makes you a better programmer, who am I to discredit the practice? But dogma has no place in software engineering.


Kind of like those developers who have to rely on syntax highlighting and linters, right? Those guys! :)

Sometimes using something to help you write better code is just a plain old good idea. You can write it off as unnecessary dogma, but if it actually makes your code better you're shooting yourself in the foot a bit.


> Consequently writing tests, even if you never actually run them, makes your software better.

It could be argued that an experienced programmer doesn't need TDD to write the same code that TDD could have produced.


That used to be my attitude too.

However the experienced programmer will be very happy when they don't have to re-test all of the old functionality manually after a large re-factor of some core code.


Experienced programmer may write tests after the main code is complete to make sure nothing is broken during refactoring. It has nothing to do with TDD.


That's true. Neither from system design, nor from quality perspective TDD is necessary to produce good code. However, it may still add some value even for experienced programmers, when it's easier to write a test than to build the pure mental model of the domain or the algorithm.


I've only been a developer for 20 years so I'm not at that point yet.


Tests provide certain guarantees where compiler proofs (e.g. compile-time errors) are absent. You don't need a large project to realize those guarantees.

Tests aren't just for initial correctness---their benefit is largely in maintenance. They allow you to make changes or refactor the system with guarantees that the covered portions still operate as they were originally designed. There is really no excuse for those types of breaks.

Then there's the team setting: I'm not the only one writing my code. Someone else has to get in there and make changes. I might have to come back months or years later and make changes. At work, we have five developers touching over 100 distinct projects at any point. We have observed that the lack of tests essentially guarantees breaks---we have QA, and there is a direct correlation between bugs and whether code is comprehensively tested.

With regards to TDD: we also observe time and time again that writing code after tests yields untested behavior. There are always odd cases that might not be immediately obvious. There might be a bug that was inadvertently introduced, and now that bug is tested as part of the implementation. Certain branches may not be fully tested. I've seen comments here about this being an experience thing---that more senior developers won't have this problem. That's essentially saying that one is infallible; it doesn't make sense.

TDD also prohibits rushing. Not writing tests is also an excuse to rush the implementation and produce a shitton of unnecessary code. Tests slow you down and force you to think.

Small projects grow. If you don't write tests upfront, when do you write them? We develop incrementally---all of our projects start small, and some of them will remain small for perhaps a year or more until we revisit it. Writing tests at that point loses a lot: we may have forgotten the details of the implementation by then, and the person who originally wrote it might not be involved at all in the changes.


Absolutely, this. TDD is hugely valuable especially when new people join the project who don't understand and the best way they contribute is by adding tests!


This is really wrong!

Tests are important, you should not have "the new guy" write them, because they should already have been written, by someone who understood the code being tested. Having some junior dealing with testing is a sure way of producing useless, slow and incorrect tests.


That sounds like TLD, the opposite of TDD.


I would say the cost of TDD only kicks in at scale... unit tests are a real cost that plague a project and make devs ultra conservative.


What do you mean ultra conservative? I can see that the opposite is true: when there are no tests, devs are conservative, ie. they are afraid to change anything. Having at least some tests, they have the possibility to change and refactor more safely.


> It doesn't justify the sensational title here on HN.

We've changed the title along with the url. Please see https://news.ycombinator.com/item?id=12742120. The submitted title was “TDD has little or no impact on development time or code quality” (including the quotes).


Sample size is 21 and they used Kruskal–Wallis (which is parameter free). Might not be the best tests (iirc. it's usually recommended for the exact combination of ordinal+nominal but I'd have to check) but it's suitable enough.

Also I don't think convenience sampling is bad in this case (and they clearly indicate they did it etc.) as it allows you to control for TDD experience and the like. I'd even go further and say the typical "lol students as participants" isn't an issue either.

"How effective is test-Driven Development. Making Software: What Really Works, and Why We Believe It" by Turhan et al. is a good meta analysis if you want more studies.

As an aside I like the fact that this is a replication study and that there have been a couple of these.

I mean worst case you can uses theses studies to guestimate better priors for your own Bayesian analysis :)


Criticism to the study goes both ways. It COULD be that with a larger sample size, more meaningful project and more experienced developers the result would be "TDD actively hampers productivity by 70% and leads to more bugs compared to test-last".

Being mindful of the conditions of the study is important. Shrugging it off isn't helpful in the least.


Whereas the same conditions you listed existed in the link I just found on Cleanroom method from 80's with clear evidence the method worked to produce low defect software. The weaknesses plus resulting quality just added further support.

http://infohost.nmt.edu/~al/cseet-paper.html

EDIT: This is now it's own submission below if anyone wants to discuss it there.

https://news.ycombinator.com/item?id=12741237


To be fair, the abtract says "The results failed to support the claims." and the conclusion says "We recommend future studies to survey the tasks used in experiments evaluating TDD, and assess them with respect to the treatments.".

This is not a claim that TDD is useless.


* The code-base was new, simple, and subsequently thrown away.

* There is no way to scientifically measure absolute code quality.

There are numerous subjective indicators (tabs vs. spaces) and the objective indicators only really become apparent once you are dealing with more than one component/service.


I agree with you on this. The study is irrelevant from the statistical point of view.

However, there are other considerations with TDD, probably on a more theoretical level that:

* There is no other major engineering branch that uses testing this way in production (would you bang a car against the guard rail to see if it works?)

* There is one thing common for sure with all of the bugs you catch in production, they all passed the unit tests

I think testing has it benefits but driving(!) your software development efforts is probably too much, at least for me.


Car manufacturers model car bodies mathematically so they have a good idea how a body will behave before they build it.

This hints at why the philosophy of TDD is not as helpful as it seems: it confuses modelling/analysis with implementation.

Your testing methodology can be perfectly implemented, but if your problem analysis is wrong your software will still be useless and broken.

TDD would be more interesting if it understood the distinction between specific behavioural expectations (easy to test for, but incomplete), generic applicability and model robustness (much harder to test for, but more complete), and complete formal correctness (often unreachable, but always an interesting goal.)

TDD is better than nothing. If all you can manage is a set of behavioural tests, that should still improve reliability.

But it's a big leap from there to the suggestion that if you define your spec as a series of behavioural tests and your code passes them all, you have a full and correct solution for the initial requirement.

That is just plain wrong, and in untrained hands it can be dangerous.


To your first point, the usual response is that there's no other major engineering branch which demands quite such extensive changes be possible after you've broken ground. To extend your analogy a little, nobody takes a car production line and expects the people running it to be able to switch to helicopters without starting over.


I agree, and it does not happen to my projects either. We agree in advance on the set of features for a certain version and we deliver that. If you doing much car building when you actually need a helicopter than there is something wrong.


If you could bang a car against a guard rail without incurring the cost of a car, you would.


I mean, you do incur a cost for programming with TDD though. The benefits may repay the cost in the end, but the cost is still there. And for that matter cars do undergo destructive testing, just not TDD--and the problem there is presumably that the car benefits from taking a holistic component-driven approach, "here is what I want the car overall to look like, and what components I want to fit into the space inside, now how can I reduce the rollover risk?" rather than "here are some basic expectations of a car, I expect it to be able to move forward and hold a passenger, let's make the smallest thing which causes that to happen. OK now let's add a single passenger, and let's test that bugs aren't flying into your face at highway speeds so that we remember to add a windshield eventually. (1000 steps later) Crap now we're testing the ability to shift into reverse and none of my homespun bare-minimum engine design is going to work there."

TDD probably pays its best rewards with relatively simple well-defined projects, and needs to be deployed with a modularization strategy. Possibly the wishful-thinking strategy of top-down design would be a natural complement, since when you wishfully think of a function you can hopefully quickly write a test or two for it.


Your example posits an evil developer with a BOFH mindset. S/he is trying very hard to get away with doing as little work as possible while meeting absolute minimum requirements. I know that you are parodying the Agile mindset but any philosophical framework within which you make software has edges. We try to not make them count by having good hiring practices. I'll be the first to grant you that hiring is an unsolved problem in software development.


One the one hand, your points are valid. But on the other hand, it is frustrating to see that TDD proponents resort solely to rhetoric to sway people.


Nope, there are also studies (done by Microsoft and IBM) indicating that TDD helps: https://www.infoq.com/news/2009/03/TDD-Improves-Quality


It isn't terribly clear from the abstract and I'm not going to pay for the paper, but it sounds like this is really a comparison of tests vs no tests. That has nothing to say about the practice of TDD.

This is why this newer study is interesting; it compares to TLD, which allows you to say something about TDD rather than just about testing.


Careful: sometimes even widely cited studies like Nagappan don't actually show what they are misreported as showing either. For example, if you look at the development processes being compared in the cases examined, often they aren't quite TDD as widely described and advocated.


Since when is pointing out the serious flaws of a study "rhetoric"?

I'm ambivalent on TDD, but the arguments defending this study here have been ridiculous and anti-intellectual.


Yes it's flawed, but so is development methodologies in general.

The truth is that it depends on the team writing the code not how you write it. While consistency and a method may help to organize the group it doesn't really say anything about any quality what so ever.


Most importantly, their solutions didn't need to actually work in the real world. If that wasn't the case, one would find that there are often massive rewrites (not "refactors") when software is in embryonic form and/or users are giving it a try. A major advantage and disadvantage with TDD approach is that you pay the price upfront to avoid more charges letter. But if your features, business model and code is still haven't taken solid form then your upfront payment gets wasted.


Thank you for looking into the details.


The study presents imperfect data. You have presented no data. Which is more credible? Why do comments like this get upvoted so much?


Perhaps oddly, it turns out that presenting no data is better than presenting data with poor foundation. Comments like this get upvoted because many of us appreciate someone pointing out when there's no "there" there before we've spent our own time reading.


>Why do comments like this get upvoted so much?

They're dressed up versions of "correlation != causation". Low thought comments appealing to people with not-even stats 101 knowledge.

There is no such thing as a perfect study, and a generic version of the OPs comment could be copy and pasted on any study ever. Unless you study the entire population of the planet, you're going to miss subgroups. Unless you study the entire population of the planet, you're going to need some sort of selection criteria. Unless you have infinite funding and time, you're going to need to make trade offs and sacrifices in your experiment design.

A study will disclose these shortcomings for readers to balance the significance of results against.

Take this complaint from OP:

>* The sample size was tiny. (20 students)

What sample size would satisfy him? Why is 20 too small? 40? 80? 1037? Is he basing his opinion of a proper sample size on his gut? 20 just doesn't feel right?


Have you heard the phrase "garbage in, garbage out?"


Referred to the study or to the commenters?


uhm. lol? are you saying a study can't be criticised by anything less than a counter study?


At the same time we should still praise them for trying to replicate a study, which doesn't happen nearly often enough in the computer science field (aside from algorithms and data structures studies).


But you have same variation in professional environment as well. It well could be that TDD is not that much of a gain if it gets lost in the noise of confounding factors that easily.


Students are not even a good sample for the efficacy of TDD.


Processes like TDD are most often applied/enforced with inexperienced developers in mind.


I've never seen Kent Beck make that claim, have you?


Has Kent Beck been pushing for TDD at Facebook, or are there too many experienced devs?

Experienced devs do what works best for them, which may be TDD but probably in most cases not.


It may be unfair to say this in response to the parent comment, but the great majority of HN discussions start with a comment like this one: It's seriously flawed, etc. Occasionally it's true, but the noise drowns out the signal.

In a graduate-level engineering class, the students were making similar statements about all the studies we read. One day the professor said: It's easy to find flaws in someone else's work; humans are flawed. The real challenge and benefit is to find the value in their work - find what has lasting value, learn from it, and carry it forward.


I completely disagree. The reason we care about these studies is to make decisions. If a study is flawed... maybe it's flawed. Maybe it shouldn't be relied upon any more than some random opinion blog post. (Maybe even less!) But just being "a study" lends it a lot of weight compared to that same random blog post, so it absolutely should be held to strong scrutiny.

It might be a "real challenge" to find value in every one of those blog posts too, but it's by no means useful or valuable. It's a waste of time if not outright counterproductive.

If all the studies are flawed, maybe the field is flawed. Maybe the subject is just not susceptible to (cheap) experimental studies. It's hard to trust peer review when there might be problems with the whole field. We may very well be better off relying on experience and opinions because realistic experiments are so far off the mark.

It can be even worse: generalizing the results of a bad experiment might even be dangerously wrong. Psychology results from experiments on young, Western college students are a great example—we don't want to make laws or base diagnoses purely on experiments like that because that could actively harm groups that are fundamentally unlike young, Western college students.

And all that is pretty much exactly where I see experimental software engineering: the results just don't generalize. And sometimes, I suspect, results generalize in ways that are counterproductive to experienced programmers working on large projects—exactly the people I actually care about. And yet empirical studies (even bad ones) still inherently carry a lot of unearned cachet. The real challenge at the end is overcoming this cachet, not finding value where there just might not be all that much.


It's a total waste of scientific resources to insist every study requires an army of test subjects, and the top statisticians of the times to apply the latest modeling, to work blind and replicated independently before publication.

There are limited resources to do research. You can get insights into how the world works much cheaper if you use critical thinking and facts outside the controls. Casually dismissing every study as irrelevant due to lack of rigor is wasteful.


If studies don't need to be rigorous, what's the point of doing them? Why not just write a persuasive essay instead? Is the format of a "study" just a rhetorical device, a glammed-up appeal to authority?

The whole point of careful statistics and well-designed experiments is so that we can learn whether a premise is true or not. Without rigor, we prove nothing; this study for example, neither proves NOR disproves anything about TDD in a professional setting. For us professionals, it's noise. Yet it's sitting at the #1 spot on HN, fulfilling people's preconceptions.

Also, the replication crisis [1] would disagree with you.

[1] https://en.wikipedia.org/wiki/Replication_crisis


> The whole point of careful statistics and well-designed experiments is so that we can learn whether a premise is true or not.

That's where you're wrong... even with "careful statistics", the point of sample-based studies is to disprove the null hypothesis with some confidence level. There is no requirement that it be 100% confident. In fact, with p<0.05, you might expect some small fraction of repeat experiments to show no significant effect.

(See, it's easy to get this wrong!)


Although I believe that you're technically correct, your objection does not refute jdlshore's central point.


Sure it does. In the social sciences (which includes questions about productivity a la TDD), there is no "proof" or "truth" in the mathematical sense. Results are based on statistical relevance subject to the sampling (and their biases).

This study is meaningful in that it provides some limited evidence. It's fine to question biases and confounding factors... but that doesn't change the relevance of their results, merely the scope. In this case, what the researchers actually found: "At <X> confidence interval, TDD doesn't work for white, male graduate students at <Y> University working on <Z> problem. Generalize at your own peril." But that's a shitty headline.


> Sure it does. In the social sciences (which includes questions about productivity a la TDD), there is no "proof" or "truth" in the mathematical sense.

Neither jdlshore nor I were talking about "proof" or "truth" in the mathematical sense.

> Results are based on statistical relevance subject to the sampling (and their biases).

The point was that this study doesn't have sufficient statistical relevance to give any evidence whether TDD is effective in general. It doesn't matter if this study gives any evidence whether TDD is effective when used by graduate students working on toy problems, because that was not the intention of the study (besides, nobody cares about this highly specific case).


> [...] whether TDD is effective in general.

If you read the actual article (even just the abstract), you would realize that evaluating TDD in a general professional settings was neither the goal nor the conclusion of the authors. You are arguing a straw man and hoping to make inferences that are unsupported by the paper's claims. The authors do a good job of scientific communication about their methods, results, and limitations. This is good practice for scientific communications. Don't think it's sufficiently generalizable? Fine, feel free to expand upon their work. That's how science functions.

("truth", "prove", and "rigor" were verbatim, primary components of jdlshore's comment. The two former words have very specific scientific meaning.)


While I can see your point of view, perhaps that is an ideal and not a realistic view of how science actually works. Similarly, someone might have an ideal view of software development, but if they saw how it actually worked and read actual in-production code ... and I think that describes every profession.

Science does work pretty well; it predicts things with accuracy and reliability otherwise unknown by humanity (AFAIK). It's an interesting question: If not by hewing close to this ideal, how does it actually achieve results?


You're conflating the "hard" sciences (which rely on reproducible experiments to explore natural laws) with the "soft" sciences (which rely on studies to explore human behavior). The track record of the latter is significantly worse than that of the former. Software engineering, and by extension this study, falls firmly into the soft sciences territory.

You should be a lot more sceptical of supposed facts in soft sciences than in hard sciences. Not because of prejudice or arrogance, but because it's much harder to be reasonably certain about anything in a soft science than in a hard science.


I know the differences you are referring to talking about. I mean all sciences, but really I mean that we need to hear from actual practitioners about how they really work.


Studies with small 'n' are feeder studies, exploring large amounts of problem space quickly. When patterns emerge, they can be retested or expanded upon.

Equating a somewhat flawed study to 'a persuasive essay' is wholly disingenuous. As is claiming n=20 to be 'tiny'. It's small, sure, but not ridiculously small.

This kind of study isn't a final nail in the coffin, but a data point to add to the discussion.


The use of students for this purpose reminds me of a story about a man who designs a flying machine.

After months of calculations and material research, he determines that only if he builds his machine out of birch will it be light enough to fly. According to his calculations, nothing else will work.

Days later, his best friend is visiting him in the hospital. He carries over a piece of the broken machine. With a puzzled look on his face, the friend asks, "Why did you build it out of pine when you knew you needed birch?"

"Because I didn't have any birch."


How short the internet's memory is.

Le mieux est l'ennemi du bien. It was right here not a couple of days ago.

Sheesh why do I bother?


the point is that the study appears flawed, so much that it is not even good. let alone perfect.


I'm not sure it's pure noise for professionals. These students eventually turn into professionals so there's probably some relationship (at least that would be my hypothesis).

Regarding the replication crisis...this is actually a replication and there's actually more replication for TDD than for most topics I read about.


Again, as a convenience sample with a tiny sample size, there's no conclusion we can even draw about students, let alone professionals.


There are really two types of studies: exploration and validation. Exploratory studies on small samples are still useful for refining hypotheses and figuring out whether they deserve more resources, but shouldn't be seen as significant evidence towards a conclusion–they are generally _less_ valid than opinions from people experienced in the field.

In contrast, validating studies _should_ have large enough sample sizes to be statistically significant, precommit to sharing results in order to avoid publication bias, have well-validated experimental designs, etc.

The problem is that these two are often conflated, so most people either blindly trust studies or put no faith in them whatsoever.


Well, if we are discussing the allocation of resources, I for one would much prefer fewer, more rigorous studies than a large volume of studies with mediocre methodology and sample sizes. Such studies carry little more weight than one's own intuitions.

I do feel that small studies like this one serve a very important purpose: to let scientists hone their experiments. Most any large study should first be attempted as a small study to work out any kinks.


Nobody needs an army of test subjects - just statistically valid numbers. But even with 20 test subjects the study fails at trying to make that 20 representative of any meaningful population.


Out of curiousity, how many subjects would you say are necessary to start being statistically valid?


Kind of the wrong question. It's more about qualitative sample selection and the expected effect size than it is the number of subjects.

Based on my admittedly poor understanding, I would expect to see a sample size of around 70 to have the power to detect a small effect in a single-tailed T test. I would also expect to have the samples selected and balanced carefully, preferably from the target group (i.e. experienced professional programmers in the language at hand)

From what I can tell, for an effect to be detected at least 80% of the time in a sample of 20 people, you'd need the effect to be >60%. I would definitely not expect that a short term project would show 60% improvement when employing similar levels of testing, changing only either before or after writing the code under test.


Especially not because the major benefit of TDD hits much later in the cycle, during refactoring a large chunk of code or when doing some major surgery on the whole project (and let's hope the tests are at the interface level).


The conversation around TDD is tired, and the conclusion is always the same: "it depends." It depends on the person writing the code, the type of problem they're tackling, the language they're using, the needs of the business, etc.

This study doesn't bring anything new to the table except: "in this manufactured environment we found a single point of data that equates to noise."

I'm guessing the only reason the story was upvoted at all in the first place is because some people who agree with the title clicked the up arrow without looking at the article.


the conclusion is always the same: "it depends."

Er, no. The studies I've read all end up showing that principled testing helps, but test-first and TDD (strict red/green cycle, code only enough to pass the new test, etc) provide no additional benefit over anything else that gets the tests written.

The "it depends" always comes from the echo chamber trying to justify their desire to believe that TDD isn't completely useless. It actually feels quite similar to the claims I've seen from practitioners that reikei, faith healing, etc aren't complete bunk.


> provide no additional benefit over anything else that gets the tests written.

Isn't this enough though? The tests gets written, which probably was the main point in the first place?


You can also add a coverage tool to your CI and get the same result (tests get written) without any of the ideology (TDD fairies sprinkle unicorn dust everywhere).


That assumes you are disciplined enough to act on the result.

Which is what most of these mechanism are about: Finding what causes sufficient friction to get the tests written.

(I'm making this comment because I know of a team with a coverage tool tied into CI where the coverage has been ignored for years)


Coverage tools are not sufficient for getting good test coverage. One can easily make code 'covered' without having proper tests for them.

By writing tests early, you make sure the code you are testing is testable and your knowledge about the code is fresh.


> By writing tests early, you make sure the code you are testing is testable and your knowledge about the code is fresh.

IME you also end up writing tests which are far too tied to the implementation. (With the resulting churn that that implies when the implementation changes.)

You get far more mileage from QuickCheck-type tests IME. Granted, not everything is very amenable to testing using QC-type tests, but a lot of stuff is.


I've heard that tests written specifically to drive up coverage metrics aren't much better than no tests.


I really don't believe any study or meta-study could come close to being able to suss out the nuance of when TDD may provide an advantage and when it doesn't.

I'd rather just trust programmers to consider what approach works best for their problem and mindset and go from there.

I personally don't TDD most things, but it's a tool I have available and I bring it out when a situation arises.


Well, with TDD you waste time writing red tests first, then with tests that "just pass" and finally make the thing as it is supposed to be

I find it surprising that's not slower than Test-Last (maybe if you really leave it for last then you'll need some time to fit your functions to your test)


I think we could, but it'd be very expensive (multiple large and careful studies) and only possibly worth it.

> I'd rather just trust programmers to consider what approach works best for their problem and mindset and go from there.

People are often surprisingly good at fooling themselves. I'd rather have actual empirical validation. And also a pony.


You know, this study and others don't take about long term maintainability or brekability.

In fact, isn't this a good thing? If it has little impact on development time, we should totally be doing it. That way, we have tests. We have them from the start. We met the conditions we wanted instead of guessing conditions at the end.


> the great majority of HN discussions start with a comment like this one: It's seriously flawed, etc.

That's easily the best part of HN and reddit. A great help with the Murray Gell-Mann Amnesia effect.

I don't think it's that HN is negative (though HN can certainly be negative), I think most articles really are just kinda crummy. Could be oversimplified, probably overhyped, maybe some wildly skewed sense of "providing both sides", or one of a million other possible problems. The comments will tell you about these problems, and that makes them a better barometer of "is this worth it?" than the title or even the content.


If it is unfair to say in here then you should have waited for a better time to write this comment instead of just venting yourself on the first comment you see like it.

This study has the potential to be misused by people to get their senseless argument across and could cause a lot of headache for people in the industry. It's hard enough as it is for a consultant to get people to pay for testable software.

That line about your professor is cute, but the issue is the media gobbles up studies and reports on them no matter what the quality leaving large chunks of the population misinformed. So publishing a study that is inaccurate is frankly irresponsible.


Scientific methods were invented to minimise the flaws in scientific work. If you ignore these methods and base your study on biased data you better forget calling your results scientific. Fortunately there are folks on HN reminding you on this.


We live in a world with too much information. There are probably more words written everyday than we can read in a lifetime.

The sooner you can remove worthless information the better.


> In a graduate-level engineering class, the students were making similar statements about all the studies we read

Yeah, that's because they are all terrible.


Well, he did find value in the work, just not as much as the hyped title would have you believe and I'm perfectly ok with that.

The fact that it matches my experience (that TDD has benefits but only for projects of a certain size with programmers of a certain minimum experience level) helps with that, but it is always good to be reminded of the importance of sample size and other priors related to a study that makes a very bold claim.


And words of that professor don't fix anything, really. I absolutely, 101% agree with comment you are replying.


Exactly this; I find too many HN comments to be critical in a non-constructive manner.


The criticism was very constructive though. Increase the sample size, put it in a more realistic setting.


How do you propose to gather more experienced, professional developers into the same location and get them to work on a topic that isn't making them tons of money? They can't be left to do the problems in their own workplace, or the next criticism will be "uncontrolled variables!". They also have to be vetted for minimum skills (there are plenty of experienced, professional devs out there who aren't worth a second look). The parent also wants more complex tasks done.

So... where is the money coming from? Who is going to pay for this multitude of professional programmers to converge to the same environment, be vetted, and spend a non-trivial amount of time coding the same thing as the others in the group?

Of course the researchers in the article would have loved to have those kind of resources and do the perfect, wide-ranging, deeply detailed study, but the OP's criticisms just show how divorced the OP is from experimenting with real-world humans in real-world situations, and with real-world resources.


This may sound harsh, but taking the researcher's difficulties into account is not our responsibility.

The research presented here is weak. Honestly pointing that out without pulling punches is better than simply giving them a pass because 'doing good research is hard'.


Where did I say 'simply give them a pass'? This idea that research is either a polarised "ideal" or "trash" is moronic. Taking the nature of any study into account is part of science, and part of how you caveat the knowledge gained from that study.


I completely agree with you. Pointing out a study's flaws is indeed part of the process of 'caveat[ing] the knowledge gained from that study'.


You could start by finding a corporate sponsor with a lot of developers and a vested interest in finding out which methods will be best for them. The advantage of that being that you could even test it on a real project (the sponsor would "just" need to be willing to dedicate twice the number of developers to a suitably small project).

It'd still not be easy, and of course there'd still be issues (e.g. is there anything about the corporate culture or training in that company that would affect the result?), but it'd still be far better than a bunch of students and toy problems.


Corporate places do research like that all the time; they just don't publish them all that often. A corporation with a vested interest is a corporation with a competitive interest.


Maybe we have to accept that it's untestable.

Though I think we could derive some more realistic scenarios, like evolving requirements and switching developers mid project that would be more enlightening.


Whereas the cost of finding the Higgs Boson was only a mere $13.25Bn, cheap compared to all those pesky extortionate SV engineers...

As others have said, good research costs money...


We can't arrange a good study, so let's churn out deeply flawed ones instead.


How do you propose to gather more experienced, professional developers into the same location and get them to work on a topic that isn't making them tons of money?

Hackatons seem to manage. Why not set one up?


Hackathons wouldn't satisfy the OP's requirements for complexity, nor demographics - you'd be looking at a self-selecting group of highly motivated people, skewing young, who would come together hackathon style. There aren't going to be many thirty- or forty-something coders with young families spending a weekend (to work on the same set problems as everyone else) at the hackathon, yet there are plenty of those in industry.


Indeed. It is predictable and--as a discussion point--stifling.

It's far more interesting to discuss the actual merits than to dismiss the subject out of hand.


Promotion of schemes akin to TDD tend to build their foundations on even smaller sample sizes, usually of those close in some fashion to the new wheel's reinventor; the naïve solving text-book-example problems.

But maybe I missed something.


You didn't.

The top comment is a book writer on TDD who is shocked and downright appalled

TDD provides absolutely no value over TLD.


The study is huge huge HUGELY flawed. TDD only comes into value when you start having to refactor extremely large projects that you don't understand. You need the tests in order to refactor with confidence.


You can have tests without TDD ?

TDD is a process where you write empty shims for your code and tests for it first, failing because there is no implementation, and then you write your code that passes the tests.

Frankly I find this style to be completely opposite of how I code - getting something working ASAP, plugging it in to the big picture and then figuring out the problems with my approach and designing with the insight gained. Then I spec out the behavior with tests. TDD assumes you have the design/spec right from the start and all you need to do is write the implementation - very little of my work falls in to that category - perhaps it's different for others.

One segment where I found TDD useful is writing story level E2E tests that spec out the requirements before the code is written - this is the least ambiguous way to spec the problem I've found - the downside being that the person doing the spec needs to know how to write E2E tests.


> TDD is a process where you write empty shims for your code and tests for it first, failing because there is no implementation, and then you write your code that passes the tests.

This is not how TDD works, and I can definitely see how writing code in that manner wouldn't be very productive at all. I also used to think that was how TDD was done until I read how it's really supposed to work by experts.

You iteratively build up code in a small tight cyclic manner. You do not write 100% of your test in 1 go, and you do not write 100% of the code in one go. You are also missing a step. TDD is a 3 step process, red-green-refactor, and the last missing step is refactoring. You start off with a small atomic behaviour you need in your API/product. You then make a test for that behaviour. The behaviour is not related to a particular class or method. Perhaps it is 1 class by coincidence , but maybe its 3 or 4 classes together. A good example of this would be if I had a method that dealt with multiple items. I would start off by constructing my test to use 1 item, and and the code to deal with 1 item. Then the next iteration maybe a list of 1 item in the test and code. After that, maybe I'd move to multiple items in a list.


>You can have tests without TDD ?

I've worked on several projects without this and the tests done without TDD tend to be of higher quality.

I noticed a common anti-pattern of "write the code, run the code, copy the output of the code, paste it into a test and write an assert to check that the output was precisely what came out".

This was brittle, it killed the self-documenting aspect of the tests and it often concealed various bugs.

As soon as the same team members started writing the test first, it stopped happening.

I also personally felt better about doing this since it made it easier to correct API design mistakes before implementing the code and baking them in.


>I also personally felt better about doing this since it made it easier to correct API design mistakes before implementing the code and baking them in.

How do you know you made API mistakes from writing unit tests ? API mistakes become apparent when you integrate stuff and use the API in conjunction with other things (using API in isolated scenarios like unit tests is not really insightful you can figure out those flaws just by looking at the API). This is my primary criticism of TDD - you write tests around the initial bad API and you end up keeping that bad design because you already spent so much time testing that. Implementation errors caught by unit tests are cheaper to fix afterwards then design errors.

My approach is hack together a POC -> refactor that and write tests. If you discard the POC then TDD makes sense to me.


I usually don't do unit test driven development actually. I typically do integration test driven development where the integration test sets up an environment where the code is either interacting with the real thing or a realistic mock version.

I don't find unit tests all that useful for integration code - either as tests that find bugs or for doing TDD.


> I've worked on several projects without this and the tests done without TDD tend to be of higher quality.

Do you mean "with TDD", by any chance?

The rest of your post says that you observed better results when tests were written before the code (rather than later). But as far as I understand, writing the tests before the code is indeed one of TDD's commandments.


Yes. Dumb mistake.


Could it be that TDD vs. tests-after-code is a highly personal thing? I personally find it easier to write good tests after I've coded something functional. Before hand, I know one or two fuzzy ideas of what I want to accomplish, but I can't list out the concrete, real-world test scenarios until after I've coded something, poked and prodded it, etc.

But I know some people are wired differently; they'll think a lot more about scenarios first, then code after they have everything accounted for. For them, TDD as a philosophy seems more fitting.

I think the chasm exists between _untested_ code and code that has tests. I've never understood the seemingly-religious zealotry behind TDD as an XP practice. Just like pair programming... if it works for you and your coding style, awesome. But don't force it down my throat or act like it's the One True Path to clean code.


TDD vs. test-after-code is a small distinction.

80% of software development is designing correct abstractions/interfaces/APIs. If you have the correct abstractions, everything else is easy by comparison. And both tests and code are fundamentally founded on these early design decisions.

So whether I do TDD or tests-after-code, I'm confronted with the 80% first: designing the interfaces (either in writing or mentally).

Naturally, I never get this part right at first, and so I wind up refactoring a lot as I go along. ("Build one to throw away", I believe this has been called.)

Now in my experience, TDD requires me to refactor more during this process than tests-after-code. But suit yourself. Changing the order within the last 20% won't be earth-shattering one way or the other.


This is why designing the interface in a language with a powerful type system provides the same quoted benefit of TDD. It helps you think about the interface before you move to implementation.

In my experience, using a type system to do this requires much less effort and refactoring.


> This is why designing the interface in a language with a powerful type system provides the same quoted benefit of TDD.

Yes. TDD came from developers who worked in weak/dynamic typed languages.

It is mandatory to discover errors, that are otherwise discovered instantly for free by compilers in the stronger typed languages.

Strangely, the TDD folks never outline this point.


I am a big fan of "testing via types".

Unit tests can demonstrate that my code is correct; it can say nothing about how my code is used by others.

In contrast, a type system can extend those protections downstream.

(Yes, yes, unless you go way off the deep end of dependent types, you still need tests. But a powerful type system can systemically prevent a huge number of very common bugs.)


My take it on it is that it can remove most of the boring bugs you would get in a less strongly-typed language and leaves you open to the interesting bugs (eg, logic bugs).

As for dependent types, I'm not sure that even then you would be able to ensure that, say, your cache is correctly invalidated.


Additionally, strict TDD teaches you lessons. Once you've used it enough, chances are that you'll be making better interfaces even if you code first.


I've found implementing test as I go along writing a new API help code quality and speed in two ways: I catch my own logic bugs early and the self-feedback I get from writing the tests helps me to write a more usable interface.

I never catch all the design errors with non-coding design work - this gives me the high level view which is critical - but there are always some logical errors and bugs I've failed to model correctly which early tests catch.

But yeah, the optimal design-implememt-test probably depends probably very much on the implememter and on the problem to be solved.

The third meta-advantage is that if I'm writing hobby code for myself and the profject is large compared to the weekly time allocations available in my sparetime I get something to help me remember where I was (say a month back) and gives me something to compile and run immediately (which is a huge motivational booster for me).


> 80% of software development is designing correct abstractions/interfaces/APIs

Not sure if that claim is wrong or not, but I will state it anyway. If you write in TDD manner and you decide to change the API/abstraction a little bit because of something you didn't think of, you will have to correct both tests and the code. When you do test-after-code it's much easier (IMO) to hit correct abstractions and interfaces because you are working on a living thing, not some artificial test cases you have to figure out. Then you write tests to ensure those abstractions are respected and in working order.

TDD people, please correct me if I'm wrong.

EDIT: disclaimer: I'm a front-end developer working mostly in React and Redux


> if it works for you and your coding style, awesome. But don't force it down my throat or act like it's the One True Path to clean code.

This applies to so many things in Software Development - text-editors, variable names, plugins, architectures, operating systems...


...life


Well, things start to get complicated and messy when the team size starts to grow and the technical debt starts to mount. Thats when it needs to goes from hacking to engineering.


nope - vim


Ed.

Edit: ED IS THE STANDARD TEXT EDITOR! [0]

[0] https://www.gnu.org/fun/jokes/ed-msg.html


cat.


butterflies


Ahh... good old M-x butterfly


For me it's a highly situation-dependent thing.

Sometimes writing tests first helps me think about the high-level design of my code. Other times I've got to try a few things before I have any idea what the code should look like, and writing tests first would just create a lot of extra code churn. I feel that, as time goes on, I'm getting better at anticipating which situation I'm in. I also switch back and forth between the two testing methods even more rapidly.


> Sometimes writing tests first helps me think about the high-level design of my code.

I'm personally a fan of "readme driven development" (https://news.ycombinator.com/item?id=1627246). It's a nice compromise between tons of upfront planning and nothing at all.


That's an interesting discussion and article. I sort of do this already for new features. You really feel the power of it, when after writing a lot of stuff you realise that there's a hitch, rewrite from earlier and shudder at the thought of the mess and wasted time if code had been written. In its final state the document becomes a series of steps that I can check off, which is helpful in a motivational sense.

I had thought of involving users in the design of new features via the support forum but I'm afraid of how that might turn out. A readme type document would probably be about as good a way of doing it as any though.


Yes.

When doing UI code (with something like React) it's next to impossible to do TDD - you can't know the structure until you actually implement it.

If I'm fixing a bug in some sort of non-ui code, I often find TDD to be very helpful. Create a test case that demonstrates that failure, then fix the implementation to make the test pass.


I'm not sure that testing a bug, even before writing the code that fixes it, qualifies as TDD.


I find the same thing. It's almost like writing an essay, I need to sketch out an outline first, which is often getting just far enough to convince myself that the approach is reasonable and there aren't any surprises ahead. Once it's time to crank out the details of individual sections, the work often starts to feel monotonous and that's where TDD can be invaluable for helping me to stay focused and comprehensively work through all the edge cases.


When I first learned about TDD I thought the same (order of test and implementation does not matter that much). Yet when you do the TDD steps

* write a failing test (RED) * implement minimal functionality so all tests pass (GREEN) * REFACTOR

and overcome the initial revulsion you will get a ton of benefits that you didn't know existed. I'll first give a few that are more on the code quality side

* You have a complete test suite, without redundant tests * Tests are usually not that mock-focused [] as the implementation-after-test didn't introduce random roadblockers for testing. tests do assert the relevant properties (tests-later code bases often have code bases that don't assert much or properties you don't care about). * When you write your test first, you basically write a small example how it feels like to use the API you have in mind for solving your problem. This drastically improves quality.

Some non-technical benefits of TDD

* I find TDD helps me tackle bigger code-pieces where I just don't know how to start implementing. * TDD is programming gamification. You are rewarded with regular, small successes of having added another test and made it green. So whenever you ask yourself: OMG, have I made any progress in the last 3 days, you can actually tell: Yes, I added 15 tests and made them pass - I objectively got closer to implementing that feature. * Not an issue where I work, but have often heard about this from other places: If tests are there first, no one can tell you to not spend so much time on writing tests. TDD-style is protecting your boss from shipping that car without having the brakes checked. * The same line: TDD gives explicit room for refactoring.

[*] obviously one cannot avoid mocking especially when dealing with resources external to your code base, yet I have hardly seen TDD-originated tests that mocked


> I find TDD helps me tackle bigger code-pieces where I just don't know how to start implementing.

This made me think. There are many axes of software development, and aside from TDD/non-TDD axis, there's also the bottom-up vs. top-down axis. By top-down, I mean you start with a skeleton that provides an interface to the outside world, and the fill in the implementation details. Bottom-up, on the other hand, means you first write some routine that you know will be vital in some form or another, but without immediately seeing how it will be hooked into the interface to the outside world.

Top-down has the advantage that you can have something runnable quickly, and it seems to me that TDD requires top-down development.

But somehow, there have been times when I was working on a piece of code and it just flowed more naturally in a bottom-up manner. It has happened that I went for a day (and very rarely longer) working on tricky algorithmic code before I even tried to compile it for the first time.

This is definitely something that I try to avoid when I can, it's just that sometimes it works out that way. And it is totally incompatible with the TDD way of doing things.

How often this happens surely depends on the type of project you're working on. As usual with these things, it seems like TDD can be a useful inspiration, but it shouldn't be taken as dogma.


This reminds me of https://twitter.com/marick/status/787402452848873472

I dont think writing Tests first makes you work top down or bottom up.

If you want to work bottom-up (I like to work bottom up) you just start by writing tests for your bottom up functionality.

And of course I don't want to impose TDD on others, but I would like others to try it out sincerely (recommending kent Becks book for this. Don't do it without guidance, many people who reject TDD after trying it out were doing some kind of Test-first method, but didn't get the benefit because they were not aware of stuff like get one test at a time to complete, etc.)

---------

> shouldn't be taken as dogma.

Life in software engineering becomes much more relaxed when you realize that every piece of advice and technology is in fact based on dogmas and is not based on sound empirical, statistically relevant evidence :-) (apart from the pure math part).

It is just really really hard to study software quality in an objective way that satisfies scientific criteria. So we are kind of stuck with hearsay in IT

So TDD is kind of dogmatic. So is OOP, or the rejection of gotos.


The "lightbulb" moment for me was doing what Kent Beck suggests, and writing down the list of tests I "knew" I needed to write, before writing any tests. That's a very similar step to diving into writing the code first, "knowing" what it ought to look like, and the ideas feeding into either step would be the same.

Writing a list of tests first is quicker, though, and working through the list, updating it as I learn more, tells me that my initial ideas were wrong often enough that I'm almost convinced that the true secret of TDD isn't the tests at all. It's that checklist, and the purely mechanical process of working through it, checking them off one-by-one and fixing it up as I go along.

To relate this to your situation, I often find that I'll only get started with a fairly fuzzy idea, and the checklist will be short. I can still expand it as I go, knowing that I'm working on as solid foundations as I can given the constraints.

I think there's a danger in writing off practices like TDD with "if it works for you..." because it avoids discussion of why it works for some people and not others. That in turn gives people license to ignore it simply because it's unfamiliar, and the brain doesn't like unfamiliar things by default. That's a shame, because for some of those people they'll be stuck in local optima, writing off the one thing which could break them out of it.


> But don't force it down my throat or act like it's the One True Path to clean code.

When I was a programming beginner and started learning about TDD, I was always thinking how soon to start using it because everyone says it's such a good idea. If you would search anything about it, you'd rarely find a resource not marvelling at its surprising benefits (at least, that is how it was few years back). Granted, it may have its merits but is it really a good idea for everything? No way!

One of the anti-patterns I have seen in programming world is how everyone jumps on adopting / advocating a 'good' practice without understanding how it fits the project. I wrote about this a while back[1]. Cue in — git flow, TDD, not using goto, DOM Parser instead of RegEx.

[1]: https://shubhamjain.co/2015/10/26/imperfect-best-practices/


Reminds me of a comment by Harry Roberts on CSS methodologies https://twitter.com/csswizardry/status/539726989159301121

    Modularity, DRY, SRP, etc. is never a goal, *it’s a trait*.
    [...] understand that they’re approaches and not achievements.


> I think the chasm exists between _untested_ code and code that has tests.

IMHO untested code isn't always a bad thing. Anything related to privacy, security, or data integrity should be heavily tested.

But beyond that, code should need to earn its tests. (At least in a web startup context.) After all, the core of agility is being able and willing to go in a different direction when something isn't working.


I think there are a lot of nuance to this statement. After a few years on the start up scene, I believe that if you are using a "duck" typed language your core concepts should be tested without exception (for example rails and django models). The farther you get away from these core concepts, the less your tests need to be there and the more likely that code is to change.

In addition, if you are truly in the earliest stages of your startup (pre-revenue/traction) the tests are not needed -- but you have to be _very_ aware that you are making a tradeoff for sheer velocity and you will have to pay the price later on. I've seen too many startups fail to pay the price and it comes back to haunt them and reduces their overall velocity.


If your code doesn't have tests then you can't refactor because you don't know if you broke something. Getting the right level of tests is important, they should check the interface and not the implementation, otherwise I agree, you can't refactor anything. But if you test your interfaces the same way that real code would use it then it actually makes you more agile, because it gives you the peace of mind to do major changes.


"Tests can make refactoring much easier" is a statement I could easily buy into.

"If your code doesn't have tests then you can't refactor" is pure dogma and trivial to falsify (go on, refactor some untested code now!). More likely to introduce bugs? Perhaps. But relying on that absolutist statement about refactoring will alienate me every time.


Let me clarify: you can't refactor without a very awake QA team, because you have no idea what your changes may have broken.


And a passing test is a guarantee for a working feature across an entire platform. Every. Time.


>Could it be that TDD vs. tests-after-code is a highly personal thing? I personally find it easier to write good tests after I've coded something functional. Before hand, I know one or two fuzzy ideas of what I want to accomplish, but I can't list out the concrete, real-world test scenarios until after I've coded something, poked and prodded it, etc.

I generally find it better to engage in "top-down" development - specification->tests->high level code->low level code as it's cheaper to fix mistakes at higher levels first.

If I'm doing purely experimental code though - where I'm not sure of what I even want or what the output should be, tests are just a waste of time.

It also loses its effectiveness the more declarative the code is. If I'm essentially tweaking configuration or HTML there's no point.


> they'll think about scenarios first, then code after they have everything accounted for.

I'm one of those people. But I find it hard to believe that a test suite can reflect everything I understand about a problem domain. Focusing too much on unit tests makes it difficult to find the general rules that govern the behavior of a system. Have too few unit tests, and they won't capture all the subtleties and corner cases. Have too many unit tests, and they won't fit in your head.


To me, it depends on the problem. With some problems, I'm not sure yet how to implement it, but I do already know what the result should look like. That's an excellent case for TDD. You get to work on the problem, refine your ideas about it, and you start with the part you already know: the results.

If you have no idea what the results should look like, then TDD is pointless.


For me it also depends on the task - I never do a strict TDD - but often I like to start with some tests.

Sometimes it is hard to write the tests, but it is easy to see if your code works - sometimes it is easy to write the tests, but not the code.

I would call it 'do first what is easier'.

But it is important to write the tests quickly after you wrote the piece of code - so that you don't forget something.


My experience has been that TDD is worthwhile when working with notoriously slippery whack-a-mole functions like handling time or money. The time saved by catching regressions vastly outweighs the time taken to implement the tests.

In contrast, TDD has been a waste of time for me for UI-based work, as the effort needed to properly expose the functionality under test is too great and the requirements and design change too quickly to be worth it.

In the latter case, writing some deterministic UI tests against mock data after the requirements and implementation have settled has proven much more effective in preventing regressions.


Exactly my experience - I've done TDD for 3 different types of calculation functions in 3 different languages over the years, and all were beneficial: financial risk calculations in C++, report data calculations in PHP and billing calculations in JavaScript.

On the other hand, TDD for operations that involve I/O (including user interactions) were not helpful at all.


That would probably be better handled by functional/e2e testing?


TDD doesn't assume you are using unit tests.


You are saying "TDD" but your comment reads the same if you just say "tests in general"


Agree here. Tests the stuff that needs testing.


That means you can't use coverage tools, so this approach makes the tests unquantifiable. It is not how most major dev shops work, because of the bean counters and the "senior" developers/leads/managers/... who seem to want nothing but appeasing beancounters.

It also produces superior software in my experience.


> That means you can't use coverage tools, so this approach makes the tests unquantifiable.

So it's win win then ;)


It's been very YMMV in mine. For some things, it's indispensable. Others it's tedious, repetitive and of questionable benefit.


It's tedious and repetitive even when the benefit is obvious and substantial, unfortunately.


Automatic test generation tools. There's quite a few on github, but the real ones are the ones you write yourself. What you're looking to do is to run the software a few times and record tests for particular pieces of code during those manual tests.


Can you elaborate more? Im a frontend guy and sometimes I write JS plugins for the browser (that run after page load, and apply based on how things render on the page, or bawed on user interaction with the page) and non-frontend folks tell me I need tests for my code, or dont want to look at it until I have test. How would I build a tewt for this, other than a functional test that runs in-browser and the test is whether it works or not?


Here are two strategies. First, you can go "outside" what you've built, simulating a user's interaction and examining the resulting DOM. In general this is a tool-centric approach.

Another approach is to partition your code into what Gary Bernhardt calls an "imperative shell" of code that touches the outside world and a "functional core" that does not. Then unit test the functional core, which shouldn't require special testing libraries, and validate that the imperative shell works by the occasional manual or "outside" test.


First off, if your code don't have bugs, then that is preferable! In my experience with front web dev the issue has always been different (old) devices. Testing could be done by browser macros that simulate input, like mouse movement, click on x, y, z, etc. Then take screen-shots to see if something is off. You can also make your widgets throw errors (1) and call home via XMLHttpRequest if an error is detected.

1) http://www.webtigerteam.com/johan/en/blog/error_driven_devel...


The

  * fire your QA team

  * dev team is the level 2 production support, and

  * get to continuous integration nirvana
management fads have been sweeping through my Scrum enterprise for the last 18 months.

Teams that aren't testing constantly, well, they've got tons of escape defects on every release. And those devs are constantly in fire-fighting mode, it's miserable for them. And I see that leading to compressed schedules for them and more reckless behavior like asking to push their releases during the holidays where there could be severe financial consequences to bugs.

As far as I'm concerned, in an environment like mine, where developers can no longer hide their incompetence behind bureaucracy like a QA team, it is official insanity to not spend inordinate amounts of development time writing automated tests. You should be spending 70% of your dev time writing tests and doing devops and 30% writing features.

I read in these comments a lot of bellyaching about how much time it takes to write tests. First, TDD is a skill that you can get good at, and it won't take as much time as you think once you get good. Second, I just don't think you have a choice to not test comprehensively when escape defects become a mark of shame in the organization.


>As far as I'm concerned, in an environment like mine, where developers can no longer hide their incompetence behind bureaucracy like a QA team, it is official insanity to not spend inordinate amounts of development time writing automated tests. You should be spending 70% of your dev time writing tests and doing devops and 30% writing features.

Highly agreed, anytime there has been QA available, the code base starts to look more hacky and fewer tests are written and only the tests that cover the happy path are written. When things are looking bleak, that's when I start to think we should have certifications for developers, optional of course, like the PMP or CAPM, but they should be there as a rough indicator that a dev won't skimp on QAing their own code.

>I read in these comments a lot of bellyaching about how much time it takes to write tests. First, TDD is a skill that you can get good at, and it won't take as much time as you think once you get good. Second, I just don't think you have a choice to not test comprehensively when escape defects become a mark of shame in the organization.

After years of looking at tests written poorly, I've come to the conclusion that you have to treat test code like your regular code and do code reviews on it and refactor the heck out of it.

If you don't, you'll end up in the situation I'm in, where our company has almost 4000 spec tests in ruby/rails that can take an hour to run on a local dev box and still take 20 min when parallelized across 5 boxes. Shaving off a couple of seconds here and there by removing unneeded tests or refactoring to simplify them or to share mock objects is a worthwhile endeavour but because they're "just tests", the time to do that isn't allocated and everyone on the team merrily continues to write and modify tests in a sloppy way.


The study is about the effects of "test first" vs "test later", not "writing tests" vs "not having tests".


Yeah, I tried to avoid specifying a methodology. I just couldn't resist throwing in a 'TDD' because it turns better quality results for me personally than 'test later'. My point in commenting was: testing comprehensively is not optional at my workplace due to the way the devs have suddenly been held so accountable.

Personally, I don't care how you test. Just get decent coverage over all the control paths and have a test suite that catches regression bugs. Ideally also have integration tests to prove your connections to your partners are good and maybe a smoke test to prove your code is wired up right once it hits an environment.

I'm still on the journey myself, so I can't claim to be an expert. I know there are people in my org that are suffering because of code quality issues. I call them incompetent because they're producing shoddy work products with lots of defects. A smart way to combat bugs and kill them for good is to test, so that's why I connect not testing to incompetence. A dumb way to combat bugs is heroics and working 18 hour work days trying to debug with "GOT HERE" logging.


Potentially contentious opinion, but one that's been echoed through our halls -

Developing software and developing tests and test infrastructure are two different skills and mindsets that are often inversely coupled.

It's not that "incompetence" is revealed by firing the test team (although it sometimes is) - it's that "being bad at writing tests" is revealed. A team with 3 dedicated testers and 7 devs will probably outperform (in code output and reliability) 10 devs spending 70% of their time on testing.


I have experienced the opposite. After the dedicated QA team was disbanded and repurposed to development, where every developer had to write tests for someone else's code, the code and designs started to become more testable, the tests eventually became simpler to maintain and understand. (the project was and embedded system)

The "We" and "Them" distinctions made people in different teams and different roles ignore the needs of the others. The change brought a culture change and also gave a view on the needs of the other side. This could make it possible that for the first time the development of tests and features could really be done in parallel (officially that was the methodology before, but never really worked out well, because the testability considerations were usually ignored at design time, and the docs were lagging, because of the bad bandwagoning culture. With the mixed team where everyone was treated as an equal these problems dissolved surprisingly quickly (in less than half a year)).

So my point is having dedicated testers can give base for bad culture which hurts the product and the company. Having everyone do the same job with regards to development and testing is better.


There is no silver bullet here. For some teams it makes sense to integrate development and testing. For some teams it makes sense to have dedicated QA people. For some teams a different constellation is optimal. I know developers that are brilliant at the big picture, finishing the implementation however was lacking. I also know developers that cannot get the major architecture right, but they can finish tasks and get it shipped. pick one of each and a devops guy that can write tests, and you have a team of 3 that produce top quality software.


You are probably right with your example for a small project. The one I referred was a medium sized safety critical piece with dev/test staff of hundreds. My example was for such larger organization.

Small teams almost always worked out well for me if a single leader person was present to sort out initial problems.


I've actually found that most QA people I've worked with were good at deciding what to test but were awful at writing automated tests. Of course that's purely anecdotal and based on a very small sample size.

I like to believe that I sometimes will forget to test for edge cases, but have mastered how to write tests. I don't think that how to write tests is easily acquired by any descend programmer. However, you need to learn a few patterns you otherwise won't need.


The real strength of QA people can be manual testing. There's a local maximum of "meets the stated requirements but horrid to actually use", and a singular focus on test automation seems to push projects in that direction.


Totally agree. I think they are great for what some people refer to as "exploratory testing".


Not to mention that there is a lot of stuff, especially for UI, that is either hard or impossible to write tests for...


  > You should be spending 70% of your dev time writing tests and doing devops and 30% writing features.
I laugh, but the tears are real.


Am I correct in reading that they performed this experiment only for two days, and entirely with graduate students?

If so, they have missed the point of TDD.

In the short term, TDD probably doesn't make a difference, one way or another.

But software as a business is not a short-term game.

I would love to see a study where the participants are, over a period of six months, given the same series of features (including both incremental improvements, as well as major changes in direction).

In my experience, teams that don't test at all quickly get buried in technical debt.

Untested code is nigh impossible to refactor, so nobody ever does, and the end result is usually piles of hacks upon piles of hacks.

As far as testing after development goes, there are three problems that I see regularly:

One, tests just don't get written. I have never seen a TLD (Test Later Development) team that had comprehensive code coverage. If a push to production on Friday at 6pm sounds scary, then your tests (and/or infrastructure) aren't good enough.

Two, tests written after code tend reflect what was implemented, not necessarily what was requested. This might work for open-source projects, where the developers are also the users, but not so much when building, say, software to automate small-scale farm management.

Three, you lose the benefit of tests as a design tool. Code that is hard to test is probably not well-factored, and it is much easer to fix that when writing tests, then it is to change the code.


The goal of the study was not to measure if testing is valuable at all, but to measure if there is any difference between TDD and TLD. I think no-one questions the value of testing per se.

As for your points why TLD is theoretically worse:

1. "in TLD tests just don't get written" - this is a dual argument to "in TDD the code just doesn't get refactored". After tests are green, there is no motivation to do so and it feels like a waste of time (if it works, why change it, right?)". I think the latter is much, much worse for longevity of the project than low test coverage. Not enough coverage doesn't mean automatically bad structure of the code and you can always fix that by adding more tests later (and maybe fixing a few bugs detected by them). But writing code quickly to just make the tests green and then not doing refactoring quickly leads to bad structure of the code very quickly. Whether you chose TDD or TLD, you need to apply some discipline: in TDD to refactor often, in TLD to keep high test coverage.

2."tests written after code tend reflect what was implemented, not necessarily what was requested" - a dual argument for this exists as well: "code written after tests tends to reflect the tests on case-by-case basis, not necessarily the minimal general solution covering all possible inputs that should be the true goal of the implementation". Whenever I hear the argument that tests help write better code I always remind myself a famous Sudoku Solver written by Ron Jeffreys, TDD proponent: http://ravimohan.blogspot.com/2007/04/learning-from-sudoku-s... I also saw that happen in a few real world projects - the code written after tests was just a giant if-else (or switch) ladder handling each test case separately. Bad, bad code, additionally missing a few important cases. And the most funny thing was that after seeing this during the code review, I rewrote the implementation in a more general way, got two tests failing and after investigation it turned out the tests were wrong and the code was good. Lol, verifying tests by implementation :D

3. Tests are not as a design tool. Tests are for... testing. They are often a reason for over-engineering and over-abstraction that makes the code more complex and harder to read. See FizzBuzz Enterprise Edition. It is definitely testable to death.


Phoenix (Chris is in TLD camp and Phoenix has very good test coverage)


> Untested code is nigh impossible to refactor, so nobody ever does, and the end result is usually piles of hacks upon piles of hacks.

The mistake is in creating unrefactorable code. TDD may be a possible solution if done correctly but there are other ways to skin that cat.


Sure, but I have yet to find a better solution than TDD for big teams where team members come and go constantly.


"Branches aren't merged without peer approval" has sufficed plenty in my experience.

Whether people code tests before thinking interfaces, before writing a prototype, before implementing, during the inevitable interface rewriting, after coding, after manual verification that it seems to work, or right before submitting the branch to review, doesn't matter as long as someone on the team looks over and sees that "yup, here are tests and they seem to cover the important parts, the interface makes sense, and the documentation is useful".

Then people can do TDD, TLD, TWD or whatever they personally feel most productive with. Developers being happy and feeling in control of their own work does more for quality than enforcing a shared philosophy.


This.


This is a misleading title and conclusion. The study showed a huge benefit of TDD over Waterfall, and it is only when compared to ITL that it was found to not be better.

But moreover, I think it's important to understand why Beck pushed for TDD.

TDD is like saying "I'm going to floss before I brush every time, no matter what."

But, when people don't do TDD they typically aren't all saying "I'm going to brush and floss afterwards every time, no matter what."

Instead, most say "I'll floss regularly at some point, but I don't have time now, and it takes too much effort. I'll floss here and there periodically, maybe before my monthly meeting or big date night."

Another reason Beck pushed for TDD was method and solution complexity reduction which results in lower time and cost required for maintenance because code is simpler to read and understand. Again, with ITL, you're still writing tests for everything, so you'll see those benefits. However, if you fail to write some or most tests, some developers will write overengineered solutions to things and have overly long difficult to follow methods that will make maintenance suck more resources.

If you want to go beyond this study, though, Beck, Fowler, and DHH had a critical discussion about TDD in 2014 that's worth checking out:

http://martinfowler.com/articles/is-tdd-dead/


The extra complicated architectural refactors I've seen done in the name of 'test-ability' have been eyebrow raising. TDD isn't a guarantee in inducing engineers to KISS. You can still make overengineered crap tests first / TDD or not


Waterfall is a straw man.


Not only that... when you test the efficacy of medical interventions the gold standard to strive for[1] is not whether the new intervention is better than placebo, it's whether it's better than $CURRENT_BEST_KNOWN_INTERVENTION. I suggest we should be aiming for a similar standard in testing software engineering methodology.

I think it would be very hard to argue that Waterfall ~= $CURRENT_BEST_KNOWN_METHODOLOGY.

[1] Of course, this isn't usually what happens in practice when pharmaceutical companies are doing their own testing, but it's what should happen if you actually care about efficacy and not just PR/sales.


The title suggests that TDD has little or no impact on dev time or code quality at all.

The research shows no significant difference between TDD and iterative test-last (ITL) development.

Could the title be updated? To show that it is a comparison of TDD vs ITL/TLD.


This thread feels like a study showing that HNers don't read the articles.


There is a problem with all these studies - they all use a very small amount of programmers (21 in this case) with no experience (all graduate students in this case) and presumably no significant experience with TDD or TLD.

I'm not making a stand about TDD here - I just think we need to have much better computer engineering science studies if we want to have significant results.


Agreed. I also would question the short time scale of the test. It took me a year or so to really get good at test-driven development.

I also think of TDD as a sustainability practice. If I'm writing a small thing that I do not intend to maintain, I won't bother with TDD (or with tests at all). But I'll definitely TDD something where I expect to come back to it frequently, especially when I initially don't know the requirements, and I expect requirements to change over time.

In practice, I suspect a lot of the interesting questions about software are effectively unanswerable with the budgets available to CS profs. I can't imagine really answering this question without doing something of the scope of a substantial medical study.


Really don't know why you're being downvoted. Grad students at diploma mills are awful programmers, and even those in real labs are good at putting out proof of concepts, not production code.


"TDD has little or no impact on development time or code quality when compared to the equivalent number of tests implemented afterwards using TLD."

FTA: In this paper we reported a replication of an experiment in which TDD was compared to a test-last approach.

Very different title.


I never really viewed TDD as better at reducing bugs for a short term project, its going to have marginal better chances of getting additional test cases.

I view it more as important for breaking the growth of testing effort in an iterative project. With each release the scope of what should be tested to fully test a project climbs and unless a team wishes to linearly increase the size of its test team its all but certain tests will be skipped.

TDD gives us the ability to always full regression test as its just machine time. Its a safety factor in knowing nothing is broken which in turn gives us confidence we can refactor.


Up until now it has mostly been opinions and biases and even though many popular programmers[1] have been saying this for a very long time, it's great to see a controlled study done about it.

This makes it a fact and a great counter argument for helping a lot of programmers who are being forced to practice TDD because of the generally accepted claims in productivity and code quality associated with doing it.

[1] http://david.heinemeierhansson.com/2014/tdd-is-dead-long-liv...


Keep in mind though, that the study shows that it's no better than testing after iterative development. Testing is still required, and I'd wager that the study participants following the ITL process didn't have external business pressure to skip the "test-later" bit...


- Population: A classroom of students, most without professional experience

- Sample size: 21 students

- Study duration: 2 days

- Team size: Individual

Tests are most useful when refactoring someone else's long-forgotten code; the sort of thing that happens frequently in long-running projects consisting of large teams. In other words, the "real world".

Show me that study.


In "Realizing quality improvement through test driven development: results and experiences of four industrial teams", an MSR researcher found that TDD did reduce defects in his study, but also came at a large cost in time-to-ship.

https://www.microsoft.com/en-us/research/exploding-software-...

This finding contradicts the headline. TDD impacted both development time and code quality in that study.


I have spent almost 3 years now writing code(with very few or no tests) and my current organization stresses on agile practices a lot.I encountered TDD from here.So I would like to chip in here too.

TDD solved a major problem for me which I have seen a lot of people suffer with. _Where do I start ?_ . The thing is TDD and refactoring go hand in hand. I cannot imagine doing TDD if I was not using an IDE like Intellij or something. When you normally start writing code first(typical TLD) then you need to have a plan before hand. This plan cannot change much because you really do not get feedback till you complete major segments of the code. TDD ensures you keep getting nibble sized feedbacks which assure you that what you are writing works. This according to me is the single most beneficial point of the system. TDD or TLD would allow maintainable code too.And often while doing TDD too,you can strictly follow TDD. It might not have an impact on code quality for seasoned developers(coding for years on the same codebase) but it does help for the others .It also reduces my inertia considerably too. So while it might not have impacts on development time or code quality. I tend to sleep well without large UML Diagrams floating in my head and knowing that each unit of my code works independently.


> When you normally start writing code first(typical TLD) then you need to have a plan before hand. This plan cannot change much because you really do not get feedback till you complete major segments of the code.

A good plan is modular and therefore flexible. I get lots of feedback as I'm writing the code. When I find myself writing very similar code over and over or an API feels unwieldy I take that feedback and refactor without delay.

I think what really happens in big groups of developers is that TDD forces everyone to delay agreement on the interfaces between code modules until the process of writing tests has uncovered most of the problems. With out TDD the devs who just want to get their part done and go home plow ahead with the first draft of the API locking it in stone before it's been vetted and then dig in their heels creating technical debt. TDD appears to be the saviour.


>I think what really happens in big groups [...] dig in their heels creating technical debt.

This is a major factor. the Pareto principle applies in workplaces too. TDD tries to even the balance a bit.

Regarding feedback.I should have clarified that I am a mobile dev. I need to build everything from the backend services to the UI to be able to get viable feedback without tests. And if any other mobile dev has any other approach to this problem . Please let me know,I've been trying different approaches for a while now. None seem viable to me apart from TDD.


The questions worth asking about techniques like TDD are "What problems does it fix?" and "What problems does it introduce?"

I would expect a determined attempt at TDD to solve the "no tests" problem, because it is so utterly insistent on tests. It should also solve the "don't know how to start" problem, because it de-emphasizes planning and design in favor of just jumping in; you write the tests, and then you do the bare minimum to make them pass.

That said, I would expect a TDD-based project to have the "bad architecture" problem: messy interfaces and sort of ad-hoc separation of concerns, because it makes no time for up-front analysis and design. It's always focused on the current feature and doing whatever it takes to make it work now.

In fairness, it does include a refactoring step, which is supposed to clean up the mess after the fact. Color me skeptical. Refactoring is hard, and people tend to do it on a large scale only when they have to.


" It should also solve the "don't know how to start" problem, because it de-emphasizes planning and design in favor of just jumping in; you write the tests, and then you do the bare minimum to make them pass"

It doesn't SOLVE this problem. It only pushes it in time, making it actually worse, because you're wasting time on stupid tests instead of actively researching a solution. No amount of tests is going to help you find the right solution if you don't know what you are doing. See http://ravimohan.blogspot.com/2007/04/learning-from-sudoku-s...


I think TDD deserves a bit more credit than that. Building a simple solution for part of the problem can be credited as exploring the problem space. The same can be said for extending it to address more of the problem; that's exploration too.

But I fear this incremental approach is going to produce a very baroque solution that is going to have to be rewritten completely once the bell goes "bing" and the programmer actually understand the underlying problem well enough to produce a clean solution.

I think the larger problem with TDD is that there are at least four parts to software design, and TDD bets it all on two of them. There's requirements analysis, architectural design, construction, and finally testing. TDD is really all about the construction and testing bits. It doesn't address requirements analysis at all, and it doesn't seem to want to do architectural design, it just constructs and tests with great passion. It's imbalanced.


Software development varies enormously. Flight avionics software differs from video game software differs from a spreadsheet differs from an order-entry system differs from laboratory analysis software differs from a web browser and so on. Flight avionics differs from a commercial jet liner to a fighter plane to a model airplane. Some projects have huge budgets and others have shoestring budgets. Some projects require extremely high reliability and quality; cost is not an issue. Other projects can be quite buggy, low quality but still useful -- cost effective.

Developers vary as well. Some temperamentally find something like TDD useful. Others do not.

There is no one software development methodology to rule them all.


I'm not commenting on the TDD studies in terms of its effectiveness but I do know that a project that takes longer brings more programming hours which results in larger budgets. If you were a company selling your services, you would be a bit more motivated to include things that take longer especially if this tugged at the emotional sense of assurance in your clients. You would also preach it to your programmers as a core practice and they would happily be converts. This goes for all the structure surrounding your project as well. I tend to see more structure in outsourcers these days and a smugness along with it. I wonder how much of it is bloatware though.


No TDD discussion is complete without a reference to Sudoku debacle.

https://news.ycombinator.com/item?id=3033446


should be top comment


The use of students in SE research is a hot topic, see, for example Fietelson's review: https://arxiv.org/abs/1512.08409.

Practitioners have a problem recruiting subjects. There is often a tradeoff between applying more rigorous experiment design and using convenience sampling (students) versus sacrificing controlled environments (so that professionals would actually join the study).

It's easy to condemn work like this but there's no other option. In this case the researchers chose to replicate a study (which often risks similar ire for telling us nothing new) with a commendable level of rigour and have provided more evidence that, for the scope of experiments we can construct that TDD is probably no different to TLD when using a population of relatively unqualified developers (students).

As to the problem being trivial, what else can be done? There's a finite time you can ethically expect participants to give to you, even if you pay them. If anything the criticism of this work is better directed at the limitations academics are forced to bear.


Your argument reminds me of the joke about the man searching for his keys under the streetlamp.

"A policeman sees a drunk man searching for something under a streetlight and asks what the drunk has lost. He says he lost his keys and they both look under the streetlight together. After a few minutes the policeman asks if he is sure he lost them here, and the drunk replies, no, and that he lost them in the park. The policeman asks why he is searching here, and the drunk replies, 'this is where the light is.'" [https://en.wikipedia.org/wiki/Streetlight_effect]

I agree that this is a well-designed study given its constraints. And it's admirable that it's a replication study.

That doesn't change the fact that it's largely irrelevant to professionals. It doesn't test the claims made by TDD proponents (TDD leads to better design, reduces long-term maintenance, allows for team coordination, etc.), nor does it address any of the interesting questions about TDD:

* Is TDD more effective in a professional setting than commonly-used alternatives?

* Is a mock-heavy approach to TDD more effective than a mock-light approach?

* Do people using TDD refactor their code more or less than people using a different but equally rigorous approach?

* Is the code done with TDD more maintainable than code done rigorously in another way?

* Is TDD easier or harder to sustain than equivalently-effective alternatives?

As a study, it's fine, if only of interest to academics. The problem isn't the study. It's the credulous response on the part of industry developers who then turn the false authority of the study into statements like "TDD doesn't lead to higher quality or productivity."


I skimmed the Fietelson's paper you linked and this jumped out at me:

"Students should generally not be used in studies that depend on specific expertise which requires significant experience and a long learning curve to achieve, or in studies of professional practices. Such studies are best performed by observing and interviewing professionals, not by controlled experiments."

This seems directly relevant to this TDD study. Every one of these contraindications are true in this case.


The problem that I have with this article is how people will interpret the results. The test is comparing (presumably) Comp. Sci. graduate students who already know good design patterns, best practices, etc at a relatively high level to see if they are faster and more accurate by testing before vs. after writing the main code. (TDD vs. TLD)

That's all well and fine, and possibly completely accurate. However, many people's takeaway is going to be the out-of-context & incorrect title of this post. (It does not say TDD is worthless - it says it is essentially the same as TLD)

I've always looked at TDD as a tool to help push less experienced, less "educated" developers into 1) even using tests at any point of the development cycle, 2) creating tighter, cleaner and MORE TESTABLE code by the time they've reached the end of the cycle.

So, if your team is and always will be well-educated, experienced programmers who already understand how to always do everything correctly from the beginning, feel free to use either method.

Otherwise, I'd urge you to consider TDD.


You clearly think very highly of CS grad students :)

I've met a few that couldn't code let alone write proper tests, even though their foci were not policy or say the lighter side of user experience.


Really? Almost all of the post-graduate CS students I've known are experienced people who went back to school because they were at the point where they needed to either move into management or become a very high-level expert.

That's only personal experience, however, and mostly out-of-date

That said, I would hope any of them would be a step up on the new programmers from the "churn 'em out" short courses many are forced to take as the only affordable way they have to get into the industry, but then are left with little knowledge beyond how to get the requirements fulfilled.


Yes. I can say 100% that the major difference I see in hiring interviews between grads and laterals is coding ability. Even at PhD+1 it's vastly improved


My anecdata matches the author's - I feel more productive doing TDD.

Perhaps because it's less stressful. You think about system design as you code, instead of only when you hit a wall and have to rewrite everything, or when you have to clean up for code review.

Either way, if it has little to no compact on dev team or code quality, I bet the positive impact TDD has on team morale would make it worthwhile.


Extrapolating from that, perhaps it's worth considering the benefits of thinking about system design before you code.


This is an editorialised title. The blog posting is a boring "Test Driven Development". The blog posting and the paper that it fronts has a conclusion that no significant difference between TDD and iterative test-last (ITL) development, which is quite a bit different from TDD has little or no impact on development time or code quality


My default is to not write many tests at all during the experimental, build-out phase. I'm not looking for exact or bug-free software, I'm trying out different API's, aggregates, and architecture in general. Needing to refactor tests every time I want to make a drastic change is... Well, you know. AS somebody else pointed out, this architectural stuff is probably actually much harder to nail down than just writing code that works. This is not limited to very initial build out but could apply to big refactors as well.

After and during the experimental phase, it depends. Both before and after I may write tests before or "test with" for gnarly logic or algorithm-y stuff. Otherwise and in addition I do copious amounts of manual testing. Manual testing is a must for much of what I do, so I augment or substitute automated testing as appropriate. Automated testing is great, but sometimes the overhead is too expensive.


After a quick read on the metrics section, it seems the quality is measured in terms of adherence to user stories implemented as a set of behavior tests. There seems to be no assessment on code maintainability and looks like a flaw in the study, as it would model a short lived codebase and not one that undergoes several maintenance cycles.


TDD is king when refactoring, or proving an algorithm. You have a tests to confirm the output, and near realtime feedback that you assumptions are correct. The rest is obvious. Mission critical component TDD, complicated refactor TDD, algorithm you need to validated TDD. Anything else write the code and get a peer review.


No amount of tests can prove that an algorithm is correct. At best, they prove that an algorithm works in a particular case.

And generally, I would say that the more interesting sorts of tests (fuzz testing, large-scale system testing) are extremely unpopular with software engineers because "they suddenly fail without reason". Not quite as unpopular as actual proof that an algorithm works, like implementing it in coq for instance, but very unpopular.


I'm a fan of TDD, but I'm a bigger fan of having reliable, repeatable, and complete tests period.

I don't think it's productive to argue the merits of the study itself - better to look at the positive. What the study tells us is that it's not too late to improve your existing software with tests.


Long story short: if you don't have coders who take their product as a matter of personal pride, or are inexperienced, or are mediocre, no methodology in the world will save you. None. I realize my statement is anecdotal, but I'm writing from decades of experience working with people who did not take any pride in their work, and still view programming as a trade rather than art, or view programming as an art where "spaghetti code is beautiful". No methodology, no technology, no management technique, and no programming language saved them or the company. The builds are still a mess. The code is still a mess. The bodies of code require endless babysitting and endless hacking.


I've tried to TDD numerous times in my professional career; I'm confident it works for many. I prefer to use white-box as my second pass through at my algorithm. It allows me to identify potential weaknesses, write test cases around them and correct them in one step. I never feel quite as secure with TDD as I do with post-hoc testing. I'm also not going to tell other people that's the one-true-path. Unit tests? Critical. Before vs. after? Personal.

With respect to this study, I think at best we can say that equal quality tests yield equal results. I don't think -- based on reviewing the methodology -- that the headline can clearly be drawn from the study.


TDD is basically "writing software for a test". Programming language design has a similar problem. First BIG software many people write in their new language is a compiler, so many languages are optimized for that.


As a TDD advocate, and assuming this study has any scientific validity, this is actually good news! There's a very common claim that TDD makes you less productive. It's good to have some study to oppose this claim.


I think methodically you'd need equivalence testing for that. Your hypothesis would be that productivity is equal (enough) and you could then discuss the additional benefits of TDD.

I can't remember if I read a study like this for TDD but equivalence testing is fairly underused outside of pharma/medicine (it's often even called bioequivalence) where the test usually shows similar enough effects and the extra benefits are cost savings (for generics).


We changed the URL from http://neverworkintheory.org/2016/10/05/test-driven-developm..., which points to this.

When the topic is controversial and the paper is not so specialized that only a few people here can understand it, changing the URL to that of the paper tends to help make a discussion more substantial. Especially when the blog post is more of a gloss on the paper than an in-depth commentary on it.


If you were to always write tests immediately after you write a few classes I don't think it would make a difference. However from my own experience I never write nearly as many tests after the fact.


This is the kind of stuff that in the aggregate you can't show a relationship, but I bet if you controlled for type of project one would see some interesting results. Anecdotally, I know some firmware engineers that shit out the buggiest code I have ever seen, and test driven development would have definitely improved the customer experience. Because when the engineers have literally no tests other than trying stuff out with a printf on the target embedded device, any amount unit-testing will wind up helping.


I just came out of an embedded system-sy project, and I did have some tests for my ringbuffs, and sprintf.

But tests for any of the interactions between subsystems is quite problematic. And then testing on the device might be problematic due to space constraints, but testing on a simulation is also problematic... I don't know how you'd realistically test it.


"YOLO" based Dev work on the otherhand is where it's at, right?

On the other hand I can see where students and new learners might falter. TDD requires u know a bit about what ur doing,and if you're new to programing, ut just costs more time to compensate for not having a healthy intuition.

Still tho, if you want to run maintainable code, that's somewhat future proof and not disposable - test it and keep it clean.

I mean it's like arguing sharpening ur katana while u fight is detrimental to duel survival. Which is true.. But...


Actual studies were never needed to convince managers to switch processes. Bonus points for blaming old problems on old process while blaming new problems on "not doing agile right".


It always both amuses and saddens me how people will eagerly write more tests than actual code, but refuse to use a strongly typed language. The compiler is my test harness.


The compiler can test the validity of your code, not it's behavior. Only the most trivial bugs can be caught this way.


The research seems low quality. Whenever i try creating something more complex than just a CRUD webapp, i'm always relieved after getting a significant code coverage.

It may be because i'm a medicore programmer (i mostly do hobby projects), but getting assurance that my 'small change here' didn't mess up anything major in a distant part of the system is quite relaxing.

Obviously i only test logic and usually write the tests after coding. It still helps with my flow.


Then you're not talking about TDD.


In a certain way, you always use tests when you are developing something.

Write code, run it, see what happens, repeat.

The 'see what happens' part is what is different in TDD.

It can be very similar to what you do without automated testing (while also repeating all previous tests), or it can be a scaffold on endless tests, or two few tests, or anything in between.

I've seen too many mocking tests for my taste. In fact, my tests tend to be in the 'integration tests, not unit tests' category.


There are many problems with this study, but for me the most glaring is the definition of quality that they measured. It was purely whether the program performed as expected. This is obviously an important part of code quality, but not the only one. Most proponents of TDD say that its greatest benefit is creating clean, easily maintained code. So this study didn't even attempt to test the benefit that TDD claims to provide.


TDD means 'management' can't drop the tests being written due to 'timescales'. If they're done up-front, they will be there.

It's also one reason that TDD isn't done, because given a few weeks to complete an impossible deadline means that tests are the first ideal to be dropped.

It's not the correct way to do things, but all of these studied tend to ignore the 'real world'.


The comments to that story are pretty good.

An interesting question is: why does TDD fail in such experiments (it does so unexpectedly consistently), even when many developers feel it has benefits when they practice it?

There is no silver bullet, so there must be circumstances in which TDD does not work. And conversely, the central question is: under what circumstances does TDD work? What are the preconditions?


The answer is likely in the studies, which have declared TDD efficient practice, and in the works that specified TDD approach. These works and studies tried to solve specific problems and I'm not sure they followed really scientific process before declaring that TDD is a solution, not a by-product of some solution that passed unnoticed during the research (e.g. education of developers on software architecture).


My gut feeling is that TDD is just a way to encourage me to actually write tests. I suspect I'd have just as many tests if I had someone prod me every 20 minutes to write a test case or two. Having one or two tests also creates a bit of a wedge to make me write more: once there's some code that can be neglected I can feel guilty when ignoring it.


TL;DR

Conclusion: "TDD does not affect testing effort, software external quality, and developers’ productivity"

However, per jdlshore's comment (https://news.ycombinator.com/item?id=12740978), test parameters weren't suitable for any meaningful conclusions to be drawn.


Does this study assess the long term cost of software? It may be true that this has little benefit in writing code from scratch, and my experiences are that TDD definitely takes longer when writing code than not doing it. But, how does it evaluate claims that 90% of the cost of code comes in the maintenance, not the initial creation of it.


TDD, like most of the agile practices, is a learned skill.

Doing it at an expert level is very different from an untrained novice winging it.


Replicated with 21 grad students? And then they quote statistics?

Painful to watch people generalize from such small sample sizes.


It's great such studies exist, but there might be many reasons why they are incorrect (they are testing on students, probably the students don't understand how to apply TDD, or other way around, they are so good that their coding approach provides all the benefits without TDD; the numeric metrics used in study might not adequately reflect the interesting characteristics of the code base, the payback of TDD might show up in later stages of the product life when we refactor or extend it, etc).

Probably TDD can speedup people who otherwise aren't used to iterative bottom-up approach - TDD will encourage short cycle of "change - run and see how it works" loop. Especially in non-interactive languages like C or Java.

Also, if we write tests after functionality is implemented, how do we know why our test passes: is it because the functionality is correctly implemented or it's because the test doesn't catch errors? To ensure test catches errors we need to run it on a buggy version of code. Implement functionality, write test, introduce errors in functionality to ensure the test catches them - that's 3 steps. Run test in the absence of correct code and then implement the code - 2 steps. That's where "test first" might be efficient.

But often that might be achieved other way. Suppose I'm writing a function to merge two lists. I will just do in REPL (merge '(a b c) '(1 2 3)) and see by eyes that it returns (a 1 b 2 c 3). I will then just wrap it into assert: (assert (equal `(a 1 b 2 c 3) (merge '(a b c) '(1 2 3))). Run this and see it's passes - that all, I'm sure it's an OK test.

In short, I think there is a certain truth in TDD, but it shouldn't be taken with fanaticism. And it can even be applied with negative effect (as any idea).

Suppose I want to develop a class (defclass user () (name password)).

I personally will never write tests for make-instance, (slot-value ... 'name), (slot-value ... 'password) before creating the class, the see how tests fail, then creating the class and see how tests pass.

Tests take time and efforts for writing them, and then for maintenance and rewriting when you refactor code. If a test captures an error then the test provides some "return of investments". Otherwise writing this test was a waste.

The tests in the above example will never capture anything.

I tend to create automated tests for fragile logic which is relatively easy to test, so that the efforts spend are justified by the expected payback.

But all my code is verified. Write several lines, run and see what doesn't work, fix that.


I've always made data to test if my functions work but now I write that data down in other programs for the future to keep checking my functions. What's the big deal... sure TDD is about the future not the time or development quality today.


Tests before/at/near development time really helps your code design - I've seen how ensure code is unit testable simplifies and enforces layering, etc. Really disagree that this does not help code quality.


New programmers will read "studies" like this and will decide to write tests "someday later". I really hate impact this "study" brings. And I agree with every point of @jdlshore comment.


The title of this post is very misleading. TDD in opposition to ITL has little to no impact, the title suggests that the testing itself does not have any impact. This is just click baiting...


When writing new code I don't usually write the tests first, but when fixing a bug, I do. There is nothing worse than a test that would have passed without your supposed bug fix!


Not surprised. Religiously follow certain principle to believe that it could help you bypass the complexity of the problem itself, almost always won't stand the test of time.


Experience is the name everyone gives to their mistakes --Oscar


I think TDD is most effective in "state machines" like cook([ingredients]) => dish, witch should be avoided if possible as they are very bug prone.


As many things in Software Engineering, TDD is just another tool. It's useful from time to time, but it's no silver bullet.


"TDD disproven by people who have no idea how to practice it, or have the ability to grok the longterm benefits."


For a split second I thought they were measuring TDD against no tests at all and I felt a panic-induced adrenaline rush.


Based on my own experience working in teams using TDD and teams not using TDD, I cannot agree.


The onus of proof is on the TDD pundits to prove anything substantial.


I was eager to read this paper but found little substance in it.


Somehow, as a Software Engineer, I'm not really surprised.


Can you elaborate?


If you write testable software and actually write the tests, the end result is the same whether you test first or test later. You're designing with testing in mind and creating useful tests either way.

It's really a matter of code that lacks tests that's an issue and especially of code which isn't designed with testing in mind. I think they overinterpreted the value of TDD as simply test-first. Test first can be good but it's more a motivational tool and a productivity simulator (yay look at those green checkmarks!) than a real benefit.

The end result is identical, this sort of craps on it as a motivational tool too if you consider that only as a factor in development time at least. It'd be interesting to see developer satisfaction included in some way too.


> If you write testable software and actually write the tests, the end result is the same whether you test first or test later.

This is definitely not my experience. As Kent Beck says, TDD is a design technique. It forces you to always start thinking of the code from the outside of the unit. If I build the unit first and add tests later, it's more likely I'll end up with something where the API reflects the implementation. With test last, I'm also less likely to test everything well; after the implementation is done, I believe it works.


If I build the unit first and add tests later, it's more likely I'll end up with something where the API reflects the implementation. With test last, I'm also less likely to test everything well; after the implementation is done, I believe it works.

Will you?

Are you sure?

Do you have data to back your supposition?

My contention would be that at the end of the day, the requirements of the interface to support unit testing will result in a very similar set of design choices, whether you write those tests up-front, as-you-go, or after-the-fact.

The only difference is that if you write them after-the-fact, you may push some amount of refactoring to the end of the process instead of doing it along the way.

But I'll bet you have about as much data as I do to back your beliefs. ;)


> Do you have data to back your supposition?

That's my experience. I started doing TDD more than 10 years ago, and it took me about a year to fully make the switch from test-after to test-first programming. I regularly try experiments with different personal code bases.

If you're arguing from personal experience, that your code ends up just the same either way, good for you. But I suspect you're arguing from theory here.


Perhaps it depends on how you implement each. If your idea of test last is writing a full large module and then trying to test that all at once, you might wind up with a different result than if you wrote an individual function or two and tried to test only those at once.

Getting the testing as close in time to the implementation as possible probably makes a bigger difference than first vs last.


I definitely agree with that. If there is somebody who actually writes a few units of production code, a few units of test code, and then refactors, then I expect things would be pretty close.

However, for me that's a hard set of behaviors to stick to. If I let myself say, "Oh, I'll test that later" I'm likely to get lost in the problem, write bunch of production code, and then say, "Well, it already works, but I guess I'll add some tests." There's no clear stopping point.

Test-first programming, on the other hand, is easier for me to stick to. If I don't have a broken test, I write one. If I have made the test pass, it's time to write another test before I write more production code.


I have always worked in a "test last" mode. That's the way I learned to program. Write some code, test it. When it works, write some more code, test some more.

My attempts to work in a TDD style ran up against decades of intgrained habits and I never really found it satisfying or natural.


It took me a year or so to make the transition. And I think the transition is easier to make in a greenfield or already-TDDed code base.

That's not to say that you should try it again; I like TDD a lot myself, but software methods are so interdependent that I think each person has to judge what works best for them.


tests can do alot of things, I wish there'd be a more functional name for them, ie. regression stoppers.


Well, let me ask you:

Why would we assume TDD would improve code quality or development time?

If, at the end of the day, the result is the same volume of tests and the same level of overall coverage, the order of activity would seem to make no difference. What would lead you to believe otherwise?

The real flaw in both methodologies is that it's the developer checking their own work.

I'd make the claim that if you really wanted to do it "right", you'd have a specification, one developer writing tests to the specification, and a different developer writing the code to make the tests pass.

Of course, that's probably not workable in practice... make for an interesting experiment, though (and yes, I'm too lazy to search Google, only to discover someone else already came up with the idea).


If the claim in the paper is true:

TDD = Same time + same quality + feel better.


Clickbait. TDD !== tests, article compares TDD with TLD.


It may not boost productivity upfront, but it saves a lot of time down the line by alerting you when something is out of place.


No, that's the value of automated regression test. TDD is just one way to skin that particular cat.


And that test winds up in v0.1.1 of your software magically, or is it there because you put it in the work to add it up front?


Nice work on the false dichotomy. :)

TDD has a very specific meaning. It means you write tests, then write code that passes those tests. That specific order. If you're not doing that, you're not actually adhering to the definition of TDD.

TLD could mean, for example, writing a module, class, or function with a defined interface that implements the contracts for that interface, then writing the suite of tests to validate that module, after which you move onto the next module.

Strangely enough, you can do that while you're developing the product without adhering to the process dictated by TDD.


You seem to be conflating the existence of unit tests with TDD.


TDD no, regression testing yes.


As someone who uses unit tests to find bugs in my code, that I would never otherwise find, this is surprising.


It's not about TDD vs no tests. The article is about tests first or tests later.


And yet, that is not what the clickbait title implies.


Only if you think unit testing necessarily requires TDD (which is defined as writing tests before writing the code that passes those tests).

It doesn't.

All the article says is TDD has little or not impact... it does not say unit testing as a practice has no impact.

Seems to be a common misunderstanding around here that you, too, have fallen victim to...


>Seems to be a common misunderstanding around here that you, too, have fallen victim to...

I work with people who openly believe that unit testing is a waste of time. Personally, I think they are afraid, because they don't even know how, and don't care to learn. They would see that title, and that's all the confirmation bias they would need.

Thanks for assuming the worst about me though.


Err, my apologies if that came across as a slight, I literally meant you and a lot of other folks made the exact same mistake. No judgement intended, it's clearly a common misapprehension.

My guess is that a lot of folks got introduced to unit testing through test-driven development and, as a result, conflate the two, assuming the former necessarily implies the latter.

Speaking for myself, I was writing UTs long before the TDD fad landed and so I never picked up the habit. It's just not the way my developer mind works, and I've not found it compelling enough to try and re-train myself.

That said, not believing in the value of automated regression test (of which UTs are the lowest hanging fruit) is utter madness...


Unit testing is not always a waste of time.

Unit testing can be a waste of time.


That is a good point. The implication got me to click through to the article.


That not TDD, though. It's more similar to test-after.


I think test after is a reasonable approach. I am constantly dealing with new code bases and in the interview process I am often asked about my philosophy about testing. My response is thatbtrsting occurs on two fronts from the top down ( functional UI tests) and from the bottom up (unit tests). When it comes to unit testing my approach is to focus on the hotspots. If something gives you trouble, or if you find a bug/ issue then wrap it in a unit test. That way you don't have to worry about it. Bugs tell you where the weak spots in your code base are. When they speak to you, listen and take some action. Otherwise, I feel like chasing blanket coverage is not worth the effort in most products.


...when implemented poorly


Therein lies the rub with all the faith-based modern development practices. Getting results? That's the power of The Practice. Not getting results or seeing negative impact? You're just not doing The Practice right and/or hard enough.


Cough scrum




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: