An External Replication on the Effects of Test-driven Development [pdf]

jdlshore · on Oct 19, 2016

This study, like most software development studies I've seen, is seriously flawed. It doesn't justify the sensational title here on HN.

* The sample size was tiny. (20 students)

* The participants were selected by convenience. (They were students in the researcher's class.)

* The majority of participants had no professional experience. (Six students had prior professional experience. Only three had more than two years' experience.)

* The programming problems were trivial. (The Bowling Kata and an 'equivalent complexity' Mars Rover API problem.)

Maybe, maybe you could use this to draw conclusions about how TDD affects novices working on simple algorithmic problems. Given the tiny sample size and sampling by convenience, I'm not sure you can even draw that much of a conclusion.

But it won't tell you anything about whether or not TDD impacts development time or code quality in real-world development.

ckastner · on Oct 19, 2016

> The programming problems were trivial. (The Bowling Kata and an 'equivalent complexity' Mars Rover API problem.)

The top-most comment on that page emphasizes this point:

Here's my hypothesis, based on personal experience: the benefits of TDD begin to manifest when they are applied at scale. During design and development, if a single developer can plausibly understand an entire system in their head, the benefits of TDD (and, in fact, unit testing) are negligible. However, there's a non-linear benefit as systems become larger, particularly in the diagnosis of large and complex system failures.

awjr · on Oct 19, 2016

This is absolutely key. TDD within small projects has little value. It's a technical debt people are happy to carry as the debt is negligible.

However, the moment it becomes more than just a one-off project this debt becomes a problem.

hirzel · on Oct 19, 2016

Although, isn't this more true for writing tests at all versus writing tests first or last in the coding process?

In terms of technical debt, does it matter when the tests are written?

testudovictoria · on Oct 19, 2016

Yes. Some people do not know how to write extensible software. They solve the issue at hand without any regard to big picture, so when something needs to be added (like test) the code needs to be heavily refactored to accommodate the change. As others have stated, this usually(!) isn't a big deal when the project is small with a single contributor. However, writing the tests last without any forethought to what the tests are or how they might be implemented will incur debt when the code needs to be refactored.

Conversely, extensible code can have the tests written before or after without much change to the amount of technical debt. The code is written to adapt to change, so tacking on robust tests wouldn't require too much technical debt.

Note that there is a balance. If you know your goals and have no reasonable expectation for them to change, then making everything extensible is unnecessary overhead. You can also have a mix of abstract functions that are extensible alongside immutable functions that handle something specific. It's all about what works best for your project.

mgkimsal · on Oct 19, 2016

I don't do 100% TDD, but do write code to be testable, and a lot of that mindset and experience came from trying TDD.

Forcing myself to think "how would this be tested?" helped, but doing some of the tests first got me in that frame of mind.

And by practicing that, you're writing code that - by definition - is reusable - you're using it in the tests, and in your production code. The 'reuse' thing doesn't have to mean 'reuse on 8 other projects' (a common rebuttal I've heard). Reusing the functionality in another part of your project 3 months from now is reuse too.

itsdrewmiller · on Oct 19, 2016

If true, wouldn't this be detectable in academic studies? It's easy to write untestable code for small problems, too.

awjr · on Oct 19, 2016

Writing well designed functions that do one thing enables good testing.

mseebach · on Oct 19, 2016

From experience, the value of TDD (or any style of automated tests, really, you don't really have to be driven by them) only really kicks in when the code is of a certain size, complexity and age. They are much closer to a tax in the early phases when you can reasonably keep the full scope readily in your mind.

ivan_gammel · on Oct 19, 2016

>or any style of automated tests, really, you don't really have to be driven by them

Looks like you misunderstood what TDD is. TDD is not a method of testing, it's the method of design of system architecture, in which you formalize the contracts and design APIs by writing examples of their use in tests. Because of that TDD is applicable on every scale (e.g. even when you design complicated algorithm that can fit in just couple of screens) and because of that TDD must be compared not to other test methods, but to other methods of design.

Of course, that comparison will not show the supremacy of TDD for many reasons, like, for example, insufficient expressiveness of testing framework or domain specifics. TDD is useful as one of the many tools we have today, but it shall never be the only tool.

spatulon · on Oct 19, 2016

This benefit of test-driven design is not obvious (at least it wasn't for me) until you really experience it. Several years back I wasn't aware of the concept of dependency injection, but independently discovered it while trying to unit test some error logging functionality. Previously I'd always used a global/singleton logger object. That always felt wrong, but I always assumed it was necessary until writing unit tests forced me to avoid that design.

Also note that TDD is not the only way of getting this benefit. IMHO Paul Graham's essay Programming Bottom Up[1] and Casey Muratori's concept of 'compression-oriented programming'[2] both espouse this same idea of improving API design by "writing examples of their use."

[1] http://www.paulgraham.com/progbot.html

[2] https://mollyrocket.com/casey/stream_0019.html

ivan_gammel · on Oct 19, 2016

Well, I practiced TDD for no less than 10 years, and its benefits are still not obvious, more like something that needs to be continuously reviewed in the context of the current work. Once you've understood all the design patterns that TDD enforces, you don't need TDD to use them. You just think how you _would_ test your code and defer test implementation for future.

platz · on Oct 19, 2016

Why not just learn the patterns directly?

ivan_gammel · on Oct 20, 2016

:) Who said they should not be learned directly? TDD is just a good demonstration of their use.

mempko · on Oct 19, 2016

While you are right that TDD is about design, and that it creates examples of use and contracts in a sense. There are better methods for this purpose like Design by Contract which in my opinion is a superior method. Why?

Mostly because TDD process of red-green-refactor is an evolutionary process. Like taking mother nature to design.

It's un-intelligent design....hahahha

pmontra · on Oct 19, 2016

TDD means that you write tests first, see them fail, write the code, see the tests pass. Writing the code first and then the tests is not TDD. I didn't find much value in tests first, especially because sometimes I really dont don't have any idea of what tests to write and I have to sketch some code anyway. I tend to write some code, then its tests and use them to debug my code. I save the time spent to write some tests that will be irrelevant to the code I eventually end up writing. In very few cases, let's say twice per year, I have to solve very clear algorithmic problems with very well defined inputs and outputs. Example: process the elements of an array and return another array. Then I write the tests first.

Anyway, tests first or tests later both are good. No tests is a nightmare as a couple of projects I've just inherited are reminding me today.

Edit: typoes

onion2k · on Oct 19, 2016

I have a somewhat different experience. I've found TDD is useful on even tiny projects because using TDD forces you to write better code. It's really hard to reliably test things that use "magic" and side effects that affect parts of your code they shouldn't really be affecting, and really easy to test things that have well-defined and properly documented interfaces that only do one thing. Consequently writing tests, even if you never actually run them, makes your software better.

js8 · on Oct 19, 2016

I would humbly suggest that because you and parent (and other people here) have such a different positive experience with TDD, that you cannot quite agree when it works, only that it sometimes works, then this is a big indication that TDD is just a placebo.

But maybe that's what's needed - a placebo to feel you better as a programmer, and thus making you more productive through your feelings.

falcolas · on Oct 19, 2016

> TDD is just a placebo

This is my experience. I've watched teams do TDD, only to come up against their first refactor and have to change 80% or more of the tests. Just the other day, they were griping about a code review that they had to do which involved changes to 2,000 lines of test code over a code change with minimal functionality change.

TDD has the effect of making you think differently about how you code. Thinking about how you write code is a good thing, but you don't need TDD to make you think differently about your code. You just need to think.

mseebach · on Oct 19, 2016

I can't speak for the other person, but from my reading, we don't actually have very different experiences. We seem to agree on the value of testing of large/complex project, but have a (minor) difference of opinion over the value of testing in very small projects. Indeed, I'm pretty sure we're deep into hair-splitting territory, as I do agree that TDD'ing small projects can lead to better code, I just don't think the pay off is as obviously big as it is for more complex ones.

It's like observing two fans debate which Star Wars movie is the best and the merits of Jar Jar Binks, and concluding that because they disagree, in fact all Star Wars movies must be pretty bad.

js8 · on Oct 19, 2016

> and concluding that because they disagree, in fact all Star Wars movies must be pretty bad

No, I am not concluding that the TDD is bad, if anything, I am concluding that it doesn't actually matter. Anyway, I am not even clearly doing that, I just wanted to entertain the idea that some practices (such as TDD) can be placebo, so they subjectively feel as a good thing despite the fact that we cannot measure any effect.

Neither was my intent to touch the moral issue about placebos.. I think if it works for you, do it.

In fact, thinking about it some more, there can even be practices in SW development that have nocebo effect, that is, have no measurable impact, but make you feel worse. Daily scrum meeting comes to mind for me.

crdoconnor · on Oct 19, 2016

I have the same positive experience as both of the above commenters. I've found that there are many other variables which can change its effectiveness though:

* The type of code I'm working on

* How well I remember/understand the full requirements.

* How much of the testing framework is built already and how much I have to build.

* The type of test I'm writing.

* How new the project is.

collyw · on Oct 19, 2016

Yes, TDD seems to involve having a decent idea of the requirements (very un-agile). Many projects are making it up as they go along and iterating to a reasonably correct outcome.

crdoconnor · on Oct 19, 2016

It means having a clear idea about the requirement you're implementing at that point in time (not un-agile). It doesn't mean that you have to create a big set of requirements up front (un-agile). You can just pick a story, convert it into a test and implement the code, rinse and repeat.

Writing the test also gives you a second chance to think deeply about the requirement and fix any problems with it before wasting time implementing the wrong thing.

ubu7737 · on Oct 19, 2016

To me it's sort of like people who use an index card when they read. They move it line-by-line down the page so their eyes have an easier time traversing the line of text.

If this is the crutch that makes you a better programmer, who am I to discredit the practice? But dogma has no place in software engineering.

onion2k · on Oct 19, 2016

Kind of like those developers who have to rely on syntax highlighting and linters, right? Those guys! :)

Sometimes using something to help you write better code is just a plain old good idea. You can write it off as unnecessary dogma, but if it actually makes your code better you're shooting yourself in the foot a bit.

rlpb · on Oct 19, 2016

> Consequently writing tests, even if you never actually run them, makes your software better.

It could be argued that an experienced programmer doesn't need TDD to write the same code that TDD could have produced.

gridspy · on Oct 19, 2016

That used to be my attitude too.

However the experienced programmer will be very happy when they don't have to re-test all of the old functionality manually after a large re-factor of some core code.

ivan_gammel · on Oct 19, 2016

Experienced programmer may write tests after the main code is complete to make sure nothing is broken during refactoring. It has nothing to do with TDD.

ivan_gammel · on Oct 19, 2016

That's true. Neither from system design, nor from quality perspective TDD is necessary to produce good code. However, it may still add some value even for experienced programmers, when it's easier to write a test than to build the pure mental model of the domain or the algorithm.

onion2k · on Oct 19, 2016

I've only been a developer for 20 years so I'm not at that point yet.

mikegerwitz · on Oct 19, 2016

Tests provide certain guarantees where compiler proofs (e.g. compile-time errors) are absent. You don't need a large project to realize those guarantees.

Tests aren't just for initial correctness---their benefit is largely in maintenance. They allow you to make changes or refactor the system with guarantees that the covered portions still operate as they were originally designed. There is really no excuse for those types of breaks.

Then there's the team setting: I'm not the only one writing my code. Someone else has to get in there and make changes. I might have to come back months or years later and make changes. At work, we have five developers touching over 100 distinct projects at any point. We have observed that the lack of tests essentially guarantees breaks---we have QA, and there is a direct correlation between bugs and whether code is comprehensively tested.

With regards to TDD: we also observe time and time again that writing code after tests yields untested behavior. There are always odd cases that might not be immediately obvious. There might be a bug that was inadvertently introduced, and now that bug is tested as part of the implementation. Certain branches may not be fully tested. I've seen comments here about this being an experience thing---that more senior developers won't have this problem. That's essentially saying that one is infallible; it doesn't make sense.

TDD also prohibits rushing. Not writing tests is also an excuse to rush the implementation and produce a shitton of unnecessary code. Tests slow you down and force you to think.

Small projects grow. If you don't write tests upfront, when do you write them? We develop incrementally---all of our projects start small, and some of them will remain small for perhaps a year or more until we revisit it. Writing tests at that point loses a lot: we may have forgotten the details of the implementation by then, and the person who originally wrote it might not be involved at all in the changes.

blazespin · on Oct 19, 2016

Absolutely, this. TDD is hugely valuable especially when new people join the project who don't understand and the best way they contribute is by adding tests!

5gtgd5t5e · on Oct 19, 2016

This is really wrong!

Tests are important, you should not have "the new guy" write them, because they should already have been written, by someone who understood the code being tested. Having some junior dealing with testing is a sure way of producing useless, slow and incorrect tests.

croon · on Oct 19, 2016

That sounds like TLD, the opposite of TDD.

mempko · on Oct 19, 2016

I would say the cost of TDD only kicks in at scale... unit tests are a real cost that plague a project and make devs ultra conservative.

kyberias · on Oct 19, 2016

What do you mean ultra conservative? I can see that the opposite is true: when there are no tests, devs are conservative, ie. they are afraid to change anything. Having at least some tests, they have the possibility to change and refactor more safely.

dang · on Oct 19, 2016

> It doesn't justify the sensational title here on HN.

We've changed the title along with the url. Please see https://news.ycombinator.com/item?id=12742120. The submitted title was “TDD has little or no impact on development time or code quality” (including the quotes).

kriro · on Oct 19, 2016

Sample size is 21 and they used Kruskal–Wallis (which is parameter free). Might not be the best tests (iirc. it's usually recommended for the exact combination of ordinal+nominal but I'd have to check) but it's suitable enough.

Also I don't think convenience sampling is bad in this case (and they clearly indicate they did it etc.) as it allows you to control for TDD experience and the like. I'd even go further and say the typical "lol students as participants" isn't an issue either.

"How effective is test-Driven Development. Making Software: What Really Works, and Why We Believe It" by Turhan et al. is a good meta analysis if you want more studies.

As an aside I like the fact that this is a replication study and that there have been a couple of these.

I mean worst case you can uses theses studies to guestimate better priors for your own Bayesian analysis :)

zorked · on Oct 19, 2016

Criticism to the study goes both ways. It COULD be that with a larger sample size, more meaningful project and more experienced developers the result would be "TDD actively hampers productivity by 70% and leads to more bugs compared to test-last".

Being mindful of the conditions of the study is important. Shrugging it off isn't helpful in the least.

nickpsecurity · on Oct 19, 2016

Whereas the same conditions you listed existed in the link I just found on Cleanroom method from 80's with clear evidence the method worked to produce low defect software. The weaknesses plus resulting quality just added further support.

http://infohost.nmt.edu/~al/cseet-paper.html

EDIT: This is now it's own submission below if anyone wants to discuss it there.

https://news.ycombinator.com/item?id=12741237

bnegreve · on Oct 19, 2016

To be fair, the abtract says "The results failed to support the claims." and the conclusion says "We recommend future studies to survey the tasks used in experiments evaluating TDD, and assess them with respect to the treatments.".

This is not a claim that TDD is useless.

zamalek · on Oct 19, 2016

* The code-base was new, simple, and subsequently thrown away.

* There is no way to scientifically measure absolute code quality.

There are numerous subjective indicators (tabs vs. spaces) and the objective indicators only really become apparent once you are dealing with more than one component/service.

StreamBright · on Oct 19, 2016

I agree with you on this. The study is irrelevant from the statistical point of view.

However, there are other considerations with TDD, probably on a more theoretical level that:

* There is no other major engineering branch that uses testing this way in production (would you bang a car against the guard rail to see if it works?)

* There is one thing common for sure with all of the bugs you catch in production, they all passed the unit tests

I think testing has it benefits but driving(!) your software development efforts is probably too much, at least for me.

TheOtherHobbes · on Oct 19, 2016

Car manufacturers model car bodies mathematically so they have a good idea how a body will behave before they build it.

This hints at why the philosophy of TDD is not as helpful as it seems: it confuses modelling/analysis with implementation.

Your testing methodology can be perfectly implemented, but if your problem analysis is wrong your software will still be useless and broken.

TDD would be more interesting if it understood the distinction between specific behavioural expectations (easy to test for, but incomplete), generic applicability and model robustness (much harder to test for, but more complete), and complete formal correctness (often unreachable, but always an interesting goal.)

TDD is better than nothing. If all you can manage is a set of behavioural tests, that should still improve reliability.

But it's a big leap from there to the suggestion that if you define your spec as a series of behavioural tests and your code passes them all, you have a full and correct solution for the initial requirement.

That is just plain wrong, and in untrained hands it can be dangerous.

regularfry · on Oct 19, 2016

To your first point, the usual response is that there's no other major engineering branch which demands quite such extensive changes be possible after you've broken ground. To extend your analogy a little, nobody takes a car production line and expects the people running it to be able to switch to helicopters without starting over.

StreamBright · on Oct 19, 2016

I agree, and it does not happen to my projects either. We agree in advance on the set of features for a certain version and we deliver that. If you doing much car building when you actually need a helicopter than there is something wrong.

Beltiras · on Oct 19, 2016

If you could bang a car against a guard rail without incurring the cost of a car, you would.

drostie · on Oct 19, 2016

I mean, you do incur a cost for programming with TDD though. The benefits may repay the cost in the end, but the cost is still there. And for that matter cars do undergo destructive testing, just not TDD--and the problem there is presumably that the car benefits from taking a holistic component-driven approach, "here is what I want the car overall to look like, and what components I want to fit into the space inside, now how can I reduce the rollover risk?" rather than "here are some basic expectations of a car, I expect it to be able to move forward and hold a passenger, let's make the smallest thing which causes that to happen. OK now let's add a single passenger, and let's test that bugs aren't flying into your face at highway speeds so that we remember to add a windshield eventually. (1000 steps later) Crap now we're testing the ability to shift into reverse and none of my homespun bare-minimum engine design is going to work there."

TDD probably pays its best rewards with relatively simple well-defined projects, and needs to be deployed with a modularization strategy. Possibly the wishful-thinking strategy of top-down design would be a natural complement, since when you wishfully think of a function you can hopefully quickly write a test or two for it.

Beltiras · on Oct 19, 2016

Your example posits an evil developer with a BOFH mindset. S/he is trying very hard to get away with doing as little work as possible while meeting absolute minimum requirements. I know that you are parodying the Agile mindset but any philosophical framework within which you make software has edges. We try to not make them count by having good hiring practices. I'll be the first to grant you that hiring is an unsolved problem in software development.

simula67 · on Oct 19, 2016

One the one hand, your points are valid. But on the other hand, it is frustrating to see that TDD proponents resort solely to rhetoric to sway people.

icebraining · on Oct 19, 2016

Nope, there are also studies (done by Microsoft and IBM) indicating that TDD helps: https://www.infoq.com/news/2009/03/TDD-Improves-Quality

thedufer · on Oct 19, 2016

It isn't terribly clear from the abstract and I'm not going to pay for the paper, but it sounds like this is really a comparison of tests vs no tests. That has nothing to say about the practice of TDD.

This is why this newer study is interesting; it compares to TLD, which allows you to say something about TDD rather than just about testing.

Silhouette · on Oct 19, 2016

Careful: sometimes even widely cited studies like Nagappan don't actually show what they are misreported as showing either. For example, if you look at the development processes being compared in the cases examined, often they aren't quite TDD as widely described and advocated.

Semiapies · on Oct 19, 2016

Since when is pointing out the serious flaws of a study "rhetoric"?

I'm ambivalent on TDD, but the arguments defending this study here have been ridiculous and anti-intellectual.

staticelf · on Oct 19, 2016

Yes it's flawed, but so is development methodologies in general.

The truth is that it depends on the team writing the code not how you write it. While consistency and a method may help to organize the group it doesn't really say anything about any quality what so ever.

sytelus · on Oct 19, 2016

Most importantly, their solutions didn't need to actually work in the real world. If that wasn't the case, one would find that there are often massive rewrites (not "refactors") when software is in embryonic form and/or users are giving it a try. A major advantage and disadvantage with TDD approach is that you pay the price upfront to avoid more charges letter. But if your features, business model and code is still haven't taken solid form then your upfront payment gets wasted.

wagonman · on Oct 19, 2016

Thank you for looking into the details.

apatters · on Oct 19, 2016

The study presents imperfect data. You have presented no data. Which is more credible? Why do comments like this get upvoted so much?

sanderjd · on Oct 19, 2016

Perhaps oddly, it turns out that presenting no data is better than presenting data with poor foundation. Comments like this get upvoted because many of us appreciate someone pointing out when there's no "there" there before we've spent our own time reading.

parfe · on Oct 19, 2016

>Why do comments like this get upvoted so much?

They're dressed up versions of "correlation != causation". Low thought comments appealing to people with not-even stats 101 knowledge.

There is no such thing as a perfect study, and a generic version of the OPs comment could be copy and pasted on any study ever. Unless you study the entire population of the planet, you're going to miss subgroups. Unless you study the entire population of the planet, you're going to need some sort of selection criteria. Unless you have infinite funding and time, you're going to need to make trade offs and sacrifices in your experiment design.

A study will disclose these shortcomings for readers to balance the significance of results against.

Take this complaint from OP:

>* The sample size was tiny. (20 students)

What sample size would satisfy him? Why is 20 too small? 40? 80? 1037? Is he basing his opinion of a proper sample size on his gut? 20 just doesn't feel right?

hueving · on Oct 19, 2016

Have you heard the phrase "garbage in, garbage out?"

Udik · on Oct 19, 2016

Referred to the study or to the commenters?

algesten · on Oct 19, 2016

uhm. lol? are you saying a study can't be criticised by anything less than a counter study?

_csoo · on Oct 19, 2016

At the same time we should still praise them for trying to replicate a study, which doesn't happen nearly often enough in the computer science field (aside from algorithms and data structures studies).

varjag · on Oct 19, 2016

But you have same variation in professional environment as well. It well could be that TDD is not that much of a gain if it gets lost in the noise of confounding factors that easily.

cheez · on Oct 19, 2016

Students are not even a good sample for the efficacy of TDD.

grumpyprole · on Oct 19, 2016

Processes like TDD are most often applied/enforced with inexperienced developers in mind.

icebraining · on Oct 19, 2016

I've never seen Kent Beck make that claim, have you?

grumpyprole · on Oct 19, 2016

Has Kent Beck been pushing for TDD at Facebook, or are there too many experienced devs?

Experienced devs do what works best for them, which may be TDD but probably in most cases not.

hackuser · on Oct 19, 2016

It may be unfair to say this in response to the parent comment, but the great majority of HN discussions start with a comment like this one: It's seriously flawed, etc. Occasionally it's true, but the noise drowns out the signal.

In a graduate-level engineering class, the students were making similar statements about all the studies we read. One day the professor said: It's easy to find flaws in someone else's work; humans are flawed. The real challenge and benefit is to find the value in their work - find what has lasting value, learn from it, and carry it forward.

tikhonj · on Oct 19, 2016

I completely disagree. The reason we care about these studies is to make decisions. If a study is flawed... maybe it's flawed. Maybe it shouldn't be relied upon any more than some random opinion blog post. (Maybe even less!) But just being "a study" lends it a lot of weight compared to that same random blog post, so it absolutely should be held to strong scrutiny.

It might be a "real challenge" to find value in every one of those blog posts too, but it's by no means useful or valuable. It's a waste of time if not outright counterproductive.

If all the studies are flawed, maybe the field is flawed. Maybe the subject is just not susceptible to (cheap) experimental studies. It's hard to trust peer review when there might be problems with the whole field. We may very well be better off relying on experience and opinions because realistic experiments are so far off the mark.

It can be even worse: generalizing the results of a bad experiment might even be dangerously wrong. Psychology results from experiments on young, Western college students are a great example—we don't want to make laws or base diagnoses purely on experiments like that because that could actively harm groups that are fundamentally unlike young, Western college students.

And all that is pretty much exactly where I see experimental software engineering: the results just don't generalize. And sometimes, I suspect, results generalize in ways that are counterproductive to experienced programmers working on large projects—exactly the people I actually care about. And yet empirical studies (even bad ones) still inherently carry a lot of unearned cachet. The real challenge at the end is overcoming this cachet, not finding value where there just might not be all that much.

tlarkworthy · on Oct 19, 2016

It's a total waste of scientific resources to insist every study requires an army of test subjects, and the top statisticians of the times to apply the latest modeling, to work blind and replicated independently before publication.

There are limited resources to do research. You can get insights into how the world works much cheaper if you use critical thinking and facts outside the controls. Casually dismissing every study as irrelevant due to lack of rigor is wasteful.

jdlshore · on Oct 19, 2016

If studies don't need to be rigorous, what's the point of doing them? Why not just write a persuasive essay instead? Is the format of a "study" just a rhetorical device, a glammed-up appeal to authority?

The whole point of careful statistics and well-designed experiments is so that we can learn whether a premise is true or not. Without rigor, we prove nothing; this study for example, neither proves NOR disproves anything about TDD in a professional setting. For us professionals, it's noise. Yet it's sitting at the #1 spot on HN, fulfilling people's preconceptions.

Also, the replication crisis [1] would disagree with you.

[1] https://en.wikipedia.org/wiki/Replication_crisis

beambot · on Oct 19, 2016

> The whole point of careful statistics and well-designed experiments is so that we can learn whether a premise is true or not.

That's where you're wrong... even with "careful statistics", the point of sample-based studies is to disprove the null hypothesis with some confidence level. There is no requirement that it be 100% confident. In fact, with p<0.05, you might expect some small fraction of repeat experiments to show no significant effect.

(See, it's easy to get this wrong!)

adwn · on Oct 19, 2016

Although I believe that you're technically correct, your objection does not refute jdlshore's central point.

beambot · on Oct 19, 2016

Sure it does. In the social sciences (which includes questions about productivity a la TDD), there is no "proof" or "truth" in the mathematical sense. Results are based on statistical relevance subject to the sampling (and their biases).

This study is meaningful in that it provides some limited evidence. It's fine to question biases and confounding factors... but that doesn't change the relevance of their results, merely the scope. In this case, what the researchers actually found: "At <X> confidence interval, TDD doesn't work for white, male graduate students at <Y> University working on <Z> problem. Generalize at your own peril." But that's a shitty headline.

adwn · on Oct 19, 2016

> Sure it does. In the social sciences (which includes questions about productivity a la TDD), there is no "proof" or "truth" in the mathematical sense.

Neither jdlshore nor I were talking about "proof" or "truth" in the mathematical sense.

> Results are based on statistical relevance subject to the sampling (and their biases).

The point was that this study doesn't have sufficient statistical relevance to give any evidence whether TDD is effective in general. It doesn't matter if this study gives any evidence whether TDD is effective when used by graduate students working on toy problems, because that was not the intention of the study (besides, nobody cares about this highly specific case).

beambot · on Oct 19, 2016

> [...] whether TDD is effective in general.

If you read the actual article (even just the abstract), you would realize that evaluating TDD in a general professional settings was neither the goal nor the conclusion of the authors. You are arguing a straw man and hoping to make inferences that are unsupported by the paper's claims. The authors do a good job of scientific communication about their methods, results, and limitations. This is good practice for scientific communications. Don't think it's sufficiently generalizable? Fine, feel free to expand upon their work. That's how science functions.

("truth", "prove", and "rigor" were verbatim, primary components of jdlshore's comment. The two former words have very specific scientific meaning.)

hackuser · on Oct 19, 2016

While I can see your point of view, perhaps that is an ideal and not a realistic view of how science actually works. Similarly, someone might have an ideal view of software development, but if they saw how it actually worked and read actual in-production code ... and I think that describes every profession.

Science does work pretty well; it predicts things with accuracy and reliability otherwise unknown by humanity (AFAIK). It's an interesting question: If not by hewing close to this ideal, how does it actually achieve results?

adwn · on Oct 19, 2016

You're conflating the "hard" sciences (which rely on reproducible experiments to explore natural laws) with the "soft" sciences (which rely on studies to explore human behavior). The track record of the latter is significantly worse than that of the former. Software engineering, and by extension this study, falls firmly into the soft sciences territory.

You should be a lot more sceptical of supposed facts in soft sciences than in hard sciences. Not because of prejudice or arrogance, but because it's much harder to be reasonably certain about anything in a soft science than in a hard science.

hackuser · on Oct 19, 2016

I know the differences you are referring to talking about. I mean all sciences, but really I mean that we need to hear from actual practitioners about how they really work.

vacri · on Oct 19, 2016

Studies with small 'n' are feeder studies, exploring large amounts of problem space quickly. When patterns emerge, they can be retested or expanded upon.

Equating a somewhat flawed study to 'a persuasive essay' is wholly disingenuous. As is claiming n=20 to be 'tiny'. It's small, sure, but not ridiculously small.

This kind of study isn't a final nail in the coffin, but a data point to add to the discussion.

slavik81 · on Oct 19, 2016

The use of students for this purpose reminds me of a story about a man who designs a flying machine.

After months of calculations and material research, he determines that only if he builds his machine out of birch will it be light enough to fly. According to his calculations, nothing else will work.

Days later, his best friend is visiting him in the hospital. He carries over a piece of the broken machine. With a puzzled look on his face, the friend asks, "Why did you build it out of pine when you knew you needed birch?"

"Because I didn't have any birch."

ChoHag · on Oct 19, 2016

How short the internet's memory is.

Le mieux est l'ennemi du bien. It was right here not a couple of days ago.

Sheesh why do I bother?

ggus · on Oct 19, 2016

the point is that the study appears flawed, so much that it is not even good. let alone perfect.

kriro · on Oct 19, 2016

I'm not sure it's pure noise for professionals. These students eventually turn into professionals so there's probably some relationship (at least that would be my hypothesis).

Regarding the replication crisis...this is actually a replication and there's actually more replication for TDD than for most topics I read about.

avarun · on Oct 19, 2016

Again, as a convenience sample with a tiny sample size, there's no conclusion we can even draw about students, let alone professionals.

SatvikBeri · on Oct 19, 2016

There are really two types of studies: exploration and validation. Exploratory studies on small samples are still useful for refining hypotheses and figuring out whether they deserve more resources, but shouldn't be seen as significant evidence towards a conclusion–they are generally _less_ valid than opinions from people experienced in the field.

In contrast, validating studies _should_ have large enough sample sizes to be statistically significant, precommit to sharing results in order to avoid publication bias, have well-validated experimental designs, etc.

The problem is that these two are often conflated, so most people either blindly trust studies or put no faith in them whatsoever.

jaredklewis · on Oct 19, 2016

Well, if we are discussing the allocation of resources, I for one would much prefer fewer, more rigorous studies than a large volume of studies with mediocre methodology and sample sizes. Such studies carry little more weight than one's own intuitions.

I do feel that small studies like this one serve a very important purpose: to let scientists hone their experiments. Most any large study should first be attempted as a small study to work out any kinks.

r00fus · on Oct 19, 2016

Nobody needs an army of test subjects - just statistically valid numbers. But even with 20 test subjects the study fails at trying to make that 20 representative of any meaningful population.

vacri · on Oct 19, 2016

Out of curiousity, how many subjects would you say are necessary to start being statistically valid?

jaggederest · on Oct 19, 2016

Kind of the wrong question. It's more about qualitative sample selection and the expected effect size than it is the number of subjects.

Based on my admittedly poor understanding, I would expect to see a sample size of around 70 to have the power to detect a small effect in a single-tailed T test. I would also expect to have the samples selected and balanced carefully, preferably from the target group (i.e. experienced professional programmers in the language at hand)

From what I can tell, for an effect to be detected at least 80% of the time in a sample of 20 people, you'd need the effect to be >60%. I would definitely not expect that a short term project would show 60% improvement when employing similar levels of testing, changing only either before or after writing the code under test.

jacquesm · on Oct 19, 2016

Especially not because the major benefit of TDD hits much later in the cycle, during refactoring a large chunk of code or when doing some major surgery on the whole project (and let's hope the tests are at the interface level).

ericdykstra · on Oct 19, 2016

The conversation around TDD is tired, and the conclusion is always the same: "it depends." It depends on the person writing the code, the type of problem they're tackling, the language they're using, the needs of the business, etc.

This study doesn't bring anything new to the table except: "in this manufactured environment we found a single point of data that equates to noise."

I'm guessing the only reason the story was upvoted at all in the first place is because some people who agree with the title clicked the up arrow without looking at the article.

tbrownaw · on Oct 19, 2016

the conclusion is always the same: "it depends."

Er, no. The studies I've read all end up showing that principled testing helps, but test-first and TDD (strict red/green cycle, code only enough to pass the new test, etc) provide no additional benefit over anything else that gets the tests written.

The "it depends" always comes from the echo chamber trying to justify their desire to believe that TDD isn't completely useless. It actually feels quite similar to the claims I've seen from practitioners that reikei, faith healing, etc aren't complete bunk.

bjelkeman-again · on Oct 19, 2016

> provide no additional benefit over anything else that gets the tests written.

Isn't this enough though? The tests gets written, which probably was the main point in the first place?

zorked · on Oct 19, 2016

You can also add a coverage tool to your CI and get the same result (tests get written) without any of the ideology (TDD fairies sprinkle unicorn dust everywhere).

vidarh · on Oct 19, 2016

That assumes you are disciplined enough to act on the result.

Which is what most of these mechanism are about: Finding what causes sufficient friction to get the tests written.

(I'm making this comment because I know of a team with a coverage tool tied into CI where the coverage has been ignored for years)

_ikke_ · on Oct 19, 2016

Coverage tools are not sufficient for getting good test coverage. One can easily make code 'covered' without having proper tests for them.

By writing tests early, you make sure the code you are testing is testable and your knowledge about the code is fresh.

lomnakkus · on Oct 19, 2016

> By writing tests early, you make sure the code you are testing is testable and your knowledge about the code is fresh.

IME you also end up writing tests which are far too tied to the implementation. (With the resulting churn that that implies when the implementation changes.)

You get far more mileage from QuickCheck-type tests IME. Granted, not everything is very amenable to testing using QC-type tests, but a lot of stuff is.

dllthomas · on Oct 19, 2016

I've heard that tests written specifically to drive up coverage metrics aren't much better than no tests.

ericdykstra · on Oct 19, 2016

I really don't believe any study or meta-study could come close to being able to suss out the nuance of when TDD may provide an advantage and when it doesn't.

I'd rather just trust programmers to consider what approach works best for their problem and mindset and go from there.

I personally don't TDD most things, but it's a tool I have available and I bring it out when a situation arises.

raverbashing · on Oct 19, 2016

Well, with TDD you waste time writing red tests first, then with tests that "just pass" and finally make the thing as it is supposed to be

I find it surprising that's not slower than Test-Last (maybe if you really leave it for last then you'll need some time to fit your functions to your test)

dllthomas · on Oct 19, 2016

I think we could, but it'd be very expensive (multiple large and careful studies) and only possibly worth it.

> I'd rather just trust programmers to consider what approach works best for their problem and mindset and go from there.

People are often surprisingly good at fooling themselves. I'd rather have actual empirical validation. And also a pony.

djsumdog · on Oct 19, 2016

You know, this study and others don't take about long term maintainability or brekability.

In fact, isn't this a good thing? If it has little impact on development time, we should totally be doing it. That way, we have tests. We have them from the start. We met the conditions we wanted instead of guessing conditions at the end.

Strilanc · on Oct 19, 2016

> the great majority of HN discussions start with a comment like this one: It's seriously flawed, etc.

That's easily the best part of HN and reddit. A great help with the Murray Gell-Mann Amnesia effect.

I don't think it's that HN is negative (though HN can certainly be negative), I think most articles really are just kinda crummy. Could be oversimplified, probably overhyped, maybe some wildly skewed sense of "providing both sides", or one of a million other possible problems. The comments will tell you about these problems, and that makes them a better barometer of "is this worth it?" than the title or even the content.

divanvisagie · on Oct 19, 2016

If it is unfair to say in here then you should have waited for a better time to write this comment instead of just venting yourself on the first comment you see like it.

This study has the potential to be misused by people to get their senseless argument across and could cause a lot of headache for people in the industry. It's hard enough as it is for a consultant to get people to pay for testable software.

That line about your professor is cute, but the issue is the media gobbles up studies and reports on them no matter what the quality leaving large chunks of the population misinformed. So publishing a study that is inaccurate is frankly irresponsible.

NumberCruncher · on Oct 19, 2016

Scientific methods were invented to minimise the flaws in scientific work. If you ignore these methods and base your study on biased data you better forget calling your results scientific. Fortunately there are folks on HN reminding you on this.

flukus · on Oct 19, 2016

We live in a world with too much information. There are probably more words written everyday than we can read in a lifetime.

The sooner you can remove worthless information the better.

EdiX · on Oct 19, 2016

> In a graduate-level engineering class, the students were making similar statements about all the studies we read

Yeah, that's because they are all terrible.

jacquesm · on Oct 19, 2016

Well, he did find value in the work, just not as much as the hyped title would have you believe and I'm perfectly ok with that.

The fact that it matches my experience (that TDD has benefits but only for projects of a certain size with programmers of a certain minimum experience level) helps with that, but it is always good to be reminded of the importance of sample size and other priors related to a study that makes a very bold claim.

EugeneOZ · on Oct 19, 2016

And words of that professor don't fix anything, really. I absolutely, 101% agree with comment you are replying.

Ar-Curunir · on Oct 19, 2016

Exactly this; I find too many HN comments to be critical in a non-constructive manner.

smallnamespace · on Oct 19, 2016

The criticism was very constructive though. Increase the sample size, put it in a more realistic setting.

vacri · on Oct 19, 2016

How do you propose to gather more experienced, professional developers into the same location and get them to work on a topic that isn't making them tons of money? They can't be left to do the problems in their own workplace, or the next criticism will be "uncontrolled variables!". They also have to be vetted for minimum skills (there are plenty of experienced, professional devs out there who aren't worth a second look). The parent also wants more complex tasks done.

So... where is the money coming from? Who is going to pay for this multitude of professional programmers to converge to the same environment, be vetted, and spend a non-trivial amount of time coding the same thing as the others in the group?

Of course the researchers in the article would have loved to have those kind of resources and do the perfect, wide-ranging, deeply detailed study, but the OP's criticisms just show how divorced the OP is from experimenting with real-world humans in real-world situations, and with real-world resources.

smallnamespace · on Oct 19, 2016

This may sound harsh, but taking the researcher's difficulties into account is not our responsibility.

The research presented here is weak. Honestly pointing that out without pulling punches is better than simply giving them a pass because 'doing good research is hard'.

vacri · on Oct 19, 2016

Where did I say 'simply give them a pass'? This idea that research is either a polarised "ideal" or "trash" is moronic. Taking the nature of any study into account is part of science, and part of how you caveat the knowledge gained from that study.

smallnamespace · on Oct 19, 2016

I completely agree with you. Pointing out a study's flaws is indeed part of the process of 'caveat[ing] the knowledge gained from that study'.

vidarh · on Oct 19, 2016

You could start by finding a corporate sponsor with a lot of developers and a vested interest in finding out which methods will be best for them. The advantage of that being that you could even test it on a real project (the sponsor would "just" need to be willing to dedicate twice the number of developers to a suitably small project).

It'd still not be easy, and of course there'd still be issues (e.g. is there anything about the corporate culture or training in that company that would affect the result?), but it'd still be far better than a bunch of students and toy problems.

vacri · on Oct 19, 2016

Corporate places do research like that all the time; they just don't publish them all that often. A corporation with a vested interest is a corporation with a competitive interest.

flukus · on Oct 19, 2016

Maybe we have to accept that it's untestable.

Though I think we could derive some more realistic scenarios, like evolving requirements and switching developers mid project that would be more enlightening.

yomly · on Oct 19, 2016

Whereas the cost of finding the Higgs Boson was only a mere $13.25Bn, cheap compared to all those pesky extortionate SV engineers...

As others have said, good research costs money...

mwat · on Oct 19, 2016

We can't arrange a good study, so let's churn out deeply flawed ones instead.

icebraining · on Oct 19, 2016

How do you propose to gather more experienced, professional developers into the same location and get them to work on a topic that isn't making them tons of money?

Hackatons seem to manage. Why not set one up?

vacri · on Oct 19, 2016

Hackathons wouldn't satisfy the OP's requirements for complexity, nor demographics - you'd be looking at a self-selecting group of highly motivated people, skewing young, who would come together hackathon style. There aren't going to be many thirty- or forty-something coders with young families spending a weekend (to work on the same set problems as everyone else) at the hackathon, yet there are plenty of those in industry.

unclebucknasty · on Oct 19, 2016

Indeed. It is predictable and--as a discussion point--stifling.

It's far more interesting to discuss the actual merits than to dismiss the subject out of hand.

ChoHag · on Oct 19, 2016

Promotion of schemes akin to TDD tend to build their foundations on even smaller sample sizes, usually of those close in some fashion to the new wheel's reinventor; the naïve solving text-book-example problems.

But maybe I missed something.

shawn-butler · on Oct 19, 2016

You didn't.

The top comment is a book writer on TDD who is shocked and downright appalled

TDD provides absolutely no value over TLD.

blazespin · on Oct 19, 2016

The study is huge huge HUGELY flawed. TDD only comes into value when you start having to refactor extremely large projects that you don't understand. You need the tests in order to refactor with confidence.

rubber_duck · on Oct 19, 2016

You can have tests without TDD ?

TDD is a process where you write empty shims for your code and tests for it first, failing because there is no implementation, and then you write your code that passes the tests.

Frankly I find this style to be completely opposite of how I code - getting something working ASAP, plugging it in to the big picture and then figuring out the problems with my approach and designing with the insight gained. Then I spec out the behavior with tests. TDD assumes you have the design/spec right from the start and all you need to do is write the implementation - very little of my work falls in to that category - perhaps it's different for others.

One segment where I found TDD useful is writing story level E2E tests that spec out the requirements before the code is written - this is the least ambiguous way to spec the problem I've found - the downside being that the person doing the spec needs to know how to write E2E tests.

lgunsch · on Oct 19, 2016

> TDD is a process where you write empty shims for your code and tests for it first, failing because there is no implementation, and then you write your code that passes the tests.

This is not how TDD works, and I can definitely see how writing code in that manner wouldn't be very productive at all. I also used to think that was how TDD was done until I read how it's really supposed to work by experts.

You iteratively build up code in a small tight cyclic manner. You do not write 100% of your test in 1 go, and you do not write 100% of the code in one go. You are also missing a step. TDD is a 3 step process, red-green-refactor, and the last missing step is refactoring. You start off with a small atomic behaviour you need in your API/product. You then make a test for that behaviour. The behaviour is not related to a particular class or method. Perhaps it is 1 class by coincidence , but maybe its 3 or 4 classes together. A good example of this would be if I had a method that dealt with multiple items. I would start off by constructing my test to use 1 item, and and the code to deal with 1 item. Then the next iteration maybe a list of 1 item in the test and code. After that, maybe I'd move to multiple items in a list.

crdoconnor · on Oct 19, 2016

>You can have tests without TDD ?

I've worked on several projects without this and the tests done without TDD tend to be of higher quality.

I noticed a common anti-pattern of "write the code, run the code, copy the output of the code, paste it into a test and write an assert to check that the output was precisely what came out".

This was brittle, it killed the self-documenting aspect of the tests and it often concealed various bugs.

As soon as the same team members started writing the test first, it stopped happening.

I also personally felt better about doing this since it made it easier to correct API design mistakes before implementing the code and baking them in.

rubber_duck · on Oct 19, 2016

>I also personally felt better about doing this since it made it easier to correct API design mistakes before implementing the code and baking them in.

How do you know you made API mistakes from writing unit tests ? API mistakes become apparent when you integrate stuff and use the API in conjunction with other things (using API in isolated scenarios like unit tests is not really insightful you can figure out those flaws just by looking at the API). This is my primary criticism of TDD - you write tests around the initial bad API and you end up keeping that bad design because you already spent so much time testing that. Implementation errors caught by unit tests are cheaper to fix afterwards then design errors.

My approach is hack together a POC -> refactor that and write tests. If you discard the POC then TDD makes sense to me.

crdoconnor · on Oct 19, 2016

I usually don't do unit test driven development actually. I typically do integration test driven development where the integration test sets up an environment where the code is either interacting with the real thing or a realistic mock version.

I don't find unit tests all that useful for integration code - either as tests that find bugs or for doing TDD.

piaste · on Oct 19, 2016

> I've worked on several projects without this and the tests done without TDD tend to be of higher quality.

Do you mean "with TDD", by any chance?

The rest of your post says that you observed better results when tests were written before the code (rather than later). But as far as I understand, writing the tests before the code is indeed one of TDD's commandments.

crdoconnor · on Oct 19, 2016

Yes. Dumb mistake.

geerlingguy · on Oct 19, 2016

Could it be that TDD vs. tests-after-code is a highly personal thing? I personally find it easier to write good tests after I've coded something functional. Before hand, I know one or two fuzzy ideas of what I want to accomplish, but I can't list out the concrete, real-world test scenarios until after I've coded something, poked and prodded it, etc.

But I know some people are wired differently; they'll think a lot more about scenarios first, then code after they have everything accounted for. For them, TDD as a philosophy seems more fitting.

I think the chasm exists between _untested_ code and code that has tests. I've never understood the seemingly-religious zealotry behind TDD as an XP practice. Just like pair programming... if it works for you and your coding style, awesome. But don't force it down my throat or act like it's the One True Path to clean code.

paulddraper · on Oct 19, 2016

TDD vs. test-after-code is a small distinction.

80% of software development is designing correct abstractions/interfaces/APIs. If you have the correct abstractions, everything else is easy by comparison. And both tests and code are fundamentally founded on these early design decisions.

So whether I do TDD or tests-after-code, I'm confronted with the 80% first: designing the interfaces (either in writing or mentally).

Naturally, I never get this part right at first, and so I wind up refactoring a lot as I go along. ("Build one to throw away", I believe this has been called.)

Now in my experience, TDD requires me to refactor more during this process than tests-after-code. But suit yourself. Changing the order within the last 20% won't be earth-shattering one way or the other.

embwbam · on Oct 19, 2016

This is why designing the interface in a language with a powerful type system provides the same quoted benefit of TDD. It helps you think about the interface before you move to implementation.

In my experience, using a type system to do this requires much less effort and refactoring.

user5994461 · on Oct 19, 2016

> This is why designing the interface in a language with a powerful type system provides the same quoted benefit of TDD.

Yes. TDD came from developers who worked in weak/dynamic typed languages.

It is mandatory to discover errors, that are otherwise discovered instantly for free by compilers in the stronger typed languages.

Strangely, the TDD folks never outline this point.

paulddraper · on Oct 19, 2016

I am a big fan of "testing via types".

Unit tests can demonstrate that my code is correct; it can say nothing about how my code is used by others.

In contrast, a type system can extend those protections downstream.

(Yes, yes, unless you go way off the deep end of dependent types, you still need tests. But a powerful type system can systemically prevent a huge number of very common bugs.)

mercurial · on Oct 19, 2016

My take it on it is that it can remove most of the boring bugs you would get in a less strongly-typed language and leaves you open to the interesting bugs (eg, logic bugs).

As for dependent types, I'm not sure that even then you would be able to ensure that, say, your cache is correctly invalidated.

zamalek · on Oct 19, 2016

Additionally, strict TDD teaches you lessons. Once you've used it enough, chances are that you'll be making better interfaces even if you code first.

fsloth · on Oct 19, 2016

I've found implementing test as I go along writing a new API help code quality and speed in two ways: I catch my own logic bugs early and the self-feedback I get from writing the tests helps me to write a more usable interface.

I never catch all the design errors with non-coding design work - this gives me the high level view which is critical - but there are always some logical errors and bugs I've failed to model correctly which early tests catch.

But yeah, the optimal design-implememt-test probably depends probably very much on the implememter and on the problem to be solved.

The third meta-advantage is that if I'm writing hobby code for myself and the profject is large compared to the weekly time allocations available in my sparetime I get something to help me remember where I was (say a month back) and gives me something to compile and run immediately (which is a huge motivational booster for me).

methyl · on Oct 19, 2016

> 80% of software development is designing correct abstractions/interfaces/APIs

Not sure if that claim is wrong or not, but I will state it anyway. If you write in TDD manner and you decide to change the API/abstraction a little bit because of something you didn't think of, you will have to correct both tests and the code. When you do test-after-code it's much easier (IMO) to hit correct abstractions and interfaces because you are working on a living thing, not some artificial test cases you have to figure out. Then you write tests to ensure those abstractions are respected and in working order.

TDD people, please correct me if I'm wrong.

EDIT: disclaimer: I'm a front-end developer working mostly in React and Redux

barbs · on Oct 19, 2016

> if it works for you and your coding style, awesome. But don't force it down my throat or act like it's the One True Path to clean code.

This applies to so many things in Software Development - text-editors, variable names, plugins, architectures, operating systems...

jbigelow76 · on Oct 19, 2016

...life

yitchelle · on Oct 19, 2016

Well, things start to get complicated and messy when the team size starts to grow and the technical debt starts to mount. Thats when it needs to goes from hacking to engineering.

dtwhitney · on Oct 19, 2016

nope - vim

jpt4 · on Oct 19, 2016

Ed.

Edit: ED IS THE STANDARD TEXT EDITOR! [0]

[0] https://www.gnu.org/fun/jokes/ed-msg.html

arcticbull · on Oct 19, 2016

ben-schaaf · on Oct 19, 2016

butterflies

Steeeve · on Oct 19, 2016

Ahh... good old M-x butterfly

bunderbunder · on Oct 19, 2016

For me it's a highly situation-dependent thing.

Sometimes writing tests first helps me think about the high-level design of my code. Other times I've got to try a few things before I have any idea what the code should look like, and writing tests first would just create a lot of extra code churn. I feel that, as time goes on, I'm getting better at anticipating which situation I'm in. I also switch back and forth between the two testing methods even more rapidly.

olalonde · on Oct 19, 2016

> Sometimes writing tests first helps me think about the high-level design of my code.

I'm personally a fan of "readme driven development" (https://news.ycombinator.com/item?id=1627246). It's a nice compromise between tons of upfront planning and nothing at all.

barking · on Oct 19, 2016

That's an interesting discussion and article. I sort of do this already for new features. You really feel the power of it, when after writing a lot of stuff you realise that there's a hitch, rewrite from earlier and shudder at the thought of the mess and wasted time if code had been written. In its final state the document becomes a series of steps that I can check off, which is helpful in a motivational sense.

I had thought of involving users in the design of new features via the support forum but I'm afraid of how that might turn out. A readme type document would probably be about as good a way of doing it as any though.

madeofpalk · on Oct 19, 2016

Yes.

When doing UI code (with something like React) it's next to impossible to do TDD - you can't know the structure until you actually implement it.

If I'm fixing a bug in some sort of non-ui code, I often find TDD to be very helpful. Create a test case that demonstrates that failure, then fix the implementation to make the test pass.

Udik · on Oct 19, 2016

I'm not sure that testing a bug, even before writing the code that fixes it, qualifies as TDD.

dcosson · on Oct 19, 2016

I find the same thing. It's almost like writing an essay, I need to sketch out an outline first, which is often getting just far enough to convince myself that the approach is reasonable and there aren't any surprises ahead. Once it's time to crank out the details of individual sections, the work often starts to feel monotonous and that's where TDD can be invaluable for helping me to stay focused and comprehensively work through all the edge cases.

wirrbel · on Oct 19, 2016

When I first learned about TDD I thought the same (order of test and implementation does not matter that much). Yet when you do the TDD steps

* write a failing test (RED) * implement minimal functionality so all tests pass (GREEN) * REFACTOR

and overcome the initial revulsion you will get a ton of benefits that you didn't know existed. I'll first give a few that are more on the code quality side

* You have a complete test suite, without redundant tests * Tests are usually not that mock-focused [] as the implementation-after-test didn't introduce random roadblockers for testing. tests do assert the relevant properties (tests-later code bases often have code bases that don't assert much or properties you don't care about). * When you write your test first, you basically write a small example how it feels like to use the API you have in mind for solving your problem. This drastically improves quality.

Some non-technical benefits of TDD

* I find TDD helps me tackle bigger code-pieces where I just don't know how to start implementing. * TDD is programming gamification. You are rewarded with regular, small successes of having added another test and made it green. So whenever you ask yourself: OMG, have I made any progress in the last 3 days, you can actually tell: Yes, I added 15 tests and made them pass - I objectively got closer to implementing that feature. * Not an issue where I work, but have often heard about this from other places: If tests are there first, no one can tell you to not spend so much time on writing tests. TDD-style is protecting your boss from shipping that car without having the brakes checked. * The same line: TDD gives explicit room for refactoring.

[*] obviously one cannot avoid mocking especially when dealing with resources external to your code base, yet I have hardly seen TDD-originated tests that mocked

nhaehnle · on Oct 19, 2016

> I find TDD helps me tackle bigger code-pieces where I just don't know how to start implementing.

This made me think. There are many axes of software development, and aside from TDD/non-TDD axis, there's also the bottom-up vs. top-down axis. By top-down, I mean you start with a skeleton that provides an interface to the outside world, and the fill in the implementation details. Bottom-up, on the other hand, means you first write some routine that you know will be vital in some form or another, but without immediately seeing how it will be hooked into the interface to the outside world.

Top-down has the advantage that you can have something runnable quickly, and it seems to me that TDD requires top-down development.

But somehow, there have been times when I was working on a piece of code and it just flowed more naturally in a bottom-up manner. It has happened that I went for a day (and very rarely longer) working on tricky algorithmic code before I even tried to compile it for the first time.

This is definitely something that I try to avoid when I can, it's just that sometimes it works out that way. And it is totally incompatible with the TDD way of doing things.

How often this happens surely depends on the type of project you're working on. As usual with these things, it seems like TDD can be a useful inspiration, but it shouldn't be taken as dogma.

wirrbel · on Oct 19, 2016

This reminds me of https://twitter.com/marick/status/787402452848873472

I dont think writing Tests first makes you work top down or bottom up.

If you want to work bottom-up (I like to work bottom up) you just start by writing tests for your bottom up functionality.

And of course I don't want to impose TDD on others, but I would like others to try it out sincerely (recommending kent Becks book for this. Don't do it without guidance, many people who reject TDD after trying it out were doing some kind of Test-first method, but didn't get the benefit because they were not aware of stuff like get one test at a time to complete, etc.)

---------

> shouldn't be taken as dogma.

Life in software engineering becomes much more relaxed when you realize that every piece of advice and technology is in fact based on dogmas and is not based on sound empirical, statistically relevant evidence :-) (apart from the pure math part).

It is just really really hard to study software quality in an objective way that satisfies scientific criteria. So we are kind of stuck with hearsay in IT

So TDD is kind of dogmatic. So is OOP, or the rejection of gotos.

regularfry · on Oct 19, 2016

The "lightbulb" moment for me was doing what Kent Beck suggests, and writing down the list of tests I "knew" I needed to write, before writing any tests. That's a very similar step to diving into writing the code first, "knowing" what it ought to look like, and the ideas feeding into either step would be the same.

Writing a list of tests first is quicker, though, and working through the list, updating it as I learn more, tells me that my initial ideas were wrong often enough that I'm almost convinced that the true secret of TDD isn't the tests at all. It's that checklist, and the purely mechanical process of working through it, checking them off one-by-one and fixing it up as I go along.

To relate this to your situation, I often find that I'll only get started with a fairly fuzzy idea, and the checklist will be short. I can still expand it as I go, knowing that I'm working on as solid foundations as I can given the constraints.

I think there's a danger in writing off practices like TDD with "if it works for you..." because it avoids discussion of why it works for some people and not others. That in turn gives people license to ignore it simply because it's unfamiliar, and the brain doesn't like unfamiliar things by default. That's a shame, because for some of those people they'll be stuck in local optima, writing off the one thing which could break them out of it.

shubhamjain · on Oct 19, 2016

> But don't force it down my throat or act like it's the One True Path to clean code.

When I was a programming beginner and started learning about TDD, I was always thinking how soon to start using it because everyone says it's such a good idea. If you would search anything about it, you'd rarely find a resource not marvelling at its surprising benefits (at least, that is how it was few years back). Granted, it may have its merits but is it really a good idea for everything? No way!

One of the anti-patterns I have seen in programming world is how everyone jumps on adopting / advocating a 'good' practice without understanding how it fits the project. I wrote about this a while back[1]. Cue in — git flow, TDD, not using goto, DOM Parser instead of RegEx.

[1]: https://shubhamjain.co/2015/10/26/imperfect-best-practices/

madeofpalk · on Oct 19, 2016

Reminds me of a comment by Harry Roberts on CSS methodologies https://twitter.com/csswizardry/status/539726989159301121

    Modularity, DRY, SRP, etc. is never a goal, *it’s a trait*.
    [...] understand that they’re approaches and not achievements.

Alex3917 · on Oct 19, 2016

> I think the chasm exists between _untested_ code and code that has tests.

IMHO untested code isn't always a bad thing. Anything related to privacy, security, or data integrity should be heavily tested.

But beyond that, code should need to earn its tests. (At least in a web startup context.) After all, the core of agility is being able and willing to go in a different direction when something isn't working.

i__believe · on Oct 19, 2016

I think there are a lot of nuance to this statement. After a few years on the start up scene, I believe that if you are using a "duck" typed language your core concepts should be tested without exception (for example rails and django models). The farther you get away from these core concepts, the less your tests need to be there and the more likely that code is to change.

In addition, if you are truly in the earliest stages of your startup (pre-revenue/traction) the tests are not needed -- but you have to be _very_ aware that you are making a tradeoff for sheer velocity and you will have to pay the price later on. I've seen too many startups fail to pay the price and it comes back to haunt them and reduces their overall velocity.

snovv_crash · on Oct 19, 2016

If your code doesn't have tests then you can't refactor because you don't know if you broke something. Getting the right level of tests is important, they should check the interface and not the implementation, otherwise I agree, you can't refactor anything. But if you test your interfaces the same way that real code would use it then it actually makes you more agile, because it gives you the peace of mind to do major changes.

dasmoth · on Oct 19, 2016

"Tests can make refactoring much easier" is a statement I could easily buy into.

"If your code doesn't have tests then you can't refactor" is pure dogma and trivial to falsify (go on, refactor some untested code now!). More likely to introduce bugs? Perhaps. But relying on that absolutist statement about refactoring will alienate me every time.

snovv_crash · on Oct 19, 2016

Let me clarify: you can't refactor without a very awake QA team, because you have no idea what your changes may have broken.

kalms · on Oct 19, 2016

And a passing test is a guarantee for a working feature across an entire platform. Every. Time.

crdoconnor · on Oct 19, 2016

>Could it be that TDD vs. tests-after-code is a highly personal thing? I personally find it easier to write good tests after I've coded something functional. Before hand, I know one or two fuzzy ideas of what I want to accomplish, but I can't list out the concrete, real-world test scenarios until after I've coded something, poked and prodded it, etc.

I generally find it better to engage in "top-down" development - specification->tests->high level code->low level code as it's cheaper to fix mistakes at higher levels first.

If I'm doing purely experimental code though - where I'm not sure of what I even want or what the output should be, tests are just a waste of time.

It also loses its effectiveness the more declarative the code is. If I'm essentially tweaking configuration or HTML there's no point.

catnaroek · on Oct 19, 2016

> they'll think about scenarios first, then code after they have everything accounted for.

I'm one of those people. But I find it hard to believe that a test suite can reflect everything I understand about a problem domain. Focusing too much on unit tests makes it difficult to find the general rules that govern the behavior of a system. Have too few unit tests, and they won't capture all the subtleties and corner cases. Have too many unit tests, and they won't fit in your head.

mcv · on Oct 19, 2016

To me, it depends on the problem. With some problems, I'm not sure yet how to implement it, but I do already know what the result should look like. That's an excellent case for TDD. You get to work on the problem, refine your ideas about it, and you start with the part you already know: the results.

If you have no idea what the results should look like, then TDD is pointless.

zby · on Oct 19, 2016

For me it also depends on the task - I never do a strict TDD - but often I like to start with some tests.

Sometimes it is hard to write the tests, but it is easy to see if your code works - sometimes it is easy to write the tests, but not the code.

I would call it 'do first what is easier'.

But it is important to write the tests quickly after you wrote the piece of code - so that you don't forget something.

tspike · on Oct 19, 2016

My experience has been that TDD is worthwhile when working with notoriously slippery whack-a-mole functions like handling time or money. The time saved by catching regressions vastly outweighs the time taken to implement the tests.

In contrast, TDD has been a waste of time for me for UI-based work, as the effort needed to properly expose the functionality under test is too great and the requirements and design change too quickly to be worth it.

In the latter case, writing some deterministic UI tests against mock data after the requirements and implementation have settled has proven much more effective in preventing regressions.

hliyan · on Oct 19, 2016

Exactly my experience - I've done TDD for 3 different types of calculation functions in 3 different languages over the years, and all were beneficial: financial risk calculations in C++, report data calculations in PHP and billing calculations in JavaScript.

On the other hand, TDD for operations that involve I/O (including user interactions) were not helpful at all.

reitoei · on Oct 19, 2016

That would probably be better handled by functional/e2e testing?

methyl · on Oct 19, 2016

TDD doesn't assume you are using unit tests.

gohrt · on Oct 19, 2016

You are saying "TDD" but your comment reads the same if you just say "tests in general"

brightball · on Oct 19, 2016

Agree here. Tests the stuff that needs testing.

candiodari · on Oct 19, 2016

That means you can't use coverage tools, so this approach makes the tests unquantifiable. It is not how most major dev shops work, because of the bean counters and the "senior" developers/leads/managers/... who seem to want nothing but appeasing beancounters.

It also produces superior software in my experience.

flukus · on Oct 19, 2016

> That means you can't use coverage tools, so this approach makes the tests unquantifiable.

So it's win win then ;)

brightball · on Oct 19, 2016

It's been very YMMV in mine. For some things, it's indispensable. Others it's tedious, repetitive and of questionable benefit.

thingexplainer · on Oct 19, 2016

It's tedious and repetitive even when the benefit is obvious and substantial, unfortunately.

candiodari · on Oct 22, 2016

Automatic test generation tools. There's quite a few on github, but the real ones are the ones you write yourself. What you're looking to do is to run the software a few times and record tests for particular pieces of code during those manual tests.

err4nt · on Oct 19, 2016

Can you elaborate more? Im a frontend guy and sometimes I write JS plugins for the browser (that run after page load, and apply based on how things render on the page, or bawed on user interaction with the page) and non-frontend folks tell me I need tests for my code, or dont want to look at it until I have test. How would I build a tewt for this, other than a functional test that runs in-browser and the test is whether it works or not?

zachrose · on Oct 19, 2016

Here are two strategies. First, you can go "outside" what you've built, simulating a user's interaction and examining the resulting DOM. In general this is a tool-centric approach.

Another approach is to partition your code into what Gary Bernhardt calls an "imperative shell" of code that touches the outside world and a "functional core" that does not. Then unit test the functional core, which shouldn't require special testing libraries, and validate that the imperative shell works by the occasional manual or "outside" test.

z3t4 · on Oct 19, 2016

First off, if your code don't have bugs, then that is preferable! In my experience with front web dev the issue has always been different (old) devices. Testing could be done by browser macros that simulate input, like mouse movement, click on x, y, z, etc. Then take screen-shots to see if something is off. You can also make your widgets throw errors (1) and call home via XMLHttpRequest if an error is detected.

1) http://www.webtigerteam.com/johan/en/blog/error_driven_devel...

EdSharkey · on Oct 19, 2016

The

  * fire your QA team

  * dev team is the level 2 production support, and

  * get to continuous integration nirvana

management fads have been sweeping through my Scrum enterprise for the last 18 months.

Teams that aren't testing constantly, well, they've got tons of escape defects on every release. And those devs are constantly in fire-fighting mode, it's miserable for them. And I see that leading to compressed schedules for them and more reckless behavior like asking to push their releases during the holidays where there could be severe financial consequences to bugs.

As far as I'm concerned, in an environment like mine, where developers can no longer hide their incompetence behind bureaucracy like a QA team, it is official insanity to not spend inordinate amounts of development time writing automated tests. You should be spending 70% of your dev time writing tests and doing devops and 30% writing features.

I read in these comments a lot of bellyaching about how much time it takes to write tests. First, TDD is a skill that you can get good at, and it won't take as much time as you think once you get good. Second, I just don't think you have a choice to not test comprehensively when escape defects become a mark of shame in the organization.

_csoo · on Oct 19, 2016

>As far as I'm concerned, in an environment like mine, where developers can no longer hide their incompetence behind bureaucracy like a QA team, it is official insanity to not spend inordinate amounts of development time writing automated tests. You should be spending 70% of your dev time writing tests and doing devops and 30% writing features.

Highly agreed, anytime there has been QA available, the code base starts to look more hacky and fewer tests are written and only the tests that cover the happy path are written. When things are looking bleak, that's when I start to think we should have certifications for developers, optional of course, like the PMP or CAPM, but they should be there as a rough indicator that a dev won't skimp on QAing their own code.

>I read in these comments a lot of bellyaching about how much time it takes to write tests. First, TDD is a skill that you can get good at, and it won't take as much time as you think once you get good. Second, I just don't think you have a choice to not test comprehensively when escape defects become a mark of shame in the organization.

After years of looking at tests written poorly, I've come to the conclusion that you have to treat test code like your regular code and do code reviews on it and refactor the heck out of it.

If you don't, you'll end up in the situation I'm in, where our company has almost 4000 spec tests in ruby/rails that can take an hour to run on a local dev box and still take 20 min when parallelized across 5 boxes. Shaving off a couple of seconds here and there by removing unneeded tests or refactoring to simplify them or to share mock objects is a worthwhile endeavour but because they're "just tests", the time to do that isn't allocated and everyone on the team merrily continues to write and modify tests in a sloppy way.

mehwoot · on Oct 19, 2016

The study is about the effects of "test first" vs "test later", not "writing tests" vs "not having tests".

EdSharkey · on Oct 19, 2016

Yeah, I tried to avoid specifying a methodology. I just couldn't resist throwing in a 'TDD' because it turns better quality results for me personally than 'test later'. My point in commenting was: testing comprehensively is not optional at my workplace due to the way the devs have suddenly been held so accountable.

Personally, I don't care how you test. Just get decent coverage over all the control paths and have a test suite that catches regression bugs. Ideally also have integration tests to prove your connections to your partners are good and maybe a smoke test to prove your code is wired up right once it hits an environment.

I'm still on the journey myself, so I can't claim to be an expert. I know there are people in my org that are suffering because of code quality issues. I call them incompetent because they're producing shoddy work products with lots of defects. A smart way to combat bugs and kill them for good is to test, so that's why I connect not testing to incompetence. A dumb way to combat bugs is heroics and working 18 hour work days trying to debug with "GOT HERE" logging.

hirsin · on Oct 19, 2016

Potentially contentious opinion, but one that's been echoed through our halls -

Developing software and developing tests and test infrastructure are two different skills and mindsets that are often inversely coupled.

It's not that "incompetence" is revealed by firing the test team (although it sometimes is) - it's that "being bad at writing tests" is revealed. A team with 3 dedicated testers and 7 devs will probably outperform (in code output and reliability) 10 devs spending 70% of their time on testing.

kodfodrasz · on Oct 19, 2016

I have experienced the opposite. After the dedicated QA team was disbanded and repurposed to development, where every developer had to write tests for someone else's code, the code and designs started to become more testable, the tests eventually became simpler to maintain and understand. (the project was and embedded system)

The "We" and "Them" distinctions made people in different teams and different roles ignore the needs of the others. The change brought a culture change and also gave a view on the needs of the other side. This could make it possible that for the first time the development of tests and features could really be done in parallel (officially that was the methodology before, but never really worked out well, because the testability considerations were usually ignored at design time, and the docs were lagging, because of the bad bandwagoning culture. With the mixed team where everyone was treated as an equal these problems dissolved surprisingly quickly (in less than half a year)).

So my point is having dedicated testers can give base for bad culture which hurts the product and the company. Having everyone do the same job with regards to development and testing is better.

hvidgaard · on Oct 19, 2016

There is no silver bullet here. For some teams it makes sense to integrate development and testing. For some teams it makes sense to have dedicated QA people. For some teams a different constellation is optimal. I know developers that are brilliant at the big picture, finishing the implementation however was lacking. I also know developers that cannot get the major architecture right, but they can finish tasks and get it shipped. pick one of each and a devops guy that can write tests, and you have a team of 3 that produce top quality software.

kodfodrasz · on Oct 19, 2016

You are probably right with your example for a small project. The one I referred was a medium sized safety critical piece with dev/test staff of hundreds. My example was for such larger organization.

Small teams almost always worked out well for me if a single leader person was present to sort out initial problems.

ajmurmann · on Oct 19, 2016

I've actually found that most QA people I've worked with were good at deciding what to test but were awful at writing automated tests. Of course that's purely anecdotal and based on a very small sample size.

I like to believe that I sometimes will forget to test for edge cases, but have mastered how to write tests. I don't think that how to write tests is easily acquired by any descend programmer. However, you need to learn a few patterns you otherwise won't need.

dasmoth · on Oct 19, 2016

The real strength of QA people can be manual testing. There's a local maximum of "meets the stated requirements but horrid to actually use", and a singular focus on test automation seems to push projects in that direction.