TDD: tastes better without the T?

gfodor · on April 28, 2010

This experience is emphasized even more if you work in a language with a REPL, like Clojure. The only value of tests, at that point, are to protect against regressions. The testing process with a REPL happens faster and more organically than writing unit tests, but it's less structured and harder to formalize. It's like comparing a structured debate to a conversation.

What someone needs to do (and I'll do eventually) is implement a REPL enhancement that allows "test capture". Namely, if you have recently evaluated one or more expressions, execute a "capture" command that extracts the working environment (locals, globals), the previous statements you've run, and their evaluated results, and outputs one or more unit tests. For example, a session with clojure might look like this:

> (def x 1)

> (def y 2)

> (+ x y)

3

> capture!

-- Saved capture001.clj

(1 test passed, 0 tests failed.)

Where capture.clj contains a test similiar to:

(def x 1)

(def y 2)

(def expected-result 3)

(assert-equal (+ x y) expected-result)

Fuzzy, and I apologize as I don't know the clojure unit testing syntax, but you get the idea. Of course, this is a baseline case, you'd need more functionality such as allowing the user to specify which parts of the environment to capture, and what results to assert, but these should fall out naturally as the tool is dogfooded.

jimbokun · on April 28, 2010

"The only value of tests, at that point, are to protect against regressions."

Which is still extremely important.

jsankey · on April 28, 2010

In fact, this the key reason that you need to have automated tests. I don't know anyone that commits a new feature without trying it -- be it (conveniently) at a REPL or via some other UI. But the manual "trying it out" method, REPL or otherwise, has always suffered from the fact that it cannot be easily repeated, so old features eventually get broken, and nobody notices. This is why I automate my tests, because their value over time exceeds the extra cost to write them vs. trying it out.

I think this is an aspect that the original article fails to take into account. Eliding tests can feel very liberating, and it allows you to plow ahead adding new features faster. Particularly in small, or at least new, projects. But over time reality catches up, and the lack of tests becomes a burden. You start avoiding adding new features, and particularly improving existing code, out of fear of breaking something. And so you end up more constrained than if you had added the right balance of tests along the way.

Writing software that is maintainable, with staying power of years or decades, requires the sacrifice of some up-front productivity.

asimjalis · on April 28, 2010

Especially if you go back and refactor and/or add new functionality to the code.

thibaut_barrere · on April 28, 2010

Exactly - and it applies to regressions caused by changes in the system (ie: change of platform, of version of the language, security patches etc), not only to changes to your code.

moconnor · on April 28, 2010

Yet another comment on HN that's significantly more insightful than the original article...

mgrouchy · on April 28, 2010

I think that speaks to the quality of the HN community.

alexitosrv · on April 28, 2010

and to the quality of the articles.

ericholscher · on April 28, 2010

Python Doctests do most of what you want.

jimbokun · on April 28, 2010

Wow, Python doctests are awesome:

http://docs.python.org/library/doctest.html

Just copy paste the REPL interaction into a doc string, and you're done. Apparently, it recognizes ">>>" as the REPL prompt, and the following line as the expected output. The equivalent for Clojure would be a neat addition.

gfodor · on April 28, 2010

It's a neat hack -- I think there's definitely a gap though, in that there could be a tool that does some smarter introspection of your history and the state of the REPL to generate a unit test. v1 would be quite rudimentary, but after many iterations this could be an almost magical tool.

The "code-as-data" homomorphic semantics of lisp would make building something like this quite interesting, as it would probably need to transform some of the code you've REPLled from statements into assertions, etc.

inerte · on April 28, 2010

I think you would end up writing code in the REPL restricting what you would type because you know it'll become a test.

Then you wouldn't using a REPL for what a REPL can offer, but writing test code.

Unless it was really magical and would cover 100% of anything that someone could type in the REPL. If you were only using a subset of the REPL/languages features because you know your to-test-converter doesn't like some stuff, then you're writing the tests anyway.

gfodor · on April 28, 2010

I'm confused. This would be a REPL enhancement, meaning it would be something you use as needed.

Currenly, using the clojure REPL to test things comes with a twinge of guilt, as it is not being captured for regression tests. (and I am too lazy to write unit tests separately)

This would make it so that using the REPL would cycle between two "styles", ad-hoc experimentation and then, when you've found some repeatable behavior you want codified in a test, capture mode. These can in some cases be distinct processes (physically and mentally) and in other cases overlap so much as to look and feel like the same thing.

gfodor · on April 28, 2010

Yes, yes I am. :) The reason being, that first, it's annoying to have to set up the boilerplate for the file. Second, it's annoying to have to convert what I just exercised in the REPL into a test. (Setup, tear down, asserts.)

The reason TDD works and is fun is that you are using tests to learn and explore. It just so happens the artifact of that learning ends up living forever as a test. In a REPL, I'm doing that same learning and exploration already. The act of writing a test becomes as exciting as filing a TPS report.

Here's an example. I've just implemented a new function, and REPL'ed it to solidity. There were about 4-5 ad-hoc calls I made to the function to prove that it worked. I just finally got to the point where I can call it using my 4-5 different arguments, and it always outputs the right thing. Using readline, my arrow keys, and my enter key, I'm repeating the same series of steps over and over until the function works. We all do this. Win.

Now, I'm at a crossroads. Do I just start working on the next piece of the project? I know this piece works, I'm happy with it.

But wait! What if something changes. I need to write a test don't I. Sadness consumes me, since testing is slowing me down. I've already exercised the code, I already know it works, and I've already written the tests, albeit sloppily, in the REPL. Why do I have to switch gears now and start writing a file, running a test runner, and so on?

The truth is: I won't. I'll move onto the next thing, not breaking my flow and not doing something boring, something I already know the outcome of, instead of doing something fun: the next feature.

Maybe those who do switch off and go through the motions to write a test, repeating themselves, are more noble and careful in their programming. But I humbly suspect most of us are more lazy than noble :)

gruseom · on April 28, 2010

Just want to say that I've been thinking about REPL vis-à-vis unit testing for a few years now and my experience and conclusions match yours very closely. You've done a nice job of articulating them (here and in the root comment).

inerte · on April 28, 2010

I don't know, I'm still not convinced. Guess it'll have to be one of those things that I might change opinions after trying (if someone ever comes up with an implementation).

inerte · on April 28, 2010

You're too lazy to open a file, type the unit tests and hitting CTRL+S but you're not lazy to open the REPL, type the unit tests and hit capture?

If you can't automate 100% of the REPL-into-test feature, if you need two mindsets/styles/etc, if you still need to "find the behaviour codified in a test", then you're just duplicating in the REPL the same workflow and results of writing the tests in a file. They need to overlap 100%.

Now, if they do, then it's awesome.

jimbokun · on April 28, 2010

How much context do you need? An entire memory dump? The complete REPL history? If IO is involved, do you need to somehow guarantee the same files are available at test time with the same contents?

It seems to me the trick is to set sensible limits on the context of the current REPL state preserved at test time, in a way that works for most kinds of common unit tests. I believe this is the "magic" of which you speak.

gruseom · on April 29, 2010

I dunno. I think this idea occurs to everyone who understands unit testing and then encounters REPLs; it certainly occurred to me under those circumstances and I got excited about it for a while too. Over time, though, it has struck me as less and less obviously good. Though you're right about where the two approaches to programming overlap, and I agree with you that REPL > tests in those areas, there's also considerable territory where they don't overlap. I suspect that xor represents an impedance mismatch that makes "test capture" not as feasible as it seems at first.

I don't mean to pour cold water on the idea, though; if someone figures out a way of doing it that's useful I'd happily change my mind.

ynniv · on April 28, 2010

That would be truer if the shell generated the Doctests for you. And if it did, how great would that be? I suspect that this would be easier in a functional (or maybe prototype inheritance) language than a traditional inheritance one, because you work directly with the objects that should carry your tests. In python you would have to decide whether the tests go on the object itself (unlikely but possible) or somewhere up its inheritance chain.

gfodor · on April 28, 2010

Ah ha. I knew I couldn't have been the first person to think of this, as it's a natural enhancement to the REPL to support TDD. I'm not a pythonista, hence my lack of exposure to it, thanks!

Zak · on April 28, 2010

A good REPL is one thing I find it really hard to program without. In many situations, I find ad-hoc testing by typing expressions in to the REPL to see that the return what I expect (or sometimes more experimentally, to see what they return so as to better understand an API) preferable to formal unit testing.

It's the main thing I miss when programming in Haskell. GHCi doesn't quite measure up to Lisp (or Ruby, or Python) REPLs.

Robin_Message · on April 28, 2010

That is a great idea, and one I'm going to implement in my copious free time.

jerf · on April 28, 2010

This is sort of the consensus I've come to on TDD: It's a great way to learn good habits. This is important, because we really have surprisingly few solid ways of teaching a new developer good habits, and anything that doesn't involve "an experienced developer watching over your code every second" but can be done by yourself is a very good thing.

But there comes a time when it's time to discard it. The entire system promotes incredibly local thinking, and when you are incapable of thinking at a higher level (or thinking correctly, anyhow), learning to get the local stuff correct is a great start. Once you get that down cold though, and start moving up to higher levels of organization, TDD can start to be a net negative. I tried it out about 8 years into my career, and mostly what it did for me was tell me to aggressively walk into local optima that I knew in advance were local optima, and, by golly, were local optima even with TDD. But I'd recommend it to anyone who hasn't got the basic, local level stuff down cold, as knowing that stuff really well is a prerequisite to getting the higher stuff correct, and rather a lot of developers get a long way into their career without knowing that stuff well.

Mc_Big_G · on April 28, 2010

I'm pretty sure I lost a $150k job opportunity for saying that TDD is a waste of time if your product hasn't been validated as a money maker. I don't regret saying it either.

Fast forward a month and I begin working on a pre-profit project that is so bloated with tests and unnecessary complexity that it took me a few days to really figure out what the hell was going on with the code. When I first started reviewing the code, I was thinking, "Wow, this guy's testing chops make me feel stupid.". Then, after really spending some time with the code, I just thought, "WTF!?".

It's clear he spent much more time writing tests than writing any features. On top of that, he had written 10 level deep abstractions for features that had features for their feature's features. In order to really understand the insanity of this you have to know that this site basically had 0 users and was going nowhere fast.

Then I was just pissed. Pissed at the thought of someone just getting paid to implement every feature under the sun without questioning if any of it was really necessary and with 0 user feedback. This is one of those instances where the developer makes thousands of dollars and the site and owner just lose thousands. This is totally unacceptable and irresponsible in my opinion.

The first thing I did was start removing features and didn't write one test for anything. I would have loved to just scrap the whole code base and start over, but there just wasn't time so I just had to rewrite as I went. You would think with all those fantastic tests, the code would be pretty solid, but I continually found incongruencies and errors. The database didn't even have foreign key constraints set up for any of the relationships. Why bother using a relational database?

I know that TDD is popular and I've seen job descriptions like "If you are just getting started with tests, don't even bother.", but I think there are a lot of developers wasting a huge amount of time and money writing tests for products that are destined to fail, partially because of all the time and money spent writing tests.

blasdel · on April 28, 2010

I agree wholeheartedly about preemptive TDD: http://ravimohan.blogspot.com/2007/04/learning-from-sudoku-s...

Almost nobody in the Rails community uses foreign key constraints with ActiveRecord. I chafed against it at first, but if you're drinking their validation koolaid, it wouldn't add value in most cases, since the validations can add stronger constraints with better exceptions.

Xurinos · on April 28, 2010

I am not familiar with the validation koolaid (can you summarize it?), but from my experience, you want great constraints in your database because there may be more than one entry point to your database. If your software layers outside the database are your only entry point, then you might be able to do away with constraints...

But also in my experience with multiple developers working on changing the schema and adding new features that result in new data, we are often very glad the database has the constraints in order to keep people honest and not screw up the data during release transitions and even during development. If the software layers below that business logic do a super great job of handling the constraints, then this is not an issue.

rue · on April 28, 2010

That is just bad design; it has nothing to do with tests.

Tichy · on April 28, 2010

What frequently floors me is the complexity of the test frameworks and utilities. Look at an average Rails shop, and their list of test helpers goes on forever. Stuff like mockups, creating test data and so on.

I just hate learning frameworks.

My dogma is fun driven development. If something is not fun, you are probably doing it wrong. Creating mock objects and ever more abstract test frameworks is not fun to me (ymmv).

cageface · on April 28, 2010

Not only that but they tend to be fragile and break when something in Rails or the other sub-frameworks they depend on changes. Somewhat ironic actually...

barrkel · on April 28, 2010

The biggest problem I have with TDD, or unit testing specifically, is the contortions (interfaces, mocking, delegating construction and configuration, configuration files, library dependencies, etc.) one has to go through to invert dependencies in statically typed languages. If only higher order module systems (think: parameterizing modules by their module dependencies, a bit like generic types are parameterized by type) were more popular, then huge chunks of this work wouldn't be necessary.

Add in preconditions, postconditions, DbC etc. and you can let the language help you out a whole lot more.

bruceboughton · on April 28, 2010

I'd be interested for you to expand on what you mean by "higher order module systems". Do you have any links?

silentbicycle · on April 28, 2010

The ML community usually calls them "functors".

For examples, a package which implements a data structure, say, a red-black tree. The module is parametric over the tree type, so you have module of type "(some record) rbtree". It customizes the module for those, at compile-time.

Or, a module that does a compiler's code generation, which takes another module which provides specification for the processor architecture.

Everything is typechecked, etc. at compile time. There's overlap with both the STL and Haskell's typeclasses, but in ML it's done as part of the module system.

kunley · on April 28, 2010

I guess he meant systems like functors in ML/Ocaml.

spuz · on April 28, 2010

This is an interesting point. When one starts to become dogmatic about any kind of 'best practices' it can be difficult to see what benefit is actually gained from them. In any given situation, you should know when and when not to apply them. Clearly, testing accessor methods is a sure way to drive any programmer numb with tedium.

eru · on April 28, 2010

Perhaps that counts as an argument against accessor methods?

NickPollard · on April 28, 2010

It's an argument against using accessors for the sake of using accessors.

It's important when learning any technique (and this goes not just for programming) to understand why that technique makes sense in this situation, which means that you can evaluate later situations on a case-by-case basis and decide whether or not that technique fits.

Too often people believe that a technique is good (Inheritance is good!) without fully appreciating why (Polymorphism). This means they start using it for other situations (code reuse) when it might make sense to do it another way (composition).

eru · on April 28, 2010

Thanks for interpreting my comment as more thoughtful than the blunt stab at OOP I wrote.

moconnor · on April 28, 2010

Yes! You can never have too many arguments against accessor methods...

locopati · on April 28, 2010

An argument for tests. In a team (>3 developers) on a project that will have a lifespan (1+ years) where the cast of characters will change over time, tests provide continuity. The tests specify how functionality is meant to behave. Changes to code can be made with reduced worry about breaking existing functionality. When you're working in this kind of environment, you need to expand your horizon beyond your code and consider the needs of the team and the organization.

I wish I worked in language with a REPL that made it easier, but you work with what you have.

theBobMcCormick · on April 28, 2010

I can completely understand there being disagreement about how to test code, how much testing is enough, when to write test (test driven vs write tests later), etc.

But what boggles my mind is we've got people in this very thread who, if I'm interpreting their comments correctly, are arguing against doing any testing at all! WTF? How, in any non-trivial codebase, are you going to prevent regression?

Does anyone here have any experience in another engineering discipline (civil, mechanical, electrical engineering, etc). Isn't some sort of testing an accepted part of the process?

Maybe I'm wrong, but this seems like a huge sign of the immaturity of the discipline of software development.

aidenn0 · on April 28, 2010

I may be reading things wrong, but I think people are arguing against unit testing, which is the focus of TDD.

I hope people aren't arguing against integration testing. I agree with the original post that unit-testing can become onanistic fairly easily, but I think that that is a failure mode for just about any coding style (coding for the sake of coding vs testing for the sake of testing).

kowen · on April 28, 2010

One discovery that kind of blew my mind a few months back was approvals. I've seen the entire bowling game kata done with a single test.

It's a bit different to code against a failing approval when doing TDD; you still code in very small increments, but you don't necessarily get to green immediately. Slightly disturbing if you feel somewhat obsessive about seeing the green bar.

Locking down legacy code is beautiful. There is a screencast where the guys who developed approvals lock down a battleship game, generating about 8000 lines of output, not even glancing at them, and then refactoring the hell out of it.

http://approvaltests.sourceforge.net/

bitwize · on April 28, 2010

This revolution in software process actually comes to us from practices employed by a secret brotherhood of programmers who have been keeping them alive since the sixties at least.

These practices were called "hacking".

The Way of the Hacker is subtle indeed, yet obvious; and it is hence prone to being rediscovered from time to time by people outside the core discipline.

mcculley · on April 28, 2010

I'm working on a project right now where I inherited a large functioning system that has no unit tests. As I'm making changes I'm adding some unit tests to prevent regression.

While I agree with the article that test cases are often overdone, I think that skipping test case implementation only works for those that have done enough design and implementation and learned where it should be skipped (e.g., where the only reasonable test case would just duplicate the implementation).

An important thing about TDD is that one is forced early to design the interfaces between components in such a way as to make testing possible. The test cases are less important than that the code is testable.

ErrantX · on April 28, 2010

Anyone who works in corporate style commercial software (or at least good corporate style commercial software) will tell you they figured this out years ago.

Probably before even TDD made the rounds.

inerte · on April 28, 2010

I don't understand, why only corporate style commercial software (which I guess I don't know what it is too :p

I thought the ones expected to already have this figured out are the awesome programmers, with more experience, the 10x more productive ones (which I do NOT include myself into). But why create this sub-group of "corporate style commercial"?

ErrantX · on April 28, 2010

Well all I meant was the kind of dull, grey, software packages that corporations pay thousands for.

As opposed to indie programmers or "cutting edge" commercial stuff - (i.e. Fogcreek, 37signals etc.) for whom things like TDD and agile tend to be big buzzwords.

(oops I didnt fully answer your question: and the reason I singled it out is because all the TDD/agile stuff just passes that industry by - they've been using SCM for years, write unit tests and, well, just "get the job done" (tm))

kaffeinecoma · on April 28, 2010

Fogcreek is actually fairly anti-TDD, unless that's just Joel speaking.

grandalf · on April 28, 2010

How many tests would be made obsolete with basic aspect oriented programming? Oddly this has not really caught on in the Ruby community, probably because testing requires less thought and people are paid to write test code so why not just relax and write boilerplate CYA code and get paid for it.

The 80/20 rule applies to testing, but most people test things like basic ActiveRecord finders, etc. Why? Because writing a test of something complex requires a lot of thinking which hurts a bit.

In most apps, there is a "money work flow", such as signing someone up, taking an order, etc. If that stops working it's a really big deal. To adequately test it you probably need integration tests, but most devs don't bother to write that test because it's a pain and it fails a lot during development.

I think TDD is a bit of a cop out that can lead to an insufficiently specified API. If all it takes to make the test pass is to handle a narrow case of inputs, then you should feel no additional confidence just because the test passes.

Ideally running a test suite can tell you that the code is safe to deploy... Some tests can also speed up writing other code by making it easy to think through a problem with sample data.

If you are writing filler tests, useless tests, etc. just stop and figure out what is actually important.

If your unit tests take longer than 60 seconds you are probably doing it wrong.

terra_t · on April 28, 2010

My use of tests is situational.

If I'm writing security-sensitive string parsing code that isn't going to change much, I'm going to write a thorough set of unit tests.

When I wrote database abstraction layer that ran on MySQL, PostgreSQL, Microsoft SQL Server and Oracle, I found that a good set of tests made the process of porting the system to a new database almost trivial.

Back when I was getting my PhD, I rewrote a simple-but-slow calculation to use a fast-but-complicated-as-hell algorithm, and I don't think it would have been possible (to get the right answer) if I hadn't used the simple code to create unit tests for the complicated code.

On the other hand, there are a lot of cases where I've written unit tests and they've become a liability over time; for instance, requirements would change, so I'd have to go back and maintain this collection of tests that, frankly, I didn't care about anymore.

raganwald · on April 28, 2010

We should make up an entirely new buzz-phrase for this. How about Incremental Development?

acemv · on April 28, 2010

I fully accepted TDD at first, but now only implement certain principles that have helped me develop better software. I am not the type to implement needless code just for the sake of it. However, many times, unit testing have saved me from countless hours of debugging and introducing new bugs. I will admit, I am one of those developers that write code first, then unit test second. That's just my style, and it really does not matter if the end result is the same. More reliable, robust and less buggy software.

It takes a pragmatic developer to evaluate methodologies and patterns, utilizing the concepts that suites the job. Ultimately, mindless use of patterns and methodologies will not solve the problem.

SapphireSun · on April 29, 2010

I can't judge because I do the same, but I think part of the reason you're supposed to write the tests first is so that you aren't influenced by what you already wrote. Ie you test what's correct rather than testing the implementation.

sbov · on April 28, 2010

I usually only write tests for more complex pieces of code. For simple code, tests look too much like a violation of DRY for my taste. I do realize that 'simple' can be subjective though.

softbuilder · on April 28, 2010

I used to really need my meds. Lately I've noticed that when I don't take my meds I'm don't hear the voices like I used to. So I'm just going to stop taking my meds because I'm healthy now.

ww520 · on April 28, 2010

I do my unit test in asserts. Next.