> program testing can be a very effective way to show the presence of bugs, but ...

mks · on Aug 6, 2015

For me this means you have to be able to reason about your code. When you write a piece of code you need to understand what it does, how it behaves and why it behaves the way it does.

It is not uncommon to reveal bugs during code review just by looking at code when the author claims they have tested everything and it works fine. Of course they have missed some test case, but how do you make sure your quality assurance does not miss something as well?

I believe it is cheaper and faster to understand the code, side effects and interactions than to rely only on testing (which is immensely important of course).

In order to grow as a developer one has to understand there is no magic - you need to strive for understanding.

jhpriestley · on Aug 6, 2015

Tests work because we don't use only tests - we use tests to support and correct our logical reasoning. Even for a very simple task, if you literally didn't think about how to solve it at all and just did the simplest thing to make each test pass, you'd never accomplish it. E.g., "find the absolute value of a number" would end up with a function like

    abs(n) {
        if(n == 1) return 1;
        if(n == -1) return 1;
        if(n == -3.5) return 3.5;
        ...
    }

This is why testing is not a substitute for logical reasoning, but a complement; once we think we've solved a problem logically, we can test several cases to check our reasoning.

This is also why no amount of testing will make up for a program with no logical structure. For example, if you use unstructured GOTO as the only means of control flow, then you won't be able to solve the resulting bugginess by writing more tests.

jimmaswell · on Aug 6, 2015

People actually do write tested and reliable "unstructured" code, eg pure assembly. It's harder but it's not automatically impossible as is often implied.

ArkyBeagle · on Aug 6, 2015

Structure matters less than being able to reason about the set of all possible inputs and outputs.

Structure only matters in that inferring the range and domain of the inputs and output is then possibly constrained.

We could produce completely bug-free code every time but it's simply not an interesting problem if we listen to our revealed preferences about defects. We arbitrage defects in the service of other values.

mikekchar · on Aug 6, 2015

> If that baseline provides the minimum value the program is written to provide, then the program is by definition correct for its given purpose.

1. Do you know what minimum value the program is written to provide?

2. Have you defined your baseline such that it models the minimum value correctly?

3. How do you know? (You can apply this to both #1 and #2)

4. Does everyone on your team agree with #1 and #2? Do all your customers agree with #1 and #2? How do you communicate it so that everyone is in the loop?

5. Even if (by some miracle) you have managed to get #1-#4 absolutely correct, have you implemented/executed your tests correctly to define the baseline? Again, how do you know?

6. It is actually feasible, in my experience, to do #1-#5 for a few iterations, but then it becomes easy to forget the original #1 and #2. How do you record these over time so that you know that you haven't broken something in the meantime. How do you manage the size so that the complexity of the description is less than the complexity of the original source code (in order to avoid errors)?

I could go on (really, I could) but I hope you get the point. There will come a time where you will see that you have tests and that you are sure that the code adheres to those tests, but there will be no way to know if those tests are sufficient. In fact, there will come to be a time where nobody is really sure what the program is supposed to do because everyone has forgotten it and the descriptions (source code, tests, design documents, requirements documents) are so voluminous that there is no way for one person to cram them all into their head.

The humble programmer knows that the system probably does not work properly (despite having tested it) and is always looking for ways to tease out the problems.

jacquesm · on Aug 6, 2015

Not only the bugs you care about, also the bugs that you actually found so regression won't occur. Regression is a very common cause of production failures and testing is a very efficient guard against that particular class of bugs.

And that's exactly how you should look at this whole spectrum of issues: bugs come in classes and for every class of bug there are detection and mitigation strategies. Testing does not work as a detection strategy for all classes of bugs but it is extremely effective against some classes of bugs.

A complete strategy would involve much more than just testing.

pekk · on Aug 6, 2015

What classes of bugs do you have in mind that testing cannot work for?

jacquesm · on Aug 6, 2015

Testing works extremely well for anything that you either find in the wild or anticipate beforehand.

Hard to find using testing: performance related issues (for which you'll need a profiler), bugs that occur rarely (for instance, one bug I recently uncovered in 'hugo' was so rare it took 1000+ runs of the test software before it turned up on the machine of the lead dev, initially he did not believe my report, see https://github.com/spf13/hugo/issues/1293 ), programmer errors (this is where code review comes in), coverage issues such as routines that are simply never exercised (for that we use coverage tools), errors in the tests themselves, heisenbugs and so on.

Bugs come in many shapes and sizes and testing is a very powerful strategy but it is not the only strategy.

NLips · on Aug 6, 2015

It sounds like you're only talking about unit testing, or perhaps automated FV. Performance, infrequent bugs, heisenbugs etc are all things we find through system test.

On some level, I'd say any bug is discoverable through testing - if a user can't experience a bug, then it doesn't exist! You could even say unexecuted code can be considered this way - if it doesn't ever matter that a program requires more RAM than it should, who cares? Of course at that point you are clearly increasing the risk of a future programmer mucking something up because of the unused code, so I don't think I'd go that far in real life.

Clearly some things are easier to find in code review / code coverage etc, but saying you can't test for bugs that occur rarely is untrue.

the_af · on Aug 6, 2015

> On some level, I'd say any bug is discoverable through testing - if a user can't experience a bug, then it doesn't exist!

Agreed, but this is like saying "for any (relevant) bug B, I can write a test T that finds it". It doesn't mean you will actually write said test or even be aware you should test for B, even though the end user will later experience it and be harmed by it. This is what Dijkstra seems to be saying: that if your tests don't find bugs it doesn't mean your tests are complete. Important bugs almost surely will happen regardless of your tests. That they pass is a good thing, but it shouldn't instill you with disproportionate confidence that your system works as needed.

NLips · on Aug 6, 2015

I agree much with the article; it was the parent comment I was replying to. I disagree with the idea that a tester shouldn't consider hard-to-find bugs, or performance bugs. I disagree somewhat with the idea that a programmer error or coverage issue is an entirely independent issue from other bugs - if there's no possible manifestation, it's not a bug, and if there is a manifestation, then it's possible to find under another 'class' of bug.

I fully agree that we won't execute tests to find all possible bugs. But similarly, code review doesn't find all code errors.

falcolas · on Aug 6, 2015

> I'd say any bug is discoverable through testing

... given sufficient time. If I give you a routine which takes one second to execute, has four parameters, which each have a maximum of 100 options, you simply can't test it to the point where you can declare it as being bug free anywhere in the next three decades.

Of course, most of our routines have more than 100 possible inputs - many have an effectively infinite number of inputs.

NLips · on Aug 6, 2015

Absolutely agree - testing is all about risk management, not finding all the bugs! And the right move is often to give up trying to reproduce/find a difficult bug. WRT to inputs:

QA Engineer walks into a bar. Orders a beer. Orders 0 beers. Orders 999999999 beers. Orders a lizard. Orders -1 beers. Orders a sfdeljknesv.

(possible Bill Sempf)

gspetr · on Aug 6, 2015

One of these classes requires manual interaction from a user at the GUI level:

http://blog.8thlight.com/uncle-bob/2014/04/30/When-tdd-does-...

d--b · on Aug 6, 2015

It is a way of saying that more often than not, testing will actually not cover all the bugs that you originally care about. Even when writing tests we are making assumptions about cases that "should not happen" and that actually do. And testing is not taking care of these.

It's just something to keep in mind when coding: testing as we generally do it is not a guarantee of code correctness. Other methods provide that, but testing does not. It gives a level of comfort that of all the cases that you can think of, every one of them works as expected...

robotresearcher · on Aug 6, 2015

> With testing I can show that my program performs a certain set of functions as I want it to, with a certain set of inputs.

No you can't. You can show that it has worked a finite number of times in tssting. With testing alone you can't show that it won't produce arbitrary crap the next time you run it.

This is obviously formally true in the general case. For example my program might contain code that makes it run exactly n times then self-destruct.

But worse than that it's often true for real non-silly programs. Race conditions are a famous, common and awful example of test-resistant bugs.

ArkyBeagle · on Aug 6, 2015

Maybe it's just me, but if you can't produce something without race conditions, you should get help with it.

Of course this turns out to be one a' them "unknown unknowns" in practice, but perhaps it should not be. Can't we teach this in school? Can't we teach it in the online media sector?

robotresearcher · on Aug 6, 2015

We do teach it in school. But concurrency is very hard to get right. People screw up much easier things all the time.

yason · on Aug 6, 2015

No you can't. You can show that it has worked a finite number of times in tssting. With testing alone you can't show that it won't produce arbitrary crap the next time you run it.

Here I think lies the line between theory and practice. As a computer scientist, Dijkstra clearly represents the former approach.

Your above description has basically been how computer programs have been built so far since the beginning for the very reason that otherwise they wouldn't have been written at all in the first place. Programs like that are nothing that even the humblest of computer scientists would ever consider complete, but in practice knowing the boundaries of incompleteness and living with that is more than useful.

Surely a program can fail in spectacular ways, but by testing a set of known behaviours with a set of reasonable inputs we can be pretty sure that if you use it as intended, with inputs that are roughly what they're supposed to be, you're most likely to have the program produce the expected output again and again. This is good enough for all practical purposes of business and utility.

It's also quite similar to how it happens in mechanical engineering. For eaxmple, an engine is designed to run at between 1000-5500 revolutions per minute, with oil of grades SAE 40 to 50, and with a suitable mixture of air and petrol. If you push the engine outside of these fixed specifications, you'll increase the likelihood of the engine failing in ways that are spectacular. The complexity of failure patterns can be overwhelming: a small, seemingly unrelated thing turns out to be vital in some underestimated sense, and failing that thing will cause all kinds of other failures which eventually destroy the engine completely. And this occasionally gets realized in real life, too. For a computer programmer, doesn't this sound familiar?

We do (try to) write critical software in a different way. Avionics, spaceship computers, medical systems. The cost per line is overwhelming but the programs still aren't bug-free. A lot of that cost goes effectively to proving the non-existence of bugs: fixing bugs that are found are cheap in comparison.

Proofs of correctness can be formulated for simple systems but it's increasingly hard for complex system. Worse yet, for most programs we use daily we're completely unable to write specifications that would be tight enough to actually make it possible to write fully correct and bug-free programs. Specifying how the program should work in the first place takes a lot of the effort that goes into special systems such as avionics. That's because specifying is kind of programming, and even if we managed to express the program specification in some non-ambiguous and logically complete format, I think the process in turn to build that specification itself would suffer from similar disconnects, vague definitions, and other humanly problems.

Goals sometimes produce the most value when they're walked towards but not necessarily ever reached.

mbrock · on Aug 6, 2015

With proofs of correctness, I think it's important to recognize that systems will need to be built in a way that is conducive to proving statements about them. This is a practical daily endeavor for anyone who uses static type systems to catch errors (since types are theorems if you squint), and a major driving idea behind functional programming research. Compositionality and purity can make it drastically easier to prove interesting theorems about programs.

robotresearcher · on Aug 6, 2015

The phrase "testing alone" was key. Practical good-enough correctness comes from reasoning about the code and inputs. Testing is a tool that helps with that. Testing alone is not good enough.

I teach freshmen how to code and test - we use unit tests in class - and have seen first hand that novices over-rely on tests as an excuse to not concentrate hard enough to fully understand what the code is doing.

srean · on Aug 6, 2015

Without factoring in the cost of errors and the cost of avoiding the errors, this debate is a pointless exercise in transcribing (i) dogma, (ii) favorite hypothetical situations and (iii) favorite anecdotal evidence. I doubt any original thought will come out of it today.

belenos46 · on Aug 6, 2015

I feel like this is just the scientist/engineer debate, where the former is trying to discover objective truth, and the other just wants to build stuff.

It's an elemental disagreement, where the scientist is saying "You don't know everything!" and the engineer is replying "So what?"

But yeah, probably we're not going to learn anything with it.

srean · on Aug 7, 2015

I doubt it is a engineer / scientist thing. Dijkstra was way much more of an engineer-programmer than many would realize. He was an engineer first and the rest later.

I think it is about correctness of claims. There are engineers who are cognizant of the deficiencies of testing, and those that believe that testing is truly sufficient and complete.

weland · on Aug 6, 2015

> From this point of view, testing will show an absence of bugs - not categorically all bugs, but all bugs I care about.

You are assuming that all the possibly disastrous bugs are bugs you care about.

In reality, it's far more likely that several of the possibly disastrous bugs are bugs you cannot even think about, let alone care.

scscsc · on Aug 6, 2015

Your are (most probably) wrong.

Has your client ever reported a bug?

Was it a bug you cared about?

Why haven't you discovered it during testing?

the_af · on Aug 6, 2015

> From this point of view, testing will show an absence of bugs - not categorically all bugs, but all bugs I care about.

I'd say not even this. Testing will show the absence of some of the bugs you thought you cared about. In practice it will fail to find bugs you thought you were testing for, and then it will fail to show bugs that, when they happen in production, will have you thinking "oh, right! I hadn't thought of that! Obviously I don't want X to happen."

This is not theoretical. I've seen both kinds of undetected bugs happen in almost every job I've had. Getting your test scenarios correctly partitioned is hard. Thinking about what to test is hard. And a lot of programmers aren't even aware of Dijkstra's assertion -- how many times have you heard one of your fellow co-workers claim "but this cannot fail! I tested it!"?

AnimalMuppet · on Aug 6, 2015

> In an absolute theoretical sense, the idea is quite true, but I'm not sure what wisdom or value is supposed to be imparted through expressing it - especially in the way it's phrased above.

Dijkstra wants you to formally prove that your code is correct. In actual practice, that works only for trivially small programs. Worse, it ensures that only trivially small programs get written. (Yes, I know, there have been longer programs proven correct. I can count on one hand the number I have heard of. And in each case, the effort to do so was very large compared to the size of the program.)

> From this point of view, testing will show an absence of bugs - not categorically all bugs, but all bugs I care about.

Not unless you test all possible inputs you care about. And for most programs, that's completely impossible.

drblast · on Aug 6, 2015

Also, tests aren't for you, they're for the next guy who inherited your code and is trying to make a change, hoping his assumptions about how it all works are correct.

The tests are a way to provide a backstop, but also another way to document what the intentions are.

ArkyBeagle · on Aug 6, 2015

It affords nihilism in testing. This keeps budgets down.