Yesterday, my program worked. Today, it does not. Why? (1999) [pdf]

warpech · on July 19, 2017

Reminds me of one of the nice features of Git: git bisect. Find the breaking commit in the smallest number of steps.

Some random blog post about it: http://webchick.net/node/99

Anderkent · on July 19, 2017

The algorithm described in the article, if I'm reading it correctly, is more powerful than git bisect, because it can identify the exact change within a commit that causes the bug. You could for example decide to make each file modification in a commit a different 'Change', and so the algorithm would identify not only which commit breaks something, but which file changes are relevant.

alttab · on July 19, 2017

This would be difficult to determine - as running tests without some changes in non-deterministic way could affect testing or the program's ability to run properly.

Anderkent · on July 19, 2017

The article covers that.

epmatsw · on July 19, 2017

I think what blew my mind most was `git bisect run`. Write a test case, go get coffee, have git spit out the offending commit automatically. So cool.

hacker_9 · on July 19, 2017

Used this the other day! Had to track down a visual bug that was introduced in the last 20 commits, but with binary search I could narrow down on the problem a commit at a time and so only had to look at 4 in total.

pmarreck · on July 19, 2017

how am I not surprised that this solved a regression that occurred at some point in the past on a Drupal project

I worked on a Drupal project... Once.

sethrin · on July 19, 2017

I believe Joomla is worse in every conceivable way. And then of course there is WordPress, which I sincerely hope is the last major procedurally-written codebase. PHP has come a long way as a language (it can almost be confused for something sensible these days) but lots of issues with large libraries.

pmarreck · on July 20, 2017

That's why I left the Ruby ecosystem and went to Elixir. I noticed that my coding style was getting more and more functional (just naturally... but I was also probably influenced by Rich Hickey's "Are We There Yet?" talk, and by Gary Bernhardt's "Boundaries" talk) and soon realized that functional nirvana was essentially unachievable in Ruby as long as I depended on a vast library of gems all of which were procedural/OO.

Leaped and haven't looked back yet. Great stuff.

sethrin · on July 20, 2017

I started in PHP, migrated to Ruby, and now I think I am beginning to be annoyed at Ruby's occasional lack of expressivity. I am probably on a similar course towards FP nirvana. Do you know any place on the left coast that's playing with Elixir or Haskell?

pmarreck · on July 23, 2017

More places than on the right coast! :O

sethrin · on July 24, 2017

Fair enough. I'll let you know when I find it :)

azhenley · on July 19, 2017

This is awesome. Didn't know about it, thanks!

My approach has been to step exponentially. I go back 1 commit, 2 commits, 4 commits... etc. I'll be using bisect from now on though.

pishpash · on July 19, 2017

You may be implementing an intuitive weighted bisect :)

maccard · on July 19, 2017

Heh, I do a binary search. Find last known good build, go half way between, etc

muxxa · on July 19, 2017

I wish git bisect could handle the 'find last good build' step by jumping back exponentially through history, then switching to binary search once the last good build is found

ChronosKey · on July 19, 2017

That's what git bisect does

maccard · on July 20, 2017

And not everyone uses git, unfortunately.

mcintyre1994 · on July 19, 2017

This is what git-bisect does :)

devrandomguy · on July 19, 2017

Git bisect is awesome. Before I discovered it, I was manually bisecting the codebase, and stubbing out the missing half.

adrianratnapala · on July 19, 2017

Not to take away from git bisect (or hg bisect, which is no more broken than the rest of mercurial) -- I don't think it is an alternative to the thing you used to do.

git bisect is "temporal bisection", i.e. you delete half of a time interval. The manual bisect is kinda-sorta "spatial" in the sense that you delete half a region of code. the two things work by similar principles, and their problem domains overlap -- but they are not the same.

MaulingMonkey · on July 19, 2017

They're complimentary techniques. One might git bisect down to the commit level, then manually bisect the sub-changes in the commit if it's a large one.

I've done the manual equivalent in perforce as well.

kossmoboleat · on July 19, 2017

I took several courses with professor Zeller and with post-docs at his chair. We applied the described concept of "delta debugging" to finding programming errors in python programs.

Suprisingly it was possible to implement this in the course although one had to fiddle so much with Python's internals.

Zeller's chair homepage has a lot of additional infos. Here's a page on how the idea evolved after the initial paper in 1999: https://www.st.cs.uni-saarland.de/dd/

sn9 · on July 19, 2017

He actually has a course on debugging on Udacity.

atonse · on July 19, 2017

We once had a bug that would cause a failure in our test suite, but very rarely. It was so rare that it would just "go away" and not show up for weeks. So we thought it was odd (maybe bad test data, etc) but moved on.

When we decided to look into it when we had some downtime, we found out that it was a date bug that only reared its head on the 31st day of the month, hence only happening every two months.

I'm not sure bisect would've helped but still was really funny when we did find the issue.

brainfire · on July 19, 2017

Related bug- OpenOffice can't print on Tuesdays: https://bugs.launchpad.net/ubuntu/+source/cupsys/+bug/255161...

Previous HN thread: https://news.ycombinator.com/item?id=8171956

adrianratnapala · on July 19, 2017

Thanks for the anecdote, because it gives me a new idea.

We should be monitoring the rate of test failures -- per test -- in a timeseries and looking for periodicities. Just graphing them and seeing regular spikes will help, but having real numerics looking for approximate periodicities will catch this sort of thing (after a few months!).

bArray · on July 19, 2017

Unless you are writing date specific code or have seen this bug arise, I probably wouldn't bother. Time bugs will likely creep in regardless, for example how would you find a bug that happens one month a year? How would you find a bug once every four years? Sometimes human knowledge/experience can't be tested out of a system. All test failures probably warrant investigation of some type.

But, as a general rule, I would try to brute force or fuzz your "inputs", a.k.a data out of your control. Random numbers, dates, user input, network packets, etc. Users and environments often act in unexpected ways and you want to make sure your bug finding isn't reduced to accidentally be working on the right day! :)

michaelcampbell · on July 19, 2017

Was this the kind of test that only ran once a day?

atonse · on July 19, 2017

It wasn't, but it was the sort of thing where we never modified that test or even a related area, so it just wouldn't make sense that the test would fail.

But the bigger issue was that since it would "fix itself" we never prioritized it and just left it alone.

But looking back, the system (automated tests) was functioning as designed. :) We just changed our "DateTime.now" to concrete dates.

scorcher · on July 19, 2017

Thats where property based tests come in very useful

dsw108 · on July 21, 2017

Here's an implementation of a modified version of his delta debugging algorithm we did for minimizing file inputs that induce a failure in a compiler or static analysis tool: http://delta.tigris.org/

Also works for other kinds of files of course. Our version is easily seen as a kind of simulated annealing, with the input providing sufficient randomness that we did not need to add any internally.

Fun story: this was my first open source project. I had always wanted to be an open source programmer, but the precipitating cause of me releasing delta was that Microsoft Research asked me to. So back when Microsoft was saying that open source is a "cancer" I became an open source programmer because Microsoft asked me to.

ericfrederich · on July 19, 2017

The GDB people have done it again. The new release 4.17 of the GNU debugger [6] brings several new features, languages, and platforms, but for some reason, it no longer integrates properly with my graphical front-end DDD [10]: the arguments specified within DDD are not passed to the debugged program. Something has changed within GDB such that it no longer works for me. Something? Between the 4.16 and 4.17 releases, no less than 178,000 lines have changed. How can I isolate the change that caused the failure and make GDB work again?

I guess this was before Git. I love git bisect

khedoros1 · on July 19, 2017

I couldn't quickly find the 4.17 release, but gdb 4.17.0.4 was released at the end of June 1998. The release notes mentioned Linux kernels in the 2.0-2.1 range.

T_D_K · on July 19, 2017

Why does the article say there are 2^N different possible configurations? Shouldn't it be N! ? Maybe I'm missing something, but I don't see how they can rule out so many different orderings of the "configurations", since he started out by saying that the ordering wasn't reliable.

Edited: I figured it out. It's because they essentially do a binary search of the changes to find the issue, I should have read a bit further before commenting. The tables / examples cleared it up pretty quickly.

known · on July 19, 2017

Blue screen of death? https://en.wikipedia.org/wiki/Blue_Screen_of_Death

throwawaybbq1 · on July 19, 2017

If it was iOS, I'd say the provisioning profile expired :-p

candiodari · on July 19, 2017

TLDR of the article: cross-application interfaces (like GDB's cli interface has effectively become) break with version changes. Users don't care about the reasons behind that breakage, and would like to see those treated as bugs. Developers see that attitude as unworkable, and would rather just not care about external interfaces at all.

I always wonder why we gave up on refactoring instead of switching it along so that cross-program changes would become not just possible, but easy and feasible.

What can be done using alt-shift-r in a distributed large scale java app with a GWT frontend and 50 libraries used on the backend, each of which can run distributed across machines ... is baffling.

It shouldn't be !

And now with microservices, it becomes more and more baffling that this worked at one point. That's just horrible. Most of the complaints about it strike me as being complaints that when a toolchain allows for complexity, people make things complex. And of course, that's fair. If you give a 5 year old a laser CNC machine and he finds how to turn it on, things are going to go south quickly. But damn the things you can do with that !

infinity0 · on July 19, 2017

That is not what the paper is about. Stop pushing your own agenda and actually read the thing.

dvfjsdhgfv · on July 19, 2017

You definitely have a point (although what you say is only marginally related to the article). However, increasing complexity is inevitable in all aspects, therefore we should be developing tools to manage all this complexity instead of just deciding that "we shouldn't" (which is also an option, see e.g. the recent unikernels trend.)