The algorithm described in the article, if I'm reading it correctly, is more powerful than git bisect, because it can identify the exact change within a commit that causes the bug. You could for example decide to make each file modification in a commit a different 'Change', and so the algorithm would identify not only which commit breaks something, but which file changes are relevant.
This would be difficult to determine - as running tests without some changes in non-deterministic way could affect testing or the program's ability to run properly.
Used this the other day! Had to track down a visual bug that was introduced in the last 20 commits, but with binary search I could narrow down on the problem a commit at a time and so only had to look at 4 in total.
I believe Joomla is worse in every conceivable way. And then of course there is WordPress, which I sincerely hope is the last major procedurally-written codebase. PHP has come a long way as a language (it can almost be confused for something sensible these days) but lots of issues with large libraries.
That's why I left the Ruby ecosystem and went to Elixir. I noticed that my coding style was getting more and more functional (just naturally... but I was also probably influenced by Rich Hickey's "Are We There Yet?" talk, and by Gary Bernhardt's "Boundaries" talk) and soon realized that functional nirvana was essentially unachievable in Ruby as long as I depended on a vast library of gems all of which were procedural/OO.
I started in PHP, migrated to Ruby, and now I think I am beginning to be annoyed at Ruby's occasional lack of expressivity. I am probably on a similar course towards FP nirvana. Do you know any place on the left coast that's playing with Elixir or Haskell?
I wish git bisect could handle the 'find last good build' step by jumping back exponentially through history, then switching to binary search once the last good build is found
Not to take away from git bisect (or hg bisect, which is no more broken than the rest of mercurial) -- I don't think it is an alternative to the thing you used to do.
git bisect is "temporal bisection", i.e. you delete half of a time interval. The manual bisect is kinda-sorta "spatial" in the sense that you delete half a region of code. the two things work by similar principles, and their problem domains overlap -- but they are not the same.
They're complimentary techniques. One might git bisect down to the commit level, then manually bisect the sub-changes in the commit if it's a large one.
I've done the manual equivalent in perforce as well.
I took several courses with professor Zeller and with post-docs at his chair. We applied the described concept of "delta debugging" to finding programming errors in python programs.
Suprisingly it was possible to implement this in the course although one had to fiddle so much with Python's internals.
Zeller's chair homepage has a lot of additional infos. Here's a page on how the idea evolved after the initial paper in 1999:
https://www.st.cs.uni-saarland.de/dd/
We once had a bug that would cause a failure in our test suite, but very rarely. It was so rare that it would just "go away" and not show up for weeks. So we thought it was odd (maybe bad test data, etc) but moved on.
When we decided to look into it when we had some downtime, we found out that it was a date bug that only reared its head on the 31st day of the month, hence only happening every two months.
I'm not sure bisect would've helped but still was really funny when we did find the issue.
Thanks for the anecdote, because it gives me a new idea.
We should be monitoring the rate of test failures -- per test -- in a timeseries and looking for periodicities. Just graphing them and seeing regular spikes will help, but having real numerics looking for approximate periodicities will catch this sort of thing (after a few months!).
Unless you are writing date specific code or have seen this bug arise, I probably wouldn't bother. Time bugs will likely creep in regardless, for example how would you find a bug that happens one month a year? How would you find a bug once every four years? Sometimes human knowledge/experience can't be tested out of a system. All test failures probably warrant investigation of some type.
But, as a general rule, I would try to brute force or fuzz your "inputs", a.k.a data out of your control. Random numbers, dates, user input, network packets, etc. Users and environments often act in unexpected ways and you want to make sure your bug finding isn't reduced to accidentally be working on the right day! :)
It wasn't, but it was the sort of thing where we never modified that test or even a related area, so it just wouldn't make sense that the test would fail.
But the bigger issue was that since it would "fix itself" we never prioritized it and just left it alone.
But looking back, the system (automated tests) was functioning as designed. :) We just changed our "DateTime.now" to concrete dates.
Here's an implementation of a modified version of his delta debugging algorithm we did for minimizing file inputs that induce a failure in a compiler or static analysis tool: http://delta.tigris.org/
Also works for other kinds of files of course. Our version is easily seen as a kind of simulated annealing, with the input providing sufficient randomness that we did not need to add any internally.
Fun story: this was my first open source project. I had always wanted to be an open source programmer, but the precipitating cause of me releasing delta was that Microsoft Research asked me to. So back when Microsoft was saying that open source is a "cancer" I became an open source programmer because Microsoft asked me to.
The GDB people have done it again. The new release 4.17 of the GNU debugger [6] brings several new features, languages, and platforms, but for some reason, it no longer integrates properly with my graphical front-end DDD [10]: the arguments specified within DDD are not passed to the debugged program. Something has changed within GDB such that it no longer works for me. Something? Between the 4.16 and 4.17 releases, no less than 178,000 lines have changed. How can I isolate the change that caused the failure and make GDB work again?
I couldn't quickly find the 4.17 release, but gdb 4.17.0.4 was released at the end of June 1998. The release notes mentioned Linux kernels in the 2.0-2.1 range.
Why does the article say there are 2^N different possible configurations? Shouldn't it be N! ? Maybe I'm missing something, but I don't see how they can rule out so many different orderings of the "configurations", since he started out by saying that the ordering wasn't reliable.
Edited: I figured it out. It's because they essentially do a binary search of the changes to find the issue, I should have read a bit further before commenting. The tables / examples cleared it up pretty quickly.
TLDR of the article: cross-application interfaces (like GDB's cli interface has effectively become) break with version changes. Users don't care about the reasons behind that breakage, and would like to see those treated as bugs. Developers see that attitude as unworkable, and would rather just not care about external interfaces at all.
I always wonder why we gave up on refactoring instead of switching it along so that cross-program changes would become not just possible, but easy and feasible.
What can be done using alt-shift-r in a distributed large scale java app with a GWT frontend and 50 libraries used on the backend, each of which can run distributed across machines ... is baffling.
It shouldn't be !
And now with microservices, it becomes more and more baffling that this worked at one point. That's just horrible. Most of the complaints about it strike me as being complaints that when a toolchain allows for complexity, people make things complex. And of course, that's fair. If you give a 5 year old a laser CNC machine and he finds how to turn it on, things are going to go south quickly. But damn the things you can do with that !
You definitely have a point (although what you say is only marginally related to the article). However, increasing complexity is inevitable in all aspects, therefore we should be developing tools to manage all this complexity instead of just deciding that "we shouldn't" (which is also an option, see e.g. the recent unikernels trend.)
Some random blog post about it: http://webchick.net/node/99