Hacker News new | past | comments | ask | show | jobs | submit login

Because it seems you're making an assumption about the problem being reproducible and identifiable by running half the code. I've never found this to be the case. Ever. Its a strange idea. Where does this come from? This isn't a hypothesis based on data, it is an assumption. And a bizarre one.



The following program segfaults:

  main() { a(); b(); }
You don't have a debugger. You don't have the source for a or b. Strategy? Sure printf works, but so does //.


I get the feeling you don't actually program. Or your writing is disconnected from the programmer part of you.


That's interesting, could you elaborate?


I can. It's long and tedious, because a lot of it is very simple things you learn very early on writing and maintaining software.

You seem to be arguing that your example demonstrates the fact that you can localize a bug in the code with binary search; specifically that the bug must exist in either a() or b(), and that running them independently is both possible and will determine the single location of the bug.

This is not true. It is only true in the case that one assumes bugs must have single locations, and that those locations can be found with binary search over what amount to incomplete programs. In other words, you're assuming the prior, begging the question, etc.

It's tempting to say "real code isn't like this contrived example" but in fact this contrived example is the best possible demonstration of why blind binary search is a poor strategy. Let us say the 'bug' exists in a(). You seem to be assuming this is the necessary consequence:

main () { a(); b(); } -- segfaults main () { a(); /* b(); / } -- segfaults main () { / a(); / b(); } -- doesn't segfault

But this isn't the only possibility. With the bug existing in a(), you could also see this result:

main () { a(); b(); } -- segfaults main () { a(); / b(); / } -- segfaults main () { / a(); / b(); } -- segfaults

You would expect this in situations where b() uses data structures created by a().

We may also see this result, still assuming the bug exists in a():

main () { a(); b(); } -- segfaults main () { a(); / b(); / } -- doesn't segfault main () { / a(); / b(); } -- segfaults

If above we learned nothing, here we've actually got a falsehood - our binary search has localized the bug to b(), but it actually exists in a()!

So, in practice, binary search fails to localize a bug in a(). All of these situations can be created by having a() write a global which is relied upon by b() - a() may write to a protected area, b() may have a default value, a() may write nonsense, or pass an integer where b() expects an address - none of this is particularly exotic, they're all the sort of things you get every day when debugging segfaults.

We might now delve into a competing series of contrived examples of a() and b() and argue about their relative prevalence in the world (which none of us are capable of knowing), because if some case is particularly rare, it may make binary search very slightly better than flipping a coin in this case.

Instead, I will point out that this is once again* assuming the prior, and that we have these simple (if shockingly annoying) facts:

1) side effects exist 2) in the presence of side-effects, binary search cannot predict the location of a bug.

And it follows that in the sense that a "scientific" hypothesis has predictive power, then binary search over the codebase is basically the homeopathy of debugging.


If you comment out b, and it segfaults, the segfault is caused by some code in a. Otherwise, the segfault is caused by some code in b. The root cause of the segfault in b may be found in a, but even looking at the stack trace from a debugger will still point you at b. This is nevertheless helpful information.

Further, there's a misunderstanding about the method. You do a binary search on the "percentage" of code viewed as an instruction stream that is allowed to execute, starting from the beginning. I understand that a can have a different length from b. Given more code, you can also search based on timestamps. You never comment out a and allow b to execute without it, this is nonsense. If that's what you thought I meant, I can understand comments about not actually being able to program. I did not explain this clearly, but only because I had forgotten that it was necessary to explain this clearly, because some people actually don't know how to program, even on HN.


Let's review the conditions of your example:

  no source,
  no debugger,
  two function calls
And the assertion that // works as a strategy. There's only two things you could possibly comment out, so really the example is not so much "explained poorly" as "conceived poorly". Don't shoot the messenger - it's your example.

As for the further explanation here, allowing a debugger makes the search irrelevant - there's only one piece of information the binary search can tell you: which function causes the segfault, but as you observe, that information's available from the debugger already. And no, it doesn't matter if you reframe things as "a percentage of an instruction stream".

The objection to binary search is not "binary search in the codebase can never tell you anything," it's that it's a method that can't -- not "doesn't" but "can't" -- produce a non-trivial testable hypothesis. Does this mean you can't ever use it? You may have no choice. But it's a terrible place to start, and most of the information it offers is available through other means, often with more precision.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: