Debugging and Binary Search

W

7

3

"Programming Pearls" in the column 2 ("AHA! Algorithm") talks about how binary search aids in various processes like sorting, tree traversals. But it mentions that binary seach can be used in "program debugging". Can somebody please explain how this is done?

Willin answered 9/5, 2009 at 19:52 Comment(0)

O

6

Another possibility is that you have a bug, and you know it wasn't there in your February release, but it was in your April release (or rather, your April release candidate -- you would never actually ship a bug to your users, right?).

You can do a manual binary search through your revision-control history to narrow down when the bug was introduced. First check out the code halfway between the two releases, build it, and see if the bug is there. Keep partitioning until you find out when it was introduced. If you don't know where to start looking for the bug, this can be very effective, especially if you do fairly small commits.

This works very well with Subversion because it has repository-wide revision numbers. If your February release was rev 533 and your April release was rev 701, then you update to rev 617, test it, and go from there. (Actually, I usually round to 600 so I don't have to do so much math in my head.) Once I start to narrow it down, I start looking at the commit comments and making educated guesses ("I really don't think this commit would have broken it"), so I usually don't need to do all log₂(n) checkouts.

I've never used Git, but they take this one step further with the built-in "bisect" command. You give it a starting point (when was it known to work?) and ending point (when did you notice it was broken?), and it will automatically get the code for the halfway point in the binary search. Then after you've built and tested, you tell it whether this rev passed or failed; then it gets the code for the next halfway point. You can even tell it to run a command for each rev and use the command's exit code to determine whether the rev is a pass or fail, at which point it can run on full automatic.

Otranto answered 9/5, 2009 at 20:43 Comment(2)

"I've never used Git" -- please tell me this has changed (or that you've at least tried another distributed VC system, maybe Mercurial) since 2009! It's so much nicer. – Younker 21/12, 2015 at 23:36

@KyleStrand Yes, I do use Git now. :-) – Otranto 22/12, 2015 at 18:9

E

8

Binary search is an efficient way to find an item in a sorted list. For instance, if you are looking for a specific page in a book (say, pp. 147) you'd open the book near the middle and determine if the opened page is before or after the page you are looking for. Then you'd pick the section you've narrowed it down to and repeat the process: split it in half and determine which half contains page 147. Even better, you can guess how far in page 147 is—not far if the book is very long and near the end of a short book—and use that guess as the first division point. This variation on binary search is called interpolation search.

So if you have a bug and a sorted list it might be hiding in, interpolation search is usually the way to squash it. Other answers explain the common cases of a bug hidden somewhere in a range of lines or source code commits. But the technique can be applied in other situations:

log search

On a long-running system, especially one that processes so much data you have to rotate your logs daily, it's not uncommon to see something broken today that was fine a few weeks/months/years ago. With a complicated, interlocking system, it's possible for bugs to be uncovered without any code changes. Finding what changed in the hardware, network, OS, configuration (though that should be stored along with the code), input, manual procedures, etc. can be difficult since so many of these things change over long time periods. Full text searches of the logs (whether in a table or in files) is often impractical.

In this case, there's hardly any choice but to open the logs somewhere in the middle and see if the problem exists or not. Then cut the section where you know the bug is hiding and look for the bug again. Eventually, you should be able to discover the first moment your bug showed up, which makes finding the culprit a lot easier.
input search

The other day, I noticed an obscure "bug" with long text. The fastest way to track down the exact boundary between text that worke and text that broke the system was to cut the text in half until I found the dividing line. (Turns out I'm an idiot, but I did better counting bananas.)
conceptual process steps

Most people don't even know that they are using binary (or better, interpolation) search most of the time; it's a really a natural way to break down a problem. When thinking about a long series of steps that includes a potential bug, it's often sensible to check the output of one of the middle steps first to avoid examining the entire code only to find the problem is in the last step.

Elsyelton answered 31/8, 2015 at 20:42 Comment(5)

of course to be efficient for the sorted list, that list must have O(1) access. Linked lists, for instance, don't. -- re "input search" I often hunt for a specific change on a Wikipedia page history that way. – Nigh 18/12, 2015 at 18:53

@WillNess You can still have an efficient binary search without O(1) access. Skip lists, binary heaps, etc. Can be used to organize your data to get nearly the same search characteristics as a flat array, with better characteristics for insert/deletion to boot. – Byzantine 18/12, 2015 at 23:47

@RichardJ.RossIII A downside to all of those is that they generally go along with a lack of locality. Not always; you can use large-page with manual subdivision to keep the memory clumped. On modern processors, cache locality (and predictability of access) can be a ridiculously huge (100-ish-fold) performance boost. – Farnesol 21/12, 2015 at 15:13

I also use manual binary search occasionally as a last ditch effort to find a line of problematic code. I comment approximately half of my code, while keeping it functional. If the bug is still there, I comment half of the remaining code. If the bug goes away, I uncomment half of the code I previously commented. Rinse, repeat until the offending code is found. This is obviously not the first tool I use, but every so often I have to resort to it. ⛵🎩 – Cassaundracassava 23/12, 2015 at 18:36

+1 on the 'conceptual process steps' part - it is a natural process we use in our day to day as well, even without realizing or understanding that we're doing it. – Downing 24/12, 2015 at 16:2

R

7

If you don't know which line in a 100-line program is buggy, you'd try to run the first 50 lines and skip the rest. If the problem shows up, you'll know this first segment contains the bug. You'd next try to split this and run the first 25 lines and see if the problem is there and so on until you have figured out a short enough piece to look at.

The idea behind binary search is to identify/isolate a small region that is buggy. However, as with all methods, this is not applicable in every situation. E.g: a recursive function will be unwieldy to such a tool. When there are far too many execution paths, segmenting your code to be run may become difficult.

Ridgeling answered 9/5, 2009 at 19:57 Comment(1)

oh so binary search here doesnt mean you are searching for elements but simply dividing the program and looking for a problem. thanks. – Willin 9/5, 2009 at 20:13

O

6

Another possibility is that you have a bug, and you know it wasn't there in your February release, but it was in your April release (or rather, your April release candidate -- you would never actually ship a bug to your users, right?).

You can do a manual binary search through your revision-control history to narrow down when the bug was introduced. First check out the code halfway between the two releases, build it, and see if the bug is there. Keep partitioning until you find out when it was introduced. If you don't know where to start looking for the bug, this can be very effective, especially if you do fairly small commits.

This works very well with Subversion because it has repository-wide revision numbers. If your February release was rev 533 and your April release was rev 701, then you update to rev 617, test it, and go from there. (Actually, I usually round to 600 so I don't have to do so much math in my head.) Once I start to narrow it down, I start looking at the commit comments and making educated guesses ("I really don't think this commit would have broken it"), so I usually don't need to do all log₂(n) checkouts.

I've never used Git, but they take this one step further with the built-in "bisect" command. You give it a starting point (when was it known to work?) and ending point (when did you notice it was broken?), and it will automatically get the code for the halfway point in the binary search. Then after you've built and tested, you tell it whether this rev passed or failed; then it gets the code for the next halfway point. You can even tell it to run a command for each rev and use the command's exit code to determine whether the rev is a pass or fail, at which point it can run on full automatic.

Otranto answered 9/5, 2009 at 20:43 Comment(2)

"I've never used Git" -- please tell me this has changed (or that you've at least tried another distributed VC system, maybe Mercurial) since 2009! It's so much nicer. – Younker 21/12, 2015 at 23:36

@KyleStrand Yes, I do use Git now. :-) – Otranto 22/12, 2015 at 18:9

R

5

Binary search may help debugging in the following ways:

Suppose control has to reach a certain point and you suspect it doesn't. Put print statements in the first and last code lines. Suppose you see the result of the first but not second statement. Put a print statement in the middle and try again. This way you use binary search over the space of lines of code to zero in on the bug.
Suppose you use a version control system. Version 10 passed all tests. Version 70, about to be released, fails some tests. Check out version 40 and run the tests on it. If it works fine, try version 55. If version 40 fails, try version 25. This way you use binary search over program version space in order to zero in on the first version where a bug entered the program.

Rondelle answered 9/5, 2009 at 20:2 Comment(0)

C

2

Let's say you have a bug, but you don't know where it is. You could place break points randomly or single step through the code, verifying the data at each stop. A better strategy, however, would be to pick a spot in the middle of the code block that you're looking at. If the problem exists there, then pick a spot mid-way between the start and the current spot and try it again. If the problem doesn't exist, then pick a spot midway between the current spot and the end, and try again. Keep going this way until you've narrowed the amount of code down to a block big enough to single step through more efficiently than stopping/restarting. That's basically doing binary search on your code.

Castellano answered 9/5, 2009 at 19:59 Comment(0)

E

1

you can comment out code, add a logging comment or simply set the break point

great for code with no error but a nonfunctioning feature & you are filled with self doubt

First set the break point smack at the middle of the code, if all is well then you know the problem is not there

then set it at the 75% of code point - if the issue arises here then you know it is in the code between 50% & 75%

So next you set it at 57%

Again if the problem is there then you split it in half again

Basically you can find the issue in a few minutes rather than spending intellectually hours reanalyzing your code

Then its still up to you to fix it.

Embodiment answered 9/8, 2015 at 0:49 Comment(0)

P

1

The full algorithm is called Delta Debugging and was developed by Andreas Zeller, a Professor of informatics and author of the book Why programs fail.

However, this is not a binary search only. The binary search is only done in the beginning and once binary search does not minimize the input any more, another approach is taken.

The complete algorithm is not so hard to understand, actually very simple. However, it's sometimes difficult to reproduce the bug and apply the decision whether or not the issue has been reproduced.

Besides the book, there is a free online course on Udacity. If you prefer the short version, read his IEEE paper

Paintbox answered 31/8, 2015 at 20:51 Comment(0)

Recommended topics

Hot tags