Binary search / bisection for floating point numbers
Asked Answered
K

2

5

It is easy to find an integer with binary search even if it can be arbitrarily large: first guess the order of magnitude, then keep dividing the interval. This answer describes how to find an arbitrary rational number.

Having set the scene, my question is similar: how can we guess a IEEE 754 floating point number? Assume it is not NaN, but everything else is fair game. For each guess, your program will be told whether the number in question is higher, equal or lower. Minimize the number of guesses required in the worst case.

(This is not a homework assignment. Though, I might make it one, if this turns out to have an interesting answer that's not just "beat the floating point numerical difficulties to death with lots and lots of special case handling.")

Edit: if I were better at searching I could have found the answer---but that only works if you already know that reinterpretation as int works (with certain caveats). So leaving this up. Thanks to Harold for a great answer!

Koffman answered 8/7, 2017 at 21:49 Comment(12)
Are we allowed to reinterpret the bit-pattern of floats as ints?Exponential
Sure, whatever you want. But the oracle will give you comparison result when interpreted as float.Koffman
Why "first guess the order of magnitude" ? What dos that have to do with binary search?Gurrola
@HenkHolterman See the new link I added, I hope that makes the question clearer? Summary: to use the classic binary search you first need finite bounds. An integer in the mathematical sense doesn't come with any finite bounds a priori, it can be arbitrarily large or small. So you need to guess them, before you can start dividing the interval.Koffman
@Matthias: so this is more a binary guess than a binary search. Binary searches search through a (sorted) list (or array or other container) of items. There, the interval is given by the largest and smallest values.Krystynakshatriya
@Rudy, I grew up in academia were we routinely use the term binary search with this broader meaning. See eg https://mcmap.net/q/2034532/-bin-packing-regarding-optimization-and-decision-versions for someone else doing the same. But, feel free to suggest an edit to the question. The term 'binary guess' doesn't seem to be common in the literature however.Koffman
@Matthias: See here: "In computer science, binary search, also known as half-interval search, logarithmic search, or binary chop, is a search algorithm that finds the position of a target value within a sorted array."Krystynakshatriya
@Rudy, I saw that one too. And for an intro article Wikipedia is good enough for the layman here, but you can see whether you can/want to improve the article to be more in line with actual usage in CS. The Talk section has lots more ideas for how to improve the article. See also how bisection method mentions the wider meaning of binary search.Koffman
@Matthias: I am a layman too (I'm a programming dentist). <g> What you are asking about is generally called a "number guessing game" and the binary strategy is just the most efficient one.Krystynakshatriya
@Rudy, I guess I was being a bit too polite to Wikipedia: I mean, the article has serious issues and should not be used as a reference on its own. In any case, please feel free to propose a change in formulation to the question. (I'm a mathematician by training myself, only got into computing because it's so easy.)Koffman
Let us continue this discussion in chat.Koffman
@Matthias: No need. It isn't too important. I just meant I wouldn't call it a search, although the principle is the same: repeatedly chop in halves.Krystynakshatriya
P
7

IEEE-754 64-bit floating point numbers are really 64-bit representations. Furthermore, with the exception of NaN values, there is no difference between floating point comparison and integer comparison of positive values. (That is, two bit patterns with the sign bit unset will produce the same comparison result regardless of whether you compare them as int64_t or double, unless one of the bit patterns is a floating point NaN-.)

That means you can find a number in 64 guesses by guessing one bit at a time, even if the number is ±∞. Start by comparing the number with 0; if the target is "less", then produce the guesses in the same way as below, but negate them before guessing. (Since IEEE-754 floats are sign/magnitude, you can negate the number by setting the sign bit to 1. Or you could do the positive bit-pattern reinterpretation and then floating point negate the result.)

After that, guess one bit at a time, starting with the highest-order value bit. Set that bit to 1 if the number is greater than or equal to the guess; set that bit to 0 if the number is less; and continue with the next bit until there aren't any more. To construct the guess, reinterpret the bit pattern as a double.

There are two caveats:

  1. You cannot distinguish between ±0 with comparison tests. That means that if your opponent wants you to distinguish between them, they will have to supply you with a way to ask about equality with −0, and you'll have to use that mechanism after you've apparently established that the number is 0 (which will happen on the 64th guess). This would add one guess, for a total of 65.

  2. If you are assured that the target is not a NaN, then there is no other problem. If it might be a NaN, you need to be careful how you compare: things will work out fine if you always ask "is X less than this guess?", because a NaN comparison will always return false. That means that after 11 successive "no" answers (not counting the one to establish the sign), you will find yourself guessing ∞, with the assumption that if the number is not less than ∞, it must be equal. However, in this case alone you need to explicitly test for equality as well, because that will also be false if the target is a NaN. This doesn't add an additional guess to the count, because it will always happen long before 64 guesses have been used up.

Palmieri answered 9/7, 2017 at 5:9 Comment(3)
On your first caveat: given that we get a three-way result each time (higher / lower / equal), and that the first comparison is with 0.0, wouldn't establishing that the number is +/- 0 happen on the first guess, so that a result of either +0.0 or -0.0 could happen on the second guess?Lazybones
Awesome! I even see very easily how this algorithm is not only terminating, but has optimal worst case runtime. Could you add a link to the background for the preservation of comparison under reinterpretation as int64_t, please?Koffman
I implemented your idea in Python. Works a charm. I can guess every float in 64 steps. (And have some examples that fail in 63 steps.)Koffman
B
0

The same approach can be applied to a floating point number. Worse case run time is O(log n).

public class GuessComparer
{
    private float random;
    public GuessComparer() // generate a random float and keep it private
    {
        Random rnd = new Random();
        var buffer = new byte[4];
        rnd.NextBytes(buffer);
        random = BitConverter.ToSingle(buffer, 0);
    }
    public int CheckGuess(float quess) // answer whether number is high, lower or the same.
    {
        return random.CompareTo(quess);
    }
}
public class FloatFinder
{

    public static int Find(GuessComparer checker)
    {
        float guess = 0;
        int result = checker.CheckGuess(guess);
        int guesscount = 1;
        var high = float.MaxValue;
        var low = float.MinValue;
        while (result != 0)
        {
            if (result > 0) //random is higher than guess
                low = guess;
            else// random is lower than guess

                high = guess;

            guess = (high + low) / 2;
            guesscount++;
            result = checker.CheckGuess(guess);
        }
        Console.WriteLine("Found answer in {0}", guesscount);
        return guesscount;
    }

    public static void Find()
    {
        var checker = new GuessComparer();
        int guesses = Find(checker);
    }
}
Blague answered 8/7, 2017 at 23:23 Comment(4)
Thanks. I am surprised this can deal with all the nastiness around overflow and underflow etc? I don't think this handles infinities right? I doubt this is optimal, since it doesn't know anything about float density (ie it doesn't take into account that there are more floats between 0 and 1 than between 1,000,000 and 1,000,001.Koffman
It doesn't need to. As long as the starting high is the max value for the data type (3.40282347E+38 for a single) and starting low is min value (-3.40282347E+38) taking the average between high and low on each loop will make the algorithm converge just as it does with integers. There is no possible way to run into Positive Infinity (1.0F / 0.0F), Negative Infinity (-1.0F / 0.0F) or NaN (0.0F / 0.0F).Blague
What if the number in question to guess is infinity? Or what if the number in question is just below float.MaxValue---won't your calculation of (high + low)/2 overflow close to the end when 'low' is also very big?Koffman
The maximum value of a single is positive infinity, not 3.40282347E+38.Plain

© 2022 - 2024 — McMap. All rights reserved.