The "guess the number" game for arbitrary rational numbers?
Asked Answered
D

8

79

I once got the following as an interview question:

I'm thinking of a positive integer n. Come up with an algorithm that can guess it in O(lg n) queries. Each query is a number of your choosing, and I will answer either "lower," "higher," or "correct."

This problem can be solved by a modified binary search, in which you listing powers of two until you find one that exceeds n, then run a standard binary search over that range. (This algorithm is called exponential search.) What I think is so cool about this is that you can search an infinite space for a particular number faster than just brute-force.

The question I have, though, is a slight modification of this problem. Instead of picking a positive integer, suppose that I pick an arbitrary rational number between zero and one. My question is: what algorithm can you use to most efficiently determine which rational number I've picked?

Right now, the best solution I have can find p/q in at most O(q) time by implicitly walking the Stern-Brocot tree, a binary search tree over all the rationals. However, I was hoping to get a runtime closer to the runtime that we got for the integer case, maybe something like O(lg (p + q)) or O(lg pq). Does anyone know of a way to get this sort of runtime?

I initially considered using a standard binary search of the interval [0, 1], but this will only find rational numbers with a non-repeating binary representation, which misses almost all of the rationals. (It can only find dyadic rational numbers.)

I also thought about using some other way of enumerating the rationals, but I can't seem to find a way to search this space given just greater/equal/less comparisons.

Dud answered 26/3, 2011 at 6:19 Comment(7)
Um, you do realize that without some limits this is impossible, since there are infinitely many rational numbers in any range? You can't really search for an unlimited integer either; suppose the number had been some random number with 10^1000 digits? "100" - "higher". "1000" - "higher". "One million" - "higher". "One trillion?" "Higher." "A googol?" "Higher!"Hydrogeology
@Tom - Given any number, (such as 10^1000), the algorithm will find it in a finite amount of time (even if it's a very long time). This is different than saying the algorithm can guess any number within t steps (for some fixed value of t), but no one has made this claim.Vereeniging
@Tom Zych- if you pick any finite integer, eventually I can find it by repeatedly doubling. It might take a ridiculous amount of time, but I can still do it in time proportional to the logarithm of your number. In this, I'm assuming that the person answering the questions is honestly representing the number and non just dodging by answering in a way that never terminates.Dud
Interesting algorithm. All rationals with denominator N are located before (or in) the level N of the tree, so O(q) is clearly possibleSerrulation
Er, quite right. I shouldn't post so late at night...Hydrogeology
@Everyone: I'd just like to say, this has been an interesting question with some neat responses and discussions. My inner math nerd is happy.Vereeniging
I'd like to reiterate Seth's comments and thank everyone for putting in the time and effort to crack this problem. It's a great feeling knowing that this sort of talent exists here, and it's wonderful knowing that I'm not the only one who likes these sorts of problems.Dud
O
50

Okay, here's my answer using continued fractions alone.

First let's get some terminology here.

Let X = p/q be the unknown fraction.

Let Q(X,p/q) = sign(X - p/q) be the query function: if it is 0, we've guessed the number, and if it's +/- 1 that tells us the sign of our error.

The conventional notation for continued fractions is A = [a0; a1, a2, a3, ... ak]

= a0 + 1/(a1 + 1/(a2 + 1/(a3 + 1/( ... + 1/ak) ... )))


We'll follow the following algorithm for 0 < p/q < 1.

  1. Initialize Y = 0 = [ 0 ], Z = 1 = [ 1 ], k = 0.

  2. Outer loop: The preconditions are that:

    • Y and Z are continued fractions of k+1 terms which are identical except in the last element, where they differ by 1, so that Y = [y0; y1, y2, y3, ... yk] and Z = [y0; y1, y2, y3, ... yk + 1]

    • (-1)k(Y-X) < 0 < (-1)k(Z-X), or in simpler terms, for k even, Y < X < Z and for k odd, Z < X < Y.

  3. Extend the degree of the continued fraction by 1 step without changing the values of the numbers. In general, if the last terms are yk and yk + 1, we change that to [... yk, yk+1=∞] and [... yk, zk+1=1]. Now increase k by 1.

  4. Inner loops: This is essentially the same as @templatetypedef's interview question about the integers. We do a two-phase binary search to get closer:

  5. Inner loop 1: yk = ∞, zk = a, and X is between Y and Z.

  6. Double Z's last term: Compute M = Z but with mk = 2*a = 2*zk.

  7. Query the unknown number: q = Q(X,M).

  8. If q = 0, we have our answer and go to step 17 .

  9. If q and Q(X,Y) have opposite signs, it means X is between Y and M, so set Z = M and go to step 5.

  10. Otherwise set Y = M and go to the next step:

  11. Inner loop 2. yk = b, zk = a, and X is between Y and Z.

  12. If a and b differ by 1, swap Y and Z, go to step 2.

  13. Perform a binary search: compute M where mk = floor((a+b)/2, and query q = Q(X,M).

  14. If q = 0, we're done and go to step 17.

  15. If q and Q(X,Y) have opposite signs, it means X is between Y and M, so set Z = M and go to step 11.

  16. Otherwise, q and Q(X,Z) have opposite signs, it means X is between Z and M, so set Y = M and go to step 11.

  17. Done: X = M.

A concrete example for X = 16/113 = 0.14159292

Y = 0 = [0], Z = 1 = [1], k = 0

k = 1:
Y = 0 = [0; &#8734;] < X, Z = 1 = [0; 1] > X, M = [0; 2] = 1/2 > X.
Y = 0 = [0; &#8734;], Z = 1/2 = [0; 2], M = [0; 4] = 1/4 > X.
Y = 0 = [0; &#8734;], Z = 1/4 = [0; 4], M = [0; 8] = 1/8 < X.
Y = 1/8 = [0; 8], Z = 1/4 = [0; 4], M = [0; 6] = 1/6 > X.
Y = 1/8 = [0; 8], Z = 1/6 = [0; 6], M = [0; 7] = 1/7 > X.
Y = 1/8 = [0; 8], Z = 1/7 = [0; 7] 
  --> the two last terms differ by one, so swap and repeat outer loop.

k = 2:
Y = 1/7 = [0; 7, &#8734;] > X, Z = 1/8 = [0; 7, 1] < X,
    M = [0; 7, 2] = 2/15 < X
Y = 1/7 = [0; 7, &#8734;], Z = 2/15 = [0; 7, 2],
    M = [0; 7, 4] = 4/29 < X
Y = 1/7 = [0; 7, &#8734;], Z = 4/29 = [0; 7, 4], 
    M = [0; 7, 8] = 8/57 < X
Y = 1/7 = [0; 7, &#8734;], Z = 8/57 = [0; 7, 8],
    M = [0; 7, 16] = 16/113 = X 
    --> done!

At each step of computing M, the range of the interval reduces. It is probably fairly easy to prove (though I won't do this) that the interval reduces by a factor of at least 1/sqrt(5) at each step, which would show that this algorithm is O(log q) steps.

Note that this can be combined with templatetypedef's original interview question and apply towards any rational number p/q, not just between 0 and 1, by first computing Q(X,0), then for either positive/negative integers, bounding between two consecutive integers, and then using the above algorithm for the fractional part.

When I have a chance next, I will post a python program that implements this algorithm.

edit: also, note that you don't have to compute the continued fraction each step (which would be O(k), there are partial approximants to continued fractions that can compute the next step from the previous step in O(1).)

edit 2: Recursive definition of partial approximants:

If Ak = [a0; a1, a2, a3, ... ak] = pk/qk, then pk = akpk-1 + pk-2, and qk = akqk-1 + qk-2. (Source: Niven & Zuckerman, 4th ed, Theorems 7.3-7.5. See also Wikipedia)

Example: [0] = 0/1 = p0/q0, [0; 7] = 1/7 = p1/q1; so [0; 7, 16] = (16*1+0)/(16*7+1) = 16/113 = p2/q2.

This means that if two continued fractions Y and Z have the same terms except the last one, and the continued fraction excluding the last term is pk-1/qk-1, then we can write Y = (ykpk-1 + pk-2) / (ykqk-1 + qk-2) and Z = (zkpk-1 + pk-2) / (zkqk-1 + qk-2). It should be possible to show from this that |Y-Z| decreases by at least a factor of 1/sqrt(5) at each smaller interval produced by this algorithm, but the algebra seems to be beyond me at the moment. :-(

Here's my Python program:

import math

# Return a function that returns Q(p0/q0,p/q) 
#   = sign(p0/q0-p/q) = sign(p0q-q0p)*sign(q0*q)
# If p/q < p0/q0, then Q() = 1; if p/q < p0/q0, then Q() = -1; otherwise Q()=0.
def makeQ(p0,q0):
  def Q(p,q):
    return cmp(q0*p,p0*q)*cmp(q0*q,0)
  return Q

def strsign(s):
  return '<' if s<0 else '>' if s>0 else '=='

def cfnext(p1,q1,p2,q2,a):
  return [a*p1+p2,a*q1+q2]

def ratguess(Q, doprint, kmax):
# p2/q2 = p[k-2]/q[k-2]
  p2 = 1
  q2 = 0
# p1/q1 = p[k-1]/q[k-1]
  p1 = 0
  q1 = 1
  k = 0
  cf = [0]
  done = False
  while not done and (not kmax or k < kmax):
    if doprint:
      print 'p/q='+str(cf)+'='+str(p1)+'/'+str(q1)
# extend continued fraction
    k = k + 1
    [py,qy] = [p1,q1]
    [pz,qz] = cfnext(p1,q1,p2,q2,1)
    ay = None
    az = 1
    sy = Q(py,qy)
    sz = Q(pz,qz)
    while not done:
      if doprint:
        out = str(py)+'/'+str(qy)+' '+strsign(sy)+' X '
        out += strsign(-sz)+' '+str(pz)+'/'+str(qz)
        out += ', interval='+str(abs(1.0*py/qy-1.0*pz/qz))
      if ay:
        if (ay - az == 1):
          [p0,q0,a0] = [pz,qz,az]
          break
        am = (ay+az)/2
      else:
        am = az * 2
      [pm,qm] = cfnext(p1,q1,p2,q2,am)
      sm = Q(pm,qm)
      if doprint:
        out = str(ay)+':'+str(am)+':'+str(az) + '   ' + out + ';  M='+str(pm)+'/'+str(qm)+' '+strsign(sm)+' X '
        print out
      if (sm == 0):
        [p0,q0,a0] = [pm,qm,am]
        done = True
        break
      elif (sm == sy):
        [py,qy,ay,sy] = [pm,qm,am,sm]
      else:
        [pz,qz,az,sz] = [pm,qm,am,sm]     

    [p2,q2] = [p1,q1]
    [p1,q1] = [p0,q0]    
    cf += [a0]

  print 'p/q='+str(cf)+'='+str(p1)+'/'+str(q1)
  return [p1,q1]

and a sample output for ratguess(makeQ(33102,113017), True, 20):

p/q=[0]=0/1
None:2:1   0/1 < X < 1/1, interval=1.0;  M=1/2 > X 
None:4:2   0/1 < X < 1/2, interval=0.5;  M=1/4 < X 
4:3:2   1/4 < X < 1/2, interval=0.25;  M=1/3 > X 
p/q=[0, 3]=1/3
None:2:1   1/3 > X > 1/4, interval=0.0833333333333;  M=2/7 < X 
None:4:2   1/3 > X > 2/7, interval=0.047619047619;  M=4/13 > X 
4:3:2   4/13 > X > 2/7, interval=0.021978021978;  M=3/10 > X 
p/q=[0, 3, 2]=2/7
None:2:1   2/7 < X < 3/10, interval=0.0142857142857;  M=5/17 > X 
None:4:2   2/7 < X < 5/17, interval=0.00840336134454;  M=9/31 < X 
4:3:2   9/31 < X < 5/17, interval=0.00379506641366;  M=7/24 < X 
p/q=[0, 3, 2, 2]=5/17
None:2:1   5/17 > X > 7/24, interval=0.00245098039216;  M=12/41 < X 
None:4:2   5/17 > X > 12/41, interval=0.00143472022956;  M=22/75 > X 
4:3:2   22/75 > X > 12/41, interval=0.000650406504065;  M=17/58 > X 
p/q=[0, 3, 2, 2, 2]=12/41
None:2:1   12/41 < X < 17/58, interval=0.000420521446594;  M=29/99 > X 
None:4:2   12/41 < X < 29/99, interval=0.000246366100025;  M=53/181 < X 
4:3:2   53/181 < X < 29/99, interval=0.000111613371282;  M=41/140 < X 
p/q=[0, 3, 2, 2, 2, 2]=29/99
None:2:1   29/99 > X > 41/140, interval=7.21500721501e-05;  M=70/239 < X 
None:4:2   29/99 > X > 70/239, interval=4.226364059e-05;  M=128/437 > X 
4:3:2   128/437 > X > 70/239, interval=1.91492009996e-05;  M=99/338 > X 
p/q=[0, 3, 2, 2, 2, 2, 2]=70/239
None:2:1   70/239 < X < 99/338, interval=1.23789953207e-05;  M=169/577 > X 
None:4:2   70/239 < X < 169/577, interval=7.2514738621e-06;  M=309/1055 < X 
4:3:2   309/1055 < X < 169/577, interval=3.28550190148e-06;  M=239/816 < X 
p/q=[0, 3, 2, 2, 2, 2, 2, 2]=169/577
None:2:1   169/577 > X > 239/816, interval=2.12389981991e-06;  M=408/1393 < X 
None:4:2   169/577 > X > 408/1393, interval=1.24415093544e-06;  M=746/2547 < X 
None:8:4   169/577 > X > 746/2547, interval=6.80448470014e-07;  M=1422/4855 < X 
None:16:8   169/577 > X > 1422/4855, interval=3.56972657711e-07;  M=2774/9471 > X 
16:12:8   2774/9471 > X > 1422/4855, interval=1.73982239227e-07;  M=2098/7163 > X 
12:10:8   2098/7163 > X > 1422/4855, interval=1.15020646951e-07;  M=1760/6009 > X 
10:9:8   1760/6009 > X > 1422/4855, interval=6.85549088053e-08;  M=1591/5432 < X 
p/q=[0, 3, 2, 2, 2, 2, 2, 2, 9]=1591/5432
None:2:1   1591/5432 < X < 1760/6009, interval=3.06364213998e-08;  M=3351/11441 < X 
p/q=[0, 3, 2, 2, 2, 2, 2, 2, 9, 1]=1760/6009
None:2:1   1760/6009 > X > 3351/11441, interval=1.45456726663e-08;  M=5111/17450 < X 
None:4:2   1760/6009 > X > 5111/17450, interval=9.53679318849e-09;  M=8631/29468 < X 
None:8:4   1760/6009 > X > 8631/29468, interval=5.6473816179e-09;  M=15671/53504 < X 
None:16:8   1760/6009 > X > 15671/53504, interval=3.11036635336e-09;  M=29751/101576 > X 
16:12:8   29751/101576 > X > 15671/53504, interval=1.47201634215e-09;  M=22711/77540 > X 
12:10:8   22711/77540 > X > 15671/53504, interval=9.64157420569e-10;  M=19191/65522 > X 
10:9:8   19191/65522 > X > 15671/53504, interval=5.70501257346e-10;  M=17431/59513 > X 
p/q=[0, 3, 2, 2, 2, 2, 2, 2, 9, 1, 8]=15671/53504
None:2:1   15671/53504 < X < 17431/59513, interval=3.14052228667e-10;  M=33102/113017 == X

Since Python handles biginteger math from the start, and this program uses only integer math (except for the interval calculations), it should work for arbitrary rationals.


edit 3: Outline of proof that this is O(log q), not O(log^2 q):

First note that until the rational number is found, the # of steps nk for each new continued fraction term is exactly 2b(a_k)-1 where b(a_k) is the # of bits needed to represent a_k = ceil(log2(a_k)): it's b(a_k) steps to widen the "net" of the binary search, and b(a_k)-1 steps to narrow it). See the example above, you'll note that the # of steps is always 1, 3, 7, 15, etc.

Now we can use the recurrence relation qk = akqk-1 + qk-2 and induction to prove the desired result.

Let's state it in this way: that the value of q after the Nk = sum(nk) steps required for reaching the kth term has a minimum: q >= A*2cN for some fixed constants A,c. (so to invert, we'd get that the # of steps N is <= (1/c) * log2 (q/A) = O(log q).)

Base cases:

  • k=0: q = 1, N = 0, so q >= 2N
  • k=1: for N = 2b-1 steps, q = a1 >= 2b-1 = 2(N-1)/2 = 2N/2/sqrt(2).

This implies A = 1, c = 1/2 could provide desired bounds. In reality, q may not double each term (counterexample: [0; 1, 1, 1, 1, 1] has a growth factor of phi = (1+sqrt(5))/2) so let's use c = 1/4.

Induction:

  • for term k, qk = akqk-1 + qk-2. Again, for the nk = 2b-1 steps needed for this term, ak >= 2b-1 = 2(nk-1)/2.

    So akqk-1 >= 2(Nk-1)/2 * qk-1 >= 2(nk-1)/2 * A*2Nk-1/4 = A*2Nk/4/sqrt(2)*2nk/4.

Argh -- the tough part here is that if ak = 1, q may not increase much for that one term, and we need to use qk-2 but that may be much smaller than qk-1.

Olnee answered 26/3, 2011 at 22:56 Comment(12)
So this looks really great, but I don't think it's O(lg q). Any individual iteration of the inner loop runs in O(lg q) steps as you use the modified binary search to recover the next number of the continued fraction, but remember that there are O(lg q) iterations of that loop since there are (in the worst case) O(lg q) numbers in the partial fraction. That makes me think that this is O(lg^2 q) time instead. However, this is still an excellent solution to the problem, and whether it's O(lg q) or O(lg^2 q) time it's still exponentially better than what I had before.Dud
I know it looks like O(lg^2 q) because of the two loops, but that's probably conservative. I'll try to prove it.Olnee
+1: Didn't check the details, but I now believe the CF approach will work.Beitris
You need need to try and show that |Y-Z| decreases geometrically. As typedeftemplate mentions in his answer, there are O(log q) terms in the CF, each of size at most q. So if you take O(log q) steps to increase the 'degree' of your CF by 1, you take O(log^2 q) steps total.Beitris
No, you can't evaluate it as O(log^2 q) because of the two loops; that's overconservative. If you take O(log q) steps to increase the # of terms of the continued fraction, then the term of that fraction is very large, and the interval will be very small. Each iteration of the inner loop decreases the interval as well, not just the increase in the continued fraction length.Olnee
@Jason: I am not saying it is Omega(log^2 q) :-) I suppose it will be log (a_1) + log(a_2) + .. + log (a_k) where the number to be guessed is [a_1, a_2, ..., a_k], if you want to be more precise.Beitris
@jason-s: It looks like you do about the same as I do on guesses, and occasionally a little better. I'm declaring your method the winner. A rigorous analysis demonstrating that you're O(log(q)) seems tricky to me, but I believe that it is so.Paedogenesis
@Jason: If we go by O(log (a_1) + .. + log (a_k)), then it seems likely to be O(log q) in the average case. I believe this follows from the remarkable result of Khinchin that the geometric mean: (a1 a2 ... an)^(1/n) of the CF [a0, a1, a2 ... an,...] tends to a constant (known as Khinchin's constant).Beitris
What makes it remarkable is that the constant is independent of the CF!Beitris
Of course, since that need not be valid for a countable set (or even an uncountable set of measure 0), the above statement is not a proof.Beitris
I think I can prove it (not sure if/when I'll have time to do so :/ ) -- the approach would be to look at the denominator q using the recurrence relation q_k = a_kq_k-1 + q_k-2. Unless the algorithm terminates because we reach equality, the # of steps for each new continued fraction term is *exactly 2b(a_k)-1 where b(n) is the # of bits needed to represent a_k = ceil(log2(a_k)). (it's b(a_k) steps to widen the "net" of the binary search, and b(a_k)-1 steps to narrow it)Olnee
Yeah, here is a proof: IF we do the euclidean division of u0/u1, we get uk = u(k+1) ak + u(k+2) >= u(k+1)ak. Taking logs add adding up gives us log u0 - log uM > sum log(aj). where M is the number of steps. And thus sum log (ai) = O(log q). The ak gives us the CF of u0/u1.Beitris
P
6

Let's take the rational numbers, in reduced form, and write them out in order first of denominator, then numerator.

1/2, 1/3, 2/3, 1/4, 3/4, 1/5, 2/5, 3/5, 4/5, 1/6, 5/6, ...

Our first guess is going to be 1/2. Then we'll go along the list until we have 3 in our range. Then we will take 2 guesses to search that list. Then we'll go along the list until we have 7 in our remaining range. Then we will take 3 guesses to search that list. And so on.

In n steps we'll cover the first 2O(n) possibilities, which is in the order of magnitude of efficiency that you were looking for.

Update: People didn't get the reasoning behind this. The reasoning is simple. We know how to walk a binary tree efficiently. There are O(n2) fractions with maximum denominator n. We could therefore search up to any particular denominator size in O(2*log(n)) = O(log(n)) steps. The problem is that we have an infinite number of possible rationals to search. So we can't just line them all up, order them, and start searching.

Therefore my idea was to line up a few, search, line up more, search, and so on. Each time we line up more we line up about double what we did last time. So we need one more guess than we did last time. Therefore our first pass uses 1 guess to traverse 1 possible rational. Our second uses 2 guesses to traverse 3 possible rationals. Our third uses 3 guesses to traverse 7 possible rationals. And our k'th uses k guesses to traverse 2k-1 possible rationals. For any particular rational m/n, eventually it will wind up putting that rational on a fairly big list that it knows how to do a binary search on efficiently.

If we did binary searches, then ignored everything we'd learned when we grab more rationals, then we'd put all of the rationals up to and including m/n in O(log(n)) passes. (That's because by that point we'll get to a pass with enough rationals to include every rational up to and including m/n.) But each pass takes more guesses, so that would be O(log(n)2) guesses.

However we actually do a lot better than that. With our first guess, we eliminate half the rationals on our list as being too big or small. Our next two guesses don't quite cut the space into quarters, but they don't come too far from it. Our next 3 guesses again don't quite cut the space into eighths, but they don't come too far from it. And so on. When you put it together, I'm convinced that the result is that you find m/n in O(log(n)) steps. Though I don't actually have a proof.

Try it out: Here is code to generate the guesses so that you can play and see how efficient it is.

#! /usr/bin/python

from fractions import Fraction
import heapq
import readline
import sys

def generate_next_guesses (low, high, limit):
    upcoming = [(low.denominator + high.denominator,
                 low.numerator + high.numerator,
                 low.denominator, low.numerator,
                 high.denominator, high.numerator)]
    guesses = []
    while len(guesses) < limit:
        (mid_d, mid_n, low_d, low_n, high_d, high_n) = upcoming[0]
        guesses.append(Fraction(mid_n, mid_d))
        heapq.heappushpop(upcoming, (low_d + mid_d, low_n + mid_n,
                                     low_d, low_n, mid_d, mid_n))
        heapq.heappush(upcoming, (mid_d + high_d, mid_n + high_n,
                                  mid_d, mid_n, high_d, high_n))
    guesses.sort()
    return guesses

def ask (num):
    while True:
        print "Next guess: {0} ({1})".format(num, float(num))
        if 1 < len(sys.argv):
            wanted = Fraction(sys.argv[1])
            if wanted < num:
                print "too high"
                return 1
            elif num < wanted:
                print "too low"
                return -1
            else:
                print "correct"
                return 0

        answer = raw_input("Is this (h)igh, (l)ow, or (c)orrect? ")
        if answer == "h":
            return 1
        elif answer == "l":
            return -1
        elif answer == "c":
            return 0
        else:
            print "Not understood.  Please say one of (l, c, h)"

guess_size_bound = 2
low = Fraction(0)
high = Fraction(1)
guesses = [Fraction(1,2)]
required_guesses = 0
answer = -1
while 0 != answer:
    if 0 == len(guesses):
        guess_size_bound *= 2
        guesses = generate_next_guesses(low, high, guess_size_bound - 1)
    #print (low, high, guesses)
    guess = guesses[len(guesses)/2]
    answer = ask(guess)
    required_guesses += 1
    if 0 == answer:
        print "Thanks for playing!"
        print "I needed %d guesses" % required_guesses
    elif 1 == answer:
        high = guess
        guesses[len(guesses)/2:] = []
    else:
        low = guess
        guesses[0:len(guesses)/2 + 1] = []

As an example to try it out I tried 101/1024 (0.0986328125) and found that it took 20 guesses to find the answer. I tried 0.98765 and it took 45 guesses. I tried 0.0123456789 and it needed 66 guesses and about a second to generate them. (Note, if you call the program with a rational number as an argument, it will fill in all of the guesses for you. This is a very helpful convenience.)

Paedogenesis answered 26/3, 2011 at 7:40 Comment(18)
I'm not sure I understand what you're saying. Can you clarify this?Dud
@templatetypedef: What is unclear? Our first guess is always 1/2. Suppose that the answer comes back lower. The next 3 numbers on the list fitting the condition are 1/3, 1/4 and 1/5. So we guess 1/4 next, then either 1/3 or 1/5 on the following guess. If we continue, we grab 7 numbers in our range and set up the next 3 guesses. After that we'll grab 15 and set up the next 4 guesses. etc. What is unclear about that? I'm going to bed now. If you still do not understand in the morning, I'll write a program to do the guessing and you can see how it works.Paedogenesis
@btilly: Where do the 7-3, 15-4, (31-5?) come in? What's the logic behind having such a queue of "guess-next" numbers?Chil
Ah, okay. I see what you're saying. However, how do you efficiently generate the fractions to test? Scanning over the enumerated sequence is no better than what we had before. Also, why pick 3, 7, etc. instead of another sequence? Is it just for the convenience of the binary search?Dud
@templatetypedef: You can efficiently generate the fractions to test by walking the Stern Brocot Tree. Using a heap to generate them in order of denominator size until you have enough. Though it is not that efficient - the number to generate grows exponentially with the number of guesses you have.Paedogenesis
Some back-of-the-envelope analysis: We eliminate O(2^t) possibilities on step t, using t guesses. After step t, we will have eliminated O(2^t) possibilities in total, and made 1 + 2 +... + t = O(t^2) guesses. The target p/q has index O(q^2) on the list. We will get there on step O(log q^2) = O(log q), having made O(log^2 q) guesses. So same asymptotic performance as my proposal, but this algorithm is much more efficient and elegant.Vereeniging
@Seth: I believe my asymptotic performance is more exactly 2*log(n)/log(2) + O(sqrt(log(n)).Paedogenesis
@Btilly: +1, but it looks like you haven't really have solved the main issue. You generate Theta(q) rationals and do a binary search over those. So the runtime is Omega(q), even though you take O(log^2 q) queries. In fact, Seth has a very similar algorithm (and if you read carefully, he isn't querying for p+q). IMO, the main problem to be solved here is generation of O(polylog(q)) rationals, rather than trying to keep the number of queries O(polylog(q)) irrespective of the other book-keeping overhead.Beitris
@billy: My analysis was less than rigorous, and it's entirely possible I made a mistake. Could you share the analysis that leads to your O(log n) claim and/or let me know where I erred? (I take it n = p + q)?Vereeniging
@Seth: It is btilly, not billy :-) q or p+q, does not matter, as p < q.Beitris
@Moron: I interpreted the word "efficient" to mean "very few queries gets you the answer". It seems to me that you're asking for the answer to a different question. I'll need to think about how to do that. Incidentally the run-time is better than you think, I need O(sqrt(q) * junk(q)) operations where the junk grows slower than any polynomial. (Of course the size of the integers will also grow, slowing you down.)Paedogenesis
@Seth: log(q^2) is 2 log(q), not log^2 (q). I do not have a rigorous analysis. I only believe the more precise estimate I gave you.Paedogenesis
@Moron: Now that I look at what @Vereeniging wrote I see that you are right. However mine runs much faster because he spends time considering a lot of values of p+q that don't actually have any guesses. The 0.0123456789 example that I posted would have him looking at over 10 billion values of p+q, which my algorithm did not.Paedogenesis
@Btilly: log1 + log 2 + ... + log q is Theta(log^2 q). You increase the number of searches you do each time by 1. So it is log 1 + log2 + .. + logq, isn't it? Seth does a binary search of p+q, so his is also Theta(Log^2 q) queries.Beitris
@Moron: You're analyzing the result of doing q passes. But each pass is starting with a data set that is already pre-filtered by your previous answers, so you need a lot fewer than q passes to get your answer. In fact you need O(sqrt(q)) passes. Try saving my code and running it with some sample values and see how much work it is doing and how many steps it takes. This is not just me theorizing. One example I ran matched a fraction with 10 billion in the denominator after 62 queries - during the 11th pass.Paedogenesis
@Moron: Now that I read more carefully, I was misanalyzing @Seth's method.Paedogenesis
If the number is 1/100, won't you guess 1/2, 1/3, 1/4, ...?Stuyvesant
@ThomasAhle Copy the program, and run it. You'll find that it takes 21 guesses to find 1/100.Paedogenesis
O
4

I've got it! What you need to do is to use a parallel search with bisection and continued fractions.

Bisection will give you a limit toward a specific real number, as represented as a power of two, and continued fractions will take the real number and find the nearest rational number.

How you run them in parallel is as follows.

At each step, you have l and u being the lower and upper bounds of bisection. The idea is, you have a choice between halving the range of bisection, and adding an additional term as a continued fraction representation. When both l and u have the same next term as a continued fraction, then you take the next step in the continued fraction search, and make a query using the continued fraction. Otherwise, you halve the range using bisection.

Since both methods increase the denominator by at least a constant factor (bisection goes by factors of 2, continued fractions go by at least a factor of phi = (1+sqrt(5))/2), this means your search should be O(log(q)). (There may be repeated continued fraction calculations, so it may end up as O(log(q)^2).)

Our continued fraction search needs to round to the nearest integer, not use floor (this is clearer below).

The above is kind of handwavy. Let's use a concrete example of r = 1/31:

  1. l = 0, u = 1, query = 1/2. 0 is not expressible as a continued fraction, so we use binary search until l != 0.

  2. l = 0, u = 1/2, query = 1/4.

  3. l = 0, u = 1/4, query = 1/8.

  4. l = 0, u = 1/8, query = 1/16.

  5. l = 0, u = 1/16, query = 1/32.

  6. l = 1/32, u = 1/16. Now 1/l = 32, 1/u = 16, these have different cfrac reps, so keep bisecting., query = 3/64.

  7. l = 1/32, u = 3/64, query = 5/128 = 1/25.6

  8. l = 1/32, u = 5/128, query = 9/256 = 1/28.4444....

  9. l = 1/32, u = 9/256, query = 17/512 = 1/30.1176... (round to 1/30)

  10. l = 1/32, u = 17/512, query = 33/1024 = 1/31.0303... (round to 1/31)

  11. l = 33/1024, u = 17/512, query = 67/2048 = 1/30.5672... (round to 1/31)

  12. l = 33/1024, u = 67/2048. At this point both l and u have the same continued fraction term 31, so now we use a continued fraction guess. query = 1/31.

SUCCESS!

For another example let's use 16/113 (= 355/113 - 3 where 355/113 is pretty close to pi).

[to be continued, I have to go somewhere]


On further reflection, continued fractions are the way to go, never mind bisection except to determine the next term. More when I get back.

Olnee answered 26/3, 2011 at 14:11 Comment(8)
You're definitely on to something here. I think the approach might just be to use the general "I'm thinking of an integer" algorithm to compute one term of the continued fraction at a time. I'm not an expert in continued fractions, but from what i gather there's only logarithmically many terms in the representation, and if this procedure works it would be a way to generate each of those terms one at a time using logarithmic time for each of them. I'll think this over.Dud
Yea, I completely agree - CFs are the simplest and probably most effective answer, just using an integer search for each term. I was going to put that as my own answer but @Jason beat me to it.Scrimmage
There is no well defined nearest rational number to a given real (apart from itself). What this approach is doing is not very clear, perhaps it needs more elaboration.Beitris
@Moron: Read up on continued fraction approximations. (e.g. Theory of Numbers, Niven & Zuckerman) They form the nearest rational numbers for a constrained denominator, namely that if p/q is the continued fraction approximation of a real number r, then |r - (p/q)| <= C/(q^2) where I forget what C is, I think it's either 1/5 or 1/sqrt(5).Olnee
For instance, l and u have the same CF upto some point does not necessarily imply that the number you are guessing also has the same convergent... (if I understood your approach correctly).Beitris
I think it does (the point being that you keep the unknown fraction X between l and u)... but, never mind this approach. It's efficient but not as efficient or simple as a pure continued fraction approach -- see my other answer I finished about 15 minutes ago.Olnee
+1, You might be right, but we need a proof. I checked Niven's book, and it does not have it. For rationals at least, a proof by induction might work.Beitris
Ok: You are right! Here is a proof. If x = [a0, a1, a2, ...] and y = [b0, b1, b2, ...]. Let k be the smallest index where ak != bk. Then x < y iff (-1)^k(ak - bk) < 0. So if l = [a0, a1, a2, ..., am, ..], u = [a0, a1, ..., am, b1, b2, ...] and l < x < u. Let x = [b0, b1, ...]. If bi != ai for some 0 <= i <= m, then we get that either x <u AND x < l or x > u AND x > l. We can never have l < x < u. Thus the CF of x has to have the same first m numbers.Beitris
V
3

I think I found an O(log^2(p + q)) algorithm.

To avoid confusion in the next paragraph, a "query" refers to when the guesser gives the challenger a guess, and the challenger responds "bigger" or "smaller". This allows me to reserve the word "guess" for something else, a guess for p + q that is not asked directly to the challenger.

The idea is to first find p + q, using the algorithm you describe in your question: guess a value k, if k is too small, double it and try again. Then once you have an upper and lower bound, do a standard binary search. This takes O(log(p+q)T) queries, where T is an upper bound for the number of queries it takes to check a guess. Let's find T.

We want to check all fractions r/s with r + s <= k, and double k until k is sufficiently large. Note that there are O(k^2) fractions you need to check for a given value of k. Build a balanced binary search tree containing all these values, then search it to determine if p/q is in the tree. It takes O(log k^2) = O(log k) queries to confirm that p/q is not in the tree.

We will never guess a value of k greater than 2(p + q). Hence we can take T = O(log(p+q)).

When we guess the correct value for k (i.e., k = p + q), we will submit the query p/q to the challenger in the course of checking our guess for k, and win the game.

Total number of queries is then O(log^2(p + q)).

Vereeniging answered 26/3, 2011 at 8:15 Comment(7)
Actually building the search tree will take K^2log K time. Maybe you should improve this step to really take O(log k) time. Also, once you have a candidate k, you should return "bigger/smaller" for it, and not just "exists/doesn't exist". How do you do it?Dayan
Please disregard the second part of my previous comment ;) If the outer loop performs the doubling, then the inner part has to check for match/no match only.Dayan
This is a nice algorithm for #guesses being O(log^2(p+q)), but not for computation time complexity O(log^2(p+q)). What type of complexity is requested by the OP?Dayan
I'm looking for something (ideally) with both properties. This is certainly a good start in terms of minimizing the number of queries, though ideally I'd like something that also minimizes the computation involved. Then again, this might be theoretically optimal!Dud
The way that I read the question was that you could only get answers back of the form "higher", "lower", and "correct". That would disallow the p+q questions.Paedogenesis
@billy: The algorithm doesn't directly ask p+q questions. For a given k, it checks (using a binary search) all fractions r/s with r + s <= k. If p + q <= k, it finds the answer; otherwise we know p + q > k, so we double k.Vereeniging
@Seth: My bad. You're right. However most values of p+q will have no guesses to make.Paedogenesis
D
3

Okay, I think I figured out an O(lg2 q) algorithm for this problem that is based on Jason S's most excellent insight about using continued fractions. I thought I'd flesh the algorithm out all the way right here so that we have a complete solution, along with a runtime analysis.

The intuition behind the algorithm is that any rational number p/q within the range can be written as

a0 + 1 / (a1 + 1 / (a2 + 1 / (a3 + 1 / ...))

For appropriate choices of ai. This is called a continued fraction. More importantly, though these ai can be derived by running the Euclidean algorithm on the numerator and denominator. For example, suppose we want to represent 11/14 this way. We begin by noting that 14 goes into eleven zero times, so a crude approximation of 11/14 would be

0 = 0

Now, suppose that we take the reciprocal of this fraction to get 14/11 = 1 3/11. So if we write

0 + (1 / 1) = 1

We get a slightly better approximation to 11/14. Now that we're left with 3 / 11, we can take the reciprocal again to get 11/3 = 3 2/3, so we can consider

0 + (1 / (1 + 1/3)) = 3/4

Which is another good approximation to 11/14. Now, we have 2/3, so consider the reciprocal, which is 3/2 = 1 1/2. If we then write

0 + (1 / (1 + 1/(3 + 1/1))) = 5/6

We get another good approximation to 11/14. Finally, we're left with 1/2, whose reciprocal is 2/1. If we finally write out

0 + (1 / (1 + 1/(3 + 1/(1 + 1/2)))) = (1 / (1 + 1/(3 + 1/(3/2)))) = (1 / (1 + 1/(3 + 2/3)))) = (1 / (1 + 1/(11/3)))) = (1 / (1 + 3/11)) = 1 / (14/11) = 11/14

which is exactly the fraction we wanted. Moreover, look at the sequence of coefficients we ended up using. If you run the extended Euclidean algorithm on 11 and 14, you get that

11 = 0 x 14 + 11 --> a0 = 0 14 = 1 x 11 + 3 --> a1 = 1 11 = 3 x 3 + 2 --> a2 = 3 3 = 2 x 1 + 1 --> a3 = 2

It turns out that (using more math than I currently know how to do!) that this isn't a coincidence and that the coefficients in the continued fraction of p/q are always formed by using the extended Euclidean algorithm. This is great, because it tells us two things:

  1. There can be at most O(lg (p + q)) coefficients, because the Euclidean algorithm always terminates in this many steps, and
  2. Each coefficient is at most max{p, q}.

Given these two facts, we can come up with an algorithm to recover any rational number p/q, not just those between 0 and 1, by applying the general algorithm for guessing arbitrary integers n one at a time to recover all of the coefficients in the continued fraction for p/q. For now, though, we'll just worry about numbers in the range (0, 1], since the logic for handling arbitrary rational numbers can be done easily given this as a subroutine.

As a first step, let's suppose that we want to find the best value of a1 so that 1 / a1 is as close as possible to p/q and a1 is an integer. To do this, we can just run our algorithm for guessing arbitrary integers, taking the reciprocal each time. After doing this, one of two things will have happened. First, we might by sheer coincidence discover that p/q = 1/k for some integer k, in which case we're done. If not, we'll find that p/q is sandwiched between 1/(a1 - 1) and 1/a0 for some a1. When we do this, then we start working on the continued fraction one level deeper by finding the a2 such that p/q is between 1/(a1 + 1/a2) and 1/(a1 + 1/(a2 + 1)). If we magically find p/q, that's great! Otherwise, we then go one level down further in the continued fraction. Eventually, we'll find the number this way, and it can't take too long. Each binary search to find a coefficient takes at most O(lg(p + q)) time, and there are at most O(lg(p + q)) levels to the search, so we need only O(lg2(p + q)) arithmetic operations and probes to recover p/q.

One detail I want to point out is that we need to keep track of whether we're on an odd level or an even level when doing the search because when we sandwich p/q between two continued fractions, we need to know whether the coefficient we were looking for was the upper or the lower fraction. I'll state without proof that for ai with i odd you want to use the upper of the two numbers, and with ai even you use the lower of the two numbers.

I am almost 100% confident that this algorithm works. I'm going to try to write up a more formal proof of this in which I fill in all of the gaps in this reasoning, and when I do I'll post a link here.

Thanks to everyone for contributing the insights necessary to get this solution working, especially Jason S for suggesting a binary search over continued fractions.

Dud answered 26/3, 2011 at 22:51 Comment(4)
Just saw this, haven't had a chance to read over it carefully but you're probably right.Olnee
...although I think it's log(q), not log^2(q).Olnee
I believe this is right. For a proof see my comment to Jason's first answer.Beitris
Actually, I think we have a proof that it is O(log q). See my comment to Jason's second answer.Beitris
Q
2

Remember that any rational number in (0, 1) can be represented as a finite sum of distinct (positive or negative) unit fractions. For example, 2/3 = 1/2 + 1/6 and 2/5 = 1/2 - 1/10. You can use this to perform a straight-forward binary search.

Quondam answered 26/3, 2011 at 12:45 Comment(2)
Could you elaborate on how the algorithm would use this fact?Vereeniging
Are you talking about Egyptian fractions?Paulson
P
2

Here is yet another way to do it. If there is sufficient interest, I will try to fill out the details tonight, but I can't right now because I have family responsibilities. Here is a stub of an implementation that should explain the algorithm:

low = 0
high = 1
bound = 2
answer = -1
while 0 != answer:
    mid = best_continued_fraction((low + high)/2, bound)
    while mid == low or mid == high:
        bound += bound
        mid = best_continued_fraction((low + high)/2, bound)
    answer = ask(mid)
    if -1 == answer:
        low = mid
    elif 1 == answer:
        high = mid
    else:
        print_success_message(mid)

And here is the explanation. What best_continued_fraction(x, bound) should do is find the last continued fraction approximation to x with the denominator at most bound. This algorithm will take polylog steps to complete and finds very good (though not always the best) approximations. So for each bound we'll get something close to a binary search through all possible fractions of that size. Occasionally we won't find a particular fraction until we increase the bound farther than we should, but we won't be far off.

So there you have it. A logarithmic number of questions found with polylog work.

Update: And full working code.

#! /usr/bin/python

from fractions import Fraction
import readline
import sys

operations = [0]

def calculate_continued_fraction(terms):
    i = len(terms) - 1
    result = Fraction(terms[i])
    while 0 < i:
        i -= 1
        operations[0] += 1
        result = terms[i] + 1/result
    return result

def best_continued_fraction (x, bound):
    error = x - int(x)
    terms = [int(x)]
    last_estimate = estimate = Fraction(0)
    while 0 != error and estimate.numerator < bound:
        operations[0] += 1
        error = 1/error
        term = int(error)
        terms.append(term)
        error -= term
        last_estimate = estimate
        estimate = calculate_continued_fraction(terms)
    if estimate.numerator < bound:
        return estimate
    else:
        return last_estimate

def ask (num):
    while True:
        print "Next guess: {0} ({1})".format(num, float(num))
        if 1 < len(sys.argv):
            wanted = Fraction(sys.argv[1])
            if wanted < num:
                print "too high"
                return 1
            elif num < wanted:
                print "too low"
                return -1
            else:
                print "correct"
                return 0

        answer = raw_input("Is this (h)igh, (l)ow, or (c)orrect? ")
        if answer == "h":
            return 1
        elif answer == "l":
            return -1
        elif answer == "c":
            return 0
        else:
            print "Not understood.  Please say one of (l, c, h)"

ow = Fraction(0)
high = Fraction(1)
bound = 2
answer = -1
guesses = 0
while 0 != answer:
    mid = best_continued_fraction((low + high)/2, bound)
    guesses += 1
    while mid == low or mid == high:
        bound += bound
        mid = best_continued_fraction((low + high)/2, bound)
    answer = ask(mid)
    if -1 == answer:
        low = mid
    elif 1 == answer:
        high = mid
    else:
        print "Thanks for playing!"
        print "I needed %d guesses and %d operations" % (guesses, operations[0])

It appears slightly more efficient in guesses than the previous solution, and does a lot fewer operations. For 101/1024 it required 19 guesses and 251 operations. For .98765 it needed 27 guesses and 623 operations. For 0.0123456789 it required 66 guesses and 889 operations. And for giggles and grins, for 0.0123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789 (that's 10 copies of the previous one) it required 665 guesses and 23289 operations.

Paedogenesis answered 26/3, 2011 at 21:2 Comment(1)
@jason-s: It is better filled out now. I'm looking forward to comparing with yours when you have code. Yours will definitely require fewer operations, I have no sense whose will require fewer guesses.Paedogenesis
B
0

You can sort rational numbers in a given interval by for example the pair (denominator, numerator). Then to play the game you can

  1. Find the interval [0, N] using the doubling-step approach
  2. Given an interval [a, b] shoot for the rational with smallest denominator in the interval that is the closest to the center of the interval

this is however probably still O(log(num/den) + den) (not sure and it's too early in the morning here to make me think clearly ;-) )

Baskerville answered 26/3, 2011 at 7:37 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.