Time complexity of Python 3.8+'s integer square root `math.isqrt()` function
Asked Answered
F

2

6

I am trying to analyze the time complexity of the default integer square root function in Python 3.8+ versions, math.isqrt()

I have tried to peruse the time complexity of this function as a function of n when calculating res = math.isqrt(n), where res is the result value that we want, and n is an integer.

I believe some of the complications may come from how Python does arithmetic as well, as well as being a dynamically-typed language, so there may need to be some rounding and buffer underflow/overflow that needs to be taken care of when trying to convert the float value back to int in subsequent rounds of calculation until a stopping condition is met with a threshold precision value.

If anyone has worked on the integer square root function math.isqrt for Python, I'd be happy to find your perspectives on the algorithmic complexity analysis for both time complexity as well as space complexity (which may be relevant if there are recursive elements or storing floating points after the decimal point for future calculations).

Furlough answered 26/2, 2024 at 18:53 Comment(6)
Maybe you could try plotting time-taken vs log(n), and using something from the answers to How can I time a code segment for testing performance with Pythons timeit?Heeltap
did you check out the reference c source code? It has a python equivalent documented : github.com/python/cpython/blob/…Oleate
The whole point of isqrt is that it doesn't use any floating point, so forget about "storing floating points".Vanpelt
From @JonSG's link: "... the number of iterations is known in advance: it's exactly floor(log2(log2(n))) for n > 1".Synsepalous
@PresidentJamesK.Polk but operations on variable size ints aren't necessarily O(1), that might affect the overall analysis.Vanpelt
@MarkRansom: That's correct, it counts only iterations through the loop, which may not be a fine-grained enough measure.Synsepalous
G
8

Quick answer: just give me the facts!

CPython 3.12 introduced some improvements to the speed of division of large integers. As a result, the running time of math.isqrt depends on which version of Python you're using:

  • For CPython < 3.12, the asymptotic time complexity of isqrt(n) is O((log n)^2): that is, it's quadratic time in the number of bits of n.
  • For CPython >= 3.12, it's subquadratic: O((log n)^1.585) (where that constant 1.585 is more properly log(3)/log(2)).
  • In either case, the space required to compute isqrt(n) is linear in the number of bits of n: O(log n).

Detailed answer: okay, now can I have an explanation, too?

The algorithm

For reference, here's the underlying algorithm expressed in Python (leaving out some irrelevancies like type conversion and error checking):

def isqrt(n: int) -> int:
    """Find the integer part of the square root of a positive integer n."""
    c = (n.bit_length() - 1) // 2
    a = 1
    for s in reversed(range(c.bit_length())):
        d, e = c >> s, c >> s + 1
        a = (a << d - e - 1) + (n >> 2*c - e - d + 1) // a
        # at this point a approximates the square root of n >> (2*c - 2*d)
    return a - (a*a > n)

An aside: while this is the form of the algorithm that we want to use for complexity analysis (since it closely matches the actual implementation), the algorithm is rather opaque in this form, so it's not particularly helpful for understanding why it works. The integer square root algorithm is more naturally expressed recursively, something like this:

def nsqrt(n: int) -> int:
    """For positive n, return a satisfying (a - 1)**2 < n < (a + 1)**2."""
    if n < 4:
        return 1

    t = (n.bit_length() + 1) // 4
    a = nsqrt(n >> 2 * t) << t
    return (a + n // a) // 2

def isqrt(n: int) -> int:
    """Find the integer part of the square root of a positive integer n."""
    a = nsqrt(n)
    return a - (a * a > n)

That is, the core of the algorithm is a recursive method for finding what I like to call a "near square root" of a positive integer n: a value a satisfying (a - 1)**2 < n < (a + 1)**2, or equivalently, |a - √n| < 1. That near-square-root algorithm uses the fact that if a is a near square root of n >> 2*t for some nonnegative integer t, then a << t is an approximation to the real square root √n. If we apply a single Newton-Raphson iteration to that approximation, we get a better approximation to √n, and if t is chosen carefully then that better approximation is again a near square root of n.

I leave it to you to check that the two forms of the algorithm are essentially equivalent.

Complexity analysis

What makes the complexity analysis relatively easy is that we only have to worry about the last iteration of the for loop. That's because the algorithm progressively increases the working precision, roughly doubling it at each iteration. Given that we're expecting at least linear time (in the number of bits) for each iteration, the last-but-one iteration takes at most half the time of the last iteration, and the last-but-two iteration takes at most a quarter of the time of the last iteration, and so on. So by the magic of geometric progressions, the total accumulated time is at most double the time of the last iteration (plus some O(log n) and O(log log n) book-keeping overhead, which doesn't affect the final complexity).

Now within each iteration, there are only four operations that are working at the level of big integers, and they're all in that a = ... update line: a left shift, a right shift, a division, and an addition. The left shift, right shift and addition all have time linear in the number of bits involved, so the division dominates. (The integers e, d and c all have size of the order of magnitude of log2(n), so the operations involved in computing d - e - 1 or 2 * c - e - d + 1 don't contribute significantly to the time complexity. In the actual C code implementation of math.isqrt, the distinction is clearer, since c, d and e are all native C integers, while n and a are full-blown Python multiprecision integer objects.)

One slight subtlety with the right shift: note that the time taken grows with the size of the result, not the size of n. That is, getting the top t bits of a b-bit number n takes time proportional to t, not b. That fact's important in establishing the claim that the time taken at least doubles for each iteration.

So for everything up to the end of the for loop, the time complexity is the same as the time complexity of the division in the final iteration. If you push through the numbers, you can predict the sizes of the integers involved in the division: if n has b bits (that is, n.bit_length() == b), then that division is a division of an integer with roughly 3b/4 bits by an integer with roughly b/4 bits.

The only other thing that might be significant is the final squaring multiplication, which is a multiplication of a roughly b/2-bit integer by itself. It's possible that a big integer arithmetic system could have a division that's asymptotically more efficient than the multiplication, but it's unlikely: in general, division should be expected to have complexity at least that of multiplication. (Fast division algorithms usually involve a multiplication at some point, and Python's is no exception.)

So to find the complexity, all we have to care about is two operations: the final division, and the multiplication in the correction step.

To CPython specifics: prior to CPython 3.12, multiplication used the Karatsuba algorithm (beyond some cutoff point), and division was the traditional schoolbook long division method, taking time quadratic in its inputs. So isqrt was dominated by the division, and had complexity O((log n)^2). From CPython 3.12 onwards, multiplication has the same complexity as before, but division now has the same asymptotic complexity as multiplication, so isqrt has complexity matching that of both the multiplication and division algorithms: O(n^1.585). See https://github.com/python/cpython/pull/96673 for the gory details of the Python 3.12 changes.

For the space requirements: the number of auxiliary variables is constant, and their size is bounded by the size of n. So the space requirement is O(log n).

Interestingly, Juraj Sukop came up with an improvement to the isqrt algorithm that (at least for CPython < 3.12) doubles its speed for large n. Discussion is here: https://github.com/python/cpython/issues/87219. It's not implemented for math.isqrt because the (fairly small) extra complexity involved has an adverse affect for smaller inputs.

Gard answered 28/2, 2024 at 18:9 Comment(0)
C
1

@JonSG is right that there's a detailed description of the algorithm available in the module source code. One part of the description (also pointed out by @President James K. Polk) is the number of iterations:

The algorithm is remarkable in its simplicity. There's no need for a per-iteration check-and-correct step, and termination is straightforward: the number of iterations is known in advance (it's exactly floor(log2(log2(n))) for n > 1).

For the final complexity: since this method is potentially applied to very large integers it's appropriate to consider the exact complexity of the arithmetic operations. Looking at the loop of the reference code:

# ...
for s in reversed(range(c.bit_length())): # c proportional to log(n), log(log(n)) iterations
    e = d
    d = c >> s
    a = (a << d - e - 1) + (n >> 2*c - e - d + 1) // a
# ...

Bit shifting n by 2*c is generally log(n) for very large n. Applying the log(n) shift cost for each of log(log(n)) iterations, we get a complexity of O(log(n) log(log(n))).

Correspond answered 26/2, 2024 at 21:22 Comment(5)
What about all the other arithmetic operations, though? I'm inclined to believe that the shifts, additions and subtractions are probably O(log n) - based on the assumption that all involved numbers are O(n) - though I haven't verified this. But what about the long division (// a) specifically? This will have O(log n log log n) or worse time complexity (for example a naive implementation might be as bad as O((log n)²)). What are the bounds on numerator and denominator of the long division?Consignor
I thought the division would take the most time too at first, but the shifted n >> 2*c is much smaller than n itself, I think we end up with log base log(n) of (n). I was about to give it a shot when I wrote the answer but once there were enough logs to build a cabin I started to lose track...Correspond
I think we end up at O(log(log(n)) (log(n) + log(log(log(n, log(n)))) log(log(log(log(n, log(n))))))), which is still dominated by the log(n) on the right. Maybe.Correspond
@ExaltedToast: Your instincts were correct: it's the division that dominates everything. On the final iteration, the shift in n >> 2*c - e - d + 1 is shifting only around one quarter of the bits of n away, leaving three quarters of the bits of n behind; we end up doing a 3b/4-bit-by-b/4-bit division, where b is the number of bits of n.Gard
BTW, in case it wasn't clear, that shift should be read as n >> (2*c - e - d + 1), not (n >> 2*c) - e - d + 1 (or any other bracketing): the >> operation has lower precedence than the arithmetic operations.Gard

© 2022 - 2025 — McMap. All rights reserved.