Which is faster in Python: x**.5 or math.sqrt(x)?

Asked 29/11, 2008 at 1:24 Answered 23/2, 2022 at 9:59

250

I've been wondering this for some time. As the title say, which is faster, the actual function or simply raising to the half power?

UPDATE

This is not a matter of premature optimization. This is simply a question of how the underlying code actually works. What is the theory of how Python code works?

I sent Guido van Rossum an email cause I really wanted to know the differences in these methods.

My email:

There are at least 3 ways to do a square root in Python: math.sqrt, the '**' operator and pow(x,.5). I'm just curious as to the differences in the implementation of each of these. When it comes to efficiency which is better?

His response:

pow and ** are equivalent; math.sqrt doesn't work for complex numbers, and links to the C sqrt() function. As to which one is faster, I have no idea...

Avebury answered 29/11, 2008 at 1:24 Comment(5)

I don't think this is a bad question. For example, x * x is a full 10 times faster than x ** 2. Readability is a tossup in this situation, so why not do the fast way? – Expansible 1/2, 2009 at 1:12

TM, for x2 being slower than x*x, i think it could be due to a full blown function call for x2... – Avebury 1/2, 2009 at 4:8

Casey, I'm with you on the "premature optimization" thing. :) Your question does not look like premature optimization to me: there is no risk that any of the variants breaks your code. It's more a matter of knowing better what you do (in terms of execution time) when you choose pow() over math.sqrt(). – Catapult 27/5, 2009 at 21:16

Why not replace math.sqrt with a sqrt = lambda n: n**0.5, when that's faster and also works for complex numbers like guido said. – Bertrando 10/6, 2010 at 15:48

This isn't premature optimization, but rather avoiding premature pessimization (ref. no. 28, C++ coding standards, A.Alexandrescu). If math.sqrt is a more optimized routine (as it is) and expresses the intent more clearly, it should always be preferred over x**.5. It is not premature optimization to know what you write, and chose the alternative that is faster and provides more code clarity. If so, you need to argue equally well why you would chose the other alternatives. – Yager 28/6, 2014 at 11:4

126

math.sqrt(x) is significantly faster than x**0.5.

import math
N = 1000000

%%timeit
for i in range(N):
    z=i**.5

10 loops, best of 3: 156 ms per loop

%%timeit
for i in range(N):
    z=math.sqrt(i)

10 loops, best of 3: 91.1 ms per loop

Using Python 3.6.9 (notebook).

Napalm answered 29/11, 2008 at 1:32 Comment(11)

I've now run it 3 times on codepad.org and all three times a() was much faster than b(). – Effendi 29/11, 2008 at 1:38

With my machine locally, it takes ~0.12 seconds for a and ~0.16 seconds for b. Interesting. – Mailemailed 29/11, 2008 at 1:53

Try moving the sqrt function into the local scope, so it doesn't need to look up math in the global dictionary then sqrt in that dictionary. – Boodle 29/11, 2008 at 3:16

ah yes - i forgot that syntax-ed exression (such as * and ) are faster since they require no lookup... moving the math.sqrt into a local argument is still slower than i.5 – Napalm 30/11, 2008 at 6:53

The standard timeit module is your friend. It avoids common pitfalls when it comes to measuring execution time! – Catapult 27/5, 2009 at 21:13

Here are the results of your script: zoltan@host:~$ python2.5 p.py Took 0.183226 seconds Took 0.155829 seconds zoltan@host:~$ python2.4 p.py Took 0.181142 seconds Took 0.153742 seconds zoltan@host:~$ python2.6 p.py Took 0.157436 seconds Took 0.093905 seconds Target system: Ubuntu Linux CPU: Intel(R) Core(TM)2 Duo CPU T9600 @ 2.80GHz As you can see I got different results. According to this your answer is not generic. – Mcripley 23/4, 2010 at 3:37

I've found like zoli2k that timeit2() gives a lower time on my machine with Python 2.6.4. I also tried using the timeit module and found the same result. – Clout 23/4, 2010 at 6:35

Codepad is a great service, but horrible for timing performance, I mean who knows how busy the server will be at a given moment. Each run could potentially give very different results – Dextrad 12/5, 2010 at 23:49

hmm I just ran it on a local machine w/ py2.5 and timeit1 is faster.. although when running it with Psyco timeit2 is WAYY faster =). prob depends on a variety of factors.. – Napalm 12/8, 2010 at 17:14

I've added performance comparison of x**.5 vs sqrt(x) for py32,py31,py30,py27,py26,pypy,jython,py25,py24 interpreters on Linux. gist.github.com/783011 – Algophobia 17/1, 2011 at 16:26

sqrt probably does LUT interpolation, while pow does not really have that option. – Elaterid 18/12, 2015 at 20:48

first rule of optimization: don't do it
second rule: don't do it, yet

Here's some timings (Python 2.5.2, Windows):

$ python -mtimeit -s"from math import sqrt; x = 123" "x**.5"
1000000 loops, best of 3: 0.445 usec per loop

$ python -mtimeit -s"from math import sqrt; x = 123" "sqrt(x)"
1000000 loops, best of 3: 0.574 usec per loop

$ python -mtimeit -s"import math; x = 123" "math.sqrt(x)"
1000000 loops, best of 3: 0.727 usec per loop

This test shows that x**.5 is slightly faster than sqrt(x).

For the Python 3.0 the result is the opposite:

$ \Python30\python -mtimeit -s"from math import sqrt; x = 123" "x**.5"
1000000 loops, best of 3: 0.803 usec per loop

$ \Python30\python -mtimeit -s"from math import sqrt; x = 123" "sqrt(x)"
1000000 loops, best of 3: 0.695 usec per loop

$ \Python30\python -mtimeit -s"import math; x = 123" "math.sqrt(x)"
1000000 loops, best of 3: 0.761 usec per loop

math.sqrt(x) is always faster than x**.5 on another machine (Ubuntu, Python 2.6 and 3.1):

$ python -mtimeit -s"from math import sqrt; x = 123" "x**.5"
10000000 loops, best of 3: 0.173 usec per loop
$ python -mtimeit -s"from math import sqrt; x = 123" "sqrt(x)"
10000000 loops, best of 3: 0.115 usec per loop
$ python -mtimeit -s"import math; x = 123" "math.sqrt(x)"
10000000 loops, best of 3: 0.158 usec per loop
$ python3.1 -mtimeit -s"from math import sqrt; x = 123" "x**.5"
10000000 loops, best of 3: 0.194 usec per loop
$ python3.1 -mtimeit -s"from math import sqrt; x = 123" "sqrt(x)"
10000000 loops, best of 3: 0.123 usec per loop
$ python3.1 -mtimeit -s"import math; x = 123" "math.sqrt(x)"
10000000 loops, best of 3: 0.157 usec per loop

Algophobia answered 29/11, 2008 at 2:16 Comment(9)

Why is it so hard for me to follow this rule 😔 – Sura 19/3, 2021 at 13:35

The 'premature optimization is evil' myth – Delft 20/1, 2022 at 23:17

@Delft I don't see how it applies to my answer which is mostly about "measure first": optimize only if measurements show that optimization is necessary in your case. – Algophobia 21/1, 2022 at 15:47

@Delft Though [unrelated to the answer] the title you linked is just wrong--read the full quote by Knuth to understand why: "Programmers waste enormous amounts of time thinking about, or worrying about, the speed of noncritical parts of their programs, and these attempts at efficiency actually have a strong negative impact when debugging and maintenance are considered. We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%." wiki.c2.com/?PrematureOptimization – Algophobia 21/1, 2022 at 15:49

Your post literally starts with "first rule of optimization: don't do it". Too many programmers have come to believe that any optimization is evil and, as the article I linked argues, too many senior engineers don't even understand the basic costs of various algorithms and data structures to be able to make a good judgment about how they will affect performance (either CPU cycles or memory management). – Delft 21/1, 2022 at 18:33

Not only the above, but you answered a question about measuring the performance of square root by saying "don't bother" without any insight into whether or not the OP's code is a critical pathway. Speedy square roots are critical in game development and square root is notoriously slow, so it's a perfectly legitimate question which you start off answering with "don't bother". I think my comment is very relevant. – Delft 21/1, 2022 at 18:36

@JDB: "first rule" implies there are more than one (you should have not stopped there, there is more than one sentence in the answer). The interpretation ("don't bother") of my answer is wrong. It should be obvious by a mere fact that the answer does contain measurements. Please, reread the full quote I've provided above. Think, would I quote "should not pass up our opportunities in that critical 3%." if I'd thought "don't bother" -- it does not make any sense. – Algophobia 8/10, 2022 at 8:27

You have written a good answer which highlight the complexity of measuring performance, minus the first sentence which hamstrings the rest. A list of rules isn't very useful if they are in direct conflict with each other. At least in Western cultures, the first rule is usually seen as superseding the rest. The rule "don't do it" conflicts with and devalues the remaining rules. Your first comment in reply to mine would be a much better and more universally applicable "first rule": "first rule of optimization: measure first". – Delft 8/10, 2022 at 16:33

Given that "we should forget about small efficiencies, say about 97% of the time" (i.e., most of the time), the "first rule" is precisely appropriate. – Algophobia 9/10, 2022 at 3:17

In these micro-benchmarks, math.sqrt will be slower, because of the slight time it takes to lookup the sqrt in the math namespace. You can improve it slightly with

 from math import sqrt

Even then though, running a few variations through timeit, show a slight (4-5%) performance advantage for x**.5

Interestingly, doing

 import math
 sqrt = math.sqrt

sped it up even more, to within 1% difference in speed, with very little statistical significance.

I will repeat Kibbee, and say that this is probably a premature optimization.

Disarm answered 29/11, 2008 at 1:45 Comment(1)

The reason defining sqrt in the program's local namespace speeds it up more is probably because of the method resolution order: the compiler first checks if the function was defined in your code, then in any imports, so if it was defined locally it takes less time per lookup – When 21/8, 2020 at 11:26

How many square roots are you really performing? Are you trying to write some 3D graphics engine in Python? If not, then why go with code which is cryptic over code that is easy to read? The time difference is would be less than anybody could notice in just about any application I could forsee. I really don't mean to put down your question, but it seems that you're going a little too far with premature optimization.

Suspensory answered 29/11, 2008 at 1:28 Comment(6)

i don't really feel I'm doing a premature optimization. It's more of a simple question of deciding from 2 different methods, which, on average, will be faster. – Avebury 29/11, 2008 at 1:54

Kibbee: it's definitely a valid question, but I share your dismay at the number of questions on Stack Overflow that imply that the asker is performing all kinds of premature optimization. It's definitely a large percentage of the questions being asked for every language. – Clavate 29/11, 2008 at 2:2

But what does it really matter? It will make absolutely no impact on any program. – Asphyxiate 29/11, 2008 at 2:17

Is math.sqrt(x) any easier to read than x ** 0.5 ? I think they are both pretty obviously square root... at least if you are familiar with python anyway. Don't call standard python operators like ** "cryptic" just because you aren't familiar with python. – Expansible 1/2, 2009 at 1:10

I don't think the ** operator is cryptic. I think that raising something to the exponent 0.5 as a method of getting the square root to be a little cryptic to those that don't keep up on their math. – Suspensory 1/2, 2009 at 19:39

What if he IS making a 3D engine in Python? – Gradualism 5/11, 2010 at 9:25

In python 2.6 the (float).__pow__() function uses the C pow() function and the math.sqrt() functions uses the C sqrt() function.

In glibc compiler the implementation of pow(x,y) is quite complex and it is well optimized for various exceptional cases. For example, calling C pow(x,0.5) simply calls the sqrt() function.

The difference in speed of using .** or math.sqrt is caused by the wrappers used around the C functions and the speed strongly depends on optimization flags/C compiler used on the system.

Edit:

Here are the results of Claudiu's algorithm on my machine. I got different results:

zoltan@host:~$ python2.4 p.py 
Took 0.173994 seconds
Took 0.158991 seconds
zoltan@host:~$ python2.5 p.py 
Took 0.182321 seconds
Took 0.155394 seconds
zoltan@host:~$ python2.6 p.py 
Took 0.166766 seconds
Took 0.097018 seconds

Mcripley answered 23/4, 2010 at 4:17 Comment(0)

Most likely math.sqrt(x), because it's optimized for square rooting.

Benchmarks will provide you the answer you are looking for.

Mailemailed answered 29/11, 2008 at 1:25 Comment(0)

For what it's worth (see Jim's answer). On my machine, running python 2.5:

PS C:\> python -m timeit -n 100000 10000**.5
100000 loops, best of 3: 0.0543 usec per loop
PS C:\> python -m timeit -n 100000 -s "import math" math.sqrt(10000)
100000 loops, best of 3: 0.162 usec per loop
PS C:\> python -m timeit -n 100000 -s "from math import sqrt" sqrt(10000)
100000 loops, best of 3: 0.0541 usec per loop

Peterson answered 29/11, 2008 at 2:17 Comment(0)

using Claudiu's code, on my machine even with "from math import sqrt" x**.5 is faster but using psyco.full() sqrt(x) becomes much faster, at least by 200%

Avebury answered 29/11, 2008 at 2:28 Comment(0)

Someone commented about the "fast Newton-Raphson square root" from Quake 3... I implemented it with ctypes, but it's super slow in comparison to the native versions. I'm going to try a few optimizations and alternate implementations.

from ctypes import c_float, c_long, byref, POINTER, cast

def sqrt(num):
 xhalf = 0.5*num
 x = c_float(num)
 i = cast(byref(x), POINTER(c_long)).contents.value
 i = c_long(0x5f375a86 - (i>>1))
 x = cast(byref(i), POINTER(c_float)).contents.value

 x = x*(1.5-xhalf*x*x)
 x = x*(1.5-xhalf*x*x)
 return x * num

Here's another method using struct, comes out about 3.6x faster than the ctypes version, but still 1/10 the speed of C.

from struct import pack, unpack

def sqrt_struct(num):
 xhalf = 0.5*num
 i = unpack('L', pack('f', 28.0))[0]
 i = 0x5f375a86 - (i>>1)
 x = unpack('f', pack('L', i))[0]

 x = x*(1.5-xhalf*x*x)
 x = x*(1.5-xhalf*x*x)
 return x * num

Mcabee answered 23/4, 2010 at 5:27 Comment(0)

The Pythonic thing to optimize for is readability. For this I think explicit use of the sqrt function is best. Having said that, let's investigate performance anyway.

I updated Claudiu's code for Python 3 and also made it impossible to optimize away the calculations (something a good Python compiler may do in the future):

from sys import version
from time import time
from math import sqrt, pi, e

print(version)

N = 1_000_000

def timeit1():
  z = N * e
  s = time()
  for n in range(N):
    z += (n * pi) ** .5 - z ** .5
  print (f"Took {(time() - s):.4f} seconds to calculate {z}")

def timeit2():
  z = N * e
  s = time()
  for n in range(N):
    z += sqrt(n * pi) - sqrt(z)
  print (f"Took {(time() - s):.4f} seconds to calculate {z}")

def timeit3(arg=sqrt):
  z = N * e
  s = time()
  for n in range(N):
    z += arg(n * pi) - arg(z)
  print (f"Took {(time() - s):.4f} seconds to calculate {z}")

timeit1()
timeit2()
timeit3()

Results vary, but a sample output is:

3.6.6 (default, Jul 19 2018, 14:25:17) 
[GCC 8.1.1 20180712 (Red Hat 8.1.1-5)]
Took 0.3747 seconds to calculate 3130485.5713865166
Took 0.2899 seconds to calculate 3130485.5713865166
Took 0.2635 seconds to calculate 3130485.5713865166

And a more recent output:

3.7.4 (default, Jul  9 2019, 16:48:28) 
[GCC 8.3.1 20190223 (Red Hat 8.3.1-2)]
Took 0.2583 seconds to calculate 3130485.5713865166
Took 0.1612 seconds to calculate 3130485.5713865166
Took 0.1563 seconds to calculate 3130485.5713865166

Try it yourself.

Thunderpeal answered 18/10, 2018 at 8:52 Comment(1)

I find **0.5 more readable; when writing a math expression, I'd rather use math operators throughout, when available, rather than functions. I use ** for the same reason I write -1 instead of neg(1) or a + b instead of add(a, b). – Millesimal 21/9, 2021 at 13:51

Hello! I just made a Stack Exchange profile to participate in this conversation! What I am doing might seem trivial, but hear me out once before judging:

Experiment Conditions:

Offline(no internet compiler issues)

Keeping my system state as stable as possible

In one attempt testing all 3 functions

I ran 3 loops of 5 iterations each, for each function stated in the original question. And I calculated the square root for Integers from 0 to 10^8 in each loop.

Here are the results: Time Taken: sqrt(x) < x**0.5 < pow(x, 0.5)

Note: By a margin of double-digit seconds, over 10^8 non-negative integers.

Screenshot of outputs: Outputs

My Conclusion:

I feel Guido's email justifies these timings well. Consider the following statements:

"math.sqrt() links to C and does not entertain complex numbers"
"** and pow() are equivalent"

We can thus imply that ** and pow() both have certain overhead costs since they both have to check in case the input passed is a complex number, even if we pass an integer. Moreover, Complex Numbers are built-ins for Python, and using Python to write Python code is tasking on the computer.

And very notably, math.sqrt() works relatively faster because neither does it have to go through the trouble of checking for Complex Number arguments, but also because it is directly connected with the C language function, which are proven to be a little faster than Python in general.

Let me know in case your opinion differs from mine in this conclusion!

Code:

import time
import math
print("x**0.5 : ")
for _ in range(5):
    start = time.time()
    for i in range(int(1e8)):
        i**0.5
    end = time.time()
    print(end-start)
print("math.sqrt(x) : ")
for _ in range(5):
    start = time.time()
    for i in range(int(1e8)):
        math.sqrt(i)
    end = time.time()
    print(end-start)
print("pow(x,0.5) : ")
for _ in range(5):
    start = time.time()
    for i in range(int(1e8)):
        pow(i,0.5)
    end = time.time()
    print(end-start)

Heated answered 23/2, 2022 at 9:59 Comment(0)

Claudiu's results differ from mine. I'm using Python 2.6 on Ubuntu on an old P4 2.4Ghz machine... Here's my results:

>>> timeit1()
Took 0.564911 seconds
>>> timeit2()
Took 0.403087 seconds
>>> timeit1()
Took 0.604713 seconds
>>> timeit2()
Took 0.387749 seconds
>>> timeit1()
Took 0.587829 seconds
>>> timeit2()
Took 0.379381 seconds

sqrt is consistently faster for me... Even Codepad.org NOW seems to agree that sqrt, in the local context, is faster (http://codepad.org/6trzcM3j). Codepad seems to be running Python 2.5 presently. Perhaps they were using 2.4 or older when Claudiu first answered?

In fact, even using math.sqrt(i) in place of arg(i), I still get better times for sqrt. In this case timeit2() took between 0.53 and 0.55 seconds on my machine, which is still better than the 0.56-0.60 figures from timeit1.

I'd say, on modern Python, use math.sqrt and definitely bring it to local context, either with somevar=math.sqrt or with from math import sqrt.

Halcyon answered 23/4, 2010 at 3:14 Comment(0)

Of course, if one is dealing with literals and need a constant value, Python runtime can pre-calculate the value at compile time, if it is written with operators - no need to profile each version in this case:

In [77]: dis.dis(a)                                                                                                                       
  2           0 LOAD_CONST               1 (1.4142135623730951)
              2 RETURN_VALUE

In [78]: def a(): 
    ...:     return 2 ** 0.5 
    ...:                                                                                                                                  

In [79]: import dis                                                                                                                       

In [80]: dis.dis(a)                                                                                                                       
  2           0 LOAD_CONST               1 (1.4142135623730951)
              2 RETURN_VALUE

Dilan answered 25/4, 2020 at 16:16 Comment(0)

The problem SQRMINSUM I've solved recently requires computing square root repeatedly on a large dataset. The oldest 2 submissions in my history, before I've made other optimizations, differ solely by replacing **0.5 with sqrt(), thus reducing the runtime from 3.74s to 0.51s in PyPy. This is almost twice the already massive 400% improvement that Claudiu measured.

Lifer answered 23/11, 2017 at 14:37 Comment(0)

-5

What would be even faster is if you went into math.py and copied the function "sqrt" into your program. It takes time for your program to find math.py, then open it, find the function you are looking for, and then bring that back to your program. If that function is faster even with the "lookup" steps, then the function itself has to be awfully fast. Probably will cut your time in half. IN summary:

Go to math.py
Find the function "sqrt"
Copy it
Paste function into your program as the sqrt finder.
Time it.

Grous answered 24/3, 2015 at 11:37 Comment(2)

That won't work; see https://mcmap.net/q/118771/-where-are-math-py-and-sys-py/3004881. Also note the quote in the original question that says it's a link to a C function. Also, how could copying the function's source code be different from from math import sqrt? – Jit 8/7, 2015 at 1:16

It wouldn't, I said that just to make clear exactly what the difference was in calling the two functions. – Grous 16/7, 2015 at 0:46

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++