Python: how so fast?
Asked Answered
W

4

6

The period of the Mersenne Twister used in the module random is (I am told) 2**19937 - 1. As a binary number, that is 19937 '1's in a row (if I'm not mistaken). Python converts it to decimal pretty darned fast:

$ python -m timeit '2**19937'
10000000 loops, best of 3: 0.0271 usec per loop

$ python -m timeit -s 'result = 0' 'result += 2**19937'
100000 loops, best of 3: 2.09 usec per loop

I guess the second version is the one that requires conversion?

And it's not just binary. This is also fast. (Rather than show the numbers, I show the length of the decimal converted to a string):

>>> import math
>>> N = 1000
>>> s = str((int(N*math.e))**(int(N*math.pi)))
>>> len(s)
10787
>>> N = 5000
>>> s = str((int(N*math.e))**(int(N*math.pi)))
>>> len(s)
64921

Timing:

python -m timeit -s 'import math' -s 'N=1000' 's = str((int(N*math.e))**(int(N*math.pi)))'
10 loops, best of 3: 51.2 msec per loop

The question is: how is this actually done?

Am I just naive to be impressed? I find the sight of the Python shell generating a number of 5000 or so places in an instant truly spectacular.

Edit:

Additional timings suggested by @dalke and @truppo

$ python -m timeit 'x=2' 'x**19937'
1000 loops, best of 3: 230 usec per loop
$ python -m timeit 'x=2' 'int(x**19937)'
1000 loops, best of 3: 232 usec per loop
$ python -m timeit 'x=2' 'str(x**19937)'
100 loops, best of 3: 16.6 msec per loop

$ python -m timeit -s 'result = 0' 'x = 2' 'result += x**19937'
1000 loops, best of 3: 237 usec per loop
$ python -m timeit -s 'result = 0' 'x = 2' 'result += x**19937' 'int(result)'
1000 loops, best of 3: 238 usec per loop
$ python -m timeit -s 'result = 0' 'x = 2' 'result += x**19937' 'str(result)'
100 loops, best of 3: 16.6 msec per loop

So it looks to me like result = 0; result += 2**19937 probably does force the conversion.

Weakkneed answered 23/1, 2010 at 23:21 Comment(3)
See #867893Ajaajaccio
You never timed Python's integer to base 10 conversion. You need to do: timeit 'str(2**19937)'. And your second timing test is actually computing Python's ability to add large numbers. There's no conversion going on there either.Josiahjosias
It's clear that str() is forcing the conversion. There's a factor of 70 slowdown for that vs. the simple addition step.Josiahjosias
L
4

Python converts it to decimal pretty darned fast.

I don't know Python, but no, it needn't to do that. 2^19937 don't need computations, it is simply a binary shift ("<<") with 19937, so it is very fast. Only if you print that in decimal the actual conversion is necessary and that is much slower.

EDIT: Exponentiation can be the same as shifting (=moving the point) if the number base is identical to the exponent base.

10^-1 = 0.1 10^0 = 1
10^1 = 10
10^2 = 100
10^3 = 1000
10^n = 1 (n zeroes)

You see that exponentiation of 10 with the exponent n simply shifts the number. Now computers mostly use the internal base 2 (bits) , so calculating 2^19937 is setting a bit in position 19937 (counting bit positions from zero).
As additional information: The conversion to decimal is normally implemented by a conquer-and-divide algorithm which successively divides the number by powers of ten. As you see, the actual conversion is slower by a factor of half a million.

The second example is more interesting: As you are computing m^n with large integers m,n the fastest way is multiplying it in succession and store the temporary results.

Example: 10^345

a = 10^2
b = aa = 10^4
c = b
b = 10^16
d = c*c = 10^256

result = dccccccccbb*10

(Can be further optimized: see Knuth, Seminumerical Algorithms)

So you only need long multiplications and they can be computed pretty effectively.

EDIT: The exact implementation of multiplication depends: Besides the normal school multiplication Karatsuba, Tom-Cooke and Schoenhagen-Strasse (FFT) multiplication is used.

Lucie answered 23/1, 2010 at 23:48 Comment(1)
@Thorsten S. I'm obviously out of my depth but I'm not so sure you're right. Multiplication is a simple binary shift, once you figure out how many places to shift, but how can exponentiation be the same? It's a second step of multiplication to get the number of places, right?Weakkneed
F
6

Hate to rain on your parade, but the reason it's so fast is because the math module is actually not implemented in Python.

Python supports loading shared libraries that export Python APIs, but are implemented in other languages. math.so, which provides the module you get from import math, happens to be one of those (and so is _random.so).

Fennelly answered 23/1, 2010 at 23:27 Comment(4)
Find out yourself! svn.python.org/view/python/trunk/Modules/_math.c?view=markup :)Fennelly
Yow! That's like reading it in the original Greek. But I'll take a stab, thank you.Weakkneed
@Weakkneed Go on, ask us how the dictionary is so fast. :DVersify
@Versify I just ordered a copy of "Beautiful Code", so I'm sure I'll find out. (Chapter 18) is by Andrew Kuchling on just this topic ;)Weakkneed
C
5

When compiling to byte code, constant expressions such as 2**19937 will optimized down to a single constant:

>>> def foo(): return 2**10
... 
>>> import dis
>>> dis.dis(foo)
  1           0 LOAD_CONST               3 (1024)
              3 RETURN_VALUE        
>>> 

Consider instead:

[~] python -m timeit 'x=2' 'x**19937'
1000 loops, best of 3: 210 usec per loop
Coroner answered 23/1, 2010 at 23:49 Comment(0)
L
4

Python converts it to decimal pretty darned fast.

I don't know Python, but no, it needn't to do that. 2^19937 don't need computations, it is simply a binary shift ("<<") with 19937, so it is very fast. Only if you print that in decimal the actual conversion is necessary and that is much slower.

EDIT: Exponentiation can be the same as shifting (=moving the point) if the number base is identical to the exponent base.

10^-1 = 0.1 10^0 = 1
10^1 = 10
10^2 = 100
10^3 = 1000
10^n = 1 (n zeroes)

You see that exponentiation of 10 with the exponent n simply shifts the number. Now computers mostly use the internal base 2 (bits) , so calculating 2^19937 is setting a bit in position 19937 (counting bit positions from zero).
As additional information: The conversion to decimal is normally implemented by a conquer-and-divide algorithm which successively divides the number by powers of ten. As you see, the actual conversion is slower by a factor of half a million.

The second example is more interesting: As you are computing m^n with large integers m,n the fastest way is multiplying it in succession and store the temporary results.

Example: 10^345

a = 10^2
b = aa = 10^4
c = b
b = 10^16
d = c*c = 10^256

result = dccccccccbb*10

(Can be further optimized: see Knuth, Seminumerical Algorithms)

So you only need long multiplications and they can be computed pretty effectively.

EDIT: The exact implementation of multiplication depends: Besides the normal school multiplication Karatsuba, Tom-Cooke and Schoenhagen-Strasse (FFT) multiplication is used.

Lucie answered 23/1, 2010 at 23:48 Comment(1)
@Thorsten S. I'm obviously out of my depth but I'm not so sure you're right. Multiplication is a simple binary shift, once you figure out how many places to shift, but how can exponentiation be the same? It's a second step of multiplication to get the number of places, right?Weakkneed
I
0

I know little to nothing about how this is actually implemented in Python, but given that this is basically primitive multiplication and logarithms, I'm not terribly surprised that it's reasonably fast even on quite large numbers.

There are arbitrary precision math libraries such as GMP that implement a wide variety of operations in a very effective manner, optimized in assembly, for just this purpose.

Inversely answered 23/1, 2010 at 23:28 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.