Fastest way to list all primes below N
Asked Answered
G

41

398

This is the best algorithm I could come up.

def get_primes(n):
    numbers = set(range(n, 1, -1))
    primes = []
    while numbers:
        p = numbers.pop()
        primes.append(p)
        numbers.difference_update(set(range(p*2, n+1, p)))
    return primes

>>> timeit.Timer(stmt='get_primes.get_primes(1000000)', setup='import   get_primes').timeit(1)
1.1499958793645562

Can it be made even faster?

This code has a flaw: Since numbers is an unordered set, there is no guarantee that numbers.pop() will remove the lowest number from the set. Nevertheless, it works (at least for me) for some input numbers:

>>> sum(get_primes(2000000))
142913828922L
#That's the correct sum of all numbers below 2 million
>>> 529 in get_primes(1000)
False
>>> 529 in get_primes(530)
True
Georgiageorgian answered 14/1, 2010 at 23:40 Comment(11)
Code sniplet in question is much faster if numbers declared like numbers = set(range(n, 2, -2)). But can't beat sundaram3. Thanks for the question.Decuple
It'd be nice if there could be Python 3 versions of the functions in the answers.Pinfeather
Surely there's a library to do this so we don't have to roll our own> xkcd promised Python is as simple as import antigravity. Isn't there anything like require 'prime'; Prime.take(10) (Ruby)?Ahmadahmar
Note that you do not need to pass in a set as your argument to difference_update. You can simply do numbers.difference_update(xrange(p*2, N+1, p)) That will shave a few milliseconds off your time at the very least.Jackpot
I suspect a Python binding around the C++ library primesieve would be orders of magnitude faster than all of these.Ahmadahmar
@ColonelPanic As it so happens I updated github.com/jaredks/pyprimesieve for Py3 and added to PyPi. It's certainly faster than these but not orders of magnitude - more like ~5x faster than the best numpy versions.Lasley
I don't know the speed comparison to the answers already listed here, however I would recommend looking at sagemath.org It is a crypto python frame work that has many built in functions to do the things that you are looking for.Dipstick
@ColonelPanic: I think editing old answers to note that they've aged is appropriate, since that makes it a more useful resource. If the "accepted" answer is no longer the best one, maybe edit a note into the question with a 2015 update to point people at the current best method.Gabbert
I can't believe that no moderator has deleted this question. It asks for improvement in speed of an algorithm that is admittedly not correct. Groetjes AlbertPryce
from sympy import sieve; sieve.extend(N);Amary
Hey that's really fast code. you are right the code fails for n =10000 as number.pop() doesn't pop out the least one as not sorted. ThreadBipetalous
S
415

Warning: timeit results may vary due to differences in hardware or version of Python.

Below is a script which compares a number of implementations:

Many thanks to stephan for bringing sieve_wheel_30 to my attention. Credit goes to Robert William Hanks for primesfrom2to, primesfrom3to, rwh_primes, rwh_primes1, and rwh_primes2.

Of the plain Python methods tested, with psyco, for n=1000000, rwh_primes1 was the fastest tested.

+---------------------+-------+
| Method              | ms    |
+---------------------+-------+
| rwh_primes1         | 43.0  |
| sieveOfAtkin        | 46.4  |
| rwh_primes          | 57.4  |
| sieve_wheel_30      | 63.0  |
| rwh_primes2         | 67.8  |    
| sieveOfEratosthenes | 147.0 |
| ambi_sieve_plain    | 152.0 |
| sundaram3           | 194.0 |
+---------------------+-------+

Of the plain Python methods tested, without psyco, for n=1000000, rwh_primes2 was the fastest.

+---------------------+-------+
| Method              | ms    |
+---------------------+-------+
| rwh_primes2         | 68.1  |
| rwh_primes1         | 93.7  |
| rwh_primes          | 94.6  |
| sieve_wheel_30      | 97.4  |
| sieveOfEratosthenes | 178.0 |
| ambi_sieve_plain    | 286.0 |
| sieveOfAtkin        | 314.0 |
| sundaram3           | 416.0 |
+---------------------+-------+

Of all the methods tested, allowing numpy, for n=1000000, primesfrom2to was the fastest tested.

+---------------------+-------+
| Method              | ms    |
+---------------------+-------+
| primesfrom2to       | 15.9  |
| primesfrom3to       | 18.4  |
| ambi_sieve          | 29.3  |
+---------------------+-------+

Timings were measured using the command:

python -mtimeit -s"import primes" "primes.{method}(1000000)"

with {method} replaced by each of the method names.

primes.py:

#!/usr/bin/env python
import psyco; psyco.full()
from math import sqrt, ceil
import numpy as np

def rwh_primes(n):
    # https://mcmap.net/q/86560/-fastest-way-to-list-all-primes-below-n/3035188#3035188
    """ Returns  a list of primes < n """
    sieve = [True] * n
    for i in xrange(3,int(n**0.5)+1,2):
        if sieve[i]:
            sieve[i*i::2*i]=[False]*((n-i*i-1)/(2*i)+1)
    return [2] + [i for i in xrange(3,n,2) if sieve[i]]

def rwh_primes1(n):
    # https://mcmap.net/q/86560/-fastest-way-to-list-all-primes-below-n/3035188#3035188
    """ Returns  a list of primes < n """
    sieve = [True] * (n/2)
    for i in xrange(3,int(n**0.5)+1,2):
        if sieve[i/2]:
            sieve[i*i/2::i] = [False] * ((n-i*i-1)/(2*i)+1)
    return [2] + [2*i+1 for i in xrange(1,n/2) if sieve[i]]

def rwh_primes2(n):
    # https://mcmap.net/q/86560/-fastest-way-to-list-all-primes-below-n/3035188#3035188
    """ Input n>=6, Returns a list of primes, 2 <= p < n """
    correction = (n%6>1)
    n = {0:n,1:n-1,2:n+4,3:n+3,4:n+2,5:n+1}[n%6]
    sieve = [True] * (n/3)
    sieve[0] = False
    for i in xrange(int(n**0.5)/3+1):
      if sieve[i]:
        k=3*i+1|1
        sieve[      ((k*k)/3)      ::2*k]=[False]*((n/6-(k*k)/6-1)/k+1)
        sieve[(k*k+4*k-2*k*(i&1))/3::2*k]=[False]*((n/6-(k*k+4*k-2*k*(i&1))/6-1)/k+1)
    return [2,3] + [3*i+1|1 for i in xrange(1,n/3-correction) if sieve[i]]

def sieve_wheel_30(N):
    # http://zerovolt.com/?p=88
    ''' Returns a list of primes <= N using wheel criterion 2*3*5 = 30

Copyright 2009 by zerovolt.com
This code is free for non-commercial purposes, in which case you can just leave this comment as a credit for my work.
If you need this code for commercial purposes, please contact me by sending an email to: info [at] zerovolt [dot] com.'''
    __smallp = ( 2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59,
    61, 67, 71, 73, 79, 83, 89, 97, 101, 103, 107, 109, 113, 127, 131, 137, 139,
    149, 151, 157, 163, 167, 173, 179, 181, 191, 193, 197, 199, 211, 223, 227,
    229, 233, 239, 241, 251, 257, 263, 269, 271, 277, 281, 283, 293, 307, 311,
    313, 317, 331, 337, 347, 349, 353, 359, 367, 373, 379, 383, 389, 397, 401,
    409, 419, 421, 431, 433, 439, 443, 449, 457, 461, 463, 467, 479, 487, 491,
    499, 503, 509, 521, 523, 541, 547, 557, 563, 569, 571, 577, 587, 593, 599,
    601, 607, 613, 617, 619, 631, 641, 643, 647, 653, 659, 661, 673, 677, 683,
    691, 701, 709, 719, 727, 733, 739, 743, 751, 757, 761, 769, 773, 787, 797,
    809, 811, 821, 823, 827, 829, 839, 853, 857, 859, 863, 877, 881, 883, 887,
    907, 911, 919, 929, 937, 941, 947, 953, 967, 971, 977, 983, 991, 997)

    wheel = (2, 3, 5)
    const = 30
    if N < 2:
        return []
    if N <= const:
        pos = 0
        while __smallp[pos] <= N:
            pos += 1
        return list(__smallp[:pos])
    # make the offsets list
    offsets = (7, 11, 13, 17, 19, 23, 29, 1)
    # prepare the list
    p = [2, 3, 5]
    dim = 2 + N // const
    tk1  = [True] * dim
    tk7  = [True] * dim
    tk11 = [True] * dim
    tk13 = [True] * dim
    tk17 = [True] * dim
    tk19 = [True] * dim
    tk23 = [True] * dim
    tk29 = [True] * dim
    tk1[0] = False
    # help dictionary d
    # d[a , b] = c  ==> if I want to find the smallest useful multiple of (30*pos)+a
    # on tkc, then I need the index given by the product of [(30*pos)+a][(30*pos)+b]
    # in general. If b < a, I need [(30*pos)+a][(30*(pos+1))+b]
    d = {}
    for x in offsets:
        for y in offsets:
            res = (x*y) % const
            if res in offsets:
                d[(x, res)] = y
    # another help dictionary: gives tkx calling tmptk[x]
    tmptk = {1:tk1, 7:tk7, 11:tk11, 13:tk13, 17:tk17, 19:tk19, 23:tk23, 29:tk29}
    pos, prime, lastadded, stop = 0, 0, 0, int(ceil(sqrt(N)))
    # inner functions definition
    def del_mult(tk, start, step):
        for k in xrange(start, len(tk), step):
            tk[k] = False
    # end of inner functions definition
    cpos = const * pos
    while prime < stop:
        # 30k + 7
        if tk7[pos]:
            prime = cpos + 7
            p.append(prime)
            lastadded = 7
            for off in offsets:
                tmp = d[(7, off)]
                start = (pos + prime) if off == 7 else (prime * (const * (pos + 1 if tmp < 7 else 0) + tmp) )//const
                del_mult(tmptk[off], start, prime)
        # 30k + 11
        if tk11[pos]:
            prime = cpos + 11
            p.append(prime)
            lastadded = 11
            for off in offsets:
                tmp = d[(11, off)]
                start = (pos + prime) if off == 11 else (prime * (const * (pos + 1 if tmp < 11 else 0) + tmp) )//const
                del_mult(tmptk[off], start, prime)
        # 30k + 13
        if tk13[pos]:
            prime = cpos + 13
            p.append(prime)
            lastadded = 13
            for off in offsets:
                tmp = d[(13, off)]
                start = (pos + prime) if off == 13 else (prime * (const * (pos + 1 if tmp < 13 else 0) + tmp) )//const
                del_mult(tmptk[off], start, prime)
        # 30k + 17
        if tk17[pos]:
            prime = cpos + 17
            p.append(prime)
            lastadded = 17
            for off in offsets:
                tmp = d[(17, off)]
                start = (pos + prime) if off == 17 else (prime * (const * (pos + 1 if tmp < 17 else 0) + tmp) )//const
                del_mult(tmptk[off], start, prime)
        # 30k + 19
        if tk19[pos]:
            prime = cpos + 19
            p.append(prime)
            lastadded = 19
            for off in offsets:
                tmp = d[(19, off)]
                start = (pos + prime) if off == 19 else (prime * (const * (pos + 1 if tmp < 19 else 0) + tmp) )//const
                del_mult(tmptk[off], start, prime)
        # 30k + 23
        if tk23[pos]:
            prime = cpos + 23
            p.append(prime)
            lastadded = 23
            for off in offsets:
                tmp = d[(23, off)]
                start = (pos + prime) if off == 23 else (prime * (const * (pos + 1 if tmp < 23 else 0) + tmp) )//const
                del_mult(tmptk[off], start, prime)
        # 30k + 29
        if tk29[pos]:
            prime = cpos + 29
            p.append(prime)
            lastadded = 29
            for off in offsets:
                tmp = d[(29, off)]
                start = (pos + prime) if off == 29 else (prime * (const * (pos + 1 if tmp < 29 else 0) + tmp) )//const
                del_mult(tmptk[off], start, prime)
        # now we go back to top tk1, so we need to increase pos by 1
        pos += 1
        cpos = const * pos
        # 30k + 1
        if tk1[pos]:
            prime = cpos + 1
            p.append(prime)
            lastadded = 1
            for off in offsets:
                tmp = d[(1, off)]
                start = (pos + prime) if off == 1 else (prime * (const * pos + tmp) )//const
                del_mult(tmptk[off], start, prime)
    # time to add remaining primes
    # if lastadded == 1, remove last element and start adding them from tk1
    # this way we don't need an "if" within the last while
    if lastadded == 1:
        p.pop()
    # now complete for every other possible prime
    while pos < len(tk1):
        cpos = const * pos
        if tk1[pos]: p.append(cpos + 1)
        if tk7[pos]: p.append(cpos + 7)
        if tk11[pos]: p.append(cpos + 11)
        if tk13[pos]: p.append(cpos + 13)
        if tk17[pos]: p.append(cpos + 17)
        if tk19[pos]: p.append(cpos + 19)
        if tk23[pos]: p.append(cpos + 23)
        if tk29[pos]: p.append(cpos + 29)
        pos += 1
    # remove exceeding if present
    pos = len(p) - 1
    while p[pos] > N:
        pos -= 1
    if pos < len(p) - 1:
        del p[pos+1:]
    # return p list
    return p

def sieveOfEratosthenes(n):
    """sieveOfEratosthenes(n): return the list of the primes < n."""
    # Code from: <[email protected]>, Nov 30 2006
    # http://groups.google.com/group/comp.lang.python/msg/f1f10ced88c68c2d
    if n <= 2:
        return []
    sieve = range(3, n, 2)
    top = len(sieve)
    for si in sieve:
        if si:
            bottom = (si*si - 3) // 2
            if bottom >= top:
                break
            sieve[bottom::si] = [0] * -((bottom - top) // si)
    return [2] + [el for el in sieve if el]

def sieveOfAtkin(end):
    """sieveOfAtkin(end): return a list of all the prime numbers <end
    using the Sieve of Atkin."""
    # Code by Steve Krenzel, <[email protected]>, improved
    # Code: https://web.archive.org/web/20080324064651/http://krenzel.info/?p=83
    # Info: http://en.wikipedia.org/wiki/Sieve_of_Atkin
    assert end > 0
    lng = ((end-1) // 2)
    sieve = [False] * (lng + 1)

    x_max, x2, xd = int(sqrt((end-1)/4.0)), 0, 4
    for xd in xrange(4, 8*x_max + 2, 8):
        x2 += xd
        y_max = int(sqrt(end-x2))
        n, n_diff = x2 + y_max*y_max, (y_max << 1) - 1
        if not (n & 1):
            n -= n_diff
            n_diff -= 2
        for d in xrange((n_diff - 1) << 1, -1, -8):
            m = n % 12
            if m == 1 or m == 5:
                m = n >> 1
                sieve[m] = not sieve[m]
            n -= d

    x_max, x2, xd = int(sqrt((end-1) / 3.0)), 0, 3
    for xd in xrange(3, 6 * x_max + 2, 6):
        x2 += xd
        y_max = int(sqrt(end-x2))
        n, n_diff = x2 + y_max*y_max, (y_max << 1) - 1
        if not(n & 1):
            n -= n_diff
            n_diff -= 2
        for d in xrange((n_diff - 1) << 1, -1, -8):
            if n % 12 == 7:
                m = n >> 1
                sieve[m] = not sieve[m]
            n -= d

    x_max, y_min, x2, xd = int((2 + sqrt(4-8*(1-end)))/4), -1, 0, 3
    for x in xrange(1, x_max + 1):
        x2 += xd
        xd += 6
        if x2 >= end: y_min = (((int(ceil(sqrt(x2 - end))) - 1) << 1) - 2) << 1
        n, n_diff = ((x*x + x) << 1) - 1, (((x-1) << 1) - 2) << 1
        for d in xrange(n_diff, y_min, -8):
            if n % 12 == 11:
                m = n >> 1
                sieve[m] = not sieve[m]
            n += d

    primes = [2, 3]
    if end <= 3:
        return primes[:max(0,end-2)]

    for n in xrange(5 >> 1, (int(sqrt(end))+1) >> 1):
        if sieve[n]:
            primes.append((n << 1) + 1)
            aux = (n << 1) + 1
            aux *= aux
            for k in xrange(aux, end, 2 * aux):
                sieve[k >> 1] = False

    s  = int(sqrt(end)) + 1
    if s  % 2 == 0:
        s += 1
    primes.extend([i for i in xrange(s, end, 2) if sieve[i >> 1]])

    return primes

def ambi_sieve_plain(n):
    s = range(3, n, 2)
    for m in xrange(3, int(n**0.5)+1, 2): 
        if s[(m-3)/2]: 
            for t in xrange((m*m-3)/2,(n>>1)-1,m):
                s[t]=0
    return [2]+[t for t in s if t>0]

def sundaram3(max_n):
    # https://mcmap.net/q/86560/-fastest-way-to-list-all-primes-below-n/2073279#2073279
    numbers = range(3, max_n+1, 2)
    half = (max_n)//2
    initial = 4

    for step in xrange(3, max_n+1, 2):
        for i in xrange(initial, half, step):
            numbers[i-1] = 0
        initial += 2*(step+1)

        if initial > half:
            return [2] + filter(None, numbers)

################################################################################
# Using Numpy:
def ambi_sieve(n):
    # http://tommih.blogspot.com/2009/04/fast-prime-number-generator.html
    s = np.arange(3, n, 2)
    for m in xrange(3, int(n ** 0.5)+1, 2): 
        if s[(m-3)/2]: 
            s[(m*m-3)/2::m]=0
    return np.r_[2, s[s>0]]

def primesfrom3to(n):
    # https://mcmap.net/q/86560/-fastest-way-to-list-all-primes-below-n/3035188#3035188
    """ Returns a array of primes, p < n """
    assert n>=2
    sieve = np.ones(n/2, dtype=np.bool)
    for i in xrange(3,int(n**0.5)+1,2):
        if sieve[i/2]:
            sieve[i*i/2::i] = False
    return np.r_[2, 2*np.nonzero(sieve)[0][1::]+1]    

def primesfrom2to(n):
    # https://mcmap.net/q/86560/-fastest-way-to-list-all-primes-below-n/3035188#3035188
    """ Input n>=6, Returns a array of primes, 2 <= p < n """
    sieve = np.ones(n/3 + (n%6==2), dtype=np.bool)
    sieve[0] = False
    for i in xrange(int(n**0.5)/3+1):
        if sieve[i]:
            k=3*i+1|1
            sieve[      ((k*k)/3)      ::2*k] = False
            sieve[(k*k+4*k-2*k*(i&1))/3::2*k] = False
    return np.r_[2,3,((3*np.nonzero(sieve)[0]+1)|1)]

if __name__=='__main__':
    import itertools
    import sys

    def test(f1,f2,num):
        print('Testing {f1} and {f2} return same results'.format(
            f1=f1.func_name,
            f2=f2.func_name))
        if not all([a==b for a,b in itertools.izip_longest(f1(num),f2(num))]):
            sys.exit("Error: %s(%s) != %s(%s)"%(f1.func_name,num,f2.func_name,num))

    n=1000000
    test(sieveOfAtkin,sieveOfEratosthenes,n)
    test(sieveOfAtkin,ambi_sieve,n)
    test(sieveOfAtkin,ambi_sieve_plain,n) 
    test(sieveOfAtkin,sundaram3,n)
    test(sieveOfAtkin,sieve_wheel_30,n)
    test(sieveOfAtkin,primesfrom3to,n)
    test(sieveOfAtkin,primesfrom2to,n)
    test(sieveOfAtkin,rwh_primes,n)
    test(sieveOfAtkin,rwh_primes1,n)         
    test(sieveOfAtkin,rwh_primes2,n)

Running the script tests that all implementations give the same result.

Stratification answered 15/1, 2010 at 0:19 Comment(33)
That's not pure Python, but it's the fastest version so far. thanks!Georgiageorgian
If you're interested in non-pure-Python code, then you should check out gmpy -- it has pretty good support for primes, via the next_prime method of its mpz type.Jolley
just for correctness, the code example should have import numpy as npBlenny
int(n ** 0.5) should be int(math.ceil(n ** 0.5)) or int(n ** 0.5) + 1. ambi_sieve(10) gives wrong answer otherwise.Flabellum
You might want to add wheel factorization to your comparison. The version on zerovolt.com/?p=88 (which happened to be the first that I found googling) is faster on my machine than the other pure Python versions you list.Smollett
@stephan: Thank you for the link. Are you sure you would not like to throw your hat into the ring?Stratification
@~unutbu: go ahead if you want. To me, your comparison of various very fast algorithms seems to be the perfect place to add this specific implementation. I have used your script to compare the speed and verify correctness.Smollett
@~unutbu: sieveOfEratosthenes() is faster if we allow only pure Python than any other variant (without psyco, numpy). Python3: sieveOfEratosthenes: 0.055, sieve_wheel_30: 0.061 ambi_sieve_plain: 0.130, sieveOfAtkin: 0.270; Python2: sieveOfEratosthenes: 0.061, sieve_wheel_30: 0.086, ambi_sieve_plain: 0.137, sieveOfAtkin: 0.216. If we allow C extensions then gmpy2-based variant (0.041) is the fastest in Python3 #2897797 ambi_sieve: 0.015 is the fastest on Python2 if C extensions are allowed.Ingot
@J.F Sebastian: Thanks for the info. I've rechecked my timeit runs and at least on my machine (Intel E2160 running Ubuntu 9.10) sieveOfEratosthenes is definitely not the fastest. Do you think our differing results could be due to hardware? Also, have you checked Robert William Hanks' primesfrom3to1? #2068872 Using numpy I'm getting primesfrom3to1 (19.6) versus ambi_sieve (29.4).Stratification
@~unutbu: Indeed primesfrom3to1() (corrected to return r_[...) takes 17 seconds for n=1e9 vs. 22.2 seconds for ambi_sieve() (for n=1e6 both are almost the same (15ms) ambi_sieve() being slightly faster). Among pure Python versions sieveOfEratosthenes is the fastest on my machine (Intel i7, Ubuntu 10.04)Ingot
New @Robert William Hanks's primes() function #2068872 is the fastest (for n=1e6) now among pure Python solutions ideone.com/weu23Ingot
@J.F. Sebastian: Neat program. I love the decorators. Using your program on my machine, primes1 is still sometimes faster than primes (well, on 2 out of 3 runs...). Without psycho the difference between those two seems to be on the order of the standard error between runs.Stratification
in my machine times for primes & primes1 (without psycho) are exactly the same,i just posted as an example because it uses slightly less memory. i just posted a variation of primesfrom3to with the same upside that at least for 1e7 is faster in may machine.Pruchno
@Robert William Hanks: primesfrom2to is slightly faster than ambi_sieve for n=1e6 ideone.com/9MOfqIngot
@J.F. Sebastian: sorry to ask you that, would you mind to test it again? i made a change in the translation part of the code primesfrom2to and i am getting better results in my machine.Pruchno
@Robert William Hanks: new version of primesfrom2to is definitely faster than ambi_sieve for sufficiently large n. ideone.com/ddiU2Ingot
@~unutbu: just added a new version of primesfrom2to & primes2, if you have time you may want to timeit those.Pruchno
Dead links are dead :(Reciprocate
but, for noobies, which one is simplest, yet reasonably fast ?Cytochrome
Conceptually, the simplest algorithm (to understand) is the Sieve of Eratosthenes. I would start there. If it is fast enough, great. If not, use the benchmarks posted above to guide you to a faster algorithm.Stratification
Using itertools.compress makes rwh_primes1 the fastest in the category "plain Python without Psycho" in my testing. I.e. replace the last line with return [2] + list(compress(xrange(3,n,2), sieve[1:]))Beseech
If you're using pypy, these benchmarks (the psyco ones) seem fairly off. Surprisingly enough, I found sieveOfEratosthenes and ambi_sieve_plain to be the fastest with pypy. This is what I found for the non-numpy ones gist.github.com/5bf466bb1ee9e5726a52Quality
If someone wonders how the functions here fare against PG7.8 of Wikibooks for pure python without psyco nor pypy: for n = 1000000: PG7.8: 4.93 s per loop ; rwh_primes1: 69 ms per loop ; rwh_primes2: 57.1 ms per loopHopping
@Stratification Please, fix the links to the different algorithms.Kola
@nbro: Which links are broken?Stratification
@nbro: The sieveOfAtkin like has been updated with a web.archive.org cached page.Stratification
Can you update this with PyPy, now that psyco is dead and PyPy has superseded it?Negativism
the zerovolt.com link is broken... but there is a wheel implementation in python right here: https://mcmap.net/q/57611/-adding-wheel-factorization-to-an-indefinite-sieveAirless
Would be great if these functions and timings could be updated for python3.Balance
2.47ms for 1e6 using numba and a simple (older) Sieve of Atkin.Hallah
In c++ I use min_25 sieve.I takes less than 1 sec to generate primes till 1 Billion. Which of the above catagories would it fall into for python ?Zaporozhye
@Balance the answers below contain updated code for Python3, specifically this community wiki and this recent-ish answer from Nico.Shakira
Now test with pypy!Sheepshanks
P
161

Faster & more memory-wise pure Python code:

def primes(n):
    """ Returns  a list of primes < n """
    sieve = [True] * n
    for i in range(3,int(n**0.5)+1,2):
        if sieve[i]:
            sieve[i*i::2*i]=[False]*((n-i*i-1)//(2*i)+1)
    return [2] + [i for i in range(3,n,2) if sieve[i]]

or starting with half sieve

def primes1(n):
    """ Returns  a list of primes < n """
    sieve = [True] * (n//2)
    for i in range(3,int(n**0.5)+1,2):
        if sieve[i//2]:
            sieve[i*i//2::i] = [False] * ((n-i*i-1)//(2*i)+1)
    return [2] + [2*i+1 for i in range(1,n//2) if sieve[i]]

Faster & more memory-wise numpy code:

import numpy
def primesfrom3to(n):
    """ Returns a array of primes, 3 <= p < n """
    sieve = numpy.ones(n//2, dtype=bool)
    for i in range(3,int(n**0.5)+1,2):
        if sieve[i//2]:
            sieve[i*i//2::i] = False
    return 2*numpy.nonzero(sieve)[0][1::]+1

a faster variation starting with a third of a sieve:

import numpy
def primesfrom2to(n):
    """ Input n>=6, Returns a array of primes, 2 <= p < n """
    sieve = numpy.ones(n//3 + (n%6==2), dtype=bool)
    for i in range(1,int(n**0.5)//3+1):
        if sieve[i]:
            k=3*i+1|1
            sieve[       k*k//3     ::2*k] = False
            sieve[k*(k-2*(i&1)+4)//3::2*k] = False
    return numpy.r_[2,3,((3*numpy.nonzero(sieve)[0][1:]+1)|1)]

A (hard-to-code) pure-python version of the above code would be:

def primes2(n):
    """ Input n>=6, Returns a list of primes, 2 <= p < n """
    n, correction = n-n%6+6, 2-(n%6>1)
    sieve = [True] * (n//3)
    for i in range(1,int(n**0.5)//3+1):
      if sieve[i]:
        k=3*i+1|1
        sieve[      k*k//3      ::2*k] = [False] * ((n//6-k*k//6-1)//k+1)
        sieve[k*(k-2*(i&1)+4)//3::2*k] = [False] * ((n//6-k*(k-2*(i&1)+4)//6-1)//k+1)
    return [2,3] + [3*i+1|1 for i in range(1,n//3-correction) if sieve[i]]

Unfortunately pure-python don't adopt the simpler and faster numpy way of doing assignment, and calling len() inside the loop as in [False]*len(sieve[((k*k)//3)::2*k]) is too slow. So I had to improvise to correct input (& avoid more math) and do some extreme (& painful) math-magic.

Personally I think it is a shame that numpy (which is so widely used) is not part of Python standard library, and that the improvements in syntax and speed seem to be completely overlooked by Python developers.

Pruchno answered 14/1, 2010 at 23:40 Comment(8)
Numpy is now compatible with Python 3. The fact that it's not in the standard library is good, that way they can have their own release cycle.Sneaker
to just store binary values in an array i suggest bitarray - as used here (for the simplest prime sieve; not a contender in the race here!) #31121486Airless
When casting in the primesfrom2to() method, should the division be inside of the brackets?Vedic
For a pure python version compatible with python 3, follow this link : https://mcmap.net/q/86560/-fastest-way-to-list-all-primes-below-nLazarolazaruk
FWIW, the version in your 1st code block is the topic of this question. I should also mention that I used a Python 3 version of your code here.Homemaker
Holy buttsnacks this sucker is fast.Falciform
How about numba versions of these algos? For motivation, here I use numba to implement an older version of Atkin's sieve. It takes 2.5ms to get the primes up to 10**6 and 10.6s up to 10**9.Hallah
@PierreD that's as fast as my c++ implementation of Atkin's segmented sieve. But in c++ I currently use min_25 sieve which can sieve up to 10**9 in less than 1 sec. Can it be made avaiable in python?Zaporozhye
J
43

There's a pretty neat sample from the Python Cookbook here -- the fastest version proposed on that URL is:

import itertools
def erat2( ):
    D = {  }
    yield 2
    for q in itertools.islice(itertools.count(3), 0, None, 2):
        p = D.pop(q, None)
        if p is None:
            D[q*q] = q
            yield q
        else:
            x = p + q
            while x in D or not (x&1):
                x += p
            D[x] = p

so that would give

def get_primes_erat(n):
  return list(itertools.takewhile(lambda p: p<n, erat2()))

Measuring at the shell prompt (as I prefer to do) with this code in pri.py, I observe:

$ python2.5 -mtimeit -s'import pri' 'pri.get_primes(1000000)'
10 loops, best of 3: 1.69 sec per loop
$ python2.5 -mtimeit -s'import pri' 'pri.get_primes_erat(1000000)'
10 loops, best of 3: 673 msec per loop

so it looks like the Cookbook solution is over twice as fast.

Jolley answered 14/1, 2010 at 23:52 Comment(4)
@jbochi, you're welcome -- but do look at that URL, including the credits: it took ten of us to collectively refine the code to this point, including Python-performance luminaries such as Tim Peters and Raymond Hettinger (I wrote the final text of the recipe since I edited the printed Cookbook, but in terms of coding my contribution was on a par with the others') -- in the end, it's really subtle and finely tuned code, and that's not surprising!-)Jolley
@Alex: Knowing that your code is "only" twice as fast as mine, makes me pretty proud then. :) The URL was also very interesting to read. Thanks again.Georgiageorgian
And it can be made even faster with a minor change: see #2212490Mellophone
... And it can be made yet faster with additional ~1.2x-1.3x speedup, drastic reduction in memory footprint from O(n) to O(sqrt(n)) and improvement in empirical time complexity, by postponing the addition of primes to the dict until their square is seen in the input. Test it here.Meingolda
G
29

Using Sundaram's Sieve, I think I broke pure-Python's record:

def sundaram3(max_n):
    numbers = range(3, max_n+1, 2)
    half = (max_n)//2
    initial = 4

    for step in xrange(3, max_n+1, 2):
        for i in xrange(initial, half, step):
            numbers[i-1] = 0
        initial += 2*(step+1)

        if initial > half:
            return [2] + filter(None, numbers)

Comparasion:

C:\USERS>python -m timeit -n10 -s "import get_primes" "get_primes.get_primes_erat(1000000)"
10 loops, best of 3: 710 msec per loop

C:\USERS>python -m timeit -n10 -s "import get_primes" "get_primes.daniel_sieve_2(1000000)"
10 loops, best of 3: 435 msec per loop

C:\USERS>python -m timeit -n10 -s "import get_primes" "get_primes.sundaram3(1000000)"
10 loops, best of 3: 327 msec per loop
Georgiageorgian answered 15/1, 2010 at 16:50 Comment(4)
I managed to speed up your function about 20% by adding "zero = 0" at the top of the function and then replacing the lambda in your filter with "zero.__sub__". Not the prettiest code in the world, but a bit faster :)Shenika
@truppo: Thanks for your comment! I just realized that passing None instead of the original function works and it's even faster than zero.__sub__Georgiageorgian
Did you know that if you pass sundaram3(9) it will return [2, 3, 5, 7, 9]? It seems to do this with numerous -- perhaps all -- odd numbers (even when they aren't prime)Pone
it has an issue: sundaram3(7071) includes 7071 while it is not primeReactive
M
22

The algorithm is fast, but it has a serious flaw:

>>> sorted(get_primes(530))
[2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73,
79, 83, 89, 97, 101, 103, 107, 109, 113, 127, 131, 137, 139, 149, 151, 157, 163,
167, 173, 179, 181, 191, 193, 197, 199, 211, 223, 227, 229, 233, 239, 241, 251,
257, 263, 269, 271, 277, 281, 283, 293, 307, 311, 313, 317, 331, 337, 347, 349,
353, 359, 367, 373, 379, 383, 389, 397, 401, 409, 419, 421, 431, 433, 439, 443,
449, 457, 461, 463, 467, 479, 487, 491, 499, 503, 509, 521, 523, 527, 529]
>>> 17*31
527
>>> 23*23
529

You assume that numbers.pop() would return the smallest number in the set, but this is not guaranteed at all. Sets are unordered and pop() removes and returns an arbitrary element, so it cannot be used to select the next prime from the remaining numbers.

Minoru answered 15/1, 2010 at 0:14 Comment(0)
B
19

For truly fastest solution with sufficiently large N would be to download a pre-calculated list of primes, store it as a tuple and do something like:

for pos,i in enumerate(primes):
    if i > N:
        print primes[:pos]

If N > primes[-1] only then calculate more primes and save the new list in your code, so next time it is equally as fast.

Always think outside the box.

Blenny answered 15/1, 2010 at 7:58 Comment(4)
To be fair, though, you'd have to count the time downloading, unzipping, and formatting the primes and compare that with the time to generate primes using an algorithm - any one of these algorithms could easily write the results to a file for later use. I think in that case, given enough memory to actually calculate all primes less than 982,451,653, the numpy solution would still be faster.Maite
@Daniel correct. However the store what you have and continue whenever needed still stands...Blenny
@Daniel G I think download time is irrelevant. Isn't it really about generating the numbers, so you would want to take into account the algorithm used to create that list you're downloading. And any time complexity would ignore the once of file transfer given it O(n).Somato
The FAQ for the UTM prime page suggests calculating small primes is faster than reading them off a disk (the question is what small means).Sapsucker
A
15

If you don't want to reinvent the wheel, you can install the symbolic maths library sympy (yes it's Python 3 compatible)

pip install sympy

And use the primerange function

from sympy import sieve
primes = list(sieve.primerange(1, 10**6))
Ahmadahmar answered 4/7, 2015 at 14:6 Comment(1)
I notice this prints the whole list, whereas from the community wiki answer primesfrom2to(10000) returns [ 2 3 5 ... 9949 9967 9973]. Is that shortening a NumPy nd.array thing?Darkness
R
11

If you accept itertools but not numpy, here is an adaptation of rwh_primes2 for Python 3 that runs about twice as fast on my machine. The only substantial change is using a bytearray instead of a list for the boolean, and using compress instead of a list comprehension to build the final list. (I'd add this as a comment like moarningsun if I were able.)

import itertools
izip = itertools.zip_longest
chain = itertools.chain.from_iterable
compress = itertools.compress
def rwh_primes2_python3(n):
    """ Input n>=6, Returns a list of primes, 2 <= p < n """
    zero = bytearray([False])
    size = n//3 + (n % 6 == 2)
    sieve = bytearray([True]) * size
    sieve[0] = False
    for i in range(int(n**0.5)//3+1):
      if sieve[i]:
        k=3*i+1|1
        start = (k*k+4*k-2*k*(i&1))//3
        sieve[(k*k)//3::2*k]=zero*((size - (k*k)//3 - 1) // (2 * k) + 1)
        sieve[  start ::2*k]=zero*((size -   start  - 1) // (2 * k) + 1)
    ans = [2,3]
    poss = chain(izip(*[range(i, n, 6) for i in (1,5)]))
    ans.extend(compress(poss, sieve))
    return ans

Comparisons:

>>> timeit.timeit('primes.rwh_primes2(10**6)', setup='import primes', number=1)
0.0652179726976101
>>> timeit.timeit('primes.rwh_primes2_python3(10**6)', setup='import primes', number=1)
0.03267321276325674

and

>>> timeit.timeit('primes.rwh_primes2(10**8)', setup='import primes', number=1)
6.394284538007014
>>> timeit.timeit('primes.rwh_primes2_python3(10**8)', setup='import primes', number=1)
3.833829450302801
Restaurant answered 26/10, 2015 at 21:51 Comment(0)
S
9

Here is two updated (pure Python 3.6) versions of one of the fastest functions,

from itertools import compress

def rwh_primes1v1(n):
    """ Returns  a list of primes < n for n > 2 """
    sieve = bytearray([True]) * (n//2)
    for i in range(3,int(n**0.5)+1,2):
        if sieve[i//2]:
            sieve[i*i//2::i] = bytearray((n-i*i-1)//(2*i)+1)
    return [2,*compress(range(3,n,2), sieve[1:])]

def rwh_primes1v2(n):
    """ Returns a list of primes < n for n > 2 """
    sieve = bytearray([True]) * (n//2+1)
    for i in range(1,int(n**0.5)//2+1):
        if sieve[i]:
            sieve[2*i*(i+1)::2*i+1] = bytearray((n//2-2*i*(i+1))//(2*i+1)+1)
    return [2,*compress(range(3,n,2), sieve[1:])]
Spondee answered 8/10, 2017 at 19:35 Comment(1)
In Python 3 I used this function https://mcmap.net/q/86560/-fastest-way-to-list-all-primes-below-n but replaced / with // and xrange with range and they seemed much faster than this.Superfamily
C
9

I've updated much of the code for Python 3 and threw it at perfplot (a project of mine) to see which is actually fastest. Turns out that, for large n, primesfrom{2,3}to takes the cake:

enter image description here


Code to reproduce the plot:

import perfplot
from math import sqrt, ceil
import numpy as np
import sympy


def rwh_primes(n):
    # https://mcmap.net/q/86560/-fastest-way-to-list-all-primes-below-n/3035188#3035188
    """ Returns  a list of primes < n """
    sieve = [True] * n
    for i in range(3, int(n ** 0.5) + 1, 2):
        if sieve[i]:
            sieve[i * i::2 * i] = [False] * ((n - i * i - 1) // (2 * i) + 1)
    return [2] + [i for i in range(3, n, 2) if sieve[i]]


def rwh_primes1(n):
    # https://mcmap.net/q/86560/-fastest-way-to-list-all-primes-below-n/3035188#3035188
    """ Returns  a list of primes < n """
    sieve = [True] * (n // 2)
    for i in range(3, int(n ** 0.5) + 1, 2):
        if sieve[i // 2]:
            sieve[i * i // 2::i] = [False] * ((n - i * i - 1) // (2 * i) + 1)
    return [2] + [2 * i + 1 for i in range(1, n // 2) if sieve[i]]


def rwh_primes2(n):
    # https://mcmap.net/q/86560/-fastest-way-to-list-all-primes-below-n/3035188#3035188
    """Input n>=6, Returns a list of primes, 2 <= p < n"""
    assert n >= 6
    correction = n % 6 > 1
    n = {0: n, 1: n - 1, 2: n + 4, 3: n + 3, 4: n + 2, 5: n + 1}[n % 6]
    sieve = [True] * (n // 3)
    sieve[0] = False
    for i in range(int(n ** 0.5) // 3 + 1):
        if sieve[i]:
            k = 3 * i + 1 | 1
            sieve[((k * k) // 3)::2 * k] = [False] * (
                (n // 6 - (k * k) // 6 - 1) // k + 1
            )
            sieve[(k * k + 4 * k - 2 * k * (i & 1)) // 3::2 * k] = [False] * (
                (n // 6 - (k * k + 4 * k - 2 * k * (i & 1)) // 6 - 1) // k + 1
            )
    return [2, 3] + [3 * i + 1 | 1 for i in range(1, n // 3 - correction) if sieve[i]]


def sieve_wheel_30(N):
    # http://zerovolt.com/?p=88
    """ Returns a list of primes <= N using wheel criterion 2*3*5 = 30

Copyright 2009 by zerovolt.com
This code is free for non-commercial purposes, in which case you can just leave this comment as a credit for my work.
If you need this code for commercial purposes, please contact me by sending an email to: info [at] zerovolt [dot] com."""
    __smallp = (
        2,
        3,
        5,
        7,
        11,
        13,
        17,
        19,
        23,
        29,
        31,
        37,
        41,
        43,
        47,
        53,
        59,
        61,
        67,
        71,
        73,
        79,
        83,
        89,
        97,
        101,
        103,
        107,
        109,
        113,
        127,
        131,
        137,
        139,
        149,
        151,
        157,
        163,
        167,
        173,
        179,
        181,
        191,
        193,
        197,
        199,
        211,
        223,
        227,
        229,
        233,
        239,
        241,
        251,
        257,
        263,
        269,
        271,
        277,
        281,
        283,
        293,
        307,
        311,
        313,
        317,
        331,
        337,
        347,
        349,
        353,
        359,
        367,
        373,
        379,
        383,
        389,
        397,
        401,
        409,
        419,
        421,
        431,
        433,
        439,
        443,
        449,
        457,
        461,
        463,
        467,
        479,
        487,
        491,
        499,
        503,
        509,
        521,
        523,
        541,
        547,
        557,
        563,
        569,
        571,
        577,
        587,
        593,
        599,
        601,
        607,
        613,
        617,
        619,
        631,
        641,
        643,
        647,
        653,
        659,
        661,
        673,
        677,
        683,
        691,
        701,
        709,
        719,
        727,
        733,
        739,
        743,
        751,
        757,
        761,
        769,
        773,
        787,
        797,
        809,
        811,
        821,
        823,
        827,
        829,
        839,
        853,
        857,
        859,
        863,
        877,
        881,
        883,
        887,
        907,
        911,
        919,
        929,
        937,
        941,
        947,
        953,
        967,
        971,
        977,
        983,
        991,
        997,
    )
    # wheel = (2, 3, 5)
    const = 30
    if N < 2:
        return []
    if N <= const:
        pos = 0
        while __smallp[pos] <= N:
            pos += 1
        return list(__smallp[:pos])
    # make the offsets list
    offsets = (7, 11, 13, 17, 19, 23, 29, 1)
    # prepare the list
    p = [2, 3, 5]
    dim = 2 + N // const
    tk1 = [True] * dim
    tk7 = [True] * dim
    tk11 = [True] * dim
    tk13 = [True] * dim
    tk17 = [True] * dim
    tk19 = [True] * dim
    tk23 = [True] * dim
    tk29 = [True] * dim
    tk1[0] = False
    # help dictionary d
    # d[a , b] = c  ==> if I want to find the smallest useful multiple of (30*pos)+a
    # on tkc, then I need the index given by the product of [(30*pos)+a][(30*pos)+b]
    # in general. If b < a, I need [(30*pos)+a][(30*(pos+1))+b]
    d = {}
    for x in offsets:
        for y in offsets:
            res = (x * y) % const
            if res in offsets:
                d[(x, res)] = y
    # another help dictionary: gives tkx calling tmptk[x]
    tmptk = {1: tk1, 7: tk7, 11: tk11, 13: tk13, 17: tk17, 19: tk19, 23: tk23, 29: tk29}
    pos, prime, lastadded, stop = 0, 0, 0, int(ceil(sqrt(N)))

    # inner functions definition
    def del_mult(tk, start, step):
        for k in range(start, len(tk), step):
            tk[k] = False

    # end of inner functions definition
    cpos = const * pos
    while prime < stop:
        # 30k + 7
        if tk7[pos]:
            prime = cpos + 7
            p.append(prime)
            lastadded = 7
            for off in offsets:
                tmp = d[(7, off)]
                start = (
                    (pos + prime)
                    if off == 7
                    else (prime * (const * (pos + 1 if tmp < 7 else 0) + tmp)) // const
                )
                del_mult(tmptk[off], start, prime)
        # 30k + 11
        if tk11[pos]:
            prime = cpos + 11
            p.append(prime)
            lastadded = 11
            for off in offsets:
                tmp = d[(11, off)]
                start = (
                    (pos + prime)
                    if off == 11
                    else (prime * (const * (pos + 1 if tmp < 11 else 0) + tmp)) // const
                )
                del_mult(tmptk[off], start, prime)
        # 30k + 13
        if tk13[pos]:
            prime = cpos + 13
            p.append(prime)
            lastadded = 13
            for off in offsets:
                tmp = d[(13, off)]
                start = (
                    (pos + prime)
                    if off == 13
                    else (prime * (const * (pos + 1 if tmp < 13 else 0) + tmp)) // const
                )
                del_mult(tmptk[off], start, prime)
        # 30k + 17
        if tk17[pos]:
            prime = cpos + 17
            p.append(prime)
            lastadded = 17
            for off in offsets:
                tmp = d[(17, off)]
                start = (
                    (pos + prime)
                    if off == 17
                    else (prime * (const * (pos + 1 if tmp < 17 else 0) + tmp)) // const
                )
                del_mult(tmptk[off], start, prime)
        # 30k + 19
        if tk19[pos]:
            prime = cpos + 19
            p.append(prime)
            lastadded = 19
            for off in offsets:
                tmp = d[(19, off)]
                start = (
                    (pos + prime)
                    if off == 19
                    else (prime * (const * (pos + 1 if tmp < 19 else 0) + tmp)) // const
                )
                del_mult(tmptk[off], start, prime)
        # 30k + 23
        if tk23[pos]:
            prime = cpos + 23
            p.append(prime)
            lastadded = 23
            for off in offsets:
                tmp = d[(23, off)]
                start = (
                    (pos + prime)
                    if off == 23
                    else (prime * (const * (pos + 1 if tmp < 23 else 0) + tmp)) // const
                )
                del_mult(tmptk[off], start, prime)
        # 30k + 29
        if tk29[pos]:
            prime = cpos + 29
            p.append(prime)
            lastadded = 29
            for off in offsets:
                tmp = d[(29, off)]
                start = (
                    (pos + prime)
                    if off == 29
                    else (prime * (const * (pos + 1 if tmp < 29 else 0) + tmp)) // const
                )
                del_mult(tmptk[off], start, prime)
        # now we go back to top tk1, so we need to increase pos by 1
        pos += 1
        cpos = const * pos
        # 30k + 1
        if tk1[pos]:
            prime = cpos + 1
            p.append(prime)
            lastadded = 1
            for off in offsets:
                tmp = d[(1, off)]
                start = (
                    (pos + prime)
                    if off == 1
                    else (prime * (const * pos + tmp)) // const
                )
                del_mult(tmptk[off], start, prime)
    # time to add remaining primes
    # if lastadded == 1, remove last element and start adding them from tk1
    # this way we don't need an "if" within the last while
    if lastadded == 1:
        p.pop()
    # now complete for every other possible prime
    while pos < len(tk1):
        cpos = const * pos
        if tk1[pos]:
            p.append(cpos + 1)
        if tk7[pos]:
            p.append(cpos + 7)
        if tk11[pos]:
            p.append(cpos + 11)
        if tk13[pos]:
            p.append(cpos + 13)
        if tk17[pos]:
            p.append(cpos + 17)
        if tk19[pos]:
            p.append(cpos + 19)
        if tk23[pos]:
            p.append(cpos + 23)
        if tk29[pos]:
            p.append(cpos + 29)
        pos += 1
    # remove exceeding if present
    pos = len(p) - 1
    while p[pos] > N:
        pos -= 1
    if pos < len(p) - 1:
        del p[pos + 1 :]
    # return p list
    return p


def sieve_of_eratosthenes(n):
    """sieveOfEratosthenes(n): return the list of the primes < n."""
    # Code from: <[email protected]>, Nov 30 2006
    # http://groups.google.com/group/comp.lang.python/msg/f1f10ced88c68c2d
    if n <= 2:
        return []
    sieve = list(range(3, n, 2))
    top = len(sieve)
    for si in sieve:
        if si:
            bottom = (si * si - 3) // 2
            if bottom >= top:
                break
            sieve[bottom::si] = [0] * -((bottom - top) // si)
    return [2] + [el for el in sieve if el]


def sieve_of_atkin(end):
    """return a list of all the prime numbers <end using the Sieve of Atkin."""
    # Code by Steve Krenzel, <[email protected]>, improved
    # Code: https://web.archive.org/web/20080324064651/http://krenzel.info/?p=83
    # Info: http://en.wikipedia.org/wiki/Sieve_of_Atkin
    assert end > 0
    lng = (end - 1) // 2
    sieve = [False] * (lng + 1)

    x_max, x2, xd = int(sqrt((end - 1) / 4.0)), 0, 4
    for xd in range(4, 8 * x_max + 2, 8):
        x2 += xd
        y_max = int(sqrt(end - x2))
        n, n_diff = x2 + y_max * y_max, (y_max << 1) - 1
        if not (n & 1):
            n -= n_diff
            n_diff -= 2
        for d in range((n_diff - 1) << 1, -1, -8):
            m = n % 12
            if m == 1 or m == 5:
                m = n >> 1
                sieve[m] = not sieve[m]
            n -= d

    x_max, x2, xd = int(sqrt((end - 1) / 3.0)), 0, 3
    for xd in range(3, 6 * x_max + 2, 6):
        x2 += xd
        y_max = int(sqrt(end - x2))
        n, n_diff = x2 + y_max * y_max, (y_max << 1) - 1
        if not (n & 1):
            n -= n_diff
            n_diff -= 2
        for d in range((n_diff - 1) << 1, -1, -8):
            if n % 12 == 7:
                m = n >> 1
                sieve[m] = not sieve[m]
            n -= d

    x_max, y_min, x2, xd = int((2 + sqrt(4 - 8 * (1 - end))) / 4), -1, 0, 3
    for x in range(1, x_max + 1):
        x2 += xd
        xd += 6
        if x2 >= end:
            y_min = (((int(ceil(sqrt(x2 - end))) - 1) << 1) - 2) << 1
        n, n_diff = ((x * x + x) << 1) - 1, (((x - 1) << 1) - 2) << 1
        for d in range(n_diff, y_min, -8):
            if n % 12 == 11:
                m = n >> 1
                sieve[m] = not sieve[m]
            n += d

    primes = [2, 3]
    if end <= 3:
        return primes[: max(0, end - 2)]

    for n in range(5 >> 1, (int(sqrt(end)) + 1) >> 1):
        if sieve[n]:
            primes.append((n << 1) + 1)
            aux = (n << 1) + 1
            aux *= aux
            for k in range(aux, end, 2 * aux):
                sieve[k >> 1] = False

    s = int(sqrt(end)) + 1
    if s % 2 == 0:
        s += 1
    primes.extend([i for i in range(s, end, 2) if sieve[i >> 1]])

    return primes


def ambi_sieve_plain(n):
    s = list(range(3, n, 2))
    for m in range(3, int(n ** 0.5) + 1, 2):
        if s[(m - 3) // 2]:
            for t in range((m * m - 3) // 2, (n >> 1) - 1, m):
                s[t] = 0
    return [2] + [t for t in s if t > 0]


def sundaram3(max_n):
    # https://mcmap.net/q/86560/-fastest-way-to-list-all-primes-below-n/2073279#2073279
    numbers = range(3, max_n + 1, 2)
    half = (max_n) // 2
    initial = 4

    for step in range(3, max_n + 1, 2):
        for i in range(initial, half, step):
            numbers[i - 1] = 0
        initial += 2 * (step + 1)

        if initial > half:
            return [2] + filter(None, numbers)


# Using Numpy:
def ambi_sieve(n):
    # http://tommih.blogspot.com/2009/04/fast-prime-number-generator.html
    s = np.arange(3, n, 2)
    for m in range(3, int(n ** 0.5) + 1, 2):
        if s[(m - 3) // 2]:
            s[(m * m - 3) // 2::m] = 0
    return np.r_[2, s[s > 0]]


def primesfrom3to(n):
    # https://mcmap.net/q/86560/-fastest-way-to-list-all-primes-below-n/3035188#3035188
    """ Returns an array of primes, p < n """
    assert n >= 2
    sieve = np.ones(n // 2, dtype=bool)
    for i in range(3, int(n ** 0.5) + 1, 2):
        if sieve[i // 2]:
            sieve[i * i // 2::i] = False
    return np.r_[2, 2 * np.nonzero(sieve)[0][1::] + 1]


def primesfrom2to(n):
    # https://mcmap.net/q/86560/-fastest-way-to-list-all-primes-below-n/3035188#3035188
    """ Input n>=6, Returns an array of primes, 2 <= p < n """
    assert n >= 6
    sieve = np.ones(n // 3 + (n % 6 == 2), dtype=bool)
    sieve[0] = False
    for i in range(int(n ** 0.5) // 3 + 1):
        if sieve[i]:
            k = 3 * i + 1 | 1
            sieve[((k * k) // 3)::2 * k] = False
            sieve[(k * k + 4 * k - 2 * k * (i & 1)) // 3::2 * k] = False
    return np.r_[2, 3, ((3 * np.nonzero(sieve)[0] + 1) | 1)]


def sympy_sieve(n):
    return list(sympy.sieve.primerange(1, n))


b = perfplot.bench(
    setup=lambda n: n,
    kernels=[
        rwh_primes,
        rwh_primes1,
        rwh_primes2,
        sieve_wheel_30,
        sieve_of_eratosthenes,
        sieve_of_atkin,
        # ambi_sieve_plain,
        # sundaram3,
        ambi_sieve,
        primesfrom3to,
        primesfrom2to,
        sympy_sieve,
    ],
    n_range=[2 ** k for k in range(3, 25)],
    xlabel="n",
)
b.save("out.png")
b.show()
Colloquialism answered 21/11, 2019 at 21:15 Comment(2)
mmm, log-log plots... :)Meingolda
now try it with pypy!Sheepshanks
A
8

It's instructive to write your own prime finding code, but it's also useful to have a fast reliable library at hand. I wrote a wrapper around the C++ library primesieve, named it primesieve-python

Try it pip install primesieve

import primesieve
primes = primesieve.generate_primes(10**8)

I'd be curious to see the speed compared.

Ahmadahmar answered 9/7, 2015 at 15:50 Comment(3)
It's not exactly what OP ordered but I fail to see why the downvote. It's a 2.8sec solution unlike some other outside modules. I've noticed in the source that it's threaded, got any tests on how well it scales?Garygarza
@Garygarza cheers. The bottleneck seems to be copying C++ vector to Python list, thus the count_primes function is much faster than generate_primesAhmadahmar
On my computer it can comfortably generate primes up to 1e8 (it gives MemoryError for 1e9) , and count primes up to 1e10. @HappyLeapSecond above compares algorithms for 1e6Ahmadahmar
J
4

A deterministic implementation of Miller-Rabin's Primality test on the assumption that N < 9,080,191

import sys

def miller_rabin_pass(a, n):
    d = n - 1
    s = 0
    while d % 2 == 0:
        d >>= 1
        s += 1

    a_to_power = pow(a, d, n)
    if a_to_power == 1:
        return True
    for i in range(s-1):
        if a_to_power == n - 1:
            return True
        a_to_power = (a_to_power * a_to_power) % n
    return a_to_power == n - 1


def miller_rabin(n):
    if n <= 2:
        return n == 2

    if n < 2_047:
        return miller_rabin_pass(2, n)

    return all(miller_rabin_pass(a, n) for a in (31, 73))


n = int(sys.argv[1])
primes = [2]
for p in range(3,n,2):
  if miller_rabin(p):
    primes.append(p)
print len(primes)

According to the article on Wikipedia (http://en.wikipedia.org/wiki/Miller–Rabin_primality_test) testing N < 9,080,191 for a = 31 and 73 is enough to decide whether N is composite or not.

And I adapted the source code from the probabilistic implementation of original Miller-Rabin's test found here: https://www.literateprograms.org/miller-rabin_primality_test__python_.html

Joe answered 21/1, 2010 at 0:17 Comment(4)
Thank's for the Miller-Rabin primality test, but this code is actually slower and is not providing the correct results. 37 is prime and does not pass the test.Georgiageorgian
I guess 37 is one of the special cases, my bad. I was hopeful about the deterministic version though :)Joe
There isn't any special case for rabin miller.Malinowski
You misread the article. It is 31, not 37. This is why your implementation fails.Talithatalk
L
4

If you have control over N, the very fastest way to list all primes is to precompute them. Seriously. Precomputing is a way overlooked optimization.

Letterperfect answered 21/1, 2010 at 0:21 Comment(1)
Or download them from here primes.utm.edu/lists/small/millions, but the idea is to test python's limit and see if beautiful code emerge from optimization.Georgiageorgian
R
4

Here's the code I normally use to generate primes in Python:

$ python -mtimeit -s'import sieve' 'sieve.sieve(1000000)' 
10 loops, best of 3: 445 msec per loop
$ cat sieve.py
from math import sqrt

def sieve(size):
 prime=[True]*size
 rng=xrange
 limit=int(sqrt(size))

 for i in rng(3,limit+1,+2):
  if prime[i]:
   prime[i*i::+i]=[False]*len(prime[i*i::+i])

 return [2]+[i for i in rng(3,size,+2) if prime[i]]

if __name__=='__main__':
 print sieve(100)

It can't compete with the faster solutions posted here, but at least it is pure python.

Thanks for posting this question. I really learnt a lot today.

Recreant answered 21/1, 2010 at 16:9 Comment(0)
M
4

A slightly different implementation of a half sieve using Numpy:

http://rebrained.com/?p=458

import math
import numpy
def prime6(upto):
    primes=numpy.arange(3,upto+1,2)
    isprime=numpy.ones((upto-1)/2,dtype=bool)
    for factor in primes[:int(math.sqrt(upto))]:
        if isprime[(factor-2)/2]: isprime[(factor*3-2)/2:(upto-1)/2:factor]=0
    return numpy.insert(primes[isprime],0,2)

Can someone compare this with the other timings? On my machine it seems pretty comparable to the other Numpy half-sieve.

Mopboard answered 3/9, 2010 at 18:37 Comment(1)
upto=10**6: primesfrom2to() - 7 ms; prime6() - 12 ms ideone.com/oDg2YIngot
O
4

It's all written and tested. So there is no need to reinvent the wheel.

python -m timeit -r10 -s"from sympy import sieve" "primes = list(sieve.primerange(1, 10**6))"

gives us a record breaking 12.2 msec!

10 loops, best of 10: 12.2 msec per loop

If this is not fast enough, you can try PyPy:

pypy -m timeit -r10 -s"from sympy import sieve" "primes = list(sieve.primerange(1, 10**6))"

which results in:

10 loops, best of 10: 2.03 msec per loop

The answer with 247 up-votes lists 15.9 ms for the best solution. Compare this!!!

Orphanage answered 5/10, 2016 at 16:48 Comment(0)
E
4

Fastest prime sieve in Pure Python:

from itertools import compress

def half_sieve(n):
    """
    Returns a list of prime numbers less than `n`.
    """
    if n <= 2:
        return []
    sieve = bytearray([True]) * (n // 2)
    for i in range(3, int(n ** 0.5) + 1, 2):
        if sieve[i // 2]:
            sieve[i * i // 2::i] = bytearray((n - i * i - 1) // (2 * i) + 1)
    primes = list(compress(range(1, n, 2), sieve))
    primes[0] = 2
    return primes

I optimised Sieve of Eratosthenes for speed and memory.

Benchmark

from time import clock
import platform

def benchmark(iterations, limit):
    start = clock()
    for x in range(iterations):
        half_sieve(limit)
    end = clock() - start
    print(f'{end/iterations:.4f} seconds for primes < {limit}')

if __name__ == '__main__':
    print(platform.python_version())
    print(platform.platform())
    print(platform.processor())
    it = 10
    for pw in range(4, 9):
        benchmark(it, 10**pw)

Output

>>> 3.6.7
>>> Windows-10-10.0.17763-SP0
>>> Intel64 Family 6 Model 78 Stepping 3, GenuineIntel
>>> 0.0003 seconds for primes < 10000
>>> 0.0021 seconds for primes < 100000
>>> 0.0204 seconds for primes < 1000000
>>> 0.2389 seconds for primes < 10000000
>>> 2.6702 seconds for primes < 100000000
Earlineearls answered 24/5, 2019 at 1:51 Comment(0)
M
3

For the fastest code, the numpy solution is the best. For purely academic reasons, though, I'm posting my pure python version, which is a bit less than 50% faster than the cookbook version posted above. Since I make the entire list in memory, you need enough space to hold everything, but it seems to scale fairly well.

def daniel_sieve_2(maxNumber):
    """
    Given a number, returns all numbers less than or equal to
    that number which are prime.
    """
    allNumbers = range(3, maxNumber+1, 2)
    for mIndex, number in enumerate(xrange(3, maxNumber+1, 2)):
        if allNumbers[mIndex] == 0:
            continue
        # now set all multiples to 0
        for index in xrange(mIndex+number, (maxNumber-3)/2+1, number):
            allNumbers[index] = 0
    return [2] + filter(lambda n: n!=0, allNumbers)

And the results:

>>>mine = timeit.Timer("daniel_sieve_2(1000000)",
...                    "from sieves import daniel_sieve_2")
>>>prev = timeit.Timer("get_primes_erat(1000000)",
...                    "from sieves import get_primes_erat")
>>>print "Mine: {0:0.4f} ms".format(min(mine.repeat(3, 1))*1000)
Mine: 428.9446 ms
>>>print "Previous Best {0:0.4f} ms".format(min(prev.repeat(3, 1))*1000)
Previous Best 621.3581 ms
Maite answered 15/1, 2010 at 7:41 Comment(0)
S
3

I know the competition is closed for some years. …

Nonetheless this is my suggestion for a pure python prime sieve, based on omitting the multiples of 2, 3 and 5 by using appropriate steps while processing the sieve forward. Nonetheless it is actually slower for N<10^9 than @Robert William Hanks superior solutions rwh_primes2 and rwh_primes1. By using a ctypes.c_ushort sieve array above 1.5* 10^8 it is somehow adaptive to memory limits.

10^6

$ python -mtimeit -s"import primeSieveSpeedComp" "primeSieveSpeedComp.primeSieveSeq(1000000)" 10 loops, best of 3: 46.7 msec per loop

to compare:$ python -mtimeit -s"import primeSieveSpeedComp" "primeSieveSpeedComp.rwh_primes1(1000000)" 10 loops, best of 3: 43.2 msec per loop to compare: $ python -m timeit -s"import primeSieveSpeedComp" "primeSieveSpeedComp.rwh_primes2(1000000)" 10 loops, best of 3: 34.5 msec per loop

10^7

$ python -mtimeit -s"import primeSieveSpeedComp" "primeSieveSpeedComp.primeSieveSeq(10000000)" 10 loops, best of 3: 530 msec per loop

to compare:$ python -mtimeit -s"import primeSieveSpeedComp" "primeSieveSpeedComp.rwh_primes1(10000000)" 10 loops, best of 3: 494 msec per loop to compare: $ python -m timeit -s"import primeSieveSpeedComp" "primeSieveSpeedComp.rwh_primes2(10000000)" 10 loops, best of 3: 375 msec per loop

10^8

$ python -mtimeit -s"import primeSieveSpeedComp" "primeSieveSpeedComp.primeSieveSeq(100000000)" 10 loops, best of 3: 5.55 sec per loop

to compare: $ python -mtimeit -s"import primeSieveSpeedComp" "primeSieveSpeedComp.rwh_primes1(100000000)" 10 loops, best of 3: 5.33 sec per loop to compare: $ python -m timeit -s"import primeSieveSpeedComp" "primeSieveSpeedComp.rwh_primes2(100000000)" 10 loops, best of 3: 3.95 sec per loop

10^9

$ python -mtimeit -s"import primeSieveSpeedComp" "primeSieveSpeedComp.primeSieveSeq(1000000000)" 10 loops, best of 3: 61.2 sec per loop

to compare: $ python -mtimeit -n 3 -s"import primeSieveSpeedComp" "primeSieveSpeedComp.rwh_primes1(1000000000)" 3 loops, best of 3: 97.8 sec per loop

to compare: $ python -m timeit -s"import primeSieveSpeedComp" "primeSieveSpeedComp.rwh_primes2(1000000000)" 10 loops, best of 3: 41.9 sec per loop

You may copy the code below into ubuntus primeSieveSpeedComp to review this tests.

def primeSieveSeq(MAX_Int):
    if MAX_Int > 5*10**8:
        import ctypes
        int16Array = ctypes.c_ushort * (MAX_Int >> 1)
        sieve = int16Array()
        #print 'uses ctypes "unsigned short int Array"'
    else:
        sieve = (MAX_Int >> 1) * [False]
        #print 'uses python list() of long long int'
    if MAX_Int < 10**8:
        sieve[4::3] = [True]*((MAX_Int - 8)/6+1)
        sieve[12::5] = [True]*((MAX_Int - 24)/10+1)
    r = [2, 3, 5]
    n = 0
    for i in xrange(int(MAX_Int**0.5)/30+1):
        n += 3
        if not sieve[n]:
            n2 = (n << 1) + 1
            r.append(n2)
            n2q = (n2**2) >> 1
            sieve[n2q::n2] = [True]*(((MAX_Int >> 1) - n2q - 1) / n2 + 1)
        n += 2
        if not sieve[n]:
            n2 = (n << 1) + 1
            r.append(n2)
            n2q = (n2**2) >> 1
            sieve[n2q::n2] = [True]*(((MAX_Int >> 1) - n2q - 1) / n2 + 1)
        n += 1
        if not sieve[n]:
            n2 = (n << 1) + 1
            r.append(n2)
            n2q = (n2**2) >> 1
            sieve[n2q::n2] = [True]*(((MAX_Int >> 1) - n2q - 1) / n2 + 1)
        n += 2
        if not sieve[n]:
            n2 = (n << 1) + 1
            r.append(n2)
            n2q = (n2**2) >> 1
            sieve[n2q::n2] = [True]*(((MAX_Int >> 1) - n2q - 1) / n2 + 1)
        n += 1
        if not sieve[n]:
            n2 = (n << 1) + 1
            r.append(n2)
            n2q = (n2**2) >> 1
            sieve[n2q::n2] = [True]*(((MAX_Int >> 1) - n2q - 1) / n2 + 1)
        n += 2
        if not sieve[n]:
            n2 = (n << 1) + 1
            r.append(n2)
            n2q = (n2**2) >> 1
            sieve[n2q::n2] = [True]*(((MAX_Int >> 1) - n2q - 1) / n2 + 1)
        n += 3
        if not sieve[n]:
            n2 = (n << 1) + 1
            r.append(n2)
            n2q = (n2**2) >> 1
            sieve[n2q::n2] = [True]*(((MAX_Int >> 1) - n2q - 1) / n2 + 1)
        n += 1
        if not sieve[n]:
            n2 = (n << 1) + 1
            r.append(n2)
            n2q = (n2**2) >> 1
            sieve[n2q::n2] = [True]*(((MAX_Int >> 1) - n2q - 1) / n2 + 1)
    if MAX_Int < 10**8:
        return [2, 3, 5]+[(p << 1) + 1 for p in [n for n in xrange(3, MAX_Int >> 1) if not sieve[n]]]
    n = n >> 1
    try:
        for i in xrange((MAX_Int-2*n)/30 + 1):
            n += 3
            if not sieve[n]:
                r.append((n << 1) + 1)
            n += 2
            if not sieve[n]:
                r.append((n << 1) + 1)
            n += 1
            if not sieve[n]:
                r.append((n << 1) + 1)
            n += 2
            if not sieve[n]:
                r.append((n << 1) + 1)
            n += 1
            if not sieve[n]:
                r.append((n << 1) + 1)
            n += 2
            if not sieve[n]:
                r.append((n << 1) + 1)
            n += 3
            if not sieve[n]:
                r.append((n << 1) + 1)
            n += 1
            if not sieve[n]:
                r.append((n << 1) + 1)
    except:
        pass
    return r
Sardella answered 22/9, 2013 at 13:35 Comment(2)
to visualize your test results, plot them on log-log scale, to see and compare the empirical orders of growth.Meingolda
@ Will thanks for the input, i'll have this in mind the next time i need such comparisonSardella
S
3

I tested some unutbu's functions, i computed it with hungred millions number

The winners are the functions that use numpy library,

Note: It would also interesting make a memory utilization test :)

Computation time result

Sample code

Complete code on my github repository

#!/usr/bin/env python

import lib
import timeit
import sys
import math
import datetime

import prettyplotlib as ppl
import numpy as np

import matplotlib.pyplot as plt
from prettyplotlib import brewer2mpl

primenumbers_gen = [
    'sieveOfEratosthenes',
    'ambi_sieve',
    'ambi_sieve_plain',
    'sundaram3',
    'sieve_wheel_30',
    'primesfrom3to',
    'primesfrom2to',
    'rwh_primes',
    'rwh_primes1',
    'rwh_primes2',
]

def human_format(num):
    # https://mcmap.net/q/87771/-formatting-long-numbers-as-strings?answertab=active#tab-top
    magnitude = 0
    while abs(num) >= 1000:
        magnitude += 1
        num /= 1000.0
    # add more suffixes if you need them
    return '%.2f%s' % (num, ['', 'K', 'M', 'G', 'T', 'P'][magnitude])


if __name__=='__main__':

    # Vars
    n = 10000000 # number itereration generator
    nbcol = 5 # For decompose prime number generator
    nb_benchloop = 3 # Eliminate false positive value during the test (bench average time)
    datetimeformat = '%Y-%m-%d %H:%M:%S.%f'
    config = 'from __main__ import n; import lib'
    primenumbers_gen = {
        'sieveOfEratosthenes': {'color': 'b'},
        'ambi_sieve': {'color': 'b'},
        'ambi_sieve_plain': {'color': 'b'},
         'sundaram3': {'color': 'b'},
        'sieve_wheel_30': {'color': 'b'},
# # #        'primesfrom2to': {'color': 'b'},
        'primesfrom3to': {'color': 'b'},
        # 'rwh_primes': {'color': 'b'},
        # 'rwh_primes1': {'color': 'b'},
        'rwh_primes2': {'color': 'b'},
    }


    # Get n in command line
    if len(sys.argv)>1:
        n = int(sys.argv[1])

    step = int(math.ceil(n / float(nbcol)))
    nbs = np.array([i * step for i in range(1, int(nbcol) + 1)])
    set2 = brewer2mpl.get_map('Paired', 'qualitative', 12).mpl_colors

    print datetime.datetime.now().strftime(datetimeformat)
    print("Compute prime number to %(n)s" % locals())
    print("")

    results = dict()
    for pgen in primenumbers_gen:
        results[pgen] = dict()
        benchtimes = list()
        for n in nbs:
            t = timeit.Timer("lib.%(pgen)s(n)" % locals(), setup=config)
            execute_times = t.repeat(repeat=nb_benchloop,number=1)
            benchtime = np.mean(execute_times)
            benchtimes.append(benchtime)
        results[pgen] = {'benchtimes':np.array(benchtimes)}

fig, ax = plt.subplots(1)
plt.ylabel('Computation time (in second)')
plt.xlabel('Numbers computed')
i = 0
for pgen in primenumbers_gen:

    bench = results[pgen]['benchtimes']
    avgs = np.divide(bench,nbs)
    avg = np.average(bench, weights=nbs)

    # Compute linear regression
    A = np.vstack([nbs, np.ones(len(nbs))]).T
    a, b = np.linalg.lstsq(A, nbs*avgs)[0]

    # Plot
    i += 1
    #label="%(pgen)s" % locals()
    #ppl.plot(nbs, nbs*avgs, label=label, lw=1, linestyle='--', color=set2[i % 12])
    label="%(pgen)s avg" % locals()
    ppl.plot(nbs, a * nbs + b, label=label, lw=2, color=set2[i % 12])
print datetime.datetime.now().strftime(datetimeformat)

ppl.legend(ax, loc='upper left', ncol=4)

# Change x axis label
ax.get_xaxis().get_major_formatter().set_scientific(False)
fig.canvas.draw()
labels = [human_format(int(item.get_text())) for item in ax.get_xticklabels()]

ax.set_xticklabels(labels)
ax = plt.gca()

plt.show()
Ssm answered 18/10, 2016 at 7:5 Comment(1)
to compare algorithmic performances, it's better to plot at a log-log scale.Meingolda
B
3

For Python 3

def rwh_primes2(n):
    correction = (n%6>1)
    n = {0:n,1:n-1,2:n+4,3:n+3,4:n+2,5:n+1}[n%6]
    sieve = [True] * (n//3)
    sieve[0] = False
    for i in range(int(n**0.5)//3+1):
      if sieve[i]:
        k=3*i+1|1
        sieve[      ((k*k)//3)      ::2*k]=[False]*((n//6-(k*k)//6-1)//k+1)
        sieve[(k*k+4*k-2*k*(i&1))//3::2*k]=[False]*((n//6-(k*k+4*k-2*k*(i&1))//6-1)//k+1)
    return [2,3] + [3*i+1|1 for i in range(1,n//3-correction) if sieve[i]]
Biotic answered 19/7, 2017 at 12:3 Comment(0)
T
3

The simplest way I've found of doing this is:

primes = []
for n in range(low, high + 1):
    if all(n % i for i in primes):
        primes.append(n)
Thayer answered 28/9, 2019 at 4:9 Comment(0)
I
3

I may have been slow to join the party, but I make up for it with the fastest code (at least it is on my machine, as of writing). This approach uses both numpy and bitarray, and is inspired by primesfrom2to from this answer.

import numpy as np
from bitarray import bitarray


def bit_primes(n):
    bit_sieve = bitarray(n // 3 + (n % 6 == 2))
    bit_sieve.setall(1)
    bit_sieve[0] = False

    for i in range(int(n ** 0.5) // 3 + 1):
        if bit_sieve[i]:
            k = 3 * i + 1 | 1
            bit_sieve[k * k // 3::2 * k] = False
            bit_sieve[(k * k + 4 * k - 2 * k * (i & 1)) // 3::2 * k] = False

    np_sieve = np.unpackbits(np.frombuffer(bit_sieve.tobytes(), dtype=np.uint8)).view(bool)
    return np.concatenate(((2, 3), ((3 * np.flatnonzero(np_sieve) + 1) | 1)))

Here's a comparison with primesfrom2to, which was previously found to be the fastest solution in unutbu's comparison:

python3 -m timeit -s "import fast_primes" "fast_primes.bit_primes(1000000)"
200 loops, best of 5: 1.19 msec per loop

python3 -m timeit -s "import fast_primes" "fast_primes.primesfrom2to(1000000)"
200 loops, best of 5: 1.23 msec per loop

For finding primes under 1 million, bit_primes was slightly faster. For larger values of n, the difference can be more significant. In some cases, bit_primes was over twice as fast:

python3 -m timeit -s "import fast_primes" "fast_primes.bit_primes(500_000_000)"
1 loop, best of 5: 540 msec per loop

python3 -m timeit -s "import fast_primes" "fast_primes.primesfrom2to(500_000_000)"
1 loop, best of 5: 1.15 sec per loop

For reference, here's the minimally modified (to work in Python 3) version of primesfrom2to I compared with:

def primesfrom2to(n):
    # https://mcmap.net/q/86560/-fastest-way-to-list-all-primes-below-n/3035188#3035188
    """ Input n>=6, Returns a array of primes, 2 <= p < n"""
    sieve = np.ones(n // 3 + (n % 6 == 2), dtype=np.bool)
    sieve[0] = False
    for i in range(int(n ** 0.5) // 3 + 1):
        if sieve[i]:
            k = 3 * i + 1 | 1
            sieve[((k * k) // 3)::2 * k] = False
            sieve[(k * k + 4 * k - 2 * k * (i & 1)) // 3::2 * k] = False
    return np.r_[2, 3, ((3 * np.nonzero(sieve)[0] + 1) | 1)]
Intentional answered 8/7, 2021 at 3:4 Comment(0)
H
3

I'm surprised nobody mentioned numba yet.

This version gets to the 1M mark in 2.47 ms ± 36.5 µs.

Years ago, pseudo-code for a version of Atkin's sieve was given on the Wikipedia page Prime number. This isn't there anymore, and a reference to the Sieve of Atkin seems to be a different algorithm. A 2007/03/01 version of the Wikipedia page, Primer number as of 2007-03-01, shows the pseudo-code I used as reference.

import numpy as np
from numba import njit

@njit
def nb_primes(n):
    # Generates prime numbers 2 <= p <= n
    # Atkin's sieve -- see https://en.wikipedia.org/w/index.php?title=Prime_number&oldid=111775466
    sqrt_n = int(np.sqrt(n)) + 1

    # initialize the sieve
    s = np.full(n + 1, -1, dtype=np.int8)
    s[2] = 1
    s[3] = 1

    # put in candidate primes:
    # integers which have an odd number of
    # representations by certain quadratic forms
    for x in range(1, sqrt_n):
        x2 = x * x
        for y in range(1, sqrt_n):
            y2 = y * y
            k = 4 * x2 + y2
            if k <= n and (k % 12 == 1 or k % 12 == 5): s[k] *= -1
            k = 3 * x2 + y2
            if k <= n and (k % 12 == 7): s[k] *= -1
            k = 3 * x2 - y2
            if k <= n and x > y and k % 12 == 11: s[k] *= -1

    # eliminate composites by sieving
    for k in range(5, sqrt_n):
        if s[k]:
            k2 = k*k
            # k is prime, omit multiples of its square; this is sufficient because
            # composites which managed to get on the list cannot be square-free
            for i in range(1, n // k2 + 1):
                j = i * k2 # j ∈ {k², 2k², 3k², ..., n}
                s[j] = -1
    return np.nonzero(s>0)[0]

# initial run for "compilation" 
nb_primes(10)

Timing

In[10]:
%timeit nb_primes(1_000_000)

Out[10]:
2.47 ms ± 36.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In[11]:
%timeit nb_primes(10_000_000)

Out[11]:
33.4 ms ± 373 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

In[12]:
%timeit nb_primes(100_000_000)

Out[12]:
828 ms ± 5.64 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Hallah answered 21/1, 2022 at 16:4 Comment(0)
L
2

First time using python, so some of the methods I use in this might seem a bit cumbersome. I just straight converted my c++ code to python and this is what I have (albeit a tad bit slowww in python)

#!/usr/bin/env python
import time

def GetPrimes(n):

    Sieve = [1 for x in xrange(n)]

    Done = False
    w = 3

    while not Done:

        for q in xrange (3, n, 2):
            Prod = w*q
            if Prod < n:
                Sieve[Prod] = 0
            else:
                break

        if w > (n/2):
            Done = True
        w += 2

    return Sieve



start = time.clock()

d = 10000000
Primes = GetPrimes(d)

count = 1 #This is for 2

for x in xrange (3, d, 2):
    if Primes[x]:
        count+=1

elapsed = (time.clock() - start)
print "\nFound", count, "primes in", elapsed, "seconds!\n"

pythonw Primes.py

Found 664579 primes in 12.799119 seconds!

#!/usr/bin/env python
import time

def GetPrimes2(n):

    Sieve = [1 for x in xrange(n)]

    for q in xrange (3, n, 2):
        k = q
        for y in xrange(k*3, n, k*2):
            Sieve[y] = 0

    return Sieve



start = time.clock()

d = 10000000
Primes = GetPrimes2(d)

count = 1 #This is for 2

for x in xrange (3, d, 2):
    if Primes[x]:
        count+=1

elapsed = (time.clock() - start)
print "\nFound", count, "primes in", elapsed, "seconds!\n"

pythonw Primes2.py

Found 664579 primes in 10.230172 seconds!

#!/usr/bin/env python
import time

def GetPrimes3(n):

    Sieve = [1 for x in xrange(n)]

    for q in xrange (3, n, 2):
        k = q
        for y in xrange(k*k, n, k << 1):
            Sieve[y] = 0

    return Sieve



start = time.clock()

d = 10000000
Primes = GetPrimes3(d)

count = 1 #This is for 2

for x in xrange (3, d, 2):
    if Primes[x]:
        count+=1

elapsed = (time.clock() - start)
print "\nFound", count, "primes in", elapsed, "seconds!\n"

python Primes2.py

Found 664579 primes in 7.113776 seconds!

Luminary answered 20/2, 2013 at 4:6 Comment(0)
P
1

My guess is that the fastest of all ways is to hard code the primes in your code.

So why not just write a slow script that generates another source file that has all numbers hardwired in it, and then import that source file when you run your actual program.

Of course, this works only if you know the upper bound of N at compile time, but thus is the case for (almost) all project Euler problems.

 

PS: I might be wrong though iff parsing the source with hard-wired primes is slower than computing them in the first place, but as far I know Python runs from compiled .pyc files so reading a binary array with all primes up to N should be bloody fast in that case.

Pierson answered 25/1, 2010 at 1:21 Comment(0)
S
1

Sorry to bother but erat2() has a serious flaw in the algorithm.

While searching for the next composite, we need to test odd numbers only. q,p both are odd; then q+p is even and doesn't need to be tested, but q+2*p is always odd. This eliminates the "if even" test in the while loop condition and saves about 30% of the runtime.

While we're at it: instead of the elegant 'D.pop(q,None)' get and delete method use 'if q in D: p=D[q],del D[q]' which is twice as fast! At least on my machine (P3-1Ghz). So I suggest this implementation of this clever algorithm:

def erat3( ):
    from itertools import islice, count

    # q is the running integer that's checked for primeness.
    # yield 2 and no other even number thereafter
    yield 2
    D = {}
    # no need to mark D[4] as we will test odd numbers only
    for q in islice(count(3),0,None,2):
        if q in D:                  #  is composite
            p = D[q]
            del D[q]
            # q is composite. p=D[q] is the first prime that
            # divides it. Since we've reached q, we no longer
            # need it in the map, but we'll mark the next
            # multiple of its witnesses to prepare for larger
            # numbers.
            x = q + p+p        # next odd(!) multiple
            while x in D:      # skip composites
                x += p+p
            D[x] = p
        else:                  # is prime
            # q is a new prime.
            # Yield it and mark its first multiple that isn't
            # already marked in previous iterations.
            D[q*q] = q
            yield q
Socialminded answered 13/8, 2010 at 12:4 Comment(1)
for a postponed addition of primes into the dict (until the square of a prime is seen in the input) see https://mcmap.net/q/57414/-how-to-implement-an-efficient-infinite-generator-of-prime-numbers-in-python .Meingolda
L
1

I may be late to the party but will have to add my own code for this. It uses approximately n/2 in space because we don't need to store even numbers and I also make use of the bitarray python module, further draStically cutting down on memory consumption and enabling computing all primes up to 1,000,000,000

from bitarray import bitarray
def primes_to(n):
    size = n//2
    sieve = bitarray(size)
    sieve.setall(1)
    limit = int(n**0.5)
    for i in range(1,limit):
        if sieve[i]:
            val = 2*i+1
            sieve[(i+i*val)::val] = 0
    return [2] + [2*i+1 for i, v in enumerate(sieve) if v and i > 0]

python -m timeit -n10 -s "import euler" "euler.primes_to(1000000000)"
10 loops, best of 3: 46.5 sec per loop

This was run on a 64bit 2.4GHZ MAC OSX 10.8.3

Limiter answered 14/4, 2013 at 20:37 Comment(3)
posting one timing for an unknown machine says nothing. The accepted answer here says "without psyco, for n=1000000, rwh_primes2 was the fastest". So if you'd provide your timings for that code as well as yours, on the same machine, and at 2, 4, 10 mln as well, then it'd be much more informative.Meingolda
-1, This code depends on special features of the bitarray implemented in C, which is why the code is fast as most of the work is being done in native code in the slice assignment. The bitarray package BREAKS the standard definition for proper slices (indexed over a range) for mutable sequences in that it allows assigning a single boolean 0/1 or True/False to all elements of the slice, whereas the standard behavior for pure Python seems to be to not allow this and only allow the assignment value of 0 in which case it is treated as a del of all of the slice elements from the sequence/array.Refreshment
cont'd: If calling non-standard native code were to be compared, we may as well write a "fastprimes" sequence generator package based on C code such as that of Kim Walisch's primesieve and generate all the primes in the four billion plus 32-bit number range in just a few seconds with a single call to the sequence generator. This would also use almost no memory as the linked code is based on a segmented Sieve of Eratosthenes and thus only uses a few ten's of Kilobytes of RAM, and if a sequence were generated there would be no list storage required.Refreshment
K
1

Here is an interesting technique to generate prime numbers (yet not the most efficient) using python's list comprehensions:

noprimes = [j for i in range(2, 8) for j in range(i*2, 50, i)]
primes = [x for x in range(2, 50) if x not in noprimes]
Korwun answered 6/2, 2018 at 14:10 Comment(2)
Link to example is dead.Raised
Thanks, I've removed the link.Korwun
D
1

Here is a numpy version of Sieve of Eratosthenes having both good complexity (lower than sorting an array of length n) and vectorization. Compared to @unutbu times this just as fast as the packages with 46 microsecons to find all primes below a million.

import numpy as np 
def generate_primes(n):
    is_prime = np.ones(n+1,dtype=bool)
    is_prime[0:2] = False
    for i in range(int(n**0.5)+1):
        if is_prime[i]:
            is_prime[i**2::i]=False
    return np.where(is_prime)[0]

Timings:

import time    
for i in range(2,10):
    timer =time.time()
    generate_primes(10**i)
    print('n = 10^',i,' time =', round(time.time()-timer,6))

>> n = 10^ 2  time = 5.6e-05
>> n = 10^ 3  time = 6.4e-05
>> n = 10^ 4  time = 0.000114
>> n = 10^ 5  time = 0.000593
>> n = 10^ 6  time = 0.00467
>> n = 10^ 7  time = 0.177758
>> n = 10^ 8  time = 1.701312
>> n = 10^ 9  time = 19.322478
Dewayne answered 10/5, 2019 at 12:4 Comment(0)
N
1

Starting from this 2021 answer, I haven't found bitarray approach to be beneficial for primes below 1 billion.

But I was able to speedup primesfrom2to almost x2 with several tricks:

  • use numexpr library to convert numpy expressions to tight loops with less allocations
  • replace np.ones with a faster alternative
  • manipulate first 9 elements of a sieve in a way, so there is no need to change array shape later

All in all, time for primes <1 billion went from 25s to 14.5s on my machine

import numexpr as ne
import numpy as np

def primesfrom2to_numexpr(n):
    # https://mcmap.net/q/86560/-fastest-way-to-list-all-primes-below-n/3035188#3035188
    """ Input n>=24, Returns a array of primes, 2 <= p < n + a few over"""
    sieve = np.zeros((n // 3 + (n % 6 == 2))//4+1, dtype=np.int32)
    ne.evaluate('sieve + 0x01010101', out=sieve)
    sieve = sieve.view('int8')
    #sieve = np.ones(n // 3 + (n % 6 == 2), dtype=np.bool_)
    sieve[0] = 0
    for i in np.arange(int(n ** 0.5) // 3 + 1):
        if sieve[i]:
            k = 3 * i + 1 | 1
            sieve[((k * k) // 3)::2 * k] = 0
            sieve[(k * k + 4 * k - 2 * k * (i & 1)) // 3::2 * k] = 0
    sieve[[0,8]] = 1
    result = np.flatnonzero(sieve)
    ne.evaluate('result * 3 + 1 + result%2', out=result)
    result[:9] = [2,3,5,7,11,13,17,19,23]
    return result
Narcho answered 19/5, 2022 at 20:6 Comment(0)
M
0

The fastest method I've tried so far is based on the Python cookbook erat2 function:

import itertools as it
def erat2a( ):
    D = {  }
    yield 2
    for q in it.islice(it.count(3), 0, None, 2):
        p = D.pop(q, None)
        if p is None:
            D[q*q] = q
            yield q
        else:
            x = q + 2*p
            while x in D:
                x += 2*p
            D[x] = p

See this answer for an explanation of the speeding-up.

Mellophone answered 26/9, 2010 at 3:7 Comment(0)
I
0

I collected several prime number sieves over time. The fastest on my computer is this:

from time import time
# 175 ms for all the primes up to the value 10**6
def primes_sieve(limit):
    a = [True] * limit
    a[0] = a[1] = False
    #a[2] = True
    for n in xrange(4, limit, 2):
        a[n] = False
    root_limit = int(limit**.5)+1
    for i in xrange(3,root_limit):
        if a[i]:
            for n in xrange(i*i, limit, 2*i):
                a[n] = False
    return a

LIMIT = 10**6
s=time()
primes = primes_sieve(LIMIT)
print time()-s
Insistency answered 2/11, 2014 at 21:39 Comment(0)
H
0

I'm slow responding to this question but it seemed like a fun exercise. I'm using numpy which might be cheating and I doubt this method is the fastest but it should be clear. It sieves a Boolean array referring to its indices only and elicits prime numbers from the indices of all True values. No modulo needed.

import numpy as np
def ajs_primes3a(upto):
    mat = np.ones((upto), dtype=bool)
    mat[0] = False
    mat[1] = False
    mat[4::2] = False
    for idx in range(3, int(upto ** 0.5)+1, 2):
        mat[idx*2::idx] = False
    return np.where(mat == True)[0]
Homoeo answered 14/2, 2015 at 15:25 Comment(6)
it is incorrect e.g., ajs_primes3a(10) -> array([2, 3, 5, 7, 9]). 9 is not a primeIngot
You spotted an edge case I hadn't – well done! The problem was in 'for idx in range(3, int(upto ** 0.5), 2):' which should be 'for idx in range(3, int(upto ** 0.5) + 1, 2):'. Thanks but it works now.Homoeo
The reason was that the idx loop went up to 'upto ** 05' which for cases up to and including 15. From 16 onwards, it works fine. This was a set of edge cases I hadn't tested for. Adding 1 means it should work for all numbers.Homoeo
It seems to work now. It is the slowest among numpy-based solutions that return an array. Note: no true Sieve of Eratosthenes implementation uses modulo -- no need to mention it. You could use mat[idx*idx::idx] instead of mat[idx*2::idx]. And np.nonzero(mat)[0] instead of np.where(mat == True)[0].Ingot
Thanks JF. I tested against prime6() and got a result faster up to (IIRC) about 250k when prime6() took over. primesfrom2to() was faster. At up to 20m, ajs_primes3a() took 0.034744977951ms, prime6() took 0.0222899913788ms and primesfrom2to() took 0.0104751586914ms (same machine, same load, best of 10 timings). It's honestly better than I thought it would be!Homoeo
It would be faster if mat[idx*2::idx] = False was checked if needed. if mat[idx]: mat[idx*2::idx] = FalseHydrocephalus
S
0

I found a pure Python 2 prime generator here , in a comment by Willy Good, that is faster than rwh2_primes.

def primes235(limit):
yield 2; yield 3; yield 5
if limit < 7: return
modPrms = [7,11,13,17,19,23,29,31]
gaps = [4,2,4,2,4,6,2,6,4,2,4,2,4,6,2,6] # 2 loops for overflow
ndxs = [0,0,0,0,1,1,2,2,2,2,3,3,4,4,4,4,5,5,5,5,5,5,6,6,7,7,7,7,7,7]
lmtbf = (limit + 23) // 30 * 8 - 1 # integral number of wheels rounded up
lmtsqrt = (int(limit ** 0.5) - 7)
lmtsqrt = lmtsqrt // 30 * 8 + ndxs[lmtsqrt % 30] # round down on the wheel
buf = [True] * (lmtbf + 1)
for i in xrange(lmtsqrt + 1):
    if buf[i]:
        ci = i & 7; p = 30 * (i >> 3) + modPrms[ci]
        s = p * p - 7; p8 = p << 3
        for j in range(8):
            c = s // 30 * 8 + ndxs[s % 30]
            buf[c::p8] = [False] * ((lmtbf - c) // p8 + 1)
            s += p * gaps[ci]; ci += 1
for i in xrange(lmtbf - 6 + (ndxs[(limit - 7) % 30])): # adjust for extras
    if buf[i]: yield (30 * (i >> 3) + modPrms[i & 7])

My results:

$ time ./prime_rwh2.py 1e8
5761455 primes found < 1e8

real    0m3.201s
user    0m2.609s
sys     0m0.578s
$ time ./prime_wheel.py 1e8
5761455 primes found < 1e8

real    0m2.710s
user    0m2.469s
sys     0m0.219s

...on my recent midrange laptop (i5 8265U 1.6GHz) running Ubuntu on Win 10.

This is a mod 30 wheel sieve which skips multiples of 2, 3, and 5. It works great for me up to about 2.5e9, when my laptop starts running out of 8G RAM and swapping a lot.

I like mod 30 since it has only 8 remainders that aren't multiples of 2, 3, or 5. That enables using shifts and "&" for multiplication, division and mod, and should allow packing results for one mod 30 wheel into a byte. I have morphed Willy's code into a segmented mod 30 wheel sieve to eliminate the thrashing for big N and posted it here.

There is an even faster Javascript version which is segmented and uses a mod 210 wheel (no multiples of 2, 3, 5, or 7) by @GordonBGood with an in depth explanation that is useful to me.

Sd answered 10/9, 2019 at 22:42 Comment(0)
S
0

This is a variation of the solution in the question that should be faster than what's in the question. It uses a static sieve of Eratosthenes with no other optimizations.

from typing import List

def list_primes(limit: int) -> List[int]:
    primes = set(range(2, limit + 1))
    for i in range(2, limit + 1):
        if i in primes:
            primes.difference_update(set(list(range(i, limit + 1, i))[1:]))
    return sorted(primes)

>>> list_primes(100)
[2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97]
Shick answered 15/8, 2020 at 16:35 Comment(0)
B
0

You have a faster code and also the simplest code generating primes. but for the higher number, it's not working for n=10000, 10000000 it failed maybe it's .pop() method

Consider: is N prime?

  • case 1:

    You got some factors of N,

    for i in range(2, N):
    

    If N is prime loop is performed for ~(N-2) times. else less number of times

  • case 2:

    for i in range(2, int(math.sqrt(N)): 
    

    Loop is performed for almost ~(sqrt(N)-2) times if N is prime else will break somewhere

  • case 3:

    Better We Divide N With Only number of primes<=sqrt(N)

    Where loop is performed for only π(sqrt(N)) times

    π(sqrt(N)) << sqrt(N) as N increases

    from math import sqrt
    from time import *
    prime_list = [2]
    n = int(input())
    s = time()
    for n0 in range(2,n+1):
        for i0 in prime_list:
            if n0%i0==0:
                break
            elif i0>=int(sqrt(n0)):
                prime_list.append(n0)
                break
    e = time()
    print(e-s)
    #print(prime_list); print(f'pi({n})={len(prime_list)}')
    print(f'{n}: {len(prime_list)}, time: {e-s}')
    
  • Output

    100: 25, time: 0.00010275840759277344
    1000: 168, time: 0.0008606910705566406
    10000: 1229, time: 0.015588521957397461
    100000: 9592, time: 0.023436546325683594
    1000000: 78498, time: 4.1965954303741455
    10000000: 664579, time: 109.24591708183289
    100000000: 5761455, time: 2289.130858898163
    

For less than 1000 seems slow but then for <10^6 I think it's faster.

Though, I couldn't understand the time complexity.

Bipetalous answered 19/9, 2021 at 14:29 Comment(0)
C
0
def find_primes(n):
    # Create a boolean list to track prime numbers
    primes = [True] * (n+1)
    primes[0] = primes[1] = False

    # Mark all multiples of 2 as composite (except 2 itself)
    for i in range(4, n+1, 2):
        primes[i] = False

    # Iterate over all odd numbers and mark their multiples as composite
    for i in range(3, int(n**0.5)+1, 2):
        if primes[i]:
            for j in range(i*i, n+1, 2*i):
                primes[j] = False

    # Return a list of prime numbers
    return [i for i in range(n+1) if primes[I]]
primes = find_primes(1000000)
print(primes)

The find_primes function takes an integer n as input and returns a list of all prime numbers from 1 to n. The code first creates a boolean list primes of length n+1 to track prime numbers. It then marks all multiples of 2 as composite (except 2 itself), since all other even numbers are not prime. Next, it iterates over all odd numbers from 3 to the square root of n and marks their multiples as composite. Finally, it returns a list of all prime numbers by filtering out the composite numbers from the boolean list.

Note that the Sieve of Eratosthenes algorithm has a time complexity of O(n log log n), which makes it much more efficient than testing each number for primality individually.

Clericals answered 12/4, 2023 at 16:54 Comment(0)
S
0

You can save a lot of space by first pre-filtering down all 8 scenarios of the mod 30 wheel collapsed down to this simple formula :

((x % 30) - 15)^4 % 30 == 16

or simply

( x % 30 - 15 )^4 % 30 == 16

Then each byte can represent exactly 1 spin of the wheel, which is already 73.3% space savings right off the bat compared to 1 bit for every natural number.

(I usually prefer starting the wheel at 11,13,...,31,37 instead of 1,7,...,23,29)

Ster answered 13/3 at 16:6 Comment(0)
S
-1

This is an elegant and simpler solution to find primes using a stored list. Starts with a 4 variables, you only have to test odd primes for divisors, and you only have to test up to a half of what number you are testing as a prime (no point in testing whether 9, 11, 13 divide into 17). It tests previously stored primes as divisors.`

    # Program to calculate Primes
 primes = [1,3,5,7]
for n in range(9,100000,2):
    for x in range(1,(len(primes)/2)):
        if n % primes[x] == 0:
            break
    else:
        primes.append(n)
print primes
Santinasantini answered 2/2, 2014 at 0:41 Comment(0)
N
-4

This is the way you can compare with others.

# You have to list primes upto n
nums = xrange(2, n)
for i in range(2, 10):
    nums = filter(lambda s: s==i or s%i, nums)
print nums

So simple...

Nympholepsy answered 13/2, 2015 at 16:55 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.