A Fast Prime Number Sieve in Python
Asked Answered
S

3

8

I have been going through prime number generation in python using the sieve of Eratosthenes and the solutions which people tout as a relatively fast option such as those in a few of the answers to a question on optimising prime number generation in python are not straightforward and the simple implementation which I have here rivals them in efficiency. My implementation is given below

def sieve_for_primes_to(n):
    size = n//2
    sieve = [1]*size
    limit = int(n**0.5)
    for i in range(1,limit):
        if sieve[i]:
            val = 2*i+1
            tmp = ((size-1) - i)//val 
            sieve[i+val::val] = [0]*tmp
    return sieve


print [2] + [i*2+1 for i, v in enumerate(sieve_for_primes_to(10000000)) if v and i>0]

Timing the execution returns

python -m timeit -n10 -s "import euler" "euler.sieve_for_primes_to(1000000)"
10 loops, best of 3: 19.5 msec per loop

While the method described in the answer to the above linked question as being the fastest from the python cookbook is given below

import itertools
def erat2( ):
    D = {  }
    yield 2
    for q in itertools.islice(itertools.count(3), 0, None, 2):
        p = D.pop(q, None)
        if p is None:
            D[q*q] = q
            yield q
        else:
            x = p + q
            while x in D or not (x&1):
                x += p
            D[x] = p

def get_primes_erat(n):
  return list(itertools.takewhile(lambda p: p<n, erat2()))

When run it gives

python -m timeit -n10 -s "import euler" "euler.get_primes_erat(1000000)"
10 loops, best of 3: 697 msec per loop

My question is why do people tout the above from the cook book which is relatively complex as the ideal prime generator?

Schwaben answered 14/4, 2013 at 21:15 Comment(4)
Who and where is touting erat2 "as the ideal prime generator"? Please provide references so that we can better understand the context that has given rise to your question.Billingsgate
Did you compare yours against the rwh_primes2 algorithm?Irredeemable
erat2 was only compared to the OP's code on that page, and Alex Martelli only said that Cookbook solution is over twice as fast compared to OP's solution. And your solution is twice as slow compared to rwh_primes2.Anabelanabella
This looks like a minor variation on rwh_primes1.Abraham
A
12

I transformed your code to fit into the prime sieve comparison script of @unutbu at Fastest way to list all primes below N as follows:

def sieve_for_primes_to(n):
    size = n//2
    sieve = [1]*size
    limit = int(n**0.5)
    for i in range(1,limit):
        if sieve[i]:
            val = 2*i+1
            tmp = ((size-1) - i)//val 
            sieve[i+val::val] = [0]*tmp
    return [2] + [i*2+1 for i, v in enumerate(sieve) if v and i>0]

On my MBPro i7 the script is fast calculating all primes < 1000000 but actually 1.5 times slower than rwh_primes2, rwh_primes1 (1.2), rwh_primes (1.19) and primeSieveSeq (1.12) (@andreasbriese at the page end).

Add answered 25/9, 2013 at 6:18 Comment(0)
T
3

You should only use the "postponed" variant of that algorithm. Comparing your code test run up to 10 and 20 mln upper limit, as

...
print(len( [2] + [i*2+1 for i, v in 
  enumerate(sieve_for_primes_to(10000000)) if v and i>0]))

with the other one, run at the corresponding figures of 664579 and 1270607 primes to produce, as

...
print( list( islice( (p for p in postponed_sieve() ), n-1, n+1))) 

shows your code running "only" 3.1x...3.3x times faster. :) Not 36x times faster, as your timings show for some reason.

I don't think anyone ever claimed it's an "ideal" prime generator, just that it is a conceptually clean and clear one. All these prime generation functions are toys really, the real stuff is working with the very big numbers, using completely different algorithms anyway.

Here in the low range, what matters is the time complexity of the algorithm, which should be around ~ n^(1+a), a < 0.1...0.2 empirical orders of growth, which both of them seem to be indeed. Having a toy generator with ~ n^1.5 or even ~ n^2 orders of growth is just no fun to play with.

Terminal answered 16/4, 2013 at 0:37 Comment(0)
W
0

I agree this is not good for anything but playing with execution times. Real versions of this use multiprocessing and wheels. Try using bitarray. You can use nice slicing to set things.

from bitarray import bitarray
def make_sieve(size):
    """Create a sieve of Eratosthenes up to the given size."""
    sieve_time = time.time()
    #print(f"Creating new sieve of {size:,} primes.")
    limit = int(1 + size**0.5) + 2
    p = bitarray(size+2)  # One bit per value
    p.setall(True)
    p[0] = p[1] = False
    p[4::2] = False  # Clear multiples of 2
    p[9::3] = False  # Clear multiples of 3

    for i in range(5, limit, 6):  # Process only numbers of the form 6k-1 and 6k+1
        h = i + 2  # 6k+1
        if p[i]:  # If 6k-1 is prime
            p[i*i::2 * i] = False  # Clear multiples of 6k-1
        if p[h]:  # If 6k+1 is prime
            p[h*h::2 * h] = False  # Clear multiples of 6k+1
            
    #print(f"Sieve has a memory size of {sys.getsizeof(p):,} bytes")         
    #print(f"Make sieve for {len(p):,} took {str(timedelta(seconds=time.time() - sieve_time))}")
    return p

Run looks like:

$ python -m timeit -n10 -s "import prime_count as sv" "sv.make_sieve(10000000)"
10 loops, best of 5: 18.1 msec per loop
While answered 5/6 at 17:7 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.