Speed up bitstring/bit operations in Python?
Asked Answered
E

11

44

I wrote a prime number generator using Sieve of Eratosthenes and Python 3.1. The code runs correctly and gracefully at 0.32 seconds on ideone.com to generate prime numbers up to 1,000,000.

# from bitstring import BitString

def prime_numbers(limit=1000000):
    '''Prime number generator. Yields the series
    2, 3, 5, 7, 11, 13, 17, 19, 23, 29 ...    
    using Sieve of Eratosthenes.
    '''
    yield 2
    sub_limit = int(limit**0.5) 
    flags = [False, False] + [True] * (limit - 2)   
#     flags = BitString(limit)
    # Step through all the odd numbers
    for i in range(3, limit, 2):       
        if flags[i] is False:
#        if flags[i] is True:
            continue
        yield i
        # Exclude further multiples of the current prime number
        if i <= sub_limit:
            for j in range(i*3, limit, i<<1):
                flags[j] = False
#                flags[j] = True

The problem is, I run out of memory when I try to generate numbers up to 1,000,000,000.

    flags = [False, False] + [True] * (limit - 2)   
MemoryError

As you can imagine, allocating 1 billion boolean values (1 byte 4 or 8 bytes (see comment) each in Python) is really not feasible, so I looked into bitstring. I figured, using 1 bit for each flag would be much more memory-efficient. However, the program's performance dropped drastically - 24 seconds runtime, for prime number up to 1,000,000. This is probably due to the internal implementation of bitstring.

You can comment/uncomment the three lines to see what I changed to use BitString, as the code snippet above.

My question is, is there a way to speed up my program, with or without bitstring?

Edit: Please test the code yourself before posting. I can't accept answers that run slower than my existing code, naturally.

Edit again:

I've compiled a list of benchmarks on my machine.

Effieeffigy answered 24/5, 2010 at 13:31 Comment(16)
Nitpick: you're looking at 4 or 8 bytes per boolean (depending on whether you're on a 32-bit or 64-bit system), rather than 1: internally, the list flags is just a C array of (PyObject *) pointers.Lapp
You could use numpy in Python 2.x rosettacode.org/wiki/Sieve_of_Eratosthenes#Using_numpy It is much faster (~20 times).Speos
It takes ~28 seconds to generate upto limit 1e9 using primes_upto2(), primes_upto3() from the above link. C++ version ~14 seconds, C version ~13 seconds.Speos
@J.F. Sebastian: Nice link. I'll give that a try after work today, but note that isn't using a generator, and it's only returning an array of zeroes and ones. Not exactly complete. :]Effieeffigy
Xavier Ho: numpy.nonzero() returns indices of the sieve array that are primes in this case. You should have tried it.Speos
casevh's gmpy-based version is marginally faster than iprimes_upto() but it is significantly slower than the numpy-based versions. For 1e6: iprimes_upto: 0.39, primes_upto2: 0.02, primes_upto3: 0.02, prime_numbers: 0.29. For 1e9 prime_numbers is too slow (but there is no memory error). primes_upto2: 28.96, primes_upto3: 27.55, prime_numbers: 509.74 (time in seconds).Speos
@J.F. Sebastian: Thanks for the effort. I'll probably eventually edit in a list of benchmarks sometime this week, after I finish all of my uni assignments.Effieeffigy
@Jf. Sebastian, on numpy.nonzero(): Ah okay. My mistake.Effieeffigy
@J.F. Sebastian: Just wanted to let you know that I ran into ValueError: dimensions too large. when I tried to generate primes up to 1 billion. However, 1 million is blazing fast! 0.035 seconds without a generator, and 0.40 seconds with a generator. I also found an additional optimisation to the link in RCode: primes[n*n::n*n] = 0.Effieeffigy
Never mind my last comment about that optimisation. It is wrong.Effieeffigy
@Xavier Ho: My previous timings for casevh's solution are bogus. I've adapted it for Python 2.x. Now it is prime_numbers: 169.00 s and prime_numbers2: 95.53 s for 1e9.Speos
@Xavier Ho: you should be able to run numpy generator version (that doesn't use nonzero() as I've shown) on your machine. numpy.bool is 1 byte so 4 GB should be sufficient on a fresh (non-fragmented) system.Speos
@J.F. Sebastian: I got some error like Value too large from what I can recall. Will give you a better detail in a day or two.Effieeffigy
@J.F. Sebastian: Could you post an answer with the numpy's solution and the generator version? They are winning by a fair margin, yet I cannot credit the person who posted it with a tick and an upvote.Effieeffigy
With a limit of 1,000,000 you should really be using xrange in your for loop.Durgy
@wallacaloo: In Python 3.x the range is lazy.Speos
L
35

There are a couple of small optimizations for your version. By reversing the roles of True and False, you can change "if flags[i] is False:" to "if flags[i]:". And the starting value for the second range statement can be i*i instead of i*3. Your original version takes 0.166 seconds on my system. With those changes, the version below takes 0.156 seconds on my system.

def prime_numbers(limit=1000000):
    '''Prime number generator. Yields the series
    2, 3, 5, 7, 11, 13, 17, 19, 23, 29 ...
    using Sieve of Eratosthenes.
    '''
    yield 2
    sub_limit = int(limit**0.5)
    flags = [True, True] + [False] * (limit - 2)
    # Step through all the odd numbers
    for i in range(3, limit, 2):
        if flags[i]:
            continue
        yield i
        # Exclude further multiples of the current prime number
        if i <= sub_limit:
            for j in range(i*i, limit, i<<1):
                flags[j] = True

This doesn't help your memory issue, though.

Moving into the world of C extensions, I used the development version of gmpy. (Disclaimer: I'm one of the maintainers.) The development version is called gmpy2 and supports mutable integers called xmpz. Using gmpy2 and the following code, I have a running time of 0.140 seconds. Running time for a limit of 1,000,000,000 is 158 seconds.

import gmpy2

def prime_numbers(limit=1000000):
    '''Prime number generator. Yields the series
    2, 3, 5, 7, 11, 13, 17, 19, 23, 29 ...
    using Sieve of Eratosthenes.
    '''
    yield 2
    sub_limit = int(limit**0.5)
    # Actual number is 2*bit_position + 1.
    oddnums = gmpy2.xmpz(1)
    current = 0
    while True:
        current += 1
        current = oddnums.bit_scan0(current)
        prime = 2 * current + 1
        if prime > limit:
            break
        yield prime
        # Exclude further multiples of the current prime number
        if prime <= sub_limit:
            for j in range(2*current*(current+1), limit>>1, prime):
                oddnums.bit_set(j)

Pushing optimizations, and sacrificing clarity, I get running times of 0.107 and 123 seconds with the following code:

import gmpy2

def prime_numbers(limit=1000000):
    '''Prime number generator. Yields the series
    2, 3, 5, 7, 11, 13, 17, 19, 23, 29 ...
    using Sieve of Eratosthenes.
    '''
    yield 2
    sub_limit = int(limit**0.5)
    # Actual number is 2*bit_position + 1.
    oddnums = gmpy2.xmpz(1)
    f_set = oddnums.bit_set
    f_scan0 = oddnums.bit_scan0
    current = 0
    while True:
        current += 1
        current = f_scan0(current)
        prime = 2 * current + 1
        if prime > limit:
            break
        yield prime
        # Exclude further multiples of the current prime number
        if prime <= sub_limit:
            list(map(f_set,range(2*current*(current+1), limit>>1, prime)))

Edit: Based on this exercise, I modified gmpy2 to accept xmpz.bit_set(iterator). Using the following code, the run time for all primes less 1,000,000,000 is 56 seconds for Python 2.7 and 74 seconds for Python 3.2. (As noted in the comments, xrange is faster than range.)

import gmpy2

try:
    range = xrange
except NameError:
    pass

def prime_numbers(limit=1000000):
    '''Prime number generator. Yields the series
    2, 3, 5, 7, 11, 13, 17, 19, 23, 29 ...
    using Sieve of Eratosthenes.
    '''
    yield 2
    sub_limit = int(limit**0.5)
    oddnums = gmpy2.xmpz(1)
    f_scan0 = oddnums.bit_scan0
    current = 0
    while True:
        current += 1
        current = f_scan0(current)
        prime = 2 * current + 1
        if prime > limit:
            break
        yield prime
        if prime <= sub_limit:
            oddnums.bit_set(iter(range(2*current*(current+1), limit>>1, prime)))

Edit #2: One more try! I modified gmpy2 to accept xmpz.bit_set(slice). Using the following code, the run time for all primes less 1,000,000,000 is about 40 seconds for both Python 2.7 and Python 3.2.

from __future__ import print_function
import time
import gmpy2

def prime_numbers(limit=1000000):
    '''Prime number generator. Yields the series
    2, 3, 5, 7, 11, 13, 17, 19, 23, 29 ...
    using Sieve of Eratosthenes.
    '''
    yield 2
    sub_limit = int(limit**0.5)
    flags = gmpy2.xmpz(1)
    # pre-allocate the total length
    flags.bit_set((limit>>1)+1)
    f_scan0 = flags.bit_scan0
    current = 0
    while True:
        current += 1
        current = f_scan0(current)
        prime = 2 * current + 1
        if prime > limit:
            break
        yield prime
        if prime <= sub_limit:
            flags.bit_set(slice(2*current*(current+1), limit>>1, prime))

start = time.time()
result = list(prime_numbers(1000000000))
print(time.time() - start)

Edit #3: I've updated gmpy2 to properly support slicing at the bit level of an xmpz. No change in performance but a much nice API. I have done a little tweaking and I've got the time down to about 37 seconds. (See Edit #4 to changes in gmpy2 2.0.0b1.)

from __future__ import print_function
import time
import gmpy2

def prime_numbers(limit=1000000):
    '''Prime number generator. Yields the series
    2, 3, 5, 7, 11, 13, 17, 19, 23, 29 ...
    using Sieve of Eratosthenes.
    '''
    sub_limit = int(limit**0.5)
    flags = gmpy2.xmpz(1)
    flags[(limit>>1)+1] = True
    f_scan0 = flags.bit_scan0
    current = 0
    prime = 2
    while prime <= sub_limit:
        yield prime
        current += 1
        current = f_scan0(current)
        prime = 2 * current + 1
        flags[2*current*(current+1):limit>>1:prime] = True
    while prime <= limit:
        yield prime
        current += 1
        current = f_scan0(current)
        prime = 2 * current + 1

start = time.time()
result = list(prime_numbers(1000000000))
print(time.time() - start)

Edit #4: I made some changes in gmpy2 2.0.0b1 that break the previous example. gmpy2 no longer treats True as a special value that provides an infinite source of 1-bits. -1 should be used instead.

from __future__ import print_function
import time
import gmpy2

def prime_numbers(limit=1000000):
    '''Prime number generator. Yields the series
    2, 3, 5, 7, 11, 13, 17, 19, 23, 29 ...
    using Sieve of Eratosthenes.
    '''
    sub_limit = int(limit**0.5)
    flags = gmpy2.xmpz(1)
    flags[(limit>>1)+1] = 1
    f_scan0 = flags.bit_scan0
    current = 0
    prime = 2
    while prime <= sub_limit:
        yield prime
        current += 1
        current = f_scan0(current)
        prime = 2 * current + 1
        flags[2*current*(current+1):limit>>1:prime] = -1
    while prime <= limit:
        yield prime
        current += 1
        current = f_scan0(current)
        prime = 2 * current + 1

start = time.time()
result = list(prime_numbers(1000000000))
print(time.time() - start)

Edit #5: I've made some enhancements to gmpy2 2.0.0b2. You can now iterate over all the bits that are either set or clear. Running time has improved by ~30%.

from __future__ import print_function
import time
import gmpy2

def sieve(limit=1000000):
    '''Returns a generator that yields the prime numbers up to limit.'''

    # Increment by 1 to account for the fact that slices do not include
    # the last index value but we do want to include the last value for
    # calculating a list of primes.
    sieve_limit = gmpy2.isqrt(limit) + 1
    limit += 1

    # Mark bit positions 0 and 1 as not prime.
    bitmap = gmpy2.xmpz(3)

    # Process 2 separately. This allows us to use p+p for the step size
    # when sieving the remaining primes.
    bitmap[4 : limit : 2] = -1

    # Sieve the remaining primes.
    for p in bitmap.iter_clear(3, sieve_limit):
        bitmap[p*p : limit : p+p] = -1

    return bitmap.iter_clear(2, limit)

if __name__ == "__main__":
    start = time.time()
    result = list(sieve(1000000000))
    print(time.time() - start)
    print(len(result))
Librarianship answered 25/5, 2010 at 3:41 Comment(28)
What settings have you used to install gmpy. It takes 500 seconds for limit=1e9 on my machine (for comparison, primes_upto2() from rosettacode.org/wiki/Sieve_of_Eratosthenes#Using_numpy takes me 30 seconds). I've just checkouted the code and run python setup.py installSpeos
@casevh: Thanks for the gmpy code. I'll give it a run after work today - I'm pretty impressed by the 2/3 reduced runtime.Effieeffigy
That should be all you need to do to install it. I'm using 64-bit Ubuntu 10.04, 2.2Ghz Core2 Duo, GMP 5.01, and Python 3.1. Using the Ubuntu version of numpy, primes_upto2() takes 50 seconds on my computer so something is strange.Librarianship
@casevh: I've noticed the 2nd variant of prime_numbers() causes my machine to swap. So it might ruin the benchmark. 64-bit Ubuntu 10.04, 2.6GHz i7, GMP 4.3.2 pgmpy 2.0.0a and python 2.6.4.Speos
Since you are using Python 2.x, change range to xrange. That should fix the extreme memory usage. You might try compiling GMP 5.01 from source. IIRC, it has improved code for the newer Intel processors.Librarianship
This answer has been the best so far; I'm giving gmpy a shot now. Would everyone be okay if I put in a comparison list of execution speed on my machine? I think other people might have trouble following this topic.Effieeffigy
@casevh: I tried to checkout the development build 1.12, but I'm getting the following error when I tried to use setup.py install : c:\users\hp\documents\coding\python\gmpy_1_12\src\gmpy.h(31) : fatal error C1083 : Cannot open include file: 'gmp.h': No such file or directoryEffieeffigy
gmpy is a wrapper for the GNU Multiple Precision library so that needs to be compiled. There are two files included that describe the Windows build process. You will also need to checkout the trunk version to get gmpy2. It will be identified at 2.0.0a. 1.12 is just a bug fix that will be released soon. For more assistance with building gmpy2, open an issue at the gmpy site.Librarianship
@casevh: I've redone the benchmark: prime_numbers: 169.00 s and prime_numbers2: 95.53 s for 1e9.Speos
@casevh: gmp 5.0.1 produces the same results: prime_numbers: 166.93 prime_numbers2: 96.48 (I've double checked the version with gmpy2.gmp_version()).Speos
@casevh: Python 3.2a0 is slightly slower: prime_numbers: 173.18 prime_numbers2: 110.99Speos
@J.F. Sebastian: my results agree with you on Py3 vs Py2. Even xrange() in Python 2.6 is faster than range in Python 3. I wonder why.Effieeffigy
@Xavier Ho: xrange() uses C ints (fast, finite). range() uses Python ints (slow, infinite (while the memory is available)).Speos
xrange is ~50MOPS, range is ~30MOPS (both 2.6 & 3.2 Python)Speos
@J.F. Sebastian: Interesting insight. So did they remove xrange() in Python 3 completely, or was it moved somewhere?Effieeffigy
@J.F. Sebastian: Your performance results with Python 3.2a0 are interesting. I usually get the same, or slightly faster, times with 3.2. To ensure that the configure options are the same, I compile each version of Python that I use for testing. Did you use --with-computed-gotos when you compiled 3.2? IIRC, it is disabled by default.Librarianship
@casevh: I've used default settings. --with-computed-gotos on 3.2a0 puts the result in between 2.6.4 and 3.2a0 for a default: prime_numbers: 167.97 prime_numbers2: 105.05Speos
@casevh: To sum up: On my machine there is no measurable difference in performance between gmp 4.3.2/5.0.1, Python 2.6.4/3.2a0, /--with-computed-gotos for the prime_numbers() functions.Speos
@J.F. Sebastian: I've edited my original answer to to include a new result based on changes I made to gmpy2. xmpz.bit_set will now accept an iterator directly. Time is now 56 seconds with Python 2.7.Librarianship
@casevh: with new xmpz.bit_set(): Python 2.6.4: 47, Python 3.x 66 seconds. It is close to 55 seconds for ideone.com/mxSdH but still slower than 30 seconds for primes_upto2() from rosettacode.org/wiki/Sieve_of_Eratosthenes#Using_numpySpeos
@J.F. Sebastian: I added slice support to xmpz.bit_set. Down to about 40 seconds on my system.Librarianship
@casevh: slice-based version is 33 seconds on my machine (it is near 30 seconds for the numpy version). btw, the interface for xmpz.bit_set() is a mess (take a look at list_ass_subscript() (implements L[o]=v) in listobject.c, tuplesubscript() (implements t[o]) in tupleobject.c). Additionally immutable case should have the same capabilities (just use result in immutable case everywhere you use self in mutable case).Speos
@J.F. Sebastian: Thanks for continuing to benchmark each version. I agree the slice interface is horrible. It started out as an experiment to see how much overhead was involved in iterating over xrange/range. I was originally thinking of an xrange clone that gmpy2 would recognize and then bypass the iterator protocol. It would be easy to allow xmpz[::]=1 to set bits but is worth supporting a=a[::-1] and a[::2]=a[1::2] etc.? Suggestions welcome :)Librarianship
Wow. The world is just getting better and better. Nice work, casevh. I should be getting around to install gmp on my computer next week, and I'll let you know the benchmark results.Effieeffigy
@casevh: I don't know what are design priorities for the gmpy2: maintanability (hence robustness), performance or feature-set (reason to bother with the C dependency). For bit_set() I'd leave Integer index and slice. Iterable version is slow and complicates both code and interface. gmpy_mpz_bit_set_iterable() could be removed completely gist.github.com/416936Speos
@J.F. Sebastian: The goals are speed and full support of GMP. The goal for gmpy 1.1x was support for Python 3.x and speed but I didn't want to change the API much. With gmpy2, I'm dropping support for old versions of Python (pre 2.6) and I'm willing to change the API; hence the experiments. Thanks for the code!Librarianship
@J.F. Sebastian: I've updated gmpy2 to (hopefully) properly support slicing. It's only supported on xmpz for the moment. I made another tweak to prime_numbers() and I'm down to around 37 seconds.Librarianship
Timing for the edit #3: Python 2.6: 24 seconds 3.2: 33 seconds which outperform original numpy version rosettacode.org/wiki/Sieve_of_Eratosthenes#Using_numpy. The mapping interface for integers might be too much magic for me. I'll open an issue with encountered problems.Speos
M
13

OK, so this is my second answer, but as speed is of the essence I thought that I had to mention the bitarray module - even though it's bitstring's nemesis :). It's ideally suited to this case as not only is it a C extension (and so faster than pure Python has a hope of being), but it also supports slice assignments.

I haven't even tried to optimise this, I just rewrote the bitstring version. On my machine I get 0.16 seconds for primes under a million.

For a billion, it runs perfectly well and completes in 2 minutes 31 seconds.

import bitarray

def prime_bitarray(limit=1000000):
    yield 2
    flags = bitarray.bitarray(limit)
    flags.setall(False)
    sub_limit = int(limit**0.5)
    for i in range(3, limit, 2):
        if not flags[i]:
            yield i
            if i <= sub_limit:
                flags[3*i:limit:i*2] = True
Morpho answered 24/5, 2010 at 20:53 Comment(3)
Aww what, bit array! Exactly what I needed? XD. I'll give it a try after work today.Effieeffigy
Yes, I've run into the same problem before and was going to suggest bitarray. I hadn't heard of bitstring before, but I'll check it out. I had been using BitVector before I found bitarray (which I found wasn't very optimized), but its good to know of another binary array module to check out.Complaint
Just thought to let you know that on my machine, it took 0.45 seconds to run for n < 1,000,000, and so it will probably take 450 seconds to do a billion. I haven't tried it yet, but just to give you some idea about my machine speed compared to my 0.21 seconds version. Perhaps I can use bitarray for a generator that requires more than 100 million or something, heh.Effieeffigy
E
8

Okay, here's a (near complete) comprehensive benchmarking I've done tonight to see which code runs the fastest. Hopefully someone will find this list useful. I omitted anything that takes more than 30 seconds to complete on my machine.

I would like to thank everyone that put in an input. I've gained a lot of insight from your efforts, and I hope you have too.

My machine: AMD ZM-86, 2.40 Ghz Dual-Core, with 4GB of RAM. This is a HP Touchsmart Tx2 laptop. Note that while I may have linked to a pastebin, I benchmarked all of the following on my own machine.

I will add the gmpy2 benchmark once I am able to build it.

All of the benchmarks are tested in Python 2.6 x86

Returning a list of prime numbers n up to 1,000,000: (Using Python generators)

Sebastian's numpy generator version (updated) - 121 ms @

Mark's Sieve + Wheel - 154 ms

Robert's version with slicing - 159 ms

My improved version with slicing - 205 ms

Numpy generator with enumerate - 249 ms @

Mark's Basic Sieve - 317 ms

casevh's improvement on my original solution - 343 ms

My modified numpy generator solution - 407 ms

My original method in the question - 409 ms

Bitarray Solution - 414 ms @

Pure Python with bytearray - 1394 ms @

Scott's BitString solution - 6659 ms @

'@' means this method is capable of generating up to n < 1,000,000,000 on my machine setup.

In addition, if you don't need the generator and just want the whole list at once:

numpy solution from RosettaCode - 32 ms @

(The numpy solution is also capable of generating up to 1 billion, which took 61.6259 seconds. I suspect the memory was swapped once, hence the double time.)

Effieeffigy answered 24/5, 2010 at 13:31 Comment(15)
@Xavier Ho: you numpy version is incorrect: the step should be n, not n*n i.e., isprime[n*n::n]=0. You don't need to call numpy.nonzero() if you'd like a generator version: primes[:2]=0; return (i for i, p in enumerate(primes) if p)Speos
Note: numpy generator solution is 3 times slower (100 seconds vs. 30 seconds for 1e9) than the non-generator version.Speos
@J.F Sebastian: Nice catch. Thought I had fixed it! Thanks.Effieeffigy
@J.F. Sebastian: Interesting. On my machine it's more than 6 times slower.Effieeffigy
@Xavier Ho: Here's numpy generator version ideone.com/mxSdH (55 seconds for 1e9 (compared to 30 seconds for numpy non-generator version and to 45 seconds for @casevh's new xmpz.bitset()-based version)Speos
Nice. I'll give it a go tonight. Thanks.Effieeffigy
I've replaced isprime[n*n::n] by isprime[n*n::2*n] due to this solution already skips all even numbers ideone.com/G9l44 It is slightly faster 52.72 vs. 47.84 seconds for limit 1e9.Speos
@J.F. Sebastian: Haha, good find. I've updated the chart above to reflect the new benchmark.Effieeffigy
ambi_sieve() is faster than numpy solution from RossettaCode #2068872Speos
Madness. This is madness! (I've been pretty busy. Did you run some tests on your own machine?)Effieeffigy
@Robert William Hanks: prime_numbers4() is a champion among generator solutions (on my machine and for the code: ideone.com/weu23 )Speos
primesfrom2to() takes 10 seconds for n=1e9. It is the best result I know among solutions in Python. #2068872Speos
It's nice to see @Robert still going. I admit freely I've been busy with semester exams, and hence the downtime.Effieeffigy
@Xavier Ho: i added my last attempt of a numpy generator,if you have time, you may want to timeit that. (I know it's a wiki but some benchmark in other machine would be great).Assibilate
I'm not suggesting to revive this old old thread, but there's no proper speed comparison without comparing the run times at several size points, getting the codes' empirical orders of growth and (ideally) drawing the times-vs-sizes graphs on one log-log plot as can be seen e.g. here and here.Faints
A
6

Related question: Fastest way to list all primes below N in python.

Hi, i am too looking for a code in Python to generate primes up to 10^9 as fast as i can, which is difficult because of the memory problem. up to now i came up with this to generate primes up to 10^6 & 10^7 (clocking 0,171s & 1,764s respectively on my old machine), which seems to be slightly faster (at least in my computer) than "My improved version with slicing" from previous post, probably because i use n//i-i +1 (see below) instead of the analogous len(flags[i2::i<<1]) in the other code. please correct me if i am wrong. Any suggestions for improvement are very welcome.

def primes(n):
    """ Returns  a list of primes < n """
    sieve = [True] * n
    for i in xrange(3,int(n**0.5)+1,2):
        if sieve[i]:
            sieve[i*i::2*i]=[False]*((n-i*i-1)/(2*i)+1)
    return [2] + [i for i in xrange(3,n,2) if sieve[i]]

In one of his codes Xavier uses flags[i2::i<<1] and len(flags[i2::i<<1]). I computed the size explicitly, using the fact that between i*i..n we have (n-i*i)//2*i multiples of 2*i because we want to count i*i also we sum 1 so len(sieve[i*i::2*i]) equals (n-i*i)//(2*i) +1. This makes the code faster. A basic generator based on the code above would be:

def primesgen(n):
    """ Generates all primes <= n """
    sieve = [True] * n
    yield 2
    for i in xrange(3,int(n**0.5)+1,2):
        if sieve[i]:
            yield i
            sieve[i*i::2*i] = [False]*((n-i*i-1)/(2*i)+1)
    for i in xrange(i+2,n,2):
        if sieve[i]: yield i

with a bit of modification one can write a slightly slower version of the code above that starts with a sieve half of the size sieve = [True] * (n//2) and works for the same n. not sure how that will reflect in the memory issue. As an example of implementation here is the modified version of the numpy rosetta code (maybe faster) starting with a sieve half of the size.

import numpy
def primesfrom3to(n):
    """ Returns a array of primes, 3 <= p < n """
    sieve = numpy.ones(n/2, dtype=numpy.bool)
    for i in xrange(3,int(n**0.5)+1,2):
        if sieve[i/2]: sieve[i*i/2::i] = False
    return 2*numpy.nonzero(sieve)[0][1::]+1

A Faster & more memory-wise generator would be:

import numpy
def primesgen1(n):
""" Input n>=6, Generates all primes < n """
sieve1 = numpy.ones(n/6+1, dtype=numpy.bool)
sieve5 = numpy.ones(n/6  , dtype=numpy.bool)
sieve1[0] = False
yield 2; yield 3;
for i in xrange(int(n**0.5)/6+1):
    if sieve1[i]:
        k=6*i+1; yield k;
        sieve1[ ((k*k)/6) ::k] = False
        sieve5[(k*k+4*k)/6::k] = False
    if sieve5[i]:
        k=6*i+5; yield k;
        sieve1[ ((k*k)/6) ::k] = False
        sieve5[(k*k+2*k)/6::k] = False
for i in xrange(i+1,n/6):
        if sieve1[i]: yield 6*i+1
        if sieve5[i]: yield 6*i+5
if n%6 > 1:
    if sieve1[i+1]: yield 6*(i+1)+1

or with a bit more code:

import numpy
def primesgen(n):
    """ Input n>=30, Generates all primes < n """
    size = n/30 + 1
    sieve01 = numpy.ones(size, dtype=numpy.bool)
    sieve07 = numpy.ones(size, dtype=numpy.bool)
    sieve11 = numpy.ones(size, dtype=numpy.bool)
    sieve13 = numpy.ones(size, dtype=numpy.bool)
    sieve17 = numpy.ones(size, dtype=numpy.bool)
    sieve19 = numpy.ones(size, dtype=numpy.bool)
    sieve23 = numpy.ones(size, dtype=numpy.bool)
    sieve29 = numpy.ones(size, dtype=numpy.bool)
    sieve01[0] = False
    yield 2; yield 3; yield 5;
    for i in xrange(int(n**0.5)/30+1):
        if sieve01[i]:
            k=30*i+1; yield k;
            sieve01[     (k*k)/30::k] = False
            sieve07[(k*k+ 6*k)/30::k] = False
            sieve11[(k*k+10*k)/30::k] = False
            sieve13[(k*k+12*k)/30::k] = False
            sieve17[(k*k+16*k)/30::k] = False
            sieve19[(k*k+18*k)/30::k] = False
            sieve23[(k*k+22*k)/30::k] = False
            sieve29[(k*k+28*k)/30::k] = False
        if sieve07[i]:
            k=30*i+7; yield k;
            sieve01[(k*k+ 6*k)/30::k] = False
            sieve07[(k*k+24*k)/30::k] = False
            sieve11[(k*k+16*k)/30::k] = False
            sieve13[(k*k+12*k)/30::k] = False
            sieve17[(k*k+ 4*k)/30::k] = False
            sieve19[     (k*k)/30::k] = False
            sieve23[(k*k+22*k)/30::k] = False
            sieve29[(k*k+10*k)/30::k] = False
        if sieve11[i]:
            k=30*i+11; yield k;
            sieve01[     (k*k)/30::k] = False
            sieve07[(k*k+ 6*k)/30::k] = False
            sieve11[(k*k+20*k)/30::k] = False
            sieve13[(k*k+12*k)/30::k] = False
            sieve17[(k*k+26*k)/30::k] = False
            sieve19[(k*k+18*k)/30::k] = False
            sieve23[(k*k+ 2*k)/30::k] = False
            sieve29[(k*k+ 8*k)/30::k] = False
        if sieve13[i]:
            k=30*i+13; yield k;
            sieve01[(k*k+24*k)/30::k] = False
            sieve07[(k*k+ 6*k)/30::k] = False
            sieve11[(k*k+ 4*k)/30::k] = False
            sieve13[(k*k+18*k)/30::k] = False
            sieve17[(k*k+16*k)/30::k] = False
            sieve19[     (k*k)/30::k] = False
            sieve23[(k*k+28*k)/30::k] = False
            sieve29[(k*k+10*k)/30::k] = False
        if sieve17[i]:
            k=30*i+17; yield k;
            sieve01[(k*k+ 6*k)/30::k] = False
            sieve07[(k*k+24*k)/30::k] = False
            sieve11[(k*k+26*k)/30::k] = False
            sieve13[(k*k+12*k)/30::k] = False
            sieve17[(k*k+14*k)/30::k] = False
            sieve19[     (k*k)/30::k] = False
            sieve23[(k*k+ 2*k)/30::k] = False
            sieve29[(k*k+20*k)/30::k] = False
        if sieve19[i]:
            k=30*i+19; yield k;
            sieve01[     (k*k)/30::k] = False
            sieve07[(k*k+24*k)/30::k] = False
            sieve11[(k*k+10*k)/30::k] = False
            sieve13[(k*k+18*k)/30::k] = False
            sieve17[(k*k+ 4*k)/30::k] = False
            sieve19[(k*k+12*k)/30::k] = False
            sieve23[(k*k+28*k)/30::k] = False
            sieve29[(k*k+22*k)/30::k] = False
        if sieve23[i]:
            k=30*i+23; yield k;
            sieve01[(k*k+24*k)/30::k] = False
            sieve07[(k*k+ 6*k)/30::k] = False
            sieve11[(k*k+14*k)/30::k] = False
            sieve13[(k*k+18*k)/30::k] = False
            sieve17[(k*k+26*k)/30::k] = False
            sieve19[     (k*k)/30::k] = False
            sieve23[(k*k+ 8*k)/30::k] = False
            sieve29[(k*k+20*k)/30::k] = False
        if sieve29[i]:
            k=30*i+29; yield k;
            sieve01[     (k*k)/30::k] = False
            sieve07[(k*k+24*k)/30::k] = False
            sieve11[(k*k+20*k)/30::k] = False
            sieve13[(k*k+18*k)/30::k] = False
            sieve17[(k*k+14*k)/30::k] = False
            sieve19[(k*k+12*k)/30::k] = False
            sieve23[(k*k+ 8*k)/30::k] = False
            sieve29[(k*k+ 2*k)/30::k] = False
    for i in xrange(i+1,n/30):
            if sieve01[i]: yield 30*i+1
            if sieve07[i]: yield 30*i+7
            if sieve11[i]: yield 30*i+11
            if sieve13[i]: yield 30*i+13
            if sieve17[i]: yield 30*i+17
            if sieve19[i]: yield 30*i+19
            if sieve23[i]: yield 30*i+23
            if sieve29[i]: yield 30*i+29
    i += 1
    mod30 = n%30
    if mod30 > 1:
        if sieve01[i]: yield 30*i+1
    if mod30 > 7:
        if sieve07[i]: yield 30*i+7
    if mod30 > 11:
        if sieve11[i]: yield 30*i+11
    if mod30 > 13:
        if sieve13[i]: yield 30*i+13
    if mod30 > 17:
        if sieve17[i]: yield 30*i+17
    if mod30 > 19:
        if sieve19[i]: yield 30*i+19
    if mod30 > 23:
        if sieve23[i]: yield 30*i+23

Ps: If you have any suggestions about how to speed up this code, anything from changing the order of operations to pre-computing anything, please comment.

Assibilate answered 24/5, 2010 at 13:31 Comment(14)
Well, it would be faster because you're using list comprehension and not generator. Nice one, I'll add the benchmark when I get the time.Effieeffigy
Just a thought, can you explain how your m = n // i-i is analogous to my flags[i2::i<<1]? Since I ignored iterating through all the multiples of two, I also avoided it in the 'step' in the slicing syntax. The way you're doing it is still iterating over every multiple of itself.Effieeffigy
sorry man i am new to programing i don't even know what << means at this point. And i am not sure if my code is faster than yours or exactly why, my guess was you calling len(). Maybe this can help, sorry if off the point. well math tells us the the numbers the the numbers of multiples o i in range(1,n) is equal to n//i (integer division), the number of multiples o i in range (1,ii) is i (1i,2i,3i,...ii) so in [ii::i] we have multiples of i in range(1,n) - multiples of i in range(1,ii) +1 (+one because we want to count ii too) thus the formula len(sieve[i*i::i]) equal n//i-i+1.Assibilate
In my code i ignore multiples of two completely and i don't flag than out of my sieve since i rely in the step of the range function being 2 for sieving and forming the primes list.(i only add [2] to the list at the end)Assibilate
On a side note one can skip the prime 3 completely too if the initialization of the sieve is made something like this [False,True,True] * ((n+1)//3) see my second example primes2(), its bit faster. Please ensure the the input is one less than a multiple of 3.in my machine the difference is so small that i did not care for a better code.Assibilate
@Robert: Have you tried your code? I suspect it generates 8 as a prime number...Effieeffigy
Not possible, i generate the primes by this list comprehension [2,3] + [i for i in range(5,n+1,2) if sieve[i]] as the index of the range is equal to the value (sieve[i] equal i) i only check for odd integers. the even integers in my sieve are all flagged True, i simply ignore than. by the way using this fact and the trick you showed me [i*i::2*i] i think it's possible to make a version of the code that start with half of the values in the sieve since we only need to care about odd numbers. that should use a bit less memory and maybe it will be faster.Assibilate
I updated the code responding to a comment of Xavier with respect to "how your m = n // i-i is analogous to my flags[i2::i<<1]", hope it helps.Assibilate
@Robert: Yeah, changing it to 2*i should eliminate the incorrect numbers generated.Effieeffigy
@Robert: And great job! 159 ms on my machine for your generator (2nd code). Your effort will not go unappreciated.Effieeffigy
@Robert: I'll give that a run tonight.Effieeffigy
primesgen2() shows a tiny improvement compared to primesgen1(): 33 ms vs 37 ms for n=1e6 ideone.com/xcoii.Speos
@J.F. Sebastian: thanks for the feedback man, primesgen1() uses a combined sieve of size (1/3)*n, primesgen2() one of sieze (8/30)*n. 33% vs 26% (can we say that is the memory usage of each version?) that is why i tried.Assibilate
Results for n=1e9 confirm "memory hypothesis": primesgen1() vs. primesgen2() is 35 vs. 27 seconds (the time ratio is similar to the memory usage ratio). primesfrom2to3() is 9.9 seconds (for comparison).Speos
L
4

Here's a version that I wrote a while back; it might be interesting to compare with yours for speed. It doesn't do anything about the space problems, though (in fact, they're probably worse than with your version).

from math import sqrt

def basicSieve(n):
    """Given a positive integer n, generate the primes < n."""
    s = [1]*n
    for p in xrange(2, 1+int(sqrt(n-1))):
        if s[p]:
            a = p*p
            s[a::p] = [0] * -((a-n)//p)
    for p in xrange(2, n):
        if s[p]:
            yield p 

I have faster versions, using a wheel, but they're much more complicated. Original source is here.

Okay, here's the version using a wheel. wheelSieve is the main entry point.

from math import sqrt
from bisect import bisect_left

def basicSieve(n):
    """Given a positive integer n, generate the primes < n."""
    s = [1]*n
    for p in xrange(2, 1+int(sqrt(n-1))):
        if s[p]:
            a = p*p
            s[a::p] = [0] * -((a-n)//p)
    for p in xrange(2, n):
        if s[p]:
            yield p

class Wheel(object):
    """Class representing a wheel.

    Attributes:
       primelimit -> wheel covers primes < primelimit.
       For example, given a primelimit of 6
       the wheel primes are 2, 3, and 5.
       primes -> list of primes less than primelimit
       size -> product of the primes in primes;  the modulus of the wheel
       units -> list of units modulo size
       phi -> number of units

    """
    def __init__(self, primelimit):
        self.primelimit = primelimit
        self.primes = list(basicSieve(primelimit))

        # compute the size of the wheel
        size = 1
        for p in self.primes:
            size *= p
        self.size = size

        # find the units by sieving
        units = [1] * self.size
        for p in self.primes:
            units[::p] = [0]*(self.size//p)
        self.units = [i for i in xrange(self.size) if units[i]]

        # number of units
        self.phi = len(self.units)

    def to_index(self, n):
        """Compute alpha(n), where alpha is an order preserving map
        from the set of units modulo self.size to the nonnegative integers.

        If n is not a unit, the index of the first unit greater than n
        is given."""
        return bisect_left(self.units, n % self.size) + n // self.size * self.phi

    def from_index(self, i):
        """Inverse of to_index."""

        return self.units[i % self.phi] + i // self.phi * self.size

def wheelSieveInner(n, wheel):
    """Given a positive integer n and a wheel, find the wheel indices of
    all primes that are less than n, and that are units modulo the
    wheel modulus.
    """

    # renaming to avoid the overhead of attribute lookups
    U = wheel.units
    wS = wheel.size
    # inverse of unit map
    UI = dict((u, i) for i, u in enumerate(U))
    nU = len(wheel.units)

    sqroot = 1+int(sqrt(n-1)) # ceiling of square root of n

    # corresponding index (index of next unit up)
    sqrti = bisect_left(U, sqroot % wS) + sqroot//wS*nU
    upper = bisect_left(U, n % wS) + n//wS*nU
    ind2 = bisect_left(U, 2 % wS) + 2//wS*nU

    s = [1]*upper
    for i in xrange(ind2, sqrti):
        if s[i]:
            q = i//nU
            u = U[i%nU]
            p = q*wS+u
            u2 = u*u
            aq, au = (p+u)*q+u2//wS, u2%wS
            wp = p * nU
            for v in U:
                # eliminate entries corresponding to integers
                # congruent to p*v modulo p*wS
                uvr = u*v%wS
                m = aq + (au > uvr)
                bot = (m + (q*v + u*v//wS - m) % p) * nU + UI[uvr]
                s[bot::wp] = [0]*-((bot-upper)//wp)
    return s

def wheelSieve(n, wheel=Wheel(10)):
    """Given a positive integer n, generate the list of primes <= n."""
    n += 1
    wS = wheel.size
    U = wheel.units
    nU = len(wheel.units)
    s = wheelSieveInner(n, wheel)
    return (wheel.primes[:bisect_left(wheel.primes, n)] +
            [p//nU*wS + U[p%nU] for p in xrange(bisect_left(U, 2 % wS)
             + 2//wS*nU, len(s)) if s[p]])

As to what a wheel is: well, you know that (apart from 2), all primes are odd, so most sieves miss out all the even numbers. Similarly, you can go a bit further and notice that all primes (except 2 and 3) are congruent to 1 or 5 modulo 6 (== 2 * 3), so you can get away with only storing entries for those numbers in your sieve. The next step up is to note that all primes (except 2, 3 and 5) are congruent to one of 1, 7, 11, 13, 17, 19, 23, 29 (modulo 30) (here 30 == 2*3*5), and so on.

Lapp answered 24/5, 2010 at 14:57 Comment(7)
Care to explain the whee? Is it something similar to ... Sieve of Akin?Effieeffigy
@Mark: 0.28 seconds! You're our first-up to the finals! =D ideone.com/yIENnEffieeffigy
@Mark: And.. naturally, MemoryError @ 1,000,000,000. =/ ideone.com/YeBOR. I'm curious about your faster version now.Effieeffigy
Thanks! I'll read over the code and try to understand it. Will report my status later.Effieeffigy
No, Sieve of Atkin introduces a fundamentally new idea, which is to use quadratic forms; I think it's only asymptotically faster than a basic sieve of eratosthenes + wheel, and the point at which it becomes faster is likely to be > 1000000 (depending on implementations, of course). The idea of using a wheel has been around a good while. I've added a link to the place I first posted this; there's an implementation using a wheel there.Lapp
@Mark: Nice! The wheel code you wrote a while back took 0.127 seconds. ideone.com/qX2cU. It still doesn't handle 1,000,000,000, but I think I've probed enough. I will definitely look into this interesting beast. Accepted!Effieeffigy
@Mark: I just wanted to let you know that your slicing assignment is a great idea: I was able to optimise my code to 0.21 seconds from 0.32 seconds: ideone.com/bIDY4 - 1/2 the speed of your wheel code. :]Effieeffigy
M
3

One speed improvement you can make using bitstring is to use the 'set' method when you exclude multiples of the current number.

So the vital section becomes

for i in range(3, limit, 2):
    if flags[i]:
        yield i
        if i <= sub_limit:
            flags.set(1, range(i*3, limit, i*2))    

On my machine this runs about 3 times faster than the original.

My other thought was to use the bitstring to represent only the odd numbers. You could then find the unset bits using flags.findall([0]) which returns a generator. Not sure if that would be much faster (finding bit patterns isn't terribly easy to do efficiently).

[Full disclosure: I wrote the bitstring module, so I've got some pride at stake here!]


As a comparison I've also taken the guts out of the bitstring method so that it's doing it in the same way, but without range checking etc. I think this gives a reasonable lower limit for pure Python that works for a billion elements (without changing the algorithm, which I think is cheating!)

def prime_pure(limit=1000000):
    yield 2
    flags = bytearray((limit + 7) // 8)
    sub_limit = int(limit**0.5)
    for i in xrange(3, limit, 2):
        byte, bit = divmod(i, 8)
        if not flags[byte] & (128 >> bit):
            yield i
            if i <= sub_limit:
                for j in xrange(i*3, limit, i*2):
                    byte, bit = divmod(j, 8)
                    flags[byte] |= (128 >> bit)

On my machine this runs in about 0.62 seconds for a million elements, which means it's about a quarter of the speed of the bitarray answer. I think that's quite reasonable for pure Python.

Morpho answered 24/5, 2010 at 13:53 Comment(11)
@Scott: Ah cool, nice to have the author of bitstring here! I'll try that too, will let you know shortly of the result. And yes, I am using 2.0.0 beta 1, as I only downloaded the library an hour ago.Effieeffigy
@Scott: Just did a test. 11.2 seconds with your fix. Still a bit slow. Got any more ideas?Effieeffigy
Couple more ideas above. I'm guessing that should bring your time down to about 7 seconds.Morpho
@Scott: Thanks. But my original code runs at 0.32 seconds. See: ideone.com/wCJm5. But thanks for the idea and input! I'll be watching this topic for a while.Effieeffigy
I thought the challenge was for the fastest sieve code to produce primes up to a billion, not a million. The bitstring code will work for a billion, whereas your original code doesn't, so I think bitstring is winning so far! As an aside, using Python 2 I get the million case down to 4.5 seconds.Morpho
@Scott: Yes, the memory constraint is something even Mark's answer can't solve. However, even 4.5 seconds is too slow compared to my original code. It does solve the big number issue, but the time also blows up in a linear fashion. Hence the reason I accepted Mark's answer (0.11 seconds up to 1 million.)Effieeffigy
Using the wheel feels like cheating to me - it's not quite the same algorithm any more. By stripping away some internal bitstring validation code I got down to 2.3 seconds for primes up to a million. I'll tell you how long it takes for a billion as soon as it finishes :)Morpho
Primes up to a billion in a little under 38 minutes. But at least it worked!Morpho
@Scott: Nice! That's really good to know. I didn't have the patience to wait =/Effieeffigy
@Scott: Thanks for the side to side comparison. Yes, you did a good job. :3.Effieeffigy
prime_pure() is two times slower than the slowest function I've compared ideone.com/F0RBo (comparison is at the end of the file)Speos
S
3

One option you may want to look at is just compiling a C/C++ module so you have direct access to the bit-twiddling features. AFAIK you could write something of that nature and only return the results on completion of the calculations performed in C/C++. Now that I'm typing this you may also look at numpy which does store arrays as native C for speed. I don't know if that will be any faster than the bitstring module, though!

Samy answered 24/5, 2010 at 14:29 Comment(3)
Thanks, Wayne. You're right that it is an option - just not exactly an easy one. I'll be happy when you write one, of course. ;]Effieeffigy
Heh. It's actually not that bad if you already know C/C++ - the biggest pain is figuring out how to tell your scripts the right thing for scons to take care of the build process. Then you just have to deal with ~ 125 MB worth of bits (1 Billion bits/8 bytes == 125 Million Bytes).Samy
True. I know C++, but I know Python is written in C, and I haven't written a Python module in C/C++ myself. So that is a little distant yet. It's okay, though, we're getting some brilliant answers here on SO, as always. :]Effieeffigy
A
2

Python's integer types can be of arbitrary size, so you shouldn't need a clever bitstring library to represent a set of bits, just a single number.

Here's the code. It handles a limit of 1,000,000 with ease, and can even handle 10,000,000 without complaining too much:

def multiples_of(n, step, limit):
    bits = 1 << n
    old_bits = bits
    max = 1 << limit
    while old_bits < max:
        old_bits = bits
        bits += bits << step
        step *= 2
    return old_bits

def prime_numbers(limit=10000000):
    '''Prime number generator. Yields the series                                                                        
    2, 3, 5, 7, 11, 13, 17, 19, 23, 29 ...                                                                              
    using Sieve of Eratosthenes.                                                                                        
    '''
    yield 2
    sub_limit = int(limit**0.5)
    flags = ((1 << (limit - 2)) - 1) << 2
    # Step through all the odd numbers                                                                                  
    for i in xrange(3, limit, 2):
        if not (flags & (1 << i)):
            continue
        yield i
        # Exclude further multiples of the current prime number                                                         
        if i <= sub_limit:
            flags &= ~multiples_of(i * 3, i << 1, limit)

The multiples_of function avoids the cost of setting every single multiple individually. It sets the initial bit, shifts it by the initial step (i << 1) and adds the result to itself. It then doubles the step, so that the next shift copies both bits, and so on until it reaches the limit. This sets all the multiples of a number up to the limit in O(log(limit)) time.

Anaximander answered 24/5, 2010 at 13:44 Comment(17)
@Marcelo: Yeah, I'm aware of that. I could also write a few functions that do the bit manipulations and see if it's much faster. It's something I'm working on, at this moment. Edit: The issue is, even performing 2 << 1000000 takes more than 10 seconds.Effieeffigy
@Xavier: No it doesn't. It might be that printing the result of that operation takes that long. Try x = 2 << 1000000 instead.Isolt
@kaizer.se: Hm, good point. I'll go and see what I can do with that.Effieeffigy
This seems unlikely to work well, unless you can find some way of doing a whole bunch of bit-sets together: each bit-operation on a long will generate a whole new long, so is an O(n) operation.Lapp
@Mark: Yes, I just tried it. 1.3 seconds to generate up to 100,000, and when I tried 1,000,000 it takes more than 20 seconds (in fact, it's still not finished as I type this). This is not viable by bit shifting, and I need direct access to change the bits.Effieeffigy
@Marcelo: Thanks for the edit. Why did you change the square root to division by 2, by the way? I'll try your code now. || Edit: 0.992000102997 seconds for n < 100,000. For n < 1,000,000 it still takes ages. Have you tried it yoruself? Nice work, though. Gave me some ideas.Effieeffigy
@Marcelo: Changed it to sqrt, yielding 0.59 seconds for n < 100,000. Still not good enough, sorry. :] || Edit: 1 million took 54.7890000343 seconds.Effieeffigy
A slowdown is inevitable as the limit grows. Just out of curiosity: is the bitset version faster or slower? What is the fastest time you've seen for 1,000,000?Anaximander
@Marcelo: What bitset version? If you meant Scott's answer, yes, it took 11 seconds in that version, for n < 1,000,000. That's still 10 times slower than my code, though.Effieeffigy
@Marcelo: I wonder whether you can create the values for multiples_of with some clever divisions. E.g., (1<<(2*n-1))//3 has every other bit set; (1<<(3*n-1))//7 has every 3rd bit set, etc.Lapp
@Marcelo: The fastest time for 1,000,000 in Python? My version, currently (that I've seen). In C/C++? I recall code that ran in 100x the speed, somewhere on ProjectEuler. I'll check.Effieeffigy
scratch that last comment; that scheme quickly get out of hand when the quotient of the division becomes too large (i.e., multi-limb, internally).Lapp
Sorry, I can't get it any faster (I didn't notice the times you posted in your question; sorry). I think bitstring wins because it lets you mutate the one bitmap, whereas the integer arithmetic does a lot of copying.Anaximander
@Marcelo: Yeah, but neither of them are faster than a list of booleans. I'm sad. =/ || Although, my primary issue is that I run out of memory at n < 1 billion.Effieeffigy
@Marcelo: I foudn a C# code here: ideone.com/1GMyI || It claims to run at 190ms and finds the sum of all prime numbers under 1 million. I haven't tested it but I believe it - it's the exact same algorithm as mine.Effieeffigy
Certainly, compiled languages will solve this much faster than Python can. You might have the best of both worlds if there was a bitstring library written in C (and, ideally, supported setting a stripe of bits).Anaximander
@Marcelo: Yeah. But scroll down ad look at Mark's answer. =]Effieeffigy
Q
1

Another really stupid option, but that can be of help if you really need a large list of primes number available very fast. Say, if you need them as a tool to answer project Euler's problems (if the problem is just optimizing your code as a mind game it's irrelevant).

Use any slow solution to generate list and save it to a python source file, says primes.py that would just contain:

primes = [ a list of a million primes numbers here ]

Now to use them you just have to do import primes and you get them with minimal memory footprint at the speed of disk IO. Complexity is number of primes :-)

Even if you used a poorly optimized solution to generate this list, it will only be done once and it does not matter much.

You could probably make it even faster using pickle/unpickle, but you get the idea...

Quicklime answered 27/5, 2010 at 0:36 Comment(0)
T
1

You could use a segmented Sieve of Eratosthenes. The memory used for each segment is reused for the next segment.

Tyner answered 2/2, 2012 at 13:58 Comment(3)
By 'segmented' do you mean a memory block for a certain number range, and once it's exhausted, you create the next number range on the same memory block?Effieeffigy
Yes. Google for 'segmented Sieve of Eratosthenes'.Tyner
s/could/should. :)Faints
D
1

Here is some Python3 code that uses less memory than the bitarray/bitstring solutions posted to date and about 1/8 the memory of Robert William Hanks's primesgen(), while running faster than primesgen() (marginally faster at 1,000,000, using 37KB of memory, 3x faster than primesgen() at 1,000,000,000 using under 34MB). Increasing the chunk size (variable chunk in the code) uses more memory but speeds up the program, up to a limit -- I picked a value so that its contribution to memory is under about 10% of sieve's n//30 bytes. It's not as memory-efficient as Will Ness's infinite generator (see also https://mcmap.net/q/57414/-how-to-implement-an-efficient-infinite-generator-of-prime-numbers-in-python) because it records (and returns at the end, in compressed form) all calculated primes.

This should work correctly as long as the square root calculation comes out accurate (about 2**51 if Python uses 64-bit doubles). However you should not use this program to find primes that large!

(I didn't recalculate the offsets, just took them from code of Robert William Hanks. Thanks Robert!)

import numpy as np
def prime_30_gen(n):
    """ Input n, int-like.  Yields all primes < n as Python ints"""
    wheel = np.array([2,3,5])
    yield from wheel[wheel < n].tolist()
    powers = 1 << np.arange(8, dtype='u1')
    odds = np.array([1, 7, 11, 13, 17, 19, 23, 29], dtype='i8')
    offsets=np.array([[0,6,10,12,16,18,22,28],[6,24,16,12,4,0,22,10],
                      [0,6,20,12,26,18,2,8],  [24,6,4,18,16,0,28,10],
                      [6,24,26,12,14,0,2,20], [0,24,10,18,4,12,28,22],
                      [24,6,14,18,26,0,8,20], [0,24,20,18,14,12,8,2]],
                     dtype='i8')
    offsets = {d:f for d,f in zip(odds, offsets)}
    sieve = 255 * np.ones((n + 28) // 30, dtype='u1')
    if n % 30 > 1: sieve[-1] >>= 8 - odds.searchsorted(n % 30)
    sieve[0] &= ~1 
    for i in range((int((n - 1)**0.5) + 29) // 30):
        for odd in odds[(sieve[i] & powers).nonzero()[0]]:
            k = i * 30 + odd
            yield int(k)
            for clear_bit, off in zip(~powers, offsets[odd]): 
                sieve[(k * (k + off)) // 30 :: k] &= clear_bit
    chunk = min(1 + (n >> 13), 1<<15)
    for i in range(i + 1, len(sieve), chunk):
        a = (sieve[i : i + chunk, np.newaxis] & powers)
        a = np.flatnonzero(a.astype('bool')) + i * 8
        a = (a // 8 * 30 + odds[a & 7]).tolist()
        yield from a
    return sieve

Side note: If you want real speed and have the 2GB of RAM required for the list of primes to 10**9, then you should use pyprimesieve (on https://pypi.python.org/, using primesieve http://primesieve.org/).

Damalis answered 30/10, 2015 at 3:49 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.