Sieve Of Atkin is surprisingly slow

Asked 4/1, 2014 at 6:14 Answered 4/1, 2014 at 21:1

c++performance optimization sieve-algorithm

I recently became very interested in prime numbers and tried making programs to calculate them. I was able to make a sieve of Sundaram program that was able to calculate a million prime numbers in a couple seconds. I believe that's pretty fast, but I wanted better. I went on to try to make a Sieve of Atkin, I slapped together working C++ code in 20 minutes after copying the pseudocode from Wikipedia.

I knew that it wouldn't be perfect because after all, its pseudocode. I was expecting at least better times than my Sundaram Sieve though, but I was so wrong. It's very very slow. I have looked it over many times but I cannot find any significant changes that could be made. When looking at my code remember, I know it's inefficient, I know I used system commands, I know it's all over the place, but this isn't a project or anything important, it's for me.

#include <iostream>
#include <fstream>
#include <time.h>
#include <Windows.h>
#include <vector>

using namespace std;

int main(){

float limit;
float slimit;
long int n;
int counter = 0;
int squarenum;
int starttime;
int endtime;
vector <bool> primes;

ofstream save;
save.open("primes.txt");
save.clear();

cout << "Find all primes up to: " << endl;
cin >> limit;

slimit = sqrt(limit);

primes.resize(limit);

starttime = time(0);

// sets all values to false
for (int i = 0; i < limit; i++){

    primes[i] = false;
}


//puts in possible primes
for (int x = 1; x <= slimit; x++){

    for (int y = 1; y <= slimit; y++){


        n = (4*x*x) + (y*y);
        if (n <= limit && (n%12 == 1 || n%12 == 5)){

            primes[n] = !primes[n];
        }

        n = (3*x*x) + (y*y);
        if (n <= limit && n% 12 == 7){

            primes[n] = !primes[n];
        }

        n = (3*x*x) - (y*y);
        if ( x > y && n <= limit && n%12 == 11){

            primes[n] = !primes[n];
        }
    }
}

//square number mark all multiples not prime

for (float i = 5; i < slimit; i++){

    if (primes[i] == true){

        for (long int k = i*i; k < limit; k = k + (i*i)){

            primes[k] = false;
        }
    }
}

endtime = time(0);
cout << endl << "Calculations complete, saving in text document" << endl;


// loads to document
for (int i = 0 ; i < limit ; i++){

    if (primes[i] == true){


        save << counter << ") " << i << endl;
        counter++;
    }
}

save << "Found in " << endtime - starttime << " seconds" << endl;

save.close();

system("primes.txt");

system ("Pause");
return 0;
}

Balzer answered 4/1, 2014 at 6:14 Comment(8)

also compile options. – Shoer 4/1, 2014 at 6:15

See wikipedia. "This pseudocode is written for clarity. Repeated and wasteful calculations mean that it would run slower than the sieve of Eratosthenes. To improve its efficiency, faster methods must be used to find solutions to the three quadratics". An efficient sieve of Atkin can be found here: cr.yp.to/primegen.html – Permafrost 4/1, 2014 at 6:24

Downloading something to do it for me defeats the purpose, I realize I can just find it online but I don't want that. – Balzer 4/1, 2014 at 6:29

Is there a reason you used float for limit and slimit? – Insufficient 4/1, 2014 at 6:34

You can not square an integer as far as I know, but then again I don't know alot @RetiredNinja – Balzer 4/1, 2014 at 6:37

@Balzer I was suggesting that you read the primegen source code to figure out how to implement this efficiently. It's not easy. See also prior stackoverflow posts on this. – Permafrost 4/1, 2014 at 6:40

Obligatory comment on using a profiler to see where the bottle necks are. – Carl 4/1, 2014 at 17:4

Comparing int and float is sloooooow. Not only is the comparison slow, but lots of optimisations will go out of the window. Using float instead of double also means your algorithm will go badly wrong somewhere about 16 million. – Bibliomancy 29/7, 2014 at 10:50

This isn't exactly an answer (IMO, you've already gotten an answer in the comments), but a quick standard for comparison. A sieve of Eratosthenes should find a million primes in well under a second on a reasonably modern machine.

#include <vector>
#include <iostream>
#include <time.h>

unsigned long primes = 0;

int main() {
    // empirically derived limit to get 1,000,000 primes
    int number = 15485865; 

    clock_t start = clock();
    std::vector<bool> sieve(number,false);
    sieve[0] = sieve[1] = true;

    for(int i = 2; i<number; i++) {
        if(!sieve[i]) {
            ++primes;
            for (int temp = 2*i; temp<number; temp += i)
                sieve[temp] = true;
        }
    }
    clock_t stop = clock();

    std::cout.imbue(std::locale(""));
    std::cout << "Total primes: " << primes << "\n";
    std::cout << "Time: " << double(stop - start) / CLOCKS_PER_SEC << " seconds\n";
    return 0;
}

Running this on my laptop, I get a result of:

Total primes: 1000000
Time: 0.106 seconds

Obviously, speed will vary somewhat with processor, clock speed, etc., but with anything reasonably modern, I'd still expect a time of less than a second. Of course, if you decide to write the primes out to a file, you can expect that to add some time, but even with that I'd expect a total time under a second--with my laptop's relatively slow hard drive, writing out the numbers only gets the total up to about 0.6 seconds.

Natatory answered 4/1, 2014 at 6:50 Comment(10)

Yes, my soe implementation using a bit sieve computes the 1270607 under 20 million in 0.052 seconds. – Permafrost 4/1, 2014 at 6:56

@user3114046: That certainly sounds a lot more respectable than "a couple seconds." – Natatory 4/1, 2014 at 6:59

I spent quite some time on this about a year ago trying to compute large values of pi(x) with my own version of a Meissel-Lehmer sieve. I only got up to something like pi(10^20). The bit sieve uses the hardware popcnt function to rapidly count up the primes, which is actually a lot of the overhead of sieving. I couldn't find my vanilla SOE implementation in C but running one in python gets to 1 million primes in 0.71 seconds. – Permafrost 4/1, 2014 at 7:9

Oh - you will gain a little speed if you start sifting at p^2 rather than 2*p for (int temp = i*i; temp<number; temp += i) – Permafrost 4/1, 2014 at 7:18

When I run that it takes way longer. My laptop is modern, so I do not know where the problem lies. @JerryCoffin – Balzer 4/1, 2014 at 7:29

@user3114046: Yes, I realize there are a lot of optimizations available--I just threw this together as a quick baseline of the minimum you should be able to expect from even the simplest, least-optimized implementation (short of putting real effort into intentionally slowing it down). – Natatory 4/1, 2014 at 7:36

Sure. user2829334 note than you need to sieve only up to sqrt(number), this is the simplest optimization. Note that if you do the i^2 optimization without that, the program will segfault. – Permafrost 4/1, 2014 at 7:51

@user2829334: At a guess, you may have compiled without optimization. If you're compiling at the command line, try adding -O2 and see if it doesn't work better. In an IDE, you'll typically switch to a "release" build. – Natatory 4/1, 2014 at 8:11

@user3114046 my baseline non-segmented odds-only C++ sieve of Eratosthenes takes 0.09 seconds for primes under 20 mln, on a slow-ish Ideone (which is typically 3 times slower than a normal modern box, I think). It counts the primes as it sieves them. -- It makes 1 mln primes in 0.06s. – Truant 4/1, 2014 at 11:26

I've now tested your code on Ideone. It clocks in at 0.24 seconds there to your 0.106, 2.26x slower. This code is really without any optimization, and in that lies its value here, just as you've said, as an observation point. :) (+1). – Truant 4/1, 2014 at 12:13

vector is a bitset. It is expensive to update bitset values that are not in cache. Try vector, it is much cheaper to write to.

Engrossing answered 4/1, 2014 at 21:1 Comment(0)

Recommended topics

Hot tags