Sieve of Atkin - Explanation and Java example

You may (and probably do) know some of the basic ideas given here about prime numbers, composite numbers and sieves, but they may benefit other readers in understanding the nature of the algorithm. Some of this answer gets dangerously close to belonging on the math equivalent of StackOverflow, but I feel some of it is necessary to make the connection between what the algorithm does and how it does it.

The three modular arithemetic and quadratic pairs in the Wikipedia article on this sieve are derived from the three pairs in the Atkin and Bernstein paper that published the core of this sieve with theorems (and their proofs) and show they collectively form a prime number sieve. Any one alone will yield only prime numbers, but will not yield all prime numbers. It takes all three to yield all prime numbers.

Here is a java program that implements the Wikipedia algorithm. I make no claims about implementation efficiency, just that it is a working direct implementation in java of the Wikipedia article algorithm.

// SieveOfAtkin.java

import java.util.Arrays;

public class SieveOfAtkin {
private static int limit = 1000;
private static boolean[] sieve = new boolean[limit + 1];
private static int limitSqrt = (int)Math.sqrt((double)limit);

public static void main(String[] args) {
    // there may be more efficient data structure
    // arrangements than this (there are!) but
    // this is the algorithm in Wikipedia
    // initialize results array
    Arrays.fill(sieve, false);
    // the sieve works only for integers > 3, so 
    // set these trivially to their proper values
    sieve[0] = false;
    sieve[1] = false;
    sieve[2] = true;
    sieve[3] = true;

    // loop through all possible integer values for x and y
    // up to the square root of the max prime for the sieve
    // we don't need any larger values for x or y since the
    // max value for x or y will be the square root of n
    // in the quadratics
    // the theorem showed that the quadratics will produce all
    // primes that also satisfy their wheel factorizations, so
    // we can produce the value of n from the quadratic first
    // and then filter n through the wheel quadratic 
    // there may be more efficient ways to do this, but this
    // is the design in the Wikipedia article
    // loop through all integers for x and y for calculating
    // the quadratics
    for (int x = 1; x <= limitSqrt; x++) {
        for (int y = 1; y <= limitSqrt; y++) {
            // first quadratic using m = 12 and r in R1 = {r : 1, 5}
            int n = (4 * x * x) + (y * y);
            if (n <= limit && (n % 12 == 1 || n % 12 == 5)) {
                sieve[n] = !sieve[n];
            }
            // second quadratic using m = 12 and r in R2 = {r : 7}
            n = (3 * x * x) + (y * y);
            if (n <= limit && (n % 12 == 7)) {
                sieve[n] = !sieve[n];
            }
            // third quadratic using m = 12 and r in R3 = {r : 11}
            n = (3 * x * x) - (y * y);
            if (x > y && n <= limit && (n % 12 == 11)) {
                sieve[n] = !sieve[n];
            } // end if
            // note that R1 union R2 union R3 is the set R
            // R = {r : 1, 5, 7, 11}
            // which is all values 0 < r < 12 where r is 
            // a relative prime of 12
            // Thus all primes become candidates
        } // end for
    } // end for
    // remove all perfect squares since the quadratic
    // wheel factorization filter removes only some of them
    for (int n = 5; n <= limitSqrt; n++) {
        if (sieve[n]) {
            int x = n * n;
            for (int i = x; i <= limit; i += x) {
                sieve[i] = false;
            } // end for
        } // end if
    } // end for
    // put the results to the System.out device
    // in 10x10 blocks
    for (int i = 0, j = 0; i <= limit; i++) {
        if (sieve[i]) {
            System.out.printf("%,8d", i);
            if (++j % 10 == 0) {
                System.out.println();
            } // end if
            if (j % 100 == 0) {
                System.out.println();
            } // end if
        } // end if
    } // end for
} // end main
} // end class SieveOfAtkin

I have a copy of Atkin's (co-authored with Bernstein) original paper in which he describes the theorems from which the sieve is constructed. That paper is available here: http://www.ams.org/mcom/2004-73-246/S0025-5718-03-01501-1/S0025-5718-03-01501-1.pdf. It's dense reading for non-mathematicians and has all the conciseness typical of an American Mathematical Society paper.

What follows here is a deeper explanation of how the algorithm is derived from the description and the paper from Atkin and Bernstein.

Atkin and Bernstein (justifiably) assume their readers thoroughly understand prime number sieves, modular arithmetic and wheel factorization using modular arithmetic. Unfortunately, the Wikipedia article description and algorithm assume similar things, although to slightly lesser degree. Atkin and Bernstein make no claim that their three pairs of wheel factorizations and irreducible quadratics are the only ones that could be used and give passing examples of other pairs that could be used with no further comment on how. Hence, the three for which Atkin and Bernstein give theorems and proofs are the three used in algorithms based on their work. Atkin and Bernstein also make no claim that their three pairs are optimal ones. They are, apparently, convenient ones.

The funky math symbols really useful for this kind of discussion in a concise way are not available here. For the purposes of this answer, I will use

{ some enumerated set or property that defines one }

to represent a set

Nat0

to represent the set of natural numbers including zero, i.e., Nat0 = {0, 1, 2, ...},

Nat

to represent the set of natural numbers not including zero, i.e. , Nat = {1, 2, 3, ...} and the following construct for defining a set and and a symbol for an element of it:

{symbol for element of a set : criteria that defines the set in terms of the symbol}

#

to represent set cardinality, i.e. the number of elements in a set

^

to represent exponentiation, i.e. x squared written as x^2

The modular arithmetic used in the wheel factorizations in descriptions, theorems and algorithms appears in two equivalent forms:

n = (k * m) + r for k in Nat0 and r in R = {r : r in Nat0 and r < m}

n mod m = r where r in R = {r : r in Nat0 and r < m}

Here are the definitions given in the theorems in their paper along with some notes about the modular arithmetic forms:

n is always prime where #{(x, y) : n = (4 * x^2 )+ (y^2), n in {n : (Nat0 * 4) + 1}, where x and y >= 1 and n has no perfect square factor} is odd. That is, if and only if there are an odd number of (x, y) pairs that solve the quadratic n = (4 * x^2) + (y^2) where x and y integers >= 1, n mod 4 = 1 and n has no perfect square factors, then n is prime. Note here that the form n mod m = r where r is in R has m = 4 and R = {r : 1}.
n is always prime where #{(x, y) : n = (3 * x^2) + (y^2), n in {n : (Nat0 * 6) + 1}, where x and y >= 1 and n has no perfect square factor} is odd. That is, if and only if there are an odd number of (x, y) pairs that solve the quadratic n = (3 * x^2) + (y^2) where x and y integers >= 1, n mod 6 = 1 and n has no perfect square factors, then n is prime. Note here that the form n mod m = r where r is in set R has m = 6 and R = {r : 1}.
n is always prime where #{(x, y) : n = (3 * x^2) - (y^2), {n : (Nat0 * 12) + 11}, x > y >= 1 and n has no perfect square factor} is odd. That is, if and only if there are an odd number of (x, y) pairs that solve the quadratic n = (3 * x^2) - (y^2) where x , y integers where x > y >= 1, n mod 12 = 11 and n has no perfect square factors, then n is prime. Note here that the form n mod m = r where r is in set R has m = 12 and R = {r : 11}.

A property of wheel factorizations that the paper and the Wikipedia article assume the reader knows well is that modular arithmetic can be used to selectively choose only integers that don't have certain prime factors. In the form

n mod m = r, r in R = {r : Nat0, r < m},

if one chooses only elements of R that are relatively prime to m, then all integers n that satisfy the expression will either be a prime number or relatively prime to m.

m relatively prime to n means they have no common integer divisor > 1. Examples of relatively prime numbers are: 2 is relatively prime to 3, 4 is relatively prime to 9, 9 is relatively prime to 14. Understanding this is essential to understanding the remainders (residues) used in the modular arithmetic and how they are equivalent in the various versions of the explanations and algorithms.

The following will now explain how the theorems, algorithms and explanations all relate.

For the first quadratic, n = (4 * x^2) + (y^2):

The theorem uses the form:

n = (k * 4) + r where r in R1 = {r : 1} and k in Nat0

which is the same as writing

n mod 4 = r where r in R1 = {r : 1}

Note that it defines n as having to be every other odd number in Nat0 starting from 1, i.e. {1, 5, 9, 13, ...}.

For algorithms, various choices for m can be made and, with the right set R, the properties shown by the theorem preserved. The authors of the paper and Wikipedia article assume the readers already know all of this and will immediately recognize it. For the other values of m used in the paper and Wikipedia article, the equivalents are:

n mod 12 = r where r in R1a = {r : 1, 5, 9}

n mod 60 = r where r in R1b = {r : 1, 5, 9, 13, 17, 21, 25, 29, 33, 37, 41, 45, 49, 53, 57}

Certain elements in the sets R1a and R1b can get removed for two reasons explained later and the theorem will still apply.

For the second quadratic, n = (3 * x^2) + (y^2):

The theorem uses the form:

n = (k * 6) + r where r in R2 = {r : 1} and k in Nat0

again this is the same as

n mod 6 = r where r in R2 = {r : 1}

Note here that this is every third odd number in Nat0 starting from 1, i.e. {1, 7, 13, 19, ...}

Equivalents in the paper and article are:

n mod 12 = r where r in R2a = {r : 1, 7}

n mod 60 = r where r in R2b = {r : 1, 7, 13, 19, 25, 31, 37, 43, 49, 55}

Again, values in a sets R2a and R2b can get removed for two reasons explained later and the theorem will still apply.

For the third quadratic, (3 * x^2) - (y^2):

The theorem uses the form:

n = k * 12 + r where k in Nat0 and r in R3a = {r : 11}

Again, this is the same as:

n mod 12 = r where r in R3a = {r : 11}

Note here that this is every sixth odd number in Nat0 starting from 11, i.e. {11, 23, 35, 47, ...}

Equivalents n the paper and article are:

n mod 60 = r where r in R3b = {r : 11, 23, 35, 47, 59}

Again, a value in set R3b can be removed for a reason explained later and the theorem will still apply.

In the various algorithms and explanations, for values of m = 12 and m = 60, elements of sets R get removed without affecting the validity of the theorem or the algorithms. There are two reasons that some values in sets R can get discarded.

The first reason is that any value of r in a set R that is not relatively prime to the m with which it is paired will serve only to incude values for n that are composite integers with one or more prime factors of m, none of which will be prime numbers. This feature of modular arithmetic is why wheel factorizations are used to filter out large quantities of non-prime numbers from further tests, usually more complex and less efficient ones, for whether they are prime. In this sieve, the more complex test is whether the number of solutions to a specific irreducible quadratic is an odd number. That means we can immediately discard all values in the set R for this algorithm that is not releatively prime to the value of m being used with that set R.

The second reason is that in the paper, the wheel factorizations create sets of integers that overlap, including overlapping primes. While they were convenient and the overlap didn't matter for the theorems, in an algorithm it is wasteful if it can be easily avoided. In this case, it is trivially avoided. Also, if the set of integers from the wheel factorizations overlap, then an odd number of solutions in one quadratic plus an odd number of solutions in another quadratic will become a cummulative even number of solutions (an odd number plus an odd number is ALWAYS an even number). In many implementations, including the Wikipedia implementation, this will identify a prime number as not being prime since implementations like the Wikipedia one detrmine primality from cummulative solutions for all the quadratics and don't segregate solutions from each quadratic. In those cases it is imperative that the integers from the wheel factorizations be exclusive subsets of integers.

An implementation does not want to test the same number in more than one quadratic if that is not necessary, and it isn't. A value for r in a set R used in the three quadratics, assuming the same m is being used, need be in only one of them. If it is in more than one, then the same value for n will show up more than once and get tested with more than one quadratic. For the same value of m in use, ensuring that the same element of R doesn't show up in the R for more than one quadratic will eliminate the overlap. In the case of the Wikipedia article, the overlap prevented by the wheel factorization prevents erroneous results that would occur with cummulative quadritic solutions that, in a single quadratic are odd, but in two quadratics, accumulate to an even number.

A different algorithm might avoid overlap before calculating the quadratics. In empirical tests of the quadratics, and wheel factorizations, the wheel factorizations where m = 12 yield significantly fewer values for n than than will solve the quadratics. Using wheel factorizations where m = 60 increases the difference significantly. If a quadratic solution algorithm for specific values of n were highly efficient, then there could be a significant improvement by using only values that come from the wheel factorizations for testing the quadratics.

Here are the wheel factorizations after removing elements that are not relatively prime. For the first quadratic:

n mod 12 = r where r in R1a = {1 : 1, 5} (9 has divisor 3 common with 12)

n mod 60 = r where r in R1b = { r : 1, 13, 17, 29, 37, 41, 49, 53} (5, 25 and 45 have divisor 5 common with 60; 9, 21, 33, 45 and 57 have divisor 3 common with 60) and this is the form in the algorithm in the Atkin and Bernstein paper.

For the second quadratic:

n mod 12 = r where r in R2a = {1, 7} (no element of R has a divisor common with 12}

n mod 60 = r where r in R2b = {r : 1, 7, 13, 19, 31, 37, 43, 49} (25 and 55 have divisor 5 common with 60) and this is the form in the algorithm in the Atkin and Bernstein paper.

For the third quadratic:

n mod 12 = r where r in R3a = {r : 11} ( no element of R has a divisor common with 12}

n mod 60 = 4 where r in R3b = {r : 11, 23, 47, 59} (35 has divisor 5 common with 60) and this is the form in the algorithm in the Atkin and Bernstein paper.

Notice that some of the same elements show up in the sets R1a and R2a for the first and second quadratics. The same is true for sets R1b and R2b. When m is 12, the set of common elements is {1}; when m is 60 the set of common elements is {1, 13, 37, 49}. Ensuring that an element of R gets included for only one quadratic creates the following forms that you should now recognize from the Wikipedia article.

For the first quadratic:

n mod 12 = r where r in R1a = {r : 1, 5} (no duplicates removed) (this is the form shown in the Wikipedia algorithm)

n mod 60 = r where r in R1b = {r : 1, 13, 17, 29, 37, 41, 49, 53} (no duplicates removed) (this is the form shown in the Wikipedia explanation)

For the second quadratic:

n mod 12 = r where r in R2a = {r : 7} (element 1 removed since it is already in R1a) (this is the form shown in the Wikipedia algorithm)

n mod 60 = r where r in R2b = {r : 7, 19, 31, 43} (elements 1, 13, 37 and 49 removed since they are already in R1b) (this is the form shown in the Wikipedia explanation)

For the third quadratic:

n mod 12 = r where r in R3a = {r: 11} (no duplicates)

n mod 60 = r where r in R3b = {r: 11, 23, 47, 59} (no duplicates)

One remaining question might be made as to why values of m range over 4, 6, 12 and 60. This has to do with how many composite (i.e. non-prime) numbers one wants to exclude from more complex testing for being prime using the quadratics versus the complexity of the wheel factorization used.

The value for m used can determine which composites can get immediately eliminated without eliminating primes. If m = 4 and R1 = {r : 1}, as in the theorem for the first quadratic, all numbers with prime factors of 2 get eliminated because 1 is relatively prime to all numbers and 4 has prime factors of 2. It is important to note that because 3 is not in this set R, a wheel factorization using m = 4 and set R1 will also exclude a large number of primes, perhaps half of them.

If m = 6 and R2 = {r : 1} as in the theorem for the second quadratic, all composite numbers with prime factors of 2 or 3 get eliminated because 1 is relatively prime to all numbers and 6 has prime factors of 2 and 3. Again, with m = 6 and set R2, which doesn't contain 5, a large number of primes, perhaps half of them, will get excluded.

If m = 12 and R3 = {r : 11} as in the theorem for the third quadratic, all composite mumbers with prime factors of 2 or 3 get eliminated because 11 is relatively prime to 12 and 12 has prime factors of 2 and 3. Again, with m = 12 and set R3, which doesn't contain 1, 5 or 7, a large number of primes, perhaps well more than half of them, will get excluded.

One of the things that Atkin and Bernstein informally show in their paper is that, although the wheel factors in the theorems individually exclude primes from their respective quadratics, collectively, the three wheel factorizations permit all primes and, in the case of the wheel factorizations in the first and second quadratics, permit significant overlap. Although they don't remove the overlap in their algorithms where m = 60, the Wikipedia article does where they set m = 12 in the article algorithm and m = 60 in the article explanation.

For the quadratics Atkin and Bernstein used in their theorems, weakening the wheel factorizations that go with them will invalidate that they will operate according to the theorems. However, we can strengthen them in ways that remove only more composites but still keeping the exact same primes. For the forms where m = 4, (4 = 2 * 2) every even integer gets filtered. For the forms where m = 12 (12 = 2 * 2 * 3), every integer with prime factors of 2 or 3 gets filtered. For the forms where m = 60 (60 = 2 * 2 * 3 * 5), every integer with prime factors of 2, 3 or 5 gets filtered. We could use potentially use filters with m = 6 for the same effect as m = 12 and m = 30 for the same effect as m = 60, but we would have to exercise care that what we create is equivalent to the ones used in the theorems.

Here are some useful statistics about wheel factorization. 50% of the integers in Nat0 are even and, other than 2, not prime. 33% of the integers in Nat0 have 3 as a prime factor and are not prime. 20% of the integers in Nat0 have 5 as a prime factor and are not prime. Collectively, 67 % of the integers in Nat0 have prime factors of 2 or 3 and are not prime. Collectively, about 75% of the integers in Nat0 have prime factors of 2, 3 or 5 and are not prime. A simple method for eliminating 1/2, 2/3 or 3/4 of the integers in Nat0 from more complex testing for being prime is very enticing and is the motive for using wheel factorization as a preliminary filter in prime number sieves. It is also a motivation for using values of m with an accompanying set R that can filter all composites wtih prime factors representing large quantities of composites.

As an absolute minimum, one would want to remove composites with a prime factor of 2 (i.e. all even numbers) and then add 2 back at the end. One would at least like to remove composites with prime factors of 2 or 3. Preferably, one would like to remove composites with prime factors of 2, 3 or 5. Past that, the statistics show diminishing returns. m = 4 with R1 accomplishes the bare minimum. m = 12 with R1a, R2a and R3a accomplishes the least one would like. m = 60 with R1b, R2b and R3b accomplish a very desirable result.

There are some more things to consider in working with values for m and sets R. Take note that the two forms are NOT equivalent for the first quadratic:

n mod 12 = r where r in R1a = {r : 1, 5}

and

n mod 6 = r where r in R = {r : 1, 5}

because the form where m = 6 is not equivalent to

n mod 4 = r where r in R1 = {r : 1}

Note that:

n mod 6 = r where r in R = {r : 1}

is equivalent to

n mod 12 = r where r in R = {r : 1, 7}

and the form where m = 6 could be used with the second quadratic. It is, in fact, the form used with the second quadratic in the theorem. If we were to use it instead of the examples already considered, we could remove the element 1 from the set R for first quadratic when its m = 12 to remove overlap.

In adjusting the algorithm, due diligence must be used to maintain the conditions that the theorems require. When choosing values for m and sets for R, one must ensure that the form is an equivalent one that does not introduce any new values for n to the quadratic that were not produced by the wheel factorization in the theorem.

For implementations, choices get made based on complexities and efficiencies involving the data structures, arithmetic, processor capabilities (especially with regard to multiplication and division), available processor cache, memory and the size of data structures. There are tradeoffs betweeen the values for m and the number of remainders (residues) in sets R that have to be checked. Some of these may be reasons for seeing m = 60 in the explanation and m = 12 in the algorithm. They are defninitely the reasons Atkin and Bernstein used forms with m = 60 in the algorithms in their paper.

In the Atkin and Bernstein paper, they also provide algorithms for finding solutions to the quadratics for specific values of n using lattices. These additional algorithms allowed Atkin and Bernstein to create sieve algorithms that filtered simultaneously on the quadratics and the wheel factorizations. None of the algorithms for lattice derived solutions to the quadratics with the wheel factorizations were considered in the Wikipedia article's algorithm. In the Wikipedia article, an exhaustive x, y value technique is used with the quadratics and the wheel factorizations are applied after the quadratics return values for n. Again, this is an efficiency issue that has to be decided by implementers.

Recommended topics

Hot tags