Find position of prime number
Asked Answered
A

7

14

I need to do the reverse of finding the Nth prime, i.e. Given a prime number, I need to find its position in

2, 3, 5, 7...

The prime number can be large, in the order of 10^7. Also, there are a lot of them.

I have an index of pre-calculated primes which can be binary-searched, but I also have a space limit of 50k! Can sieve be done? Or any another fast way?

EDIT: Thanks a lot for all the brilliant answers, I wasn't expecting them! I hope they are useful to others looking for the same.

Arun answered 2/1, 2013 at 18:7 Comment(14)
First of all, have you figured out how many are there in the possible set of answers? There's a few ways to tackle this problem. Space/speed tradeoffs, compression, and math tricks (like twin primes).Expulsion
When I checked again, there are 664570+ primes that are less than 10^7. Binary search on precomputed prime table is not an option here.Revengeful
Whoa, nice task (+1), but unless you have something radically better idea, it's going to be slooooow.Omnirange
See my answers at #3919468 for ideas about storing a prime table and sieving quickly. You should be able to get <s>one</s> two primes per byte (see en.wikipedia.org/wiki/Prime_gap). Binary search is unnecessary, you can estimate where to start looking in the table based on P(n) ≈ n log n and linear search from there.Bovid
I think sieve'ing wont be very practical here.. you might look at Darrick Henry Lehmer's prime counting method: en.wikipedia.org/wiki/Prime-counting_function he was able to do 10^10 on an IBM 701 back in '59Priestley
@Bovid Ah, good ol' prime number theorem! I love these diamonds of maths.Omnirange
The most efficient boolean table (from sieving) would take at least 250000 bytes (1 bit per number, 0.2 "optimistic" reduction factor from wheel factorization).Revengeful
Woops, edit to above comment was incorrect, only one prime per byte. :PBovid
@Potatoswatter: 1 prime per byte will take 660kB, since there are 664570+ primes below 1e7 (assuming storing the whole table). A very simple wheel-factorization (6) can already reduce the space usage to ~400kB. (It is not optimal for looping through primes, but good for checking whether a number is prime).Revengeful
@Revengeful Yes, but the wheel factorization doesn't help you reverse-index into the table. (My strategy was designed to store all up to 2^32, where the bitset is less efficient.) Anyway, better solutions are already in the answers. More insights might be found on Mathematics.Bovid
What do you mean 'you have an index of pre-calculated primes' - is that on paper, on disk, or in memory?Sidelight
Use the Meissel/Lehmer way. O(sqrt(N)) storage, so with a limit of at most 10^7 you're fine. Sublinear time complexity, so it's pretty fast.Unsatisfactory
Thanks a lot everybody. @ColonelPanic By 50k I actually meant source file limit, it is a sub-problem of an online judge problem!Arun
It's a fun problem. Post a link if you have one.Sidelight
F
10

Your range only goes to ten million, which is small for this kind of thing. I have two suggestions:

1) Create a table of pi(n) at convenient intervals, then use a segmented Sieve of Eratosthenes to count primes between the two table entries that bracket the desired value. The size of the interval determines both the size of the required table and the speed at which results can be computed.

2) Use Legendre's phi(x,a) function and Lehmer's prime-counting formula to calculate the result directly. The phi function requires some storage, I'm not sure exactly how much.

Of the two, I would probably choose the first alternative given your problem size. Implementations of both the segmented Sieve of Eratosthenes and Lehmer's prime-counting function are available at my blog.

EDIT 1:

On reflection, I have a third alternative:

3) Use the logarithmic integral to estimate pi(n). It is monotone increasing, and always larger than pi(n) over the interval you require. But the differences are small, never more than about 200. So you could precompute the differences for all values less than ten million, make a table of the 200 change points, then when requested compute the logarithmic integral and look up the correction factor in the table. Or you could do something similar with Riemann's R function.

The third alternative takes the least amount of space, but I suspect the space required for the first alternative wouldn't be too large, and sieving is probably faster than calculating the logarithmic integral. so I'll stick with my original recommendation. There is an implementation of both the logarithmic integral and the Riemann R function at my blog.

EDIT 2:

That didn't work very well, as the comments indicate. Please ignore my third suggestion.

As penance for my error in suggesting a solution that doesn't work, I wrote the program that uses a table of values of pi(n) and a segmented Sieve of Eratosthenes to calculate values of pi(n) for n < 10000000. I'll use Python, rather than the C requested by the original poster, because Python is simpler and easier to read for pedagogical purposes.

We begin by calculating the sieving primes less than the square root of ten million; these primes will be used both in building the table of values of pi(n) and in performing the sieve that computes the final answer. The square root of ten million is 3162.3. We don't want to use 2 as a sieving prime -- we'll sieve only on odd numbers, and treat 2 as a special case -- but we do want the next prime larger than the square root, so that the list of sieving primes is never exhausted (which would cause an error). So we use this very simple version of the Sieve of Eratosthenes to compute the sieving primes:

def primes(n):
    b, p, ps = [True] * (n+1), 2, []
    for p in xrange(2, n+1):
        if b[p]:
            ps.append(p)
            for i in xrange(p, n+1, p):
                b[i] = False
    return ps

The Sieve of Eratosthenes works in two parts. First, make a list of the numbers less than the target number, starting from 2. Then, repeatedly run through the list, starting with the first uncrossed number, and cross out all multiples of the number from the list. Initially, 2 is the first uncrossed number, so cross out 4, 6, 8, 10, and so on. Then 3 is the next uncrossed number, so cross out 6, 9, 12, 15, and so on. Then 4 was crossed out as a multiple of 2, and the next uncrossed number is 5, so cross out 10, 15, 20, 25, and so on. Continue until all the uncrossed numbers have been considered; the numbers that remain uncrossed are prime. The loop on p considers each number in turn, and if it is uncrossed, the loop on i crosses out the multiples.

The primes function returns a list of 447 primes: 2, 3, 5, 7, 11, 13, ..., 3121, 3137, 3163. We strike 2 from the list and store 446 sieving primes in the global ps variable:

ps = primes(3163)[1:]

The primary function that we will need counts the primes on a range. It uses a sieve that we will store in a global array so that it can be reused instead of being reallocated at each invocation of the count function:

sieve = [True] * 500

The count function uses a segmented Sieve of Eratosthenes to count the primes on a range from lo to hi (lo and hi are both included in the range). The function has four for loops: the first clears the sieve, the last counts the primes, and the other two perform the sieving, in a manner similar to the simple sieve shown above:

def count(lo, hi):
    for i in xrange(500):
        sieve[i] = True
    for p in ps:
        if p*p > hi: break
        q = (lo + p + 1) / -2 % p
        if lo+q+q+1 < p*p: q += p
        for j in xrange(q, 500, p):
            sieve[j] = False
    k = 0
    for i in xrange((hi - lo) // 2):
        if sieve[i]: k += 1
    return k

The heart of the function is the loop for p in ps that performs the sieving, taking each sieving prime p in turn. The loop terminates when the square of the sieving prime is greater than the limit of the range, since all primes will be identified at that point (the reason we needed the next prime larger than the square root is so that there would be a sieving prime to stop the loop). The mysterious variable q is the offset into the sieve of the smallest multiple of p in the range lo to hi (note that it is not the smallest multiple of p in the range, but the index of the offset of the smallest multiple of p in the range, which can be confusing). The if statement increments q when it refers to a number that is a perfect square. Then the loop on j strikes the multiples of p from the sieve.

We use the count function in two ways. The first use builds a table of the values of pi(n) at multiples of 1000; the second use interpolates within the table. We store the table in a global variable piTable:

piTable = [0] * 10000

We choose the parameters 1000 and 10000 based on the original request to keep memory usage within fifty kilobytes. (Yes, I know the original poster relaxed that requirement. But we can honor it anyway.) Ten thousand 32-bit integers will take 40,000 bytes of storage, and sieving over a range of just 1000 from lo to hi will require only 500 bytes of storage and will be very fast. You might want to try other parameters to see how they affect the space and time usage of the program. Building the piTable is done by calling the count function ten thousand times:

for i in xrange(1, 10000):
    piTable[i] = piTable[i-1] + \
        count(1000 * (i-1), 1000 * i)

All of the computation up to this point can be done at compile time instead of run time. When I did these computations at ideone.com, they took about five seconds, but that time doesn't count, because it can be done once for all time when the programmer first writes the code. As a general rule, you should look for opportunities to move code from run time to compile time, to make your programs run very quickly.

The only thing left is to write the function that actually calculates the number of primes less than or equal to n:

def pi(n):
    if type(n) != int and type(n) != long:
        raise TypeError('must be integer')
    if n < 2: return 0
    if n == 2: return 1
    i = n // 1000
    return piTable[i] + count(1000 * i, n+1)

The first if statement does type checking. The second if statement returns a correct response to ridiculous input. The third if statement handles 2 specially; our sieving makes 1 a prime and 2 a composite, both of which are incorrect, so we make the fix here. Then i is calculated as the largest index of piTable less than the requested n, and the return statement adds the value from piTable to the count of primes between the table value and the requested value; the hi limit is n+1, because otherwise in the case that n is prime it wouldn't be counted. As an example, saying:

print pi(6543223)

will cause the number 447519 to be displayed on the terminal.

The pi function is very fast. At ideone.com, a thousand random calls to pi(n) were computed in about half a second, so about half a millisecond each; that includes the time to generate the prime number and sum the result, so the time to actually compute the pi function is even less than half a millisecond. That's a pretty good return on our investment in building the table.

If you're interested in programming with prime numbers, I've done quite a bit of work on my blog. Please come and visit.

Foucquet answered 2/1, 2013 at 18:32 Comment(7)
Re “make a table of the 200 change points” – even though the error is small, it isn't monotonic so can change many more than 200 times for n up to 10^7. Have you counted the number of changes?Ohaus
No, I haven't. Thanks for the correction. Regardless, the number of changes is small, so this would require only a little bit of space. I still think sieving is preferable, however.Foucquet
Between 3 and 1000, the value of floor(Li(x)) - pi(x) changes value 293 times and floor(Li(x) + .5) - pi(x) changes value 260 times. It's not going to work.Bauman
That's far more than I thought. I agree it won't work. Let's stick with the segmented sieve, then.Foucquet
I like #1, it reminds of the giant step-baby step algorithm for solving a different problem. Like that algorithm I'd expect you can get approximately sqrt(N) storage and work. However, there is one wrinkle: how do you find the two table entries that bracket the value? I would think you can use the prime number theorem to make a guess and then make corrections (perhaps just linear probing to find the bracketing entries).Siglos
@GregS: Say you have 10^3 entries in the table at intervals of 10^4. To find the number of primes less than 567890, sieve the interval from 560000 to 570000, count the primes from 560000 to 567890, and add the value of pi(560000) from the table.Foucquet
@GregS: You can do it with a simple division, as user448810 alluded to. (Look at my answer for code.)Bauman
B
4

If you know a priori that the input is prime, you may be able to use the approximation pi(n) ≈ n / log n with a small table of fixups for the primes where rounding the result isn't sufficient to get the correct value n. I think this is your best bet within the size constraint, aside from a slow brute-force approach.

Bosom answered 2/1, 2013 at 18:23 Comment(8)
The size of that table is pretty critical. The prime number theorem mainly works on large numbers; it may not apply very well here but only trying would tell.Bovid
It's been a while since I played with the numerics of the prime number theorem, but I think it's worth a try.Bosom
I think he's looking for an exact number, and there are known good methods for computing this (Derrick Henry Lehmer's method looked the best to me, Legendre also has one that looks like it might be doable in these constraints)Priestley
Obviously he wants the exact number. My point is that you may be able to use an approximation based on inverting the Prime Number Theorem estimate and tweaking the resulting function, then tacking on a table of fixups for the inputs it's wrong on.Bosom
This won't work. The distribution of primes is just too nasty. The prime number theorem tells you that pi(n) is asymptotic to n/log(n). Meaning the multiplicative difference between them tends to zero. You need something that's within an additive constant of pi(n) for the fixup approach to have a hope of working. (And n/log(n) isn't within an additive constant of pi(n).)Bauman
Obviously my approach is not useful for computing pi(n). However OP's problem is computing pi^-1(p), seemingly under the assumption that the input p is prime. It's still unclear to me whether my proposed approach can be made useful for this problem.Bosom
BTW, I believe the other proposals that have been made since mine are much better ideas.Bosom
The problem is that asymptotic estimates to pi are way too far away from pi to be useful for computing pi^(-1). If you round Li(x) or x/log(x) in any consistent way, it will "tick upward" at enough points that are completely wrong. Notice that the derivatives of Li(x) and x/log(x) are both monotone decreasing functions. That simply can't fly when you have lots of small gaps (think twin primes) and large gaps in the sequence of primes scattered all over. The asymptotic approximations really don't give you enough information.Bauman
P
4

I would suggest a heuristic hybrid model would work here. Store every nth prime, and then do a linear search via primality testing. To speed things up, you can use a fast and simple primality test (like the Fermat test with a==2) and precompute the false positives. Some fine-tuning based on the maximum size of the input and your storage constraints should be easy to work out.

Predial answered 2/1, 2013 at 18:32 Comment(6)
There will be around 60 primes between 2 numbers, if it is implemented as you said.Revengeful
And about 800 non-primes that pass the test that are less than 10^8. Computing 2^n (mod n) is O(log n); on a practical level this means testing ~1000 numbers for primality to get the count. This approach is more suited to one-shot testing rather than batch testing, but should still be fast.Predial
This is a good idea. You can make it better and avoid the errors and Carmichael numbers by using a deterministic variant of Miller-Rabin; checking witnesses 2, 7, and 61 works for any input less than a few billion.Bauman
It's a space vs. time tradeoff; the deterministic Fermat test is so fast (and simple to implement) that it's hard to beat, and the cost of storing the 2064 numbers is very small. (also, turns out that 2064 is the right number for less than 10^8; 850 are less than 10^7. If we test 2 & 3, then it drops to 492).Predial
Miller-Rabin is just as fast and you don't need a "backup" table lookup or trial division afterward. There's no reason to use a straight Fermat test when you can use Miller-Rabin instead.Bauman
My mistake -- I assumed that the extra work that the Miller-Rabin algorithm did would make it slower on average, but some experimentation shows that the early termination possibilities speed things up more than enough to compensate.Predial
B
2

Here's some code that works. You should replace the primality test based on trial division with a deterministic Miller-Rabin test that works for your input range. Sieving to find primes in the appropriate small range would work better than trial division, but it's a step in the wrong direction.

#include <stdio.h>
#include <bitset>
using namespace std;

short smallprimes[549]; // about 1100 bytes
char in[19531]; // almost 20k

// Replace me with Miller-Rabin using 2, 7, and 61.
int isprime(int j) {
 if (j<3) return j==2;
 for (int i = 0; i < 549; i++) {
  int p = smallprimes[i];
  if (p*p > j) break;
  if (!(j%p)) return 0;
 }
 return 1;
}

void init() {
 bitset<4000> siv;
 for (int i = 2; i < 64; i++) if (!siv[i])
  for (int j = i+i; j < 4000; j+=i) siv[j] = 1;
 int k = 0;
 for (int i = 3; i < 4000; i+=2) if (!siv[i]) {
  smallprimes[k++] = i;
 }

 for (int a0 = 0; a0 < 10000000; a0 += 512) {
  in[a0/512] = !a0;
  for (int j = a0+1; j < a0+512; j+=2)
   in[a0/512] += isprime(j);
 }
}

int whichprime(int k) {
 if (k==2) return 1;
 int a = k/512;
 int ans = 1 + !a;
 for (int i = 0; i < a; i++) ans += in[i];
 for (int i = a*512+1; i<k; i+=2) ans += isprime(i);
 return ans;
}

int main() {
 int k;
 init();
 while (1 == scanf("%i", &k)) printf("%i\n", whichprime(k));
}
Bauman answered 2/1, 2013 at 19:44 Comment(1)
I like this -- storing a table of the number of primes less than x at a regular interval is much more effective than my approach of storing the primes themselves, because lookup is trivial.Predial
O
1

The following sounds like what you are looking for. http://www.geekviewpoint.com/java/numbers/index_of_prime. There you will find the code and unit tests. Since your list is relatively small (i.e. 10^7), it should handle it.

Basically you find all the prime numbers between 2 and n and then count all the primes less than n to find the index. Also, in case n is not prime, the function returns -1.

Overcharge answered 2/1, 2013 at 20:13 Comment(0)
S
0

What you suggest is best. Pre-compute (or download) the list of primes less than 10^7, then binary search them.

There are only 664579 primes less than 10^7, so the list will consume ~2.7 MB of space. The binary search to solve each instance will be super speedy - only ~20 operations.

Sidelight answered 2/1, 2013 at 18:35 Comment(2)
but he says he has a space limit of 50kShellishellie
OP writes "By 50k I actually meant source file limit"Sidelight
V
0

I did exactly this once. Wrote a code, that given n, can quickly find the nth prime, up to n = 203542528, so about 2e8. Or, it can go backwards, for any number n, can tell how many primes are less than n.

A databased is employed. I store all of the primes up to a certain point (the sqrt of my upper limit.) In your case, this means you would store all of the primes up to sqrt(1e7). There are 446 of them, and you can store that list in compressed form, since the maximum difference up to that point is only 34. Beyond that point, store every kth prime (for some value of k.) Then a quick sieve is sufficient to generate all primes in a short interval.

So in MATLAB, to find the 1e7'th prime:

nthprime(1e7)
ans =
   179424673

Or, it can find the number of primes less than 1e7:

nthprime(1e7,1)
ans =
      664579

The point is, such a database is easy to build and search. If your database can be no more than 50k, there should be no problem.

Vile answered 3/1, 2013 at 0:26 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.