Efficient storage of prime numbers
Asked Answered
E

8

23

For a library, I need to store the first primes numbers up to a limit L. This collection must have a O(1) lookup time (to check whether a number is prime or not) and it must be easy, given a number, to find the next prime number (assuming it is smaller than L).

Given that L is fixed, an Eratostene sieve to generate the list is fine. Right now, I use a packed boolean array to store the list, which contains only entries for odd numbers between 3 and L (inclusive). This takes (L-2)/2 bits of memory. I would like to be able to statically increase L without using more memory.

Is there a data structure using less memory with similar properties? Or with at least the constant lookup time? (odd numbers can then be enumerated until we get a prime)

(the language I wrote this in is Factor but this question would be the same in any language which has built-in or easily programmable packed bit arrays)

Ernestinaernestine answered 23/6, 2009 at 12:59 Comment(5)
What's a typical 'L'? Is this for an embedded device where memory is tight? It might affect recommendations. Given that there are 50,847,534 primes under a billion, you might spend more time packing/unpacking then a straight-forward array of 4 byte integers.Britanybritches
And I wouldn't want to need more than the ~320kB of memory I have today.Ernestinaernestine
So you want to store information in the order of 5,000,000 integer numbers into 320,000 bytes...Marin
DanielDaranas: this is what is being done today. Given that only odd numbers primality is stored (and takes one bit of information, true or false), I store information about 2,500,000 odd numbers in less than 313,000 bytes. Why does it surprise you?Ernestinaernestine
@Samuel: It surprises me because that memory constraint is so demanding that it will preclude virtually any sensible effort to optimize the order of the algorithms involved. It's not impossible to work with that memory limit, given the properties of primes; but it's far, far from optimal.Marin
D
25

You can explicitly check more prime numbers to remove redundancy.

At the moment you do this only for two, by checking for divisibility by two explicitly and then storing only for odd numbers whether they are prime.

For 2 and 3 you get remainders 0 to 5, of which only 1 and 5 are not divisible by two or three and can lead to a prime number, so you are down to 1/3.

For 2, 3, and 5 you get 8 numbers out of 30, which is nice to store in a byte.

This is explained in more detail here.

Deceit answered 23/6, 2009 at 13:21 Comment(2)
Indeed, filtering some more was one of the ideas I had. But I had not realized that modulo 30 gave a packing that was so efficient. I will give it a try!Ernestinaernestine
aka wheel factorisation primes.utm.edu/glossary/page.php?sort=WheelFactorization if you don't want to read as long and metaphorical description.Hovis
B
9

An alternative to packed bitmaps and wheels - but equally efficient in certain contexts - is storing the differences between consecutive primes. If you leave out the number 2 as usual then all differences are even. Storing difference/2 you can get up to 2^40ish regions (just before 1999066711391) using byte-sized variables.

The primes up 2^32 require only 194 MByte, compared to 256 MByte for an odds-only packed bitmap. Iterating over delta-stored primes is much faster than for wheeled storage, which includes the modulo-2 wheel known as odds-only bitmap.

For ranges from 1999066711391 onwards, bigger cell size or variable-length storage are needed. The latter can be extremely efficient even if very simple schemes are used (e.g. keep adding until a byte < 255 has been added, like in LZ4-style compression), because of the extremely low frequency of gaps longer than 510/2.

For efficiency's sake it is best to divide the range into sections (pages) and manage them B-Tree style.

Entropy-coding the differences (Huffmann or arithmetic coding) cuts permanent storage requirements to a bit less than half, which is close to the theoretical optimum and better than lists or wheels compressed using the best available packers.

If the data is stored uncompressed then it is still much more compact than files of binary or textual numbers, by an order of magnitude or more. With a B-Tree style index in place it is easy to simply map sections into memory as needed and iterate over them at blazing speed.

Boccioni answered 10/11, 2014 at 17:38 Comment(1)
This does not have a O(1) lookup time.Ernestinaernestine
I
4

At the moment you are treating 2 as special case and then having an array where every odd number is mapped to an element in the array (with some odd numbers being prime). You could improve on this by treating 2 and 3 as special cases recognising that the rest of the prime numbers are in the form 6n+1 or 6n-1 (that is for all primes p where p > 3, p mod 6 = 1 or 5). This can be further generalised - see Wikipedia. For all primes p > 5, p mod 30 = 1, 7, 11, 13, 17, 19, 23 or 29. You could keep going with this and reduce the memory needed at the expense of processing time (although it will still be O(1), just a slower O(1)).

Intervocalic answered 23/6, 2009 at 13:28 Comment(0)
S
3

Maybe a trie data structure which contains only the primes is what you're looking for. Instead of using characters as indexes you could use the integer digits. An implementation of this are Judy-Arrays.

Altough, they do not meet your O(1) requirement, they are extremely memory-efficient for similar keys (as most parts of numbers are) and pretty fast to look up with an O(m) (m=key-length) at maximum.

If you look up for a prime in the pre-generated tree, you can walk the tree until you find it or you are already at the node which is next to the preceeding and following prime.

Shoulders answered 23/6, 2009 at 13:27 Comment(0)
B
0

Given that memory is so cheap, I don't think you can do much better from a speed perspective than your existing scheme.

If there's a better solution, then I'd assume it'd take advantage of the Prime Number Theorem that shows that as L gets bigger, the limit of

π(L) / (L / ln(L)) approaches 1.

Perhaps a better solution would have an adaptive packing solution in a data structure sort of like a skip list.

Britanybritches answered 23/6, 2009 at 13:22 Comment(0)
C
0

How about some kind of hash table?

You would need a very good hash function (something like n mod p, where p is not a multiple of any of the q lowest primes - choose q sufficiently high in order to minimise the number of collisions).

Celsacelsius answered 23/6, 2009 at 13:37 Comment(0)
B
0

How about an Interval Tree? http://www.geeksforgeeks.org/interval-tree/

It may not be O(1) but it's really fast. Like maybe O(log(p(n))) where p(n) is the number of primes till the number n. This way you'll the memory you'll need will be in proportion to the number of primes only, greatly cutting the memory cost.

For example suppose you find a prime at say p1 and then the next one at p2, Insert interval (p1,p2) and so on and when you run a search for any number in that range it will return this interval and you can return p2 which would be the answer in your case.

Bloodstain answered 1/12, 2016 at 12:10 Comment(5)
"Insert interval (p1,p2)" you still have the problem of storing those huge numbers p1 and p2Mcleroy
Okey, missed the comment about limit on L. But even so, there are roughly 325 000 primes under 5 million, you suggestion would thus need atleast 2 (intervall) * 325 000 (interval between primes) * 32 bits (int datatype) = 20 800 000 bits = 650 kb and that's already twice the number of bytes he can afford.Mcleroy
@KavehHadjari Nope you don't have to use 4 bytes to use it.. You could try using some compact boolean array which would use maybe 2 bytes and 5 bits something which could bring it down quite a lot but again wouldn't make his cuts..Bloodstain
@KavehHadjari But yea I guess your right, you'd also require extra memory for the pointers in the interval tree... I'll let the answer remain though for probably someone with a little less stricter requirements.. as my answer would work very well if L was a few billions etc..Bloodstain
You could perhaps make it with a modification of your answer, if you store instead the delta to the interval at each node, then you need less number of bits further down the tree (smaller deltas required further down the tree), thus decreasing space requirement, if you could somehow do that in conjunction with a tree represented by a fixed size array you might have a shot at it.Mcleroy
P
-2

If you can figure out which ones are Mersenne or other easily represented prime numbers, you might be able to save a few bits by using that representation with a flag for applicable numbers.

Also, how about storing the numbers as the difference from the previous number? Then the size shouldn't rise quite as fast (but lookup would be slow). Combining with the approach above, you could store Mersenne primes and the difference from the last Mersenne prime.

Pitchblack answered 23/6, 2009 at 13:13 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.