Better algorithm - Next semiprime

Asked 26/2, 2017 at 19:11 Answered 10/3, 2017 at 22:33

algorithm optimization primes sieve-of-eratosthenes primality-test

Given n, find m such that m is the smallest semiprime greater than n.

Next prime is pretty straightforward, semiprime is less so. To be clear, only the semiprime is needed, though getting the factors at the same time would be convenient.

I've thought of a few approaches but I'm sure there are better ones.

Arithmetic operations are assumed to be O(1). I used Sieve of Eratosthenes, which is O(n log log n), I am aware of the Sieve of Atkin but I like my semi-optimized Eratosthenes.

1. Counting up from n

Count up from n, stop when you find a semiprime.

This seems really dumb but if there's an O(log n) semiprime test or O(1) test given primes below, this may be faster than the other 2.

Semiprime distribution appears to be much higher than prime distribution, so with a good semiprime test this might actually be better than O(n).

2. Primes counting down

Define prev(x) and next(x) and giving the previous and next primes respectively, which can be O(log n) if the primes are stored in a tree or with list binary search.

First do sieve.

Start with p=prev(sqrt(n)) and q=next(n/p). While pq<=n, go to the next q. If pq is less than the minimum so far, record it as the new minimum. Move on to the previous p until you run out of p to test.

This is guaranteed to find the correct answer but it's rather slow. Still O(n log n) though, so maybe acceptable.

3. Primes counting up

Start with sieve as usual. Create a hash set view of the sieve for O(1) primality tests.

Begin with p=2. Iterate through the primes up to sqrt(n). For each p, get q=(((n/p+1)/2)*2)+1=(((n/p+1)>>1)<<1)|1. While pq is less than the minimum so far and q is not prime, add 2 to q. If pq is still less than the minimum, record it as the new minimum.

I implemented #1 and #3 in Java, both using the same Sieve of Eratosthenes implementation. Most of the running time is spent sieving, so if there's optimizations to be done it's in the sieve. After some optimizations, counting up (#1) beat primes counting up (#3), being twice as fast in the last and largest test (11 decimal digit n).

There is still hope though, because how far the sieve needs to be extended depends on the largest number to prime test. If a semiprime test with a lower prime testing bound exists, the counting method may prove to be even faster.

Surely there is a better algorithm? Or at least a better way to implement this one?

Dufour answered 26/2, 2017 at 19:11 Comment(5)

Not sure whether this is helpful, but it might spark something: do a Sieve of Eratosthenes, but instead of crossing out any multiple of a prime, record the prime you're currently sieving in a factorization list for that number. Entries with 0 elements in their "factorization list" are prime; 1 or 2 elements are semiprimes unless the multiplicity of one of the elements is too high (only need one division to check this); 3 and above are definitely not semiprimes (hence you never need to store more than three factors in the list). – Bordello 26/2, 2017 at 19:43

@rici It's simpler, but it's not as obvious to me how to check semiprimeness afterwards. e.g. p1^3*p2^3 has only two prime factors (which is what we are "counting" with the list), but checking whether the factors each have multiplicity 1 amounts to factoring the number -- a problem that's generally accepted to be hard. But if you store the primes contained in the factorization it's much easier. – Bordello 26/2, 2017 at 19:54

How big can n be? – Maurya 26/2, 2017 at 21:38

The sieve of Eratosthenes can be optimized quite a lot. It might help to post your implementation, the execution times you're getting and the execution times you desire, if you have identified that as your bottleneck. – Maurya 26/2, 2017 at 21:44

@DanielWagner: OK; I wrote out what I'd previously put in a comment. It really isn't very complicated, but I don't think my verbal description was adequate. See answer. – Anabiosis 26/2, 2017 at 22:42

People are answering slightly different questions, so let me break this into sections.

is_semiprime(n)

Given a value n, is it a semi-prime. For tiny inputs, of course we could precalculate and return answers in O(1) or search. At some point we get overwhelmed by storage requirements. To my knowledge there is no very efficient solution to this problem. It is similar to primality or square-free testing in that we can quickly weed out most random inputs with simple divisibility testing. Assume we have a fast primality test that includes some pre-tests and most of the work is just looking for a small factor then returning whether the remainder is prime. For numbers without small factors, we can either do some factoring (e.g. Brent/Pollard Rho) or trial division up to n^(1/3).

On my Macbook, this takes about 0.4 microseconds per number for the range 1e8 to 1e7+1e7, and under 2 microseconds per number for the range 1e16 to 1e16+1e7.

For large semi-primes or near-semiprimes I'm not sure there is a better solution than finding a single factor. We require trial division to only N^(1/3) but there are more efficient standard factoring algorithms. Some implementations include Charles Greathouse, mine, and many at RosettaCode.

next_semiprime(n)

At 1e16, the average distance to the next semi-prime is under 10 and rarely above 100. As before, if you want to do pre-calculations, use memory, and can ignore or amortize setup time, this can be answered quickly. But again once past small inputs this gets very cumbersome.

I don't think you can significantly improve on just a simple while (1) { n++; if (is_semiprime(n)) return n; } assuming a good is_semiprime routine. Doing a full sieve comes out much slower for me, but your mileage may vary. It really won't perform once you cross ~25 digit inputs. You can optimize slightly by doing a partial sieve with prime powers incrementing a factor count, which means we only need to run the full test on results that aren't obviously semi-prime. There isn't much time savings for me, which makes sense since we're only cutting out a few native modulos. If we're looking at 1000-digit inputs then I think the partial sieve makes a lot of sense.

On my Macbook, next_semiprime using the trivial is_semiprime method called successively 1e6 times starting at 1e8 takes about 2 microseconds per call, and 17 microseconds per call when starting at 1e16.

semiprimes(low, high)

Some of the answers seem to be thinking of this question. Especially when low is <= 4, then a sieve is the right answer. There are fast sieve methods for totients and moebius ranges, I expect you could adapt one to full factor counts.

Note: A well-written SoE is faster than SoA, so don't get distracted by people recommending the Sieve of Atkin, as they've probably just read the first few paragraphs of the Wikipedia page. Of course details of implementations of the sieve, primality test, and pre-tests will make a difference in the conclusions. As will the expected input sizes, patterns, and toleration for caching data.

Trollop answered 10/3, 2017 at 22:33 Comment(0)

You could precalculate all semi primes and then use binary search. There are 80k primes under million so 3 billion semiprimes under 10^12. It might not take much time.

Eraser answered 26/2, 2017 at 19:33 Comment(4)

There's more semiprimes than that, because one of the prime factors can be more than 10^6. – Honghonied 26/2, 2017 at 19:41

I'm not OP so they aren't my constraints. Under 10^12 there are 131 billion semiprimes according to oeis.org/A066265. – Honghonied 26/2, 2017 at 19:48

There are 131126017178 semiprimes < 10^12. See OEIS A066265 – Anabiosis 26/2, 2017 at 19:51

Precalculating all semiprimes would take a really long time (and tons of memory), and would require a ridiculous amount of queries to be better. Even for 1000 queries and n<10^12, the method I proposed will be much faster, since you only sieve once and the sieve is by far the slowest part of the process. Cache/memoize if there are more queries for speed, realistically the query count won't get that high. – Dufour 26/2, 2017 at 21:35

Here is some code based on my comment above: we'll run a Sieve of Eratosthenes, but store some extra data than just "prime or not" on the side while we do it. It's in Haskell, which I recognize isn't the most common language, so I'll comment inline about what each bit does. First some library imports:

import Control.Monad
import Control.Monad.ST
import Data.Array.ST
import Data.Maybe

We'll define a new type, Primality, which we'll use for storing up to two prime factors of each number.

data Primality
    = Prime
    | OneFactor Integer
    | TwoFactor Integer Integer
    | ManyFactor
    deriving (Eq, Ord, Read, Show)

This says there are four kinds of values of type Primality: either it is the value Prime, a value of the form OneFactor n for some unbounded integer n, a value of the form TwoFactor n n' for two unbounded integers n and n', or the value ManyFactor. So this is a bit like a list of Integers that's at most two integers long (or else a note saying that it was three integers long or longer). We can add factors to such a list like this:

addFactor :: Integer -> Primality -> Primality
addFactor f Prime = OneFactor f
addFactor f (OneFactor f') = TwoFactor f f'
addFactor _ _ = ManyFactor

Given the list of prime factors of a number, it's easy to check whether it's semiprime: it must have at most two smaller prime factors whose product is the number itself.

isSemiprime :: Integer -> Primality -> Bool
isSemiprime n (OneFactor f   ) = f * f  == n
isSemiprime n (TwoFactor f f') = f * f' == n
isSemiprime _ _ = False

Now we'll write the Sieve. By Bertrand's postulate, for any n, there is a prime between n/2 and n; which means there is a semiprime between n and 2n (namely twice whatever the prime the postulate gives us is). What's more, any such semiprime cannot have a factor bigger than n (since then the other factor would have to be smaller than 2!). So we'll sieve the numbers up to 2n by factors up to n, then check the numbers between n and 2n for semiprimes. Because this latter check is O(1), we fall in the first case you proposed. So:

nextSemiprime :: Integer -> Integer
nextSemiprime n = runST $ do

Make an array with indices between 2 and 2n, initialized to Prime at each position.

    arr <- newSTArray (2,2*n) Prime

For each potential prime p between 2 and n...

    forM_ [2..n] $ \p -> do

...that we currently believe is Prime...

        primality <- readArray arr p
        when (primality == Prime) $

...add p to the factor list for each multiple of p.

            forM_ [2*p,3*p..2*n] $ \i ->
                modifyArray arr i (addFactor p)

Now do a linear search on the remaining numbers between n+1 and 2n for a semiprime. Each call to isSemiprime costs one multiplication, so they're O(1). Technically the search can fail; the fromJust <$> annotation tells the compiler that we promise it will not fail (because we've done some offline math proofs that are too complicated to bother transmitting to the compiler).

    fromJust <$> findM (\i -> isSemiprime i <$> readArray arr i) [n+1..2*n]

That's the entire body of nextSemiprime. It uses a few helper functions that really ought to be in the standard libraries somewhere. The first is the linear search algorithm; it just walks down a list looking for an element that satisfies a predicate.

findM :: Monad m => (a -> m Bool) -> [a] -> m (Maybe a)
findM f [] = return Nothing
findM f (x:xs) = do
    done <- f x
    if done then return (Just x) else findM f xs

The modifyArray function just reads an array element and writes back a modified version. Think arr[ix] = f(arr[ix]); in C.

modifyArray :: (MArray a e m, Ix i) => a i e -> i -> (e -> e) -> m ()
modifyArray arr ix f = readArray arr ix >>= writeArray arr ix . f

And newSTArray is needed because of a vagary of Haskell's treatment of arrays: all array operations are polymorphic over the kind of array you're using, which is simultaneously handy and annoying. This tells the compiler which kind of array we want for this program.

newSTArray :: Ix i => (i,i) -> e -> ST s (STArray s i e)
newSTArray = newArray

You can try it out here, which includes a simple main for printing out the first 100 semiprimes. (Though this latter task can be done in much, much more efficient ways if that's the goal!)

Although the current algorithm just returns the next semiprime, it's easy to modify it to return the factorization of the next semiprime: just return the associated Primality value rather than the Integer itself.

Bordello answered 26/2, 2017 at 21:58 Comment(1)

And yes, I know multiplication is not technically O(1). – Bordello 26/2, 2017 at 22:5

Following up on a comment (now deleted) made to a suggestion from @DanielWagner, here is a non-optimized semiprime sieve which uses two bits per entry in order to keep a count of factors.

The sieve entry contains the number of factors discovered, limited to 3. A marking pass does a saturating increment of the relevant sieve entries.

Because we also care about two equal factors, we also sieve squares of primes. Powers of primes can be identified during the sieve because their factor count will be 1 (primes have count 0; semiprimes 2 and other integers 3). When we mark the square of a prime (which will be the first power of the prime encountered), we could do a saturating add of two to each entry, but as a micro-optimization the code simply sets the count directly to 3.

On the assumption that the sieve does not contain entries for even numbers (as is usual), we special case the semiprime 4 and all semiprimes whose factors are 2 and an odd prime.

The code below is in (pseudo-)C++, and only shows how to do the sieving operation. Some details, including the definition of saturating_increment and the other sieve access functions, have been omitted because they are obvious and only distract.

/* Call `handler` with every semiprime less than `n`, in order */ 
void semi_sieve(uint32_t n, void(*handler)(uint32_t semi)) {
  Sieve sieve(n);
  if (n > 4) handler(4); /* Special case */
  for (uint32_p p = 3; p < n; p += 2) {
    switch (sieve.get(p)) {
      case 0: /* A prime */
        for (uint32_t j = p + p + p; j < n; j += p + p)
          sieve.saturating_increment(j);
        break;
      case 1: /* The square of a prime */
        handler(p);
        for (uint32_t j = p + p + p; j < n; j += p + p)
          sieve.set(j, 3);
        break;
      case 2: /* A semiprime */
        handler(p);
        break;
      case 3: /* Composite non-semiprime */
        break;
      default: /* Logic error */
    }
    /* If the next number might be twice an odd prime, check the sieve */
    if (p + 1 < n && p % 4 == 1 && 0 == sieve.get((p + 1)/2))
      handler(p + 1);
  }
}

Note: I'm aware that the above sweeps using primes from the entire range and not just up to the square root. That must come at some cost, but I would think that it is only a change in the constant. It is possible to terminate the scan early, gaining back a bit of the constant.

Anabiosis answered 26/2, 2017 at 22:39 Comment(1)

Great! I think you should also call handler in case 1. – Bordello 27/2, 2017 at 0:51

Meir-Simchah Panzer suggests on https://oeis.org/A001358 that "an equivalent definition of this sequence is... [the] smallest composite number which is not divided by any smaller composite number."

Here's an illustration of calculating the next semiprime after 100, relying on this idea:

Mark numbers greater than n that are not semiprime and stop when
you've skipped one that's not prime.

 2 * 51 = 102, marked
 3 * 34 = 102, marked
 5 * 21 = 105, marked
 7 * 15 = 105, marked
11 * 10 = 110, marked
13 *  8 = 104, marked
17 *  6 = 102, marked

101,103,107,109 are prime and we skipped 106 and 108
The only two primes that could cover those in our
next rounds are 2 and 3:

2 * 52 = 104
3 * 35 = 105

Third round:
2 * 54 = 108
3 * 36 = 108

We skipped 106

Voluptuous answered 27/2, 2017 at 5:33 Comment(0)

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

1. Counting up from n

2. Primes counting down

3. Primes counting up

Recommended topics

Hot tags