python prime factorization performance
Asked Answered
E

1

13

I'm relatively new to python and I'm confused about the performance of two relatively simple blocks of code. The first function generates a prime factorization of a number n given a list of primes. The second generates a list of all factors of n. I would have though prime_factor would be faster than factors (for the same n), but this is not the case. I'm not looking for better algorithms, but rather I would like to understand why prime_factor is so much slower than factors.

def prime_factor(n, primes):
  prime_factors = []
  i = 0
  while n != 1:
      if n % primes[i] == 0:
          factor = primes[i]
          prime_factors.append(factor)
          n = n // factor
      else: i += 1
  return prime_factors

import math
def factors(n):
  if n == 0: return []
  factors = {1, n}
  for i in range(2, math.floor(n ** (1/2)) + 1):
      if n % i == 0:
          factors.add(i)
          factors.add(n // i)
  return list(factors)

Using the timeit module,

{ i:factors(i) for i in range(1, 10000) } takes 2.5 seconds

{ i:prime_factor(i, primes) for i in range(1, 10000) } takes 17 seconds

This is surprising to me. factors checks every number from 1 to sqrt(n), while prime_factor only checks primes. I would appreciate any help in understanding the performance characteristics of these two functions.

Thanks

Edit: (response to roliu) Here is my code to generate a list of primes from 2 to up_to:

def primes_up_to(up_to):
  marked = [0] * up_to
  value = 3
  s = 2
  primes = [2]
  while value < up_to:
      if marked[value] == 0:
          primes.append(value)
          i = value
          while i < up_to:
              marked[i] = 1
              i += value
      value += 2
  return primes
Eire answered 11/10, 2013 at 4:3 Comment(2)
related: How can I profile python code line-by-line?Ought
Where do you define the primes you supply to prime_factors()?Acciaccatura
R
12

Without seeing what you used for primes, we have to guess (we can't run your code).

But a big part of this is simply mathematics: there are (very roughly speaking) about n/log(n) primes less than n, and that's a lot bigger than sqrt(n). So when you pass a prime, prime_factor(n) does a lot more work: it goes through O(n/log(n)) operations before finding the first prime factor (n itself!), while factors(n) gives up after O(sqrt(n)) operations.

This can be very significant. For example, sqrt(10000) is just 100, but there are 1229 primes less than 10000. So prime_factor(n) can need to do over 10 times as much work to deal with the large primes in your range.

Relique answered 11/10, 2013 at 4:43 Comment(5)
+1. Of course, prime_factor can also stop once it hits sqrt(n), since there can only be one prime factor of n greater than its square root.Sorn
Yes, in fact, if the list provided to prime_factor indeed contains only primes listed in ascending order, it will examine fewer numbers than factors before terminating; it gets to skip all the composite numbers!Curie
All helpful to someone ;-), but the original poster said they weren't looking for a better algorithm - they just wanted to know why the given code behaves as it does. I think it's pretty clear ;-)Relique
@Tim Peters Perhaps I should clarify, since I wasn't disagreeing with your answer, just making sure OP sees the issue. For a value like m=2p, where p is prime, prime_factor(n) divides by 2 first, scans through every prime in [3,p] until concluding that p is prime. He could have stopped once he reached sqrt(p). If he did, prime_factor(n) should be about as fast or faster than factors(n). If he started with the largest prime smaller than sqrt(n), he could do even better.Curie
Thank you very much, this was very helpful. Adding a counter verified what you expected. prime_factor looped about 87M times, and factors only looped 20M times.Eire

© 2022 - 2024 — McMap. All rights reserved.