Factorizing a number in Python
Asked Answered
T

6

9

Here's my code:

def factorize(n):
    sieve = [True] * (n + 1)

    for x in range(2, int(len(sieve) ** 0.5) + 1):
        if sieve[x]: 
            for i in range(x + x, len(sieve), x):
                sieve[i] = False

    lowerPrimes = i for i in range(2, len(sieve)) if sieve[i]] and (n % i == 0)]
    return lowerPrimes

factorize(n) returns all prime factors of the given value n. As you can see, it first makes an Eratosthenes sieve for n and then uses a list comprehension to return all values in the sieve that are factors of n. It works relatively fine for this purpose, however, I want it to return a list so that if you multiply every item in it, the result is n. Do you get my point?

For example, factorize(99020) returns [2, 5, 4951], but I'd like it to return [2, 2, 5, 4951], as 2*2*5*4951 = 99020.

I know my approach is not even close, but could you help me to make it so?

Thong answered 15/4, 2013 at 3:22 Comment(1)
This code won't run due to multiple problems - indenting for one, but more importantly you have a syntax error in your last line before the return.Brooks
M
11

The Sieve of Eratosthenes helps you find prime numbers below a certain limit. It's not really going to help you with finding the factors of a particular number.

If you want to do that, the simplest approach that I can see is something like this:

def factors(n):
    while n > 1:
        for i in range(2, n + 1):
            if n % i == 0:
                n //= i
                yield i
                break

for factor in factors(360):
    print factor

This basically finds the smallest factor of n (which is guaranteed to be prime), divides n by that number and repeats the process until n is equal to 1.

The output is:

2
2
2
3
3
5

They multiply out to the original number:

>>> from operator import mul
>>> reduce(mul, factors(360))
360
Marybellemarybeth answered 15/4, 2013 at 3:29 Comment(6)
Your method is pretty straightforward and much, much cleaner that anything I was expecing. Any advice for a novice programmer?Narton
@user2278662: I'm no expert, but my only advice is to make stuff.Marybellemarybeth
@Jared: I was aiming for simplicity here, but you might be right. I'm not sure if the overhead of generating the primes is worth it.Marybellemarybeth
primes or not primes is insignificant compared with the enormous amount of superfluous testing that this code does. More in my answer here.Onshore
@WillNess: As I said before, my code isn't meant to be efficient. I made it as short as possible to give OP something to work with. There are much better integer factorization algorithms out there and if you want to factor integers efficiently, why not use one of those?Marybellemarybeth
Stopping at sqrt is very simple, costs two lines of code usually, and gives enormous benefit. Writing GNFS is much, much, much harder - for me at least. :)Onshore
O
18

The code in the answer by Blender is very nice, but that algorithm is lacking in one very important respect: it tests way too much. E.g. trying to factorize n=392798360393, which is a prime number, it will try to divide it by all the numbers below it (including itself). This will take a lot of time.

Is it really necessary? If n == a*b and a < b, having found a we don't really need to test divide n by b. We know that it too divides n, because n/a == b implies n/b == a. So we only need to test while the potential factor a is smaller than (or equal to) the potential factor b. That is to say, until the square root of the number is reached.

Also, instead of starting over from 2 for each reduced n it's OK to start from the previous value of i:

from math import sqrt
def factors(n):    # (cf. https://mcmap.net/q/743358/-racket-programming-where-am-i-going-wrong)
    j = 2
    while n > 1:
        for i in range(j, int(sqrt(n+0.05)) + 1):
            if n % i == 0:
                n //= i ; j = i
                yield i
                break
        else:
            if n > 1:
                yield n; break

Indeed, factoring 9000009 by this code takes 0.08 seconds on Ideone, instead of 0.59 seconds.

This is guaranteed to produce only primes (because we divide out each factor found, and we try the candidates in non-decreasing order). The overhead of generating primes first and then testing only by the primes (a.o.t. testing by all numbers, as done here above) will be worth it if we are factoring several numbers at once; for factoring just one number it might not be worth it, depending on the speed of your prime generation.


But then, what really should be done when factoring several numbers at once, is to create the smallest factor sieve first where we mark each number in a given range by its smallest (prime) factor (from which it was generated by the sieve) instead of just by True or False as in the sieve of Eratosthenes. This smallest factor sieve is then used for efficient factorization of each of the numbers given, in the same range, by successive division by their factors, from the smallest up, which are efficiently found by a direct look-up in the sieve instead of testing anew:

def sfs_factorize(nums):
    top = max(nums)
    sv = smallest_factor_sieve(top)
    for k in nums:
        fs = [] ; n = k
        while n > 1:
            f = sv[n]    # no testing
            n //= f
            fs.append(f)
        print( k, list(fs))
Onshore answered 17/4, 2013 at 12:34 Comment(2)
Wow, great approach. Thank you.Narton
Python 3.8 added math.isqrt, which is perhaps a bit cleaner to use here over going through floats.Turntable
M
11

The Sieve of Eratosthenes helps you find prime numbers below a certain limit. It's not really going to help you with finding the factors of a particular number.

If you want to do that, the simplest approach that I can see is something like this:

def factors(n):
    while n > 1:
        for i in range(2, n + 1):
            if n % i == 0:
                n //= i
                yield i
                break

for factor in factors(360):
    print factor

This basically finds the smallest factor of n (which is guaranteed to be prime), divides n by that number and repeats the process until n is equal to 1.

The output is:

2
2
2
3
3
5

They multiply out to the original number:

>>> from operator import mul
>>> reduce(mul, factors(360))
360
Marybellemarybeth answered 15/4, 2013 at 3:29 Comment(6)
Your method is pretty straightforward and much, much cleaner that anything I was expecing. Any advice for a novice programmer?Narton
@user2278662: I'm no expert, but my only advice is to make stuff.Marybellemarybeth
@Jared: I was aiming for simplicity here, but you might be right. I'm not sure if the overhead of generating the primes is worth it.Marybellemarybeth
primes or not primes is insignificant compared with the enormous amount of superfluous testing that this code does. More in my answer here.Onshore
@WillNess: As I said before, my code isn't meant to be efficient. I made it as short as possible to give OP something to work with. There are much better integer factorization algorithms out there and if you want to factor integers efficiently, why not use one of those?Marybellemarybeth
Stopping at sqrt is very simple, costs two lines of code usually, and gives enormous benefit. Writing GNFS is much, much, much harder - for me at least. :)Onshore
H
2

Your first prime-sieving part is totally alright. Only second factors-finding part should be corrected.

In second part you should divide by prime factor not just single time, but multiple times till it is still divisible. This will ensure that you take into account powers of all prime factors.

This second part is called trial division by prime numbers. As it is well known it is enough to do trial division only by divisors below Sqrt(N). Remaining value of N will be automatically also prime.

One more very important speed improvement to your algorithm is that it is enough to make sieve size Sqrt(N), and not N. Because we need only primes that are less than Sqrt(N). This is because after trial division remaining N is automatically becomes prime.

I also found one more improvement for your sieving part, you should replace for i in range(x + x, ... with for i in range(x * x, .... Because you can start sieving not from x + x but from x * x, according to classical algorithm of Sieve of Erathosthenes. This will further improve speed. Starting from x + x is alright, but x * x will also give correct results with better performance.

One more obvious improvement is when you want to factorize multiple numbers, then you can reuse and extend sieve multiple times, this will make code faster. When reusing sieve it is good to keep resulting prime numbers in a separate list, so that on second Trial Division stage you don't iterate whole sieve but only prime numbers, this again will give speedup. I didn't do these two improvements in code below, but if you want I can do, just tell.

Full corrected final version is:

Try it online!

def factorize(n):
    sieve = [True] * int(n ** 0.5 + 2)
    
    for x in range(2, int(len(sieve) ** 0.5 + 2)):
        if not sieve[x]: 
            continue
        for i in range(x * x, len(sieve), x):
            sieve[i] = False

    factors = []
    for i in range(2, len(sieve)):
        if i * i > n:
            break
        if not sieve[i]:
            continue
        while n % i == 0:
            factors.append(i)
            n //= i
    if n > 1:
        factors.append(n)
    return factors

print(factorize(99020)) # [2, 2, 5, 4951]

Input:

99020

Output:

[2, 2, 5, 4951]

If you need to factorize just few numbers it might be even faster to do Trial Division Factorization without intermediate sieving stage. This also makes code more simple and shorter.

Full code for doing factorization without sieving stage is below.

In this code I did one obvious speed improvement by trying only odd divisors, this will double total algorithm speed. Of course for doing this you need to factor out all powers of 2 before, which I did too.

Try it online!

def factorize(n):
    factors = []
    while (n & 1) == 0:
        factors.append(2)
        n >>= 1
    for d in range(3, 1 << 60, 2):
        if d * d > n:
            break
        while n % d == 0:
            factors.append(d)
            n //= d
    if n > 1:
        factors.append(n)
    return factors

print(factorize(99020)) # [2, 2, 5, 4951]
Hardy answered 18/2, 2022 at 5:31 Comment(0)
J
1

I'm not familiar enough to tell if this question should be deleted or whatever, but I'll help out anyhow.

You have the sieve part correct, I think. The key is to use a while loop to keep on dividing out a valid prime factor more than once.

factors = []
sieve[0] = sieve[1] = False # So I don't have to worry about trying to skip over these two
for testFactIndex, isPrime in enumerate(sieve):
    if isPrime:
        while n%testFactIndex == 0:
            n = n/testFactIndex
            factors.append(testFactIndex)
return factors
Jamilajamill answered 15/4, 2013 at 3:29 Comment(0)
D
1

My one-liner with python 3.8 (assignment expressions used)

f = lambda n: (p:=[next(i for i in range(2, n+1) if n % i == 0)] if n>1 else [])+(f(n//p[0]) if p else [])

video with details - Factorizing number in Python

Desiccated answered 19/10, 2020 at 18:51 Comment(0)
M
0

I developed a more versatile function that also answers this question.

This is not so simple as @Evg's single line lambda method (awesome!), but returns list, dictionary or plain text! However, this function is the slowest method here: @Evg is the fastest, followed close by @Arty. I suggest replacing the list generation step with the @Evg's lambda method.

def factorof(x,mode='p'):
    A=[]
    
    for y in range(2,x):
        while x % y == 0:
            x = x / y
            A.append(y)

        # prime numbers
        if sum(A) == 0:
            A.append(x)
    
    mode = str(mode)[0].lower()

    if mode in [ 'l', '1' ]:
        return A
    else:
        B = {}
    
        for i in A:
            if i in B:
                B[i] = B[i] + 1
            else:
                B[i] = 1
        
        if mode in [ 'd', '2' ]:
            return B
        else: # mode == 'anything else'; is the default
            C = { 0 : "\U00002070", 1 : "\U000000B9", 2 : "\U000000B2",
                  3 : "\U000000B3", 4 : "\U00002074", 5 : "\U00002075",
                  6 : "\U00002076", 7 : "\U00002077", 8 : "\U00002078",
                  9 : "\U00002079" }

            D = ''
            
            for i in B:
                exp = ''
                
                if B[i] > 1:
                    for num in str(B[i]):
                        if int(num) in C:
                            exp = exp + str(C[int(num)])
                    
                D = D + str(i) + exp + " \U000000D7 "

            return D[:-3]

Usage:

factorof(99020)        # default: plain text
'2² × 5 × 4951'

factorof(99020,1)      # can replace 1 by 'list', 'ls' or 'l'
[2, 2, 5, 4951]

factorof(99020,2)      # can replace 2 by 'dict' or 'd'
{2: 2, 5: 1, 4951: 1}
Meill answered 16/1 at 16:46 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.