Part 1. Pollard-Rho + Trial Division.
Inspired by your question I decided to implement my own version of factorization (and finding largest prime factor) in Python.
Probably the simplest to implement, yet quite efficient, factoring algorithm that I know is Pollard's Rho algorithm. It has a running time of O(N^(1/4))
at most which is much more faster than time of O(N^(1/2))
for trial division algorithm. Both algos have these running times only in case of composite (non-prime) number, that's why primality test should be used to filter out prime (non-factorable) numbers.
I used following algorithms in my code: Fermat Primality Test ..., Pollard's Rho Algorithm ..., Trial Division Algorithm. Fermat primality test is used before running Pollard's Rho in order to filter out prime numbers. Trial Division is used as a fallback because Pollard's Rho in very rare cases may fail to find a factor, especially for some small numbers.
Obviously after fully factorizing a number into sorted list of prime factors the largest prime factor will be the last element in this list. In general case (for any random number) I don't know of any other ways to find out largest prime factor besides fully factorizing a number.
As an example in my code I'm factoring first 190 fractional digits of Pi, code factorizes this number within 1 second, and shows largest prime factor which is 165 digits (545 bits) in size!
Try it online!
def is_fermat_probable_prime(n, *, trials = 32):
# https://en.wikipedia.org/wiki/Fermat_primality_test
import random
if n <= 16:
return n in (2, 3, 5, 7, 11, 13)
for i in range(trials):
if pow(random.randint(2, n - 2), n - 1, n) != 1:
return False
return True
def pollard_rho_factor(N, *, trials = 16):
# https://en.wikipedia.org/wiki/Pollard%27s_rho_algorithm
import random, math
for j in range(trials):
i, stage, y, x = 0, 2, 1, random.randint(1, N - 2)
while True:
r = math.gcd(N, x - y)
if r != 1:
break
if i == stage:
y = x
stage <<= 1
x = (x * x + 1) % N
i += 1
if r != N:
return [r, N // r]
return [N] # Pollard-Rho failed
def trial_division_factor(n, *, limit = None):
# https://en.wikipedia.org/wiki/Trial_division
fs = []
while n & 1 == 0:
fs.append(2)
n >>= 1
d = 3
while d * d <= n and limit is None or d <= limit:
q, r = divmod(n, d)
if r == 0:
fs.append(d)
n = q
else:
d += 2
if n > 1:
fs.append(n)
return fs
def factor(n):
if n <= 1:
return []
if is_fermat_probable_prime(n):
return [n]
fs = trial_division_factor(n, limit = 1 << 12)
if len(fs) >= 2:
return sorted(fs[:-1] + factor(fs[-1]))
fs = pollard_rho_factor(n)
if len(fs) >= 2:
return sorted([e1 for e0 in fs for e1 in factor(e0)])
return trial_division_factor(n)
def demo():
import time, math
# http://www.math.com/tables/constants/pi.htm
# pi = 3.
# 1415926535 8979323846 2643383279 5028841971 6939937510 5820974944 5923078164 0628620899 8628034825 3421170679
# 8214808651 3282306647 0938446095 5058223172 5359408128 4811174502 8410270193 8521105559 6446229489 5493038196
# 4428810975 6659334461 2847564823 3786783165 2712019091 4564856692 3460348610 4543266482 1339360726 0249141273
# 7245870066 0631558817 4881520920 9628292540 9171536436 7892590360 0113305305 4882046652 1384146951 9415116094
# 3305727036 5759591953 0921861173 8193261179 3105118548 0744623799 6274956735 1885752724 8912279381 8301194912
# 9833673362 4406566430 8602139494 6395224737 1907021798 6094370277 0539217176 2931767523 8467481846 7669405132
#
# n = first 190 fractional digits of Pi
n = 1415926535_8979323846_2643383279_5028841971_6939937510_5820974944_5923078164_0628620899_8628034825_3421170679_8214808651_3282306647_0938446095_5058223172_5359408128_4811174502_8410270193_8521105559_6446229489
print('Number:', n)
tb = time.time()
fs = factor(n)
print('All Prime Factors:', fs)
print('Largest Prime Factor:', f'({math.log2(fs[-1]):.02f} bits, {len(str(fs[-1]))} digits)', fs[-1])
print('Time Elapsed:', round(time.time() - tb, 3), 'sec')
if __name__ == '__main__':
demo()
Output:
Number: 1415926535897932384626433832795028841971693993751058209749445923078164062862089986280348253421170679821480865132823066470938446095505822317253594081284811174502841027019385211055596446229489
All Prime Factors: [3, 71, 1063541, 153422959, 332958319, 122356390229851897378935483485536580757336676443481705501726535578690975860555141829117483263572548187951860901335596150415443615382488933330968669408906073630300473]
Largest Prime Factor: (545.09 bits, 165 digits) 122356390229851897378935483485536580757336676443481705501726535578690975860555141829117483263572548187951860901335596150415443615382488933330968669408906073630300473
Time Elapsed: 0.593 sec
Part 2. Elliptic Curve Method.
Some time later I decided to improve my post by implementing from scratch more advanced elliptic curve factorization method that is called ECM, see Wikipedia Article Lenstra Elliptic Curve Factorization.
This method is considerably faster than Pollard Rho described in Part 1 of my answer post. Time complexity of elliptic ECM method is O(exp[(Sqrt(2) + o(1)) Sqrt(ln p ln ln p)])
, where p
signifies smallest prime factor of a number. While time complexity of Pollard Rho is O(Sqrt(p))
. So ECM is much much faster for big enough smallest P factor.
Steps of ECM method:
Check if number is smaller than 2^16, then factor it through Trial Division method. Return result.
Check if number is probably prime with high condifence, for that I use Fermat Test with 32 trials. To overcome Carmichael numbers you may use Miller Rabin Test instead. If number is primes return it as only factor.
Generate curve parameters A, X, Y
randomly and derive B
from curve equation Y^2 = X^3 + AX + B (mod N)
. Check if curve is OK, value 4 * A ** 3 - 27 * B ** 2
should be non-zero.
Generate small primes through Sieve of Eratosthenes, primes below our Bound. Each prime raise to some small power, this raised prime would be called K. I do raising into power while it is smaller than some Bound2, which is Sqrt(Bound)
in my case.
Compute elliptic point multiplication P = k * P
, where K taken from previous step and P is (X, Y). Compute according to Wiki.
Point multiplication uses Modular Inverse, which computes GCD(SomeValue, N)
according to Wiki. If this GCD is not 1, then it gives non-1 factor of N, hence in this case I through an Exception and return this factor from ECM factorization algorithm.
If all primes till Bound were multiplied and gave no factor then re-run ECM factorization algorithm (1.-6.
above) again with another random curve and bigger Bound. In my code I take new bound by adding 256 to old bound.
Following sub-algorithms were used in my ECM code, all mentioned sub-algorithms have links to corresponding Wikipedia articles: Trial Division Factorization, Fermat Probability Test, Sieve of Eratosthenes (prime numbers generator), Euclidean Algorithm (computing Greatest Common Divisor, GCD), Extended Euclidean Algorithm (GCD with Bezu coefficients), Modular Multiplicative Inverse, Right-to-Left Binary Exponentation (for elliptic point multiplication), Elliptic Curve Arithmetics (point addition and multiplication), Lenstra Elliptic Curve Factorization.
Unlike Part 1 of my answer where I factor digits of Pi number, here in Part 2 I create a special number composed of n = Prime(24) * Prime(35) * Prime(37) * ... etc
, which means number as a product of Random prime numbers of 24 bits and 35 bits and 37 and etc... This custom number is more visually impressive to show algorithm capabilities.
As in Part-1 of my answer I also use several methods to compare there speed and also to factor-out smaller factors with simpler methods. So in code below I use Trial Division + Pollard Rho + Elliptic Curve Method.
After code below see console output to figure out what nicde stuff my code outputs.
Try it online!
def is_fermat_probable_prime(n, *, trials = 32):
# https://en.wikipedia.org/wiki/Fermat_primality_test
import random
if n <= 16:
return n in (2, 3, 5, 7, 11, 13)
for i in range(trials):
if pow(random.randint(2, n - 2), n - 1, n) != 1:
return False
return True
def pollard_rho_factor(N, *, trials = 8, limit = None):
# https://en.wikipedia.org/wiki/Pollard%27s_rho_algorithm
import random, math
if N <= 1:
return []
if is_fermat_probable_prime(N):
return [N]
for j in range(trials):
i, stage, y, x = 0, 2, 1, random.randint(1, N - 2)
while True:
r = math.gcd(N, x - y)
if r != 1:
break
if i == stage:
y = x
stage <<= 1
x = (x * x + 1) % N
i += 1
if limit is not None and i >= limit:
return [N] # Pollard-Rho failed
if r != N:
return sorted(pollard_rho_factor(r, trials = trials, limit = limit) +
pollard_rho_factor(N // r, trials = trials, limit = limit))
return [N] # Pollard-Rho failed
def trial_division_factor(n, *, limit = None):
# https://en.wikipedia.org/wiki/Trial_division
fs = []
while n & 1 == 0:
fs.append(2)
n >>= 1
d = 3
while d * d <= n and limit is None or d <= limit:
q, r = divmod(n, d)
if r == 0:
fs.append(d)
n = q
else:
d += 2
if n > 1:
fs.append(n)
return fs
def ecm_factor(N0, *, verbose = False):
# https://en.wikipedia.org/wiki/Lenstra_elliptic-curve_factorization
import math, random, time; gmpy2 = None
#import gmpy2
def GenPrimes_SieveOfEratosthenes(end):
# https://en.wikipedia.org/wiki/Sieve_of_Eratosthenes
composite = [False] * (end // 2) # Allocate for odd numbers only
for p in range(3, int(end ** 0.5 + 3), 2):
if composite[p >> 1]:
continue
for i in range(p * p, end, p * 2):
composite[i >> 1] = True
yield 2
for p in range(3, end, 2):
if not composite[p >> 1]:
yield p
def GCD(a, b):
# https://en.wikipedia.org/wiki/Euclidean_algorithm
# return math.gcd(a, b)
while b != 0:
a, b = b, a % b
return a
def EGCD(a, b):
# https://en.wikipedia.org/wiki/Extended_Euclidean_algorithm
if gmpy2 is None:
ro, r, so, s = a, b, 1, 0
while r != 0:
ro, (q, r) = r, divmod(ro, r)
so, s = s, so - q * s
return ro, so, (ro - so * a) // b
else:
return tuple(map(int, gmpy2.gcdext(a, b)))
def ModularInverse(a, n):
# https://en.wikipedia.org/wiki/Modular_multiplicative_inverse
# return pow(a, -1, n)
g, s, t = EGCD(a, n)
if g != 1:
raise ValueError(a)
return s % n
def EllipticCurveAdd(N, A, B, X0, Y0, X1, Y1):
# https://en.wikipedia.org/wiki/Elliptic_curve_point_multiplication
if X0 == X1 and Y0 == Y1:
# Double
l = ((3 * X0 ** 2 + A) * ModularInverse(2 * Y0, N)) % N
x = (l ** 2 - 2 * X0) % N
y = (l * (X0 - x) - Y0) % N
else:
# Add
l = ((Y1 - Y0) * ModularInverse(X1 - X0, N)) % N
x = (l ** 2 - X0 - X1) % N
y = (l * (X0 - x) - Y0) % N
return x, y
def EllipticCurveMul(N, A, B, X, Y, k):
# https://en.wikipedia.org/wiki/Modular_exponentiation#Right-to-left_binary_method
assert k >= 2, k
k -= 1
BX, BY = X, Y
while k != 0:
if k & 1:
X, Y = EllipticCurveAdd(N, A, B, X, Y, BX, BY)
BX, BY = EllipticCurveAdd(N, A, B, BX, BY, BX, BY)
k >>= 1
return X, Y
bound_start = 1 << 9
def Main(N, *, bound = bound_start, icurve = 0):
def NextFactorECM(x):
return Main(x, bound = bound + bound_start, icurve = icurve + 1)
def PrimePow(p, *, bound2 = int(bound ** 0.5 + 1.01)):
mp = p
while True:
mp *= p
if mp >= bound2:
return mp // p
if N <= 1:
return []
if N < (1 << 16):
fs = trial_division_factor(N)
if verbose and len(fs) >= 2:
print('Factors from TrialDiv:', fs, flush = True)
return fs
if is_fermat_probable_prime(N):
return [N]
if verbose:
print(f'Curve {icurve:>4}, bound 2^{math.log2(bound):>7.3f}', flush = True)
while True:
X, Y, A = [random.randrange(N) for i in range(3)]
B = (Y ** 2 - X ** 3 - A * X) % N
if 4 * A ** 3 - 27 * B ** 2 == 0:
continue
break
for p in GenPrimes_SieveOfEratosthenes(bound):
k = PrimePow(p)
try:
X, Y = EllipticCurveMul(N, A, B, X, Y, k)
except ValueError as ex:
g = GCD(ex.args[0], N)
assert g > 1, g
if g != N:
if verbose:
print(f'Factor from ECM: {g} ({math.log2(g):.1f} bits)', flush = True)
return sorted(NextFactorECM(g) + NextFactorECM(N // g))
else:
return NextFactorECM(N)
return NextFactorECM(N)
return Main(N0)
def factor(n):
if n <= 1:
return []
if is_fermat_probable_prime(n):
return [n]
fs1 = trial_division_factor(n, limit = 1 << 12)
fs1, n2 = fs1[:-1], fs1[-1]
print(len(fs1), 'factors from TrialDivision')
fs2 = pollard_rho_factor(n2, limit = 1 << 17)
fs2, n3 = fs2[:-1], fs2[-1]
print(len(fs2), 'factors from PollardRho')
fs3 = ecm_factor(n3, verbose = True)
print(len(fs3), 'factors from ECM')
return sorted(fs1 + fs2 + fs3)
def demo():
import time, math, random
def Prime(bits):
while True:
x = random.randrange(1 << (bits - 1), 1 << bits)
if is_fermat_probable_prime(x):
return x
# http://www.math.com/tables/constants/pi.htm
# pi = 3.
# 1415926535 8979323846 2643383279 5028841971 6939937510 5820974944 5923078164 0628620899 8628034825 3421170679
# 8214808651 3282306647 0938446095 5058223172 5359408128 4811174502 8410270193 8521105559 6446229489 5493038196
# 4428810975 6659334461 2847564823 3786783165 2712019091 4564856692 3460348610 4543266482 1339360726 0249141273
# 7245870066 0631558817 4881520920 9628292540 9171536436 7892590360 0113305305 4882046652 1384146951 9415116094
# 3305727036 5759591953 0921861173 8193261179 3105118548 0744623799 6274956735 1885752724 8912279381 8301194912
# 9833673362 4406566430 8602139494 6395224737 1907021798 6094370277 0539217176 2931767523 8467481846 7669405132
#
# n = 1415926535_8979323846_2643383279_5028841971_6939937510_5820974944_5923078164_0628620899_8628034825_3421170679_8214808651_3282306647_0938446095_5058223172_5359408128_4811174502_8410270193_8521105559_6446229489_5493038196_4428810975_6659334461_2847564823_3786783165_2712019091_4564856692_3460348610_4543266482_1339360726_0249141273_7245870066_0631558817_4881520920_9628292540_9171536436_7892590360_0113305305_4882046652_1384146951_9415116094_3305727036_5759591953_0921861173_8193261179_3105118548_0744623799_6274956735_1885752724_8912279381_8301194912_9833673362_4406566430_8602139494_6395224737_1907021798_6094370277_0539217176_2931767523_8467481846_7669405132
# 141592653589793238462643383279502884197169399375105820974944592307816406286208998628034825342117067982148086513282306647093844609550582231725359408128481117450284102701938521105559644622948954930381964428810975665933446128475648233786783165271201909145648566923460348610454326648213393607260249141273724587006606315588174881520920962829254091715364367892590360011330530548820466521384146951941511609433057270365759591953092186117381932611793105118548074462379962749567351885752724891227938183011949129833673362440656643086021394946395224737190702179860943702770539217176293176752384674818467669405132
n = Prime(9) * Prime(10) * Prime(11) * Prime(20) * Prime(24) * Prime(35) * Prime(37) * Prime(38) * Prime(120)
print('Number:', n)
tb = time.time()
fs = factor(n)
print('All Prime Factors:', fs)
print('Largest Prime Factor:', f'({math.log2(fs[-1]):.02f} bits, {len(str(fs[-1]))} digits)', fs[-1])
if len(fs) >= 2:
print('2nd-largest Prime Factor:', f'({math.log2(fs[-2]):.02f} bits, {len(str(fs[-2]))} digits)', fs[-2])
print('Time Elapsed:', round(time.time() - tb, 3), 'sec')
if __name__ == '__main__':
demo()
Console output:
Number: 3020823358956369790763854998578637168366763837218991014777892420353187988302225517459334041
3 factors from TrialDivision
2 factors from PollardRho
Curve 0, bound 2^ 9.000
Curve 1, bound 2^ 10.000
Curve 2, bound 2^ 10.585
Factor from ECM: 20028139561 (34.2 bits)
Curve 3, bound 2^ 11.000
Curve 4, bound 2^ 11.322
Curve 5, bound 2^ 11.585
Curve 6, bound 2^ 11.807
Curve 7, bound 2^ 12.000
Curve 8, bound 2^ 12.170
Factor from ECM: 96583780901 (36.5 bits)
Curve 9, bound 2^ 12.322
Curve 10, bound 2^ 12.459
Curve 11, bound 2^ 12.585
Curve 12, bound 2^ 12.700
Curve 13, bound 2^ 12.807
Curve 14, bound 2^ 12.907
Curve 15, bound 2^ 13.000
Curve 16, bound 2^ 13.087
Curve 17, bound 2^ 13.170
Factor from ECM: 239171423261 (37.8 bits)
4 factors from ECM
All Prime Factors: [397, 1021, 1459, 754333, 16156687, 20028139561,
96583780901, 239171423261, 905908369146483365552973334921981697]
Largest Prime Factor: (119.45 bits, 36 digits) 905908369146483365552973334921981697
2nd-largest Prime Factor: (37.80 bits, 12 digits) 239171423261
Time Elapsed: 17.156 sec
1.
find any number that divides clearly (for i = 2 to int(sqr(num)) )2.
divide by that number (num = num/i) and recur until nothing is found in 1.'s interval3.
num is the largest factor – Bithynia