The factor
command prints the prime factors of specified integer NUMBER.
When I tried it
factor 12345678912345678912
even for such big numbers, it results within milliseconds.
Which algorithm is it using?
The factor
command prints the prime factors of specified integer NUMBER.
When I tried it
factor 12345678912345678912
even for such big numbers, it results within milliseconds.
Which algorithm is it using?
Gnu coreutils manual informs that Pollard's rho algorithm is being used.
http://www.gnu.org/software/coreutils/manual/html_node/factor-invocation.html
Here's an example of one version of the source for GNU factor:
http://www.futuretg.com/FTHumanEvolutionCourse/Source/factor.c
It includes routines for both trial division and Pollard's rho. Looks to me on a quick scan as if it uses trial division to find some small factors (up to about lg(n)^2
, which is about 4000 in this case), then Pollard if what's left isn't probably-prime. In this case that's 205432623008947
if I'm right about the 4000, i.e. 35129 * 5847949643
.
The second-largest prime factor in your example is 35129
, and the square root of the largest is around 76471
. So trial division alone would be fast, since it only has to try about 25 thousand candidates.
From the source code:
Algorithm:
- Perform trial division using a small primes table, but without hardware division since the primes table store inverses modulo the word base. (The GMP variant of this code doesn't make use of the precomputed inverses, but instead relies on GMP for fast divisibility testing.)
- Check the nature of any non-factored part using Miller-Rabin for detecting composites, and Lucas for detecting primes.
- Factor any remaining composite part using the Pollard-Brent rho algorithm or if USE_SQUFOF is defined to 1, try that first. Status of found factors are checked again using Miller-Rabin and Lucas.
We prefer using Hensel norm in the divisions, not the more familiar Euclidian norm, since the former leads to much faster code. In the Pollard-Brent rho code and the prime testing code, we use Montgomery's trick of multiplying all n-residues by the word base, allowing cheap Hensel reductions mod n.
The GMP code uses an algorithm that can be considerably slower; for example, on a circa-2017 Intel Xeon Silver 4116, factoring 2^{127}-3 takes about 50 ms with the two-word algorithm but would take about 750 ms with the GMP code.
In general, factor
is pleasingly fast. However, it isn't magic; if you pick pathologically difficult numbers to factorize, it slows down dramatically.
The underlying security of RSA encryption is based on the difficulty of factorizing 2 large co-primes. So let's see how hard we can push factor
with large coprimes. The largest number factor
is able to factorize is 2127-1 (presumably represented internally with an int64_t), which happens to be prime:
$ factor $(bc <<< 2^127)
factor: ‘170141183460469231731687303715884105728’ is too large
$ factor $(bc <<< 2^127-1)
170141183460469231731687303715884105727: 170141183460469231731687303715884105727
$ factor $(bc <<< 2^127-2)
170141183460469231731687303715884105726: 2 3 3 3 7 7 19 43 73 127 337 5419 92737 649657 77158673929
$
Both 2127-1 and 2127-2 factorize almost immediately; 2127-1 is quickly found to be prime, and 2127-2 has relatively small factors.
What about something harder? The product of 2 factors of the order of 260 will be approaching the higher order of what factor can work on. So how about the next 2 primes above 260?
$ primes $(bc <<< 2^60) | head -2
1152921504606847009
1152921504606847067
$ bc <<< 1152921504606847009*1152921504606847067
1329227995784916015866073631529372603
$ time factor 1329227995784916015866073631529372603
1329227995784916015866073631529372603: 1152921504606847009 1152921504606847067
real 0m30.628s
user 0m30.578s
sys 0m0.004s
$
So as the number of bits of coprime factors increases, factorization time increases much more rapidly.
© 2022 - 2024 — McMap. All rights reserved.