what is the algorithm behind the factor command in linux?
Asked Answered
G

3

19

The factor command prints the prime factors of specified integer NUMBER.

When I tried it

factor 12345678912345678912

even for such big numbers, it results within milliseconds.

Which algorithm is it using?

Gemini answered 22/6, 2012 at 11:29 Comment(0)
S
18

Gnu coreutils manual informs that Pollard's rho algorithm is being used.

http://www.gnu.org/software/coreutils/manual/html_node/factor-invocation.html

Saxophone answered 22/6, 2012 at 11:36 Comment(0)
S
7

Here's an example of one version of the source for GNU factor:

http://www.futuretg.com/FTHumanEvolutionCourse/Source/factor.c

It includes routines for both trial division and Pollard's rho. Looks to me on a quick scan as if it uses trial division to find some small factors (up to about lg(n)^2, which is about 4000 in this case), then Pollard if what's left isn't probably-prime. In this case that's 205432623008947 if I'm right about the 4000, i.e. 35129 * 5847949643 .

The second-largest prime factor in your example is 35129, and the square root of the largest is around 76471. So trial division alone would be fast, since it only has to try about 25 thousand candidates.

Sump answered 22/6, 2012 at 11:34 Comment(0)
C
1

From the source code:

Algorithm:

  1. Perform trial division using a small primes table, but without hardware division since the primes table store inverses modulo the word base. (The GMP variant of this code doesn't make use of the precomputed inverses, but instead relies on GMP for fast divisibility testing.)
  2. Check the nature of any non-factored part using Miller-Rabin for detecting composites, and Lucas for detecting primes.
  3. Factor any remaining composite part using the Pollard-Brent rho algorithm or if USE_SQUFOF is defined to 1, try that first. Status of found factors are checked again using Miller-Rabin and Lucas.

We prefer using Hensel norm in the divisions, not the more familiar Euclidian norm, since the former leads to much faster code. In the Pollard-Brent rho code and the prime testing code, we use Montgomery's trick of multiplying all n-residues by the word base, allowing cheap Hensel reductions mod n.

The GMP code uses an algorithm that can be considerably slower; for example, on a circa-2017 Intel Xeon Silver 4116, factoring 2^{127}-3 takes about 50 ms with the two-word algorithm but would take about 750 ms with the GMP code.

In general, factor is pleasingly fast. However, it isn't magic; if you pick pathologically difficult numbers to factorize, it slows down dramatically.

The underlying security of RSA encryption is based on the difficulty of factorizing 2 large co-primes. So let's see how hard we can push factor with large coprimes. The largest number factor is able to factorize is 2127-1 (presumably represented internally with an int64_t), which happens to be prime:

$ factor $(bc <<< 2^127)
factor: ‘170141183460469231731687303715884105728’ is too large
$ factor $(bc <<< 2^127-1)
170141183460469231731687303715884105727: 170141183460469231731687303715884105727
$ factor $(bc <<< 2^127-2)
170141183460469231731687303715884105726: 2 3 3 3 7 7 19 43 73 127 337 5419 92737 649657 77158673929
$ 

Both 2127-1 and 2127-2 factorize almost immediately; 2127-1 is quickly found to be prime, and 2127-2 has relatively small factors.

What about something harder? The product of 2 factors of the order of 260 will be approaching the higher order of what factor can work on. So how about the next 2 primes above 260?

$ primes $(bc <<< 2^60) | head -2
1152921504606847009
1152921504606847067
$ bc <<< 1152921504606847009*1152921504606847067
1329227995784916015866073631529372603
$ time factor 1329227995784916015866073631529372603
1329227995784916015866073631529372603: 1152921504606847009 1152921504606847067

real    0m30.628s
user    0m30.578s
sys 0m0.004s
$ 

So as the number of bits of coprime factors increases, factorization time increases much more rapidly.

Cirrostratus answered 9/2, 2021 at 19:1 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.