Why don't genetic algorithms work on problems like factoring RSA?
Asked Answered
S

6

7

Some time ago i was pretty interested in GAs and i studied about them quite a bit. I used C++ GAlib to write some programs and i was quite amazed by their ability to solve otherwise difficult to compute problems, in a matter of seconds. They seemed like a great bruteforcing technique that works really really smart and adapts.

I was reading a book by Michalewitz, if i remember the name correctly and it all seemed to be based on the Schema Theorem, proved by MIT.

I've also heard that it cannot really be used to approach problems like factoring RSA private keys.

Could anybody explain why this is the case ?

Spallation answered 28/3, 2011 at 8:38 Comment(6)
I think it has something to do with the all-or-nothing nature of the problem. There's no known "adaption" really possible, and no way of measuring a "close" answer, only a single right, and a VERY LARGE set of wrong answers.Sair
Still, i've solved problems with incredibly large keyspaces in a matter of minutes. I once solved a 4^16 in like 2 minutes. I was really amazed, because with bruteforcing, i would still be doing it.Spallation
The keyspace does not matter as much as the distribution of the solution space in it.Sadirah
zoul, i think that sentence is a pretty nice explanation. It seems to me like everything points to that.Spallation
If a small change of any random solution have a not too ridiculously little likelihood to improve that solution, and the maximum/minimum is unique, then the space can be very large, it does not matter. It's like dropping a ball on a big bowl, it will always end in the middle of the bowl, not matter the size of the bowl :pMiskolc
One more thing : Schema Theorem, although right, does some not reasonable assumptions : sub-string of bits coding for sub-solutions of a solution. It's only true for very specific problems, it's a very naive assumptions. Most of the time, if you don't put A LOT of work in the way you encode the solutions a problem, the various part of the encoding interact in very non-trivial ways.Miskolc
M
12

Genetic Algorithm are not smart at all, they are very greedy optimizer algorithms. They all work around the same idea. You have a group of points ('a population of individuals'), and you transform that group into another one with stochastic operator, with a bias in the direction of best improvement ('mutation + crossover + selection'). Repeat until it converges or you are tired of it, nothing smart there.

For a Genetic Algorithm to work, a new population of points should perform close to the previous population of points. Little perturbation should creates little change. If, after a small perturbation of a point, you obtain a point that represents a solution with completely different performance, then, the algorithm is nothing better than random search, a usually not good optimization algorithm. In the RSA case, if your points are directly the numbers, it's either YES or NO, just by flipping a bit... Thus using a Genetic Algorithm is no better than random search, if you represents the RSA problem without much thinking "let's code search points as the bits of the numbers"

Miskolc answered 28/3, 2011 at 8:54 Comment(0)
V
4

I would say because factorisation of keys is not an optimisation problem, but an exact problem. This distinction is not very accurate, so here are details. Genetic algorithms are great to solve problems where the are minimums (local/global), but there aren't any in the factorising problem. Genetic algorithm as DCA or Simulated annealing needs a measure of "how close I am to the solution" but you can't say this for our problem.

For an example of problem genetics are good, there is the hill climbing problem.

Velamen answered 28/3, 2011 at 8:52 Comment(2)
I've actually heard that to be sincere. Is it sort of like an avalanche effect on cryptography ? Small changes in input cannot be handled by GAs due to large changes in output ? To tell you the truth, i kinda see a maximum in factorizing a key. In the relationship of e,d with p and q.Spallation
Avalanche effects only occurs in computing digests with hashing functions in fact. The problem of RSA is to factorise one number. And as this number is a product of two prime number that means there is only two divisors of the big number => you have to find one. There is no way to do it without brute-forcing and every program have to try all possibilities. There may be heuristic to fasten a bit the search, but not quite effective.Albinaalbinism
S
2

GAs are based on fitness evaluation of candidate solutions.

You basically have a fitness function that takes in a candidate solution as input and gives you back a scalar telling you how good that candidate is. You then go on and allow the best individuals of a given generation to mate with higher probability than the rest, so that the offspring will be (hopefully) more 'fit' overall, and so on.

There is no way to evaluate fitness (how good is a candidate solution compared to the rest) in the RSA factorization scenario, so that's why you can't use them.

Sage answered 28/3, 2011 at 12:47 Comment(0)
S
1

GAs are not brute-forcing, they’re just a search algorithm. Each GA essentially looks like this:

candidates = seed_value;
while (!good_enough(best_of(candidates))) {
    candidates = compute_next_generation(candidates);
}

Where good_enough and best_of are defined in terms of a fitness function. A fitness function says how well a given candidate solves the problem. That seems to be the core issue here: how would you write a fitness function for factorization? For example 20 = 2*10 or 4*5. The tuples (2,10) and (4,5) are clearly winners, but what about the others? How “fit” is (1,9) or (3,4)?

Sadirah answered 28/3, 2011 at 8:54 Comment(0)
P
1

Indirectly, you can use a genetic algorithm to factor an integer N. Dixon's integer factorization method uses equations involving powers of the first k primes, modulo N. These products of powers of small primes are called "smooth". If we are using the first k=4 primes - {2,3,5,7} - 42=2x3x7 is smooth and 11 is not (for lack of a better term, 11 is "rough"). Dixon's method requires an invertible k x k matrix consisting of the exponents that define these smooth numbers. For more on Dixon's method see https://en.wikipedia.org/wiki/Dixon%27s_factorization_method.

Now, back to the original question: There is a genetic algorithm for finding equations for Dixon's method.

  1. Let r be the inverse of a smooth number mod N - so r is a rough number
  2. Let s be smooth
  3. Generate random solutions of rx = sy mod N. These solutions [x,y] are the population for the genetic algorithm. Each x, y has a smooth component and a rough component. For example suppose x = 369 = 9 x 41. Then (assuming 41 is not small enough to count as smooth), the rough part of x is 41 and the smooth part is 9.
  4. Choose pairs of solutions - "parents" - to combine into linear combinations with ever smaller rough parts.
  5. The algorithm terminates when a pair [x,y] is found with rough parts [1,1], [1,-1],[-1,1] or [-1,-1]. This yields an equation for Dixon's method, because rx=sy mod N and r is the only rough number left: x and y are smooth, and s started off smooth. But even 1/r mod N is smooth, so it's all smooth!

Every time you combine two pairs - say [v,w] and [x,y] - the smooth parts of the four numbers are obliterated, except for the factors the smooth parts of v and x share, and the factors the smooth parts of w and y share. So we choose parents that share smooth parts to the greatest possible extent. To make this precise, write

g = gcd(smooth part of v, smooth part of x)

h = gcd(smooth part of w, smooth part of y)

[v,w], [x,y] = [g v/g, h w/h], [g x/g, h y/h].

The hard-won smooth factors g and h will be preserved into the next generation, but the smooth parts of v/g, w/h, x/g and y/h will be sacrificed in order to combine [v,w] and [x,y]. So we choose parents for which v/g, w/h, x/g and y/h have the smallest smooth parts. In this way we really do drive down the rough parts of our solutions to rx = sy mod N from one generation to the next.

Purser answered 17/5, 2017 at 1:50 Comment(0)
P
0

On further thought the best way to make your way towards smooth coefficients x, y in the lattice ax = by mod N is with regression, not a genetic algorithm.

Two regressions are performed, one with response vector R0 consisting of x-values from randomly chosen solutions of ax = by mod N; and the other with response vector R1 consisting of y-values from the same solutions. Both regressions use the same explanatory matrix X. In X are columns consisting of the remainders of the x-values modulo smooth divisors, and other columns consisting of the remainders of the y-values modulo other smooth divisors.

The best choice of smooth divisors is the one that minimizes the errors from each regression:

E0 = R0 - X (inverse of (X-transpose)(X)) (X-transpose) (R0)

E1 = R1 - X (inverse of (X-transpose)(X)) (X-transpose) (R1)

What follows is row operations to annihilate X. Then apply a result z of these row operations to the x- and y-values from the original solutions from which X was formed.

z R0 = z R0 - 0
     = z R0 - zX (inverse of (X-transpose)(X)) (X-transpose) (R0)
     = z E0 

Similarly, z R1 = z E1

Three properties are now combined in z R0 and z R1:

  • They are multiples of large smooth numbers, because z annihilates remainders modulo smooth numbers.
  • They are relatively small, since E0 and E1 are small.
  • Like any linear combination of solutions to ax = by mod N, z R0 and z R1 are themselves solutions to that equation.

A relatively small multiple of a large smooth number might just be the smooth number itself. Having a smooth solution of ax = by mod N yields an input to Dixon's method.

Two optimizations make this particularly fast:

  • There is no need to guess all the smooth numbers and columns of X at once. You can run regressions continuosly, adding one column to X at a time, choosing columns that reduce E0 and E1 the most. At no time will any two smooth numbers with a common factor be selected.
  • You can also start with a lot of random solutions of zx = by mod N, and remove the ones with the largest errors between selections of new columns for X.
Purser answered 31/8, 2017 at 13:10 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.