What is the fastest integer factorization algorithm?

Asked 15/2, 2010 at 16:2 Answered 22/3, 2023 at 19:26

algorithm math language-agnostic maple factorization

I've written a program that attempts to find Amicable Pairs. This requires finding the sums of the proper divisors of numbers.

Here is my current sumOfDivisors() method:

int sumOfDivisors(int n)
{  
    int sum = 1;
    int bound = (int) sqrt(n);
    for(int i = 2; i <= 1 + bound; i++)
    {
        if (n % i == 0)
            sum = sum + i + n / i;
    } 
    return sum;
}

So I need to do lots of factorization and that is starting to become the real bottleneck in my application. I typed a huge number into MAPLE and it factored it insanely fast.

What is one of the faster factorization algorithms?

Reber answered 15/2, 2010 at 16:2 Comment(10)

Don't forget to check whether the number is a true square. If it is, and you don't take it into account, you'll be adding the square root to the sum twice (as both i AND n/i). Take a look at Project Euler - there's all sorts of things on there covering this type of optimisation. – Polyhedron 15/2, 2010 at 16:16

If your numbers fit in a 32-bit signed int, nothing's going to beat a moderately-clever lookup table. How bit are your numbers really? – Highbred 15/2, 2010 at 16:19

@Aakasm, way larger than what will fit in a 32-bit signed or unsigned int. Greater than 25 digits. I'm using the GMP mpz big integer class. Basically, as large as I can go since I hope to be able to find a new amicable pair or friendly cycle. – Reber 15/2, 2010 at 16:20

I'm surprised that you can factor even one such number by the method above. – Monochromatic 15/2, 2010 at 16:22

@Steve Jessop, 20 digit number takes 45 seconds. – Reber 15/2, 2010 at 16:29

So a 25 digit number about 4 hours? – Monochromatic 15/2, 2010 at 16:32

I'm guessing? Hence the point of this question. – Reber 15/2, 2010 at 16:33

Did you check boo.net/~jasonp/qs.html? – Forepaw 15/2, 2010 at 22:1

I answered this #50657327 I should have marked it as a duplicate. – Broccoli 2/6, 2018 at 15:45

Why did you add 1 to the bound in the condition i <= 1 + bound of the for loop? – Minuet 5/1, 2021 at 8:5

121

Pulled directly from my answer to this other question.

The method will work, but will be slow. "How big are your numbers?" determines the method to use:

Less than 2^16 or so: Lookup table.
Less than 2^70 or so: Richard Brent's modification of Pollard's rho algorithm.
Less than 10^50: Lenstra elliptic curve factorization
Less than 10^100: Quadratic Sieve
More than 10^100: General Number Field Sieve

Lachus answered 16/2, 2010 at 16:39 Comment(8)

Nice list. Lenstra's elliptic curve method should be good for numbers a lot larger than 10^20, though. (Even for numbers larger than 10^100, if you're just looking for smallish factors.) – Tick 16/2, 2010 at 17:33

@Mark Dickinson: perhaps, but if you're working with numbers over 10^100, you're generally working with numbers that don't have small(ish) factors. – Lachus 16/2, 2010 at 17:40

Perhaps. It depends on where the numbers are coming from: a 'random' number > 10^100 could well have small factors. Of course, this wouldn't be true for an RSA modulus. Anyway, that 10^20 should be increased to 10^50 or so (perhaps more). Note that in the article you link to, it's talking about divisors of 20 to 25 digits: the number being factored will typically be a lot larger than that. – Tick 16/2, 2010 at 18:0

Isn't 2^70 about the same as 10^20? – Junior 24/2, 2010 at 0:48

@Mark Dickinson, Thanks for catching this. – Lachus 22/2, 2012 at 5:54

@xan, Yes 2^70 is slightly greater than 10^20. See the updated numbers above for the correction. :) – Lachus 22/2, 2012 at 5:54

@SamHarwell is there any source for learning more about all these methods? I tried searching for them, but didn't find anything which could help me. I did find the paper of Richard Brent. I am good with maths but not so great to understand the paper. – Whisker 28/7, 2015 at 14:56

@Junior 2^70 is roughly 10^21, not 10^20 – Swarey 3/3, 2016 at 18:26

Shor's Algorithm: http://en.wikipedia.org/wiki/Shor%27s_algorithm

Of course you need a quantum computer though :D

Disband answered 15/2, 2010 at 16:17 Comment(0)

The question in the title (and the last line) seems to have little to do with the actual body of the question. If you're trying to find amicable pairs, or computing the sum of divisors for many numbers, then separately factorising each number (even with the fastest possible algorithm) is absolutely an inefficient way to do it.

The sum-of-divisors function, σ(n) = (sum of divisors of n), is a multiplicative function: for relatively prime m and n, we have σ(mn) = σ(m)σ(n), so

σ(p₁^k₁…p_r^k_r) = [(p₁^k₁+1-1)/(p₁-1)]…[(p_r^k_r+1-1)/(p_r-1)].

So you would use any simple sieve (e.g. an augmented version of the Sieve of Eratosthenes) to find the primes up to n, and, in the process, the factorisation of all numbers up to n. (For example, as you do your sieve, store the smallest prime factor of each n. Then you can later factorize any number n by iterating.) This would be faster (overall) than using any separate factorization algorithm several times.

BTW: several known lists of amicable pairs already exist (see e.g. here and the links at MathWorld) – so are you trying to extend the record, or doing it just for fun?

Solubilize answered 15/2, 2010 at 16:35 Comment(0)

I would suggest starting from the same algorithm used in Maple, the Quadratic Sieve.

Choose your odd number n to factorize,
Choose a natural number k,
Search all p <= k so that k^2 is not congruent to (n mod p) to obtain a factor base B = p1, p2, ..., pt,
Starting from r > floor(n) search at least t+1 values so that y^2 = r^2 - n all have just factors in B,
For every y1, y2, ..., y(t+1) just calculated you generate a vector v(yi) = (e1, e2, ..., et) where ei is calculated by reducing over modulo 2 the exponent pi in yi,
Use Gaussian Elimination to find some of the vectors that added together give a null vector
Set x as the product of ri related to yi found in the previous step and set y as p1^a * p2^b * p3^c * .. * pt^z where exponents are the half of the exponents found in the factorization of yi
Calculate d = mcd(x-y, n), if 1 < d < n then d is a non-trivial factor of n, otherwise start from step 2 choosing a bigger k.

The problem about these algorithms is that they really imply a lot of theory in numerical calculus..

Epsom answered 15/2, 2010 at 16:35 Comment(0)

This is a paper of the Integer Factorization in Maple.

"Starting from some very simple instructions—“make integer factorization faster in Maple” — we have implemented the Quadratic Sieve factoring algorithm in a combination of Maple and C..."

http://www.cecm.sfu.ca/~pborwein/MITACS/papers/percival.pdf

Autodidact answered 15/2, 2010 at 16:12 Comment(0)

A more 2015 C++ version 2²⁷ lookup table implementation for 1GB memory:

#include <iostream.h> // cerr, cout, and NULL
#include <string.h>   // memcpy()
#define uint unsigned __int32
uint *factors;
const uint MAX_F=134217728; // 2^27

void buildFactors(){
   factors=new (nothrow) uint [(MAX_F+1)*2]; // 4 * 2 * 2^27 = 2^30 = 1GB
   if(factors==NULL)return; // not able to allocate enough free memory
   int i;
   for(i=0;i<(MAX_F+1)*2;i++)factors[i]=0;

   //Sieve of Eratosthenese
   factors[1*2]=1;
   factors[1*2+1]=1;
   for(i=2;i*i<=MAX_F;i++){
      for(;factors[i*2] && i*i<=MAX_F;i++);
      factors[i*2]=1;
      factors[i*2+1]=i;
      for(int j=2;i*j<=MAX_F;j++){
         factors[i*j*2]=i;
         factors[i*j*2+1]=j;
      }
   }
   for(;i<=MAX_F;i++){
      for(;i<=MAX_F && factors[i*2];i++);
      if(i>MAX_F)return;
      factors[i*2]=1;
      factors[i*2+1]=i;
   }
}

uint * factor(uint x, int &factorCount){
   if(x > MAX_F){factorCount=-1;return NULL;}
   uint tmp[70], at=x; int i=0;
   while(factors[at*2]>1){
      tmp[i++]=factors[at*2];
      cout<<"at:"<<at<<" tmp:"<<tmp[i-1]<<endl;
      at=factors[at*2+1];
   }
   if(i==0){
      cout<<"at:"<<x<<" tmp:1"<<endl;
      tmp[i++]=1;
      tmp[i++]=x;
   }else{
      cout<<"at:"<<at<<" tmp:1"<<endl;
      tmp[i++]=at;
   }
   factorCount=i;
   uint *ret=new (nothrow) uint [factorCount];
   if(ret!=NULL)
      memcpy(ret, tmp, sizeof(uint)*factorCount);
   return ret;
}

void main(){
   cout<<"Loading factors lookup table"<<endl;
   buildFactors(); if(factors==NULL){cerr<<"Need 1GB block of free memory"<<endl;return;}
   int size;
   uint x=30030;
   cout<<"\nFactoring: "<<x<<endl;
   uint *f=factor(x,size);
   if(size<0){cerr<<x<<" is too big to factor. Choose a number between 1 and "<<MAX_F<<endl;return;}
   else if(f==NULL){cerr<<"ran out of memory trying to factor "<<x<<endl;return;}

   cout<<"\nThe factors of: "<<x<<" {"<<f[0];
   for(int i=1;i<size;i++)
      cout<<", "<<f[i];
   cout<<"}"<<endl;
   delete [] f;

   x=30637;
   cout<<"\nFactoring: "<<x<<endl;
   f=factor(x,size);
   cout<<"\nThe factors of: "<<x<<" {"<<f[0];
   for(int i=1;i<size;i++)
      cout<<", "<<f[i];
   cout<<"}"<<endl;
   delete [] f;
   delete [] factors;
}

Update: or sacrificing some simplicity for a bit more range just past 2²⁸

#include <iostream.h> // cerr, cout, and NULL
#include <string.h>   // memcpy(), memset()

//#define dbg(A) A
#ifndef dbg
#define dbg(A)
#endif

#define uint   unsigned __int32
#define uint8  unsigned __int8
#define uint16 unsigned __int16

uint * factors;
uint8  *factors08;
uint16 *factors16;
uint   *factors32;

const uint LIMIT_16   = 514; // First 16-bit factor, 514 = 2*257
const uint LIMIT_32   = 131074;// First 32-bit factor, 131074 = 2*65537
const uint MAX_FACTOR = 268501119;
//const uint64 LIMIT_64 = 8,589,934,594; // First 64-bit factor, 2^33+1

const uint TABLE_SIZE = 268435456; // 2^28 => 4 * 2^28 = 2^30 = 1GB 32-bit table
const uint o08=1, o16=257 ,o32=65665; //o64=4294934465
// TableSize = 2^37 => 8 * 2^37 = 2^40 1TB 64-bit table
//   => MaxFactor = 141,733,953,600

/* Layout of factors[] array
*  Indicies(32-bit)              i                 Value Size  AFactorOf(i)
*  ----------------           ------               ----------  ----------------
*  factors[0..128]            [1..513]             8-bit       factors08[i-o08]
*  factors[129..65408]        [514..131073]        16-bit      factors16[i-o16]
*  factors[65409..268435455]  [131074..268501119]  32-bit      factors32[i-o32]
*
* Note: stopping at i*i causes AFactorOf(i) to not always be LargestFactor(i)
*/
void buildFactors(){
dbg(cout<<"Allocating RAM"<<endl;)
   factors=new (nothrow) uint [TABLE_SIZE]; // 4 * 2^28 = 2^30 = 1GB
   if(factors==NULL)return; // not able to allocate enough free memory
   uint i,j;
   factors08 = (uint8 *)factors;
   factors16 = (uint16 *)factors;
   factors32 = factors;
dbg(cout<<"Zeroing RAM"<<endl;)
   memset(factors,0,sizeof(uint)*TABLE_SIZE);
   //for(i=0;i<TABLE_SIZE;i++)factors[i]=0;

//Sieve of Eratosthenese
     //8-bit values
dbg(cout<<"Setting: 8-Bit Values"<<endl;)
   factors08[1-o08]=1;
   for(i=2;i*i<LIMIT_16;i++){
      for(;factors08[i-o08] && i*i<LIMIT_16;i++);
dbg(cout<<"Filtering: "<<i<<endl;)
      factors08[i-o08]=1;
      for(j=2;i*j<LIMIT_16;j++)factors08[i*j-o08]=i;
      for(;i*j<LIMIT_32;j++)factors16[i*j-o16]=i;
      for(;i*j<=MAX_FACTOR;j++)factors32[i*j-o32]=i;
   }
   for(;i<LIMIT_16;i++){
      for(;i<LIMIT_16 && factors08[i-o08];i++);
dbg(cout<<"Filtering: "<<i<<endl;)
      if(i<LIMIT_16){
         factors08[i-o08]=1;
         j=LIMIT_16/i+(LIMIT_16%i>0);
         for(;i*j<LIMIT_32;j++)factors16[i*j-o16]=i;
         for(;i*j<=MAX_FACTOR;j++)factors32[i*j-o32]=i;
      }
   }i--;

dbg(cout<<"Setting: 16-Bit Values"<<endl;)
     //16-bit values
   for(;i*i<LIMIT_32;i++){
      for(;factors16[i-o16] && i*i<LIMIT_32;i++);
      factors16[i-o16]=1;
      for(j=2;i*j<LIMIT_32;j++)factors16[i*j-o16]=i;
      for(;i*j<=MAX_FACTOR;j++)factors32[i*j-o32]=i;
   }
   for(;i<LIMIT_32;i++){
      for(;i<LIMIT_32 && factors16[i-o16];i++);
      if(i<LIMIT_32){
         factors16[i-o16]=1;
         j=LIMIT_32/i+(LIMIT_32%i>0);
         for(;i*j<=MAX_FACTOR;j++)factors32[i*j-o32]=i;
      }
   }i--;

dbg(cout<<"Setting: 32-Bit Values"<<endl;)
     //32-bit values
   for(;i*i<=MAX_FACTOR;i++){
      for(;factors32[i-o32] && i*i<=MAX_FACTOR;i++);
      factors32[i-o32]=1;
      for(j=2;i*j<=MAX_FACTOR;j++)factors32[i*j-o32]=i;
   }
   for(;i<=MAX_FACTOR;i++){
      for(;i<=MAX_FACTOR && factors32[i-o32];i++);
      if(i>MAX_FACTOR)return;
      factors32[i-o32]=1;
   }
}

uint * factor(uint x, int &factorCount){
   if(x > MAX_FACTOR){factorCount=-1;return NULL;}
   uint tmp[70], at=x; int i=0;
   while(at>=LIMIT_32 && factors32[at-o32]>1){
      tmp[i++]=factors32[at-o32];
dbg(cout<<"at32:"<<at<<" tmp:"<<tmp[i-1]<<endl;)
      at/=tmp[i-1];
   }
   if(at<LIMIT_32){
      while(at>=LIMIT_16 && factors16[at-o16]>1){
         tmp[i++]=factors16[at-o16];
dbg(cout<<"at16:"<<at<<" tmp:"<<tmp[i-1]<<endl;)
         at/=tmp[i-1];
      }
      if(at<LIMIT_16){
         while(factors08[at-o08]>1){
            tmp[i++]=factors08[at-o08];
dbg(cout<<"at08:"<<at<<" tmp:"<<tmp[i-1]<<endl;)
            at/=tmp[i-1];
         }
      }
   }
   if(i==0){
dbg(cout<<"at:"<<x<<" tmp:1"<<endl;)
      tmp[i++]=1;
      tmp[i++]=x;
   }else{
dbg(cout<<"at:"<<at<<" tmp:1"<<endl;)
      tmp[i++]=at;
   }
   factorCount=i;
   uint *ret=new (nothrow) uint [factorCount];
   if(ret!=NULL)
      memcpy(ret, tmp, sizeof(uint)*factorCount);
   return ret;
}
uint AFactorOf(uint x){
   if(x > MAX_FACTOR)return -1;
   if(x < LIMIT_16) return factors08[x-o08];
   if(x < LIMIT_32) return factors16[x-o16];
                    return factors32[x-o32];
}

void main(){
   cout<<"Loading factors lookup table"<<endl;
   buildFactors(); if(factors==NULL){cerr<<"Need 1GB block of free memory"<<endl;return;}
   int size;
   uint x=13855127;//25255230;//30030;
   cout<<"\nFactoring: "<<x<<endl;
   uint *f=factor(x,size);
   if(size<0){cerr<<x<<" is too big to factor. Choose a number between 1 and "<<MAX_FACTOR<<endl;return;}
   else if(f==NULL){cerr<<"ran out of memory trying to factor "<<x<<endl;return;}

   cout<<"\nThe factors of: "<<x<<" {"<<f[0];
   for(int i=1;i<size;i++)
      cout<<", "<<f[i];
   cout<<"}"<<endl;
   delete [] f;

   x=30637;
   cout<<"\nFactoring: "<<x<<endl;
   f=factor(x,size);
   cout<<"\nThe factors of: "<<x<<" {"<<f[0];
   for(int i=1;i<size;i++)
      cout<<", "<<f[i];
   cout<<"}"<<endl;
   delete [] f;
   delete [] factors;
}

Stellastellar answered 26/10, 2015 at 21:24 Comment(0)

Depends how big your numbers are. If you're searching for amicable pairs you're doing a lot of factorisations, so the key may not be to factor as quickly as possible, but to share as much work as possible between different calls. To speed up trial division you could look at memoization, and/or precalculating primes up to the square root of the biggest number you care about. It's quicker to get the prime factorisation, then calculate the sum of all factors from that, than it is to loop all the way up to sqrt(n) for every number.

If you're looking for really big amicable pairs, say bigger than 2^64, then on a small number of machines you can't do it by factorising every single number no matter how fast your factorisation is. The short-cuts which you're using to find candidates might help you factor them.

Monochromatic answered 15/2, 2010 at 16:16 Comment(0)

This is an important open mathematical problem as of 2020

Others have answered from a practical point of view, and there is high probability that for the problem sizes encountered in practice, that those algorithms are close to the optimal.

However, I would also like to highlight, that the more general mathematical problem (in asymptotic computation complexity, i.e. as the number of bits tends to infinity) is completely unsolved.

No one has ever been able to prove what is the minimal optimal time for of what is the fastest possible algorithm.

This is shown on the Wikipedia page: https://en.wikipedia.org/wiki/Integer_factorization The algorithm also figures on Wiki's "List of unsolved problems in computer science" page: https://en.wikipedia.org/wiki/List_of_unsolved_problems_in_computer_science

This is important, because until we have such an optimality proof, we won't be able to be sure that RSA encryption is actually secure. Actually, we don't have any such proofs for any existing public key system, all we know is that they seem safe because no one managed to break them publicly yet. See also this related post.

All that we know is that the best we currently have is the general number sieve. And until 2018, we didn't even have a non-heuristic proof for its complexity. The complexity of that algorithm in terms of the number of digits of the integer to be factored is something like:

e^{(k + o(1))(n^(1/3) * ln(n)^(2/3)}

which as mentioned at: Polynomial time and exponential time is not truly exponential, but it is superpolynomial.

As of 2020, we haven't even proved if the problem is NP-complete or not (although it is obviously NP since all you have to do to verify a solution is to multiply the numbers)! Although it is widely expected for it to be NP-complete. We can't be that bad at finding algorithms, can we?

Lido answered 15/6, 2020 at 17:37 Comment(0)

There is of course the HAL Algorithm by Professor Hal Mahutan, (Feb 2021), which is on the edge of factorisation research.

PLEASE VIEW LATEST UPDATE HERE

https://docs.google.com/document/d/1BlDMHCzqpWNFblq7e1F-rItCf7erFMezT7bOalvhcXA/edit?usp=sharing

Solving for two large primes for the public key is as follows...

Any AxB = Public Key can be drawn on the postive X and Y axis that forms a continuous curve where all non-integer factors solve for the public key. Of course, this is not useful, it's just an observation at this point.

Hal's insight is this: If we insist that we are only interested in the points that A is a whole number, particularly the points that B present when A is whole.

Previous attempts with this approach failed when mathematicians struggled with the apparent randomness of the remainder of B, or at least lack of predictability.

What Hal is saying is that predictability is universal for any public key providing the ratio of a/b is the same. Basically, when a series of different public keys are presented for analysis, all of them can be processed identically, provided they share the same point during processing where a/b is constant, ie they share the same tangent.

Take a look at this sketch I drew to try explain what Prof Mahutan has got going on here.

So, here's Hal's genius. Hal makes use of powerful super-computers to generate a series of hashtables (in diagram, Q, R, S & T). What you can see in the 3 A x B = Key curves on the left is that they all share tangent T and S (the only ones highlighted there) but what the diagram is showing is that given any Public Key, over a region of the curve where the tangent is the same, then you can share hashtables that preside over that region.

Just a technical note, obviously in the curve of AxB= Key, things are shifting continuously as one changes the values of AxB, so in theory, the shared tangent that maps to a hashtable is going to go out of date, but the interesting thing is that with really large keys (ironically, this makes them easier to crack because they share longer runs where a hashtable is useful.). So that's great news as key sizes are expected to get much bigger as advances in factorisation and computation accelerate. What actually happens is that the predictability of the hashtable is going to literally 'go out of focus' as the tangents to which they apply begin to diverge. Luckily, this is not a problem, because you jump to the next hashtable that is appropriately mapped to the new Tangent.

Just to make it clear, all Public Keys ever generated will always make use of the same set of hashtables, so it's a kind of one off investment that can be stored online, literally millions of terabytes of lookup data, since all keys obey the same tangential ratios.

So, what do the hashtables do to accelerate the finding of primes. The hashtables are indexed with the remainder when the Public Key is divided by B. Basically, Hal is saying that for all Keys, any ratio of A x B can be looked up. The only difference between the different curves that share the same Tangent is that they require a different offset as determined by the 'Control Curve'. The 'Control Curve' is any arbitrary curve for which you generate appropriate factors for. Let's say for the 'Control Curve', the Key is 15, and the tangent being mapped is when B = 5, so A is 3 and the remainder is zero. In another mapping with the same Tangent, say the Key is now 16. We need to find the same Tangent which lies at 5.33 for B and 3.2 for A. So the remainder for A is .2, so a public key of 16 can use the same lookup table as 15 provided it is suitably offset by .2.

So what's in the hashtables? Indexed by the offset and the value always returns the distance along the AxB curve for which you do not find another whole number of B. What Hal is saying is that it is safe to jump ahead and not check those numbers for being factors. And that's basically it. Hal punches holes into the curve that need never be checked and that just speeds up the whole game.

Thanks Professor Mahutan!

For those of you still following, here are some of our working notes:

Bullet Points for Fast Factorisation Attack Algorithm

All public key's can be represented along a curve A x B = 'Key'
This is an observation that maps all real numbers (is this the right term for non-integers?) that all multiply together to equal the key... So far not useful
We are only interested in the points where A is whole and B are both whole numbers.
We can step through the entire line where A is whole. This is half way there but has problems. Firstly, we don't know where B is whole and also, it would take too much processing power to calculate all the points.
What we are interested in is predicting where B is also whole, so we want a mechanism to be able to 'jump' along the curve where we know that B is definitely still a real number (non-whole). If we can make big enough jumps, then we reduce the processing required.

Now follows the strategy of the algorithm to predict B

Another observation is that for sufficiently large values of the 'Key', as we step through changing the value of A in whole number increments, we observe that the ratio of A/B or the tangential angle will remain mostly the same.
An important side point to this observation is that as the Key size increases, the tangent remains more constant with each iteration. Fundamentally, the means that any algorithm using this property is going to become more efficient as the Key increases in size, which is opposite of traditional approaches where increasing the Key size makes it exponentially harder to guess the factors. This is a very important point... (Elaborate on this please Nick)
The algorithm itself is as follows

Purchase sufficient storage and processing power on a cloud
Break the problem into pieces that can be run on different processes in parallel. To do this, we step through different values of A and assign the search to different processors in the cloud.
For any value of A that is being checked use a universal lookup table to predict a safe distance along the curve that we can move without having to calculate if B is going to be a whole number
Check only those positions along the curve for which the lookup table shows that the probability of it being a whole number is high enough to warrant checking.

The important concept here is that the lookup tables can be shared for any 'Key' for which the ratio of A/B (the tangent) is sufficiently close enough before the lookup becomes inaccurate (and goes out of focus).

(Also to note is that as the key size shifts, then you either need a new set of tables or you should make an appropriate mapping to the existing ratio tables in order to reuse them.)

Another really important point Nick is that all Keys can share the same set of lookup tables.

Fundamentally, the lookup table maps the remainder from any calculation of Key / A. We are interested in the remainder because when the remainder is = zero, then we have done it because A already is a whole number.

I suggest that we have enough hash tables to ensure that we can scan ahead without having to calculate the actual remainder. Let's say we start off with 1 trillion, but that can obviously be changed depending on how much computational power we have.

The hashtable for any suitably close A/b ratio is as follows. The table is indexed (keyed) with the appropriate remainder and the value at any given remainder is the 'safe' distance that can be traversed along the A * B curve without the remainder touching Zero.

You can visualise the remainder oscillating (pseudo-randomly) between 0 and 1.

The algorithm picks on a number A along the curve, then looks up the safe distance and jumps to the next hashtable, or at least the algorithm does those factor checks until the next hashtable becomes available. Given enough hashtables, I'm thinking we can pretty much avoid most of the checking.

Notes on lookup tables.

For any key, you can generate a table that curves appropriately for A/B ratio Reuse of tables is imperative. Suggested approach Generate a series of control curves for A/B from say Square Root of N (The appropriate key) and do so midway by squaring. Let’s say each is 0.0001% larger than the previous key Let’s also make the size of the table say 1 % of the A/B ratio When calculating the co-prime factors, pick the closest lookup table that matches the key. Pick your entry point into the hashtable. This means remembering the offset that the entry point in the table has with your actual entry point. The lookup table should provide a series of points for the entry point for which the corresponding co-prime entry could be very close to zero and needs to be checked manually. For each point in the series, calculate the actual offset using the appropriate mathematical formula. (This will be an integral calculation, we need to have a mathematician take a look at it) Why? Because our control table was calculated when A/B was the square root of Key. As we shift along the curve, we need to space out appropriately. As a bonus, can the mathematician collapse the Key space into a point on the A/B curve. If so, we only need to generate a single table.

Key Concepts

The number A x B = Key plots the following:

X-Axis maps A and Y Axis maps B.

The closeness of the curve to the A and B axis is dependent on the size of the public key (where A x B = Key). Basically, the curve is going to shift to the right as the Key becomes larger.

Now the idea I would like you to digest or have questions about is

Given any point on the curve, there exists a tangent (ratio of A/B)
We are interested in the value of B as A increases in whole number increments and is a whole number itself. In particular, we really are only interested in the Remainder of B when Key / A is something remainder ZERO. We will have found the factors for this public key. Specifically, it will be the last value of A (also and always a whole number) that we tried and the value B for which the remainder is zero (so also is a whole number).
The basic algorithm is simple enough. -1- Pick a point on the curve where A is a whole number -2- Find the remainder of B where Key/A is B -3- Check if remainder of B is ZERO, (if it is ZERO then we are done.) -4- Go back to step 1, (next you will pick the next whole number for A)

This will work, but is way too slow. What we are doing with the HAL algorithm is improving the basic algorithm above so that we can jump chunks of the curve where we know the remainder doesn't get too close to zero.

The more we can jump, the more efficient the algorithm.

Before we step into the improved HAL algorithm, let's review a key concept.

For very large values of Key, (remember A x B = Key), the ratio of A/B will be pretty much constant, the RSA key is 2 power 4096, and that is big.
Let us assume we have created a set of tables already preloaded into the cloud that are optimised for a particular (roughly) sized key.
- Let's say we have 1 million tables to be used only against this particularly narrow range of key sizes
- Each table maps for a slightly different tangent or A/B ratio, but remember that all of these tables can only be used for the appropriate key size
- These tables are spread evenly along the curve so that for any point I pick, there is a table that can tell me how many whole numbers of A that I can safely jump before the possibility arises that I will hit a remainder of zero for B in Key/A = B. Remember, we don't want to miss the point where B is zero or HAL won't work.
- These tables are indexed with the current remainder. (Programmers or developers note these lookup tables can be implemented with a Hashtable, say in C# or Java) Let's assume that we have to check all the points where A moves along the curve, each time producing a remainder B. As soon as B is close enough to any of the indexes, then you can use the table to work out how many whole numbers you can jump safely without missing when B has a remainder of zero.
This piece is a critical concept.
- Each set of lookup tables that is indexed with some reasonable offset only really works for a particular Key size.
- As the key grows or shrinks for the series of tables, the results of the lookup 'go out of focus' or become more inaccurate.
- Therefore, we also need to create a new series of tables as the Key size grows. Perhaps we need to create the tables every 0.001 % growth of the key.

Blakeney answered 24/3, 2021 at 4:14 Comment(1)

Can you provide a doi or a link to the home page? – Humane 24/3, 2021 at 7:52

Very interesting question, thanks!

Spent whole day programming C++ code from scratch that implements very fast Elliptic Curve ECM Factorization Method and Amicable Numbers search.

It is known that ECM method is very very fast. This ECM method, if optimized well, can easily within one second find factors even of 100-bit number (consisting of two 50-bit primes). It is slower only than Quadratic Sieve and GNFS.

As you said you're searching for 25-digit Amicable Numbers, this is 80-bit numbers.

As I measured in my below C++ program, on my old 2-core laptop, it can factor 150 random numbers 80-bit in size (25 digits) per second out of easy factorable set (I set a timer to cutoff difficult to factor numbers). And it can factor 20_000 numbers if their size is 27-30 bits (9 digits).

I said above easy factorable set. It appears that 60-70% of all random numbers 80-bit in size (25 digit) are easy factorable, meaning that it takes just 7ms to factor single such number which results in 150 numbers per second.

I'll search Amicable Pairs only among such easy-to-factor numbers. There are as many amicable numbers among easy-to-factor set as there are in all the rest. Amicable Numbers don't suffer from such easy-to-factor filtering.

Below is console output on how my program searches for 27-bit amicable numbers using ECM factorization:

  .46501    _43919    *41268   ,  20512 nums/sec, tdiv  99.34%    7mcs, ecm 25.3%   39mcs, total   37mcs, done   7%
found (97041735, 97945785)
  .321639   _303557   *285333  ,  20389 nums/sec, tdiv  99.35%    7mcs, ecm 25.4%   40mcs, total   38mcs, done  48%
found (34765731, 36939357)
  .29933    _28247    *26517   ,  20265 nums/sec, tdiv  99.35%    7mcs, ecm 25.4%   40mcs, total   38mcs, done   4%
found (9773505, 11791935)
  .529659   _499736   *470070  ,  19312 nums/sec, tdiv  99.35%    7mcs, ecm 25.5%   41mcs, total   40mcs, done  79%
found (5357625, 5684679)
  .25937    _24411    *22905   ,  19340 nums/sec, tdiv  99.35%    7mcs, ecm 25.4%   41mcs, total   40mcs, done   4%
found (998104, 1043096)
  .26367    _24902    *23462   ,  19360 nums/sec, tdiv  99.35%    7mcs, ecm 25.5%   41mcs, total   40mcs, done   4%
found (1184, 1210)
  .130374   _122896   *115508  ,  19648 nums/sec, tdiv  99.35%    7mcs, ecm 25.5%   40mcs, total   39mcs, done  20%
found (35390008, 39259592)
found (133178325, 133471275)
  .32032    _30226    *28368   ,  19753 nums/sec, tdiv  99.35%    7mcs, ecm 25.5%   40mcs, total   39mcs, done   5%
found (898216, 980984)

You can see above that program outputs found and amicable pair as soon as it finds something. Also it show some detailed statistics about speed and timings. It takes several seconds to find one amicable number of 27-bit size if you search 20_000 numbers per second as I do above.

If you think above 80-bit number (25 digits) then although I factor 150 numbers per second, still there SO FEW amicable numbers that it will take trillions of seconds to find at least one Amicable Number of such 80-bit size. Below is statistics of this 80-bit factoring. You can see that within several seconds it has done only 0.0000000000064% percentage of finding just single amicable pair, very tiny amount of whole work.

.4218     _2261     *1699    ,    123 nums/sec, tdiv  43.81%  133mcs,
    ecm  2.8% 1886mcs, total 8144mcs, done 0.0000000000064%

You may have heard about project Amicable Numbers for BOINC - it is great distributed search of such numbers. Millions of computer search for them. Of course if there are millions of computers then they can find some numbers much faster. In fact they have found already 1_227_817_736 such numbers, really great amount, see all numbers here.

In my C++ program I also present my ECM factorizer, as an example at program start I do a very fast factoring of 100-bit number, random one. It shows console output similar to following:

Initial N to factor with ECM: 1044877821881837432173434582227 (100-bit)
Number of threads 2.
Factors TrialDivA: [2953]
Factoring 35383603856479425437736059
Curves [   0,    8), bound 2^  9.000, 0.110 sec
Curves [   8,   16), bound 2^ 10.000, 0.222 sec
Curves [  16,   24), bound 2^ 10.585, 0.418 sec
Curves [  24,   32), bound 2^ 11.000, 0.416 sec
Curves [  32,   40), bound 2^ 11.322, 0.401 sec
Factors ECM: [21788197859]
Fully factored. N 1044877821881837432173434582227: [2953, 21788197859, 16239802890289801]
Time 1.692 sec.

As you see it takes just one second to factor quite complex 100-bit number. See above that first I do Trial Division stage, this is simplest factorization algorithm working in O(Sqrt(N)) time. After Trial Division I create many curves with growing Bound, each new curve has a bigger chance to factor a number, but takes more time.

My ECM algorithm according to ECM in Wiki is as following:

Try to find small prime factors, by doing Trial Division method. Spend around 5-10% of total time in this step. Remove found factors from a number.
Check if number is probably prime with high condifence, for that I use Fermat Probability Test with 32 trials. To overcome Carmichael numbers you may also use Miller Rabin Test instead. If number is primes return it as only factor.
Generate curve parameters A, X, Y randomly and derive B from curve equation Y^2 = X^3 + AX + B (mod N). Check if curve is OK, value 4 * A ** 3 - 27 * B ** 2 should be non-zero.
Generate small primes through Sieve of Eratosthenes, primes below our Bound. Each prime raise to some small power, this raised prime would be called K. I do raising into power while it is smaller than some Bound2, which is Sqrt(Bound) in my case.
Compute elliptic point multiplication P = k * P, where K taken from previous step and P is (X, Y). Compute according to Elliptic Curve Multiplication Wiki.
Point multiplication uses Modular Inverse, which computes GCD(SomeValue, N) (greatest common divisor) according to Extended Euclidean Algorithm Wiki. If this GCD is not 1, then it gives non-1 factor of N, hence we found a factor and can advance ECM algorithm to search for remaining factors.
If all primes till Bound were multiplied and gave no factor then re-run ECM factorization algorithm (3.-6. above) again with another random curve and bigger Bound. In my code I take new bound by adding 512 to old bound.

If you want to tweak my code for computing some other bit sizes, then look into Test() function. Modify bits_factor = 100 to some other values, this sets to factor 100 bit random number as an example. This bits_factor variable controls only example of factorization. Modify also bits_amicable_max = 27 if you want to set what Amicable Number bit size to search for, 27 stands for 27-bit numbers, you may set it to 80 to find 80-bit amicable numbers.

In my code I support 3 different sources of Big Integer implementation, they allow you to have any big bit size, even thousands of bits. First source is CLang's/GCC's built-in __int128, this gives 128-bit integer arithmetics for free. If you have other compiler, you may disable macro SUPPORT_STD_U128. Secondly I used Boost.Multiprecision library, it allows you to do big integer arithmetics efficiently till 1024 bits, disable macro SUPPORT_BOOST if you don't want Boost. Afterwards I used GMP Library, this allows you to have even bitsize of Millions bits, disable SUPPORT_GMP if you don't want GMP.

Under Linux it is easy to install Boost and GMP, sudo apt install libgmp-dev libboost-all-dev. In Windows you have to install VCPKG package manager first, afterwards do vcpkg install boost gmp inside its installation folder.

Click on Try it online! link below if you want to see my code running online on GodBolt servers. But NOTE that GodBolt version of code is different from the one below on Github Gist, this GodBolt version has reduced sizes of parameters specifically to run faster, because their server has very had limit for program to run, probably 5-10 seconds, not more.

SOURCE CODE HERE. Below Try it online! link there is Github Gist link. I placed my code there because it is 36 KB in size, and StackOverflow has a post size limit of 30_000 symbols in total, so together post text and source code will be a way to much.

Try it online on GodBolt!

Github Gist Full Source Code

Form answered 22/3, 2023 at 19:26 Comment(0)

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Pulled directly from my answer to this other question.

Recommended topics

Hot tags