Calculating 1^X + 2^X + ... + N^X mod 1000000007

Asked 17/1, 2017 at 10:56 Answered 26/3, 2019 at 7:22

Solved algorithm algebra number-theory modulo

Is there any algorithm to calculate (1^x + 2^x + 3^x + ... + n^x) mod 1000000007?
Note: a^b is the b-th power of a.

The constraints are 1 <= n <= 10^16, 1 <= x <= 1000. So the value of N is very large.

I can only solve for O(m log m) if m = 1000000007. It is very slow because the time limit is 2 secs.

Do you have any efficient algorithm?

There was a comment that it might be duplicate of this question, but it is definitely different.

Giddings answered 17/1, 2017 at 10:56 Comment(18)

@TobySpeight No, it is definitely different. 1^x, 2^x..., n^x is mod exponential, and I want to find FAST ALGORITHM because n <= 10^16 and mod = 10^9+7. – Giddings 17/1, 2017 at 11:1

n^x mod m is the same as ((n^(x-1) mod m) * (n mod m)) mod m; (a + b) mod m is the same as ((a mod m) + (b mod m)) mod m. – Tacklind 17/1, 2017 at 11:5

@TobySpeight In my slow algorithm, I have to take mod about 10^10 times. But it is very slow because the time limit is 2sec. I only want to find fast algorithm. – Giddings 17/1, 2017 at 11:6

@Tacklind But n is very large!!! Read the constraints. If I naively calculate 1^x, 2^x, 3^x,..., n^x, it will take much time to calculate! – Giddings 17/1, 2017 at 11:7

@TobySpeight Does that really help? See there is up to 10,000,000,000,000,000 terms in parentheses, each of them requiring up to 1,000 multiplications in the simplest, naive raising to the power algorithm. Reducing the number of mod s by a factor of 5 or 50 doesn't seem to significantly reduce the amount of computation. – Northman 17/1, 2017 at 11:10

@Northman Please note that 10^9+7 is a prime. – Giddings 17/1, 2017 at 11:18

Oh, and n^x % m is the same as ((n%m)^x) % m. All of this falls out as natural consequences of moduli. Thus you only need to calculate up to 1000000007 exponentiations, do a bit of division, another mod, and some multiplication and addition to find the total. – Tacklind 17/1, 2017 at 11:21

Give a look to en.wikipedia.org/wiki/Faulhaber's_formula. If your problem comes from a challenge site like HackerRank, I bet the inner product always simplifies to an integer and that there is a way to get rid of the global factor as well. – Nubilous 17/1, 2017 at 11:23

@FabianPijcke It seems like a good method to solve, but I don't know how to calculate B[0], B[1],..., B[x]. – Giddings 17/1, 2017 at 11:28

There are several equivalent definitions. Note that you have to assume B[0] = -1/2. en.wikipedia.org/wiki/Bernoulli_number#Recursive_definition – Nubilous 17/1, 2017 at 11:31

((Woud like to make it an answer, alas the question got closed, so I can only post this as a comment.)) For small n you can compute the sum directly in reasonable time. For large n I think you can find such pairs of numbers, which add up to 1000000007 (for example 1 and 1000000006, 7 and 1000000000...). Those numbers equal 1 and (–1) mod 1000000007, respectively, so they cancel in summation – and so do their respective powers. You can skip them then, so you'll have to directly calculate no more than 1000000007 terms of the sum... – Northman 17/1, 2017 at 11:31

@Northman Only two votes are needed to reopen. If it can, please post as an answer! – Giddings 17/1, 2017 at 11:34

You never need to calculate more than 1000000007 terms in the sequence, it's periodic. 10^16 is a red herring. – Wouldbe 17/1, 2017 at 12:16

@n.m. I am saying that 1,000,000,007 term is much large because calculating a^b mod m takes O(log b) times. – Giddings 17/1, 2017 at 12:18

log b is at most 10 here. – Wouldbe 17/1, 2017 at 12:25

@n.m. The time limit is 2 sec. Do you think that 1000000007*10 iteration can be in 2 sec? – Giddings 17/1, 2017 at 12:26

Probably not, but you can precompute some results for a significant speedup. – Wouldbe 17/1, 2017 at 12:47

I was wrong, you don't need to compute up to 1000000007 terms! If n > 1000000007/2 there are some numbers between 1 and n which make sum 1+n equal 1000000007; so you have to compute no more than 1000000007/2 i.e. 500000004 terms. Additionally (a**x) * (b**x) equals (a*b) ** x, so it's enough to compute p**x for primes only, then compute remaining terms by multipliying powers of appropriate primes. – Northman 17/1, 2017 at 20:18

You can sum up the series

1**X + 2**X + ... + N**X

with the help of Faulhaber's formula and you'll get a polynomial with an X + 1 power to compute for arbitrary N.

If you don't want to compute Bernoulli Numbers, you can find the the polynomial by solving X + 2 linear equations (for N = 1, N = 2, N = 3, ..., N = X + 2) which is a slower method but easier to implement.

Let's have an example for X = 2. In this case we have an X + 1 = 3 order polynomial:

    A*N**3 + B*N**2 + C*N + D

The linear equations are

      A +    B +   C + D = 1              =  1 
    A*8 +  B*4 + C*2 + D = 1 + 4          =  5
   A*27 +  B*9 + C*3 + D = 1 + 4 + 9      = 14
   A*64 + B*16 + C*4 + D = 1 + 4 + 9 + 16 = 30

Having solved the equations we'll get

  A = 1/3
  B = 1/2
  C = 1/6
  D =   0

The final formula is

  1**2 + 2**2 + ... + N**2 == N**3 / 3 + N**2 / 2 + N / 6

Now, all you have to do is to put an arbitrary large N into the formula. So far the algorithm has O(X**2) complexity (since it doesn't depend on N).

Noelianoell answered 17/1, 2017 at 11:41 Comment(7)

But unfortunately x might be 1000... Solving linear equation can be O(X^3), but I think it is slow. I am checking about Faulhaber's Formula right now. – Giddings 17/1, 2017 at 11:43

@square1001: if you have to be very fast, I suggest precomputing Bernoulli Numbers (up to 1000); it's not an easy task (like linear equations solving) but in this case you'll get the polynom almost ready to use. – Noelianoell 17/1, 2017 at 11:54

I checked about Bernoulli number, and finally understood that B[1], B[2],..., B[x] for O(x^2). So this problem can solve for O(N^2) ! – Giddings 17/1, 2017 at 11:57

@square1001: In O(X**2), to be true, it doesn't depend N (number of items to sum), but of power X they should be raised – Noelianoell 17/1, 2017 at 11:57

Now I'll implementate it. – Giddings 17/1, 2017 at 11:59

@square1001: good luck! Be careful: you have to work with fractions – Noelianoell 17/1, 2017 at 12:2

Implementated. I got Accepted. – Giddings 17/1, 2017 at 12:37

There are a few ways of speeding up modular exponentiation. From here on, I will use ** to denote "exponentiate" and % to denote "modulus".

First a few observations. It is always the case that (a * b) % m is ((a % m) * (b % m)) % m. It is also always the case that a ** n is the same as (a ** floor(n / 2)) * (a ** (n - floor(n/2)). This means that for an exponent <= 1000, we can always complete the exponentiation in at most 20 multiplications (and 21 mods).

We can also skip quite a few calculations, since (a ** b) % m is the same as ((a % m) ** b) % m and if m is significantly lower than n, we simply have multiple repeating sums, with a "tail" of a partial repeat.

Tacklind answered 17/1, 2017 at 11:19 Comment(5)

But your algorithm has to calculate 1^x, 2^x, ..., 1000000006^x. It has over 10^9 terms, so you have to iterate over 10^10 times. So, I think it can't pass because the time limit is 2 secs. – Giddings 17/1, 2017 at 11:21

@Giddings Just tested a NOP loop on my laptop, takes ~0.1 seconds to iterate 10 ** 10 times and that is likely dominated by "fork" and "exec". – Tacklind 17/1, 2017 at 11:25

This becomes much easier if x is odd, because, for example, 1000000006 is congruent to -1, so 1^x and 1000000006^x cancel out, and so do many other terms. I don’t see a shortcut if x is even, though. – Outbrave 17/1, 2017 at 11:26

And if x is known in advance, you could build a closed formula for the (non-mod) sum of exponentials. But, yes, you can probably collapse this to "calculate 500000003 ** x % m; multiply by floor(n / m); loop for i from 1 to n % m, exponentiate and sum" – Tacklind 17/1, 2017 at 11:30

@Tacklind 500000003 is cancelled by 500000004 for odd x. Therefore for odd x and n=m the sum equals zero. – Atherton 17/1, 2017 at 16:51

I think Vatine’s answer is probably the way to go, but I already typed this up and it may be useful, for this or for someone else’s similar problem.

I don’t have time this morning for a detailed answer, but consider this. 1^2 + 2^2 + 3^2 + ... + n^2 would take O(n) steps to compute directly. However, it’s equivalent to (n(n+1)(2n+1)/6), which can be computed in O(1) time. A similar equivalence exists for any higher integral power x.

There may be a general solution to such problems; I don’t know of one, and Wolfram Alpha doesn’t seem to know of one either. However, in general the equivalent expression is of degree x+1, and can be worked out by computing some sample values and solving a set of linear equations for the coefficients.

However, this would be difficult for large x, such as 1000 as in your problem, and probably could not be done within 2 seconds.

Perhaps someone who knows more math than I do can turn this into a workable solution?

Edit: Whoops, I see Fabian Pijcke had already posted a useful link about Faulhaber's formula before I posted.

Outbrave answered 17/1, 2017 at 11:27 Comment(0)

If you want something easy to implement and fast, try this:

Function Sum(x: Number, n: Integer) -> Number
  P := PolySum(:x, n)
  return P(x)
End

Function PolySum(x: Variable, n: Integer) -> Polynomial
  C := Sum-Coefficients(n)
  P := 0
  For i from 1 to n + 1
    P += C[i] * x^i
  End
  return P
End

Function Sum-Coefficients(n: Integer) -> Vector of Rationals
  A := Create-Matrix(n)
  R := Reduced-Row-Echelon-Form(A)
  return last column of R
End

Function Create-Matrix(n: Integer) -> Matrix of Integers
  A := New (n + 1) x (n + 2) Matrix of Integers
  Fill A with 0s
  Fill first row of A with 1s

  For i from 2 to n + 1
    For j from i to n + 1
      A[i, j] := A[i-1, j] * (j - i + 2)
    End
    A[i, n+2] := A[i, n]
  End
  A[n+1, n+2] := A[n, n+2]
  return A
End

Explanation

Our goal is to obtain a polynomial Q such that Q(x) = sum i^n for i from 1 to x. Knowing that Q(x) = Q(x - 1) + x^n => Q(x) - Q(x - 1) = x^n, we can then make a system of equations like so:

d^0/dx^0( Q(x) - Q(x - 1) ) = d^0/dx^0( x^n )
d^1/dx^1( Q(x) - Q(x - 1) ) = d^1/dx^1( x^n )
d^2/dx^2( Q(x) - Q(x - 1) ) = d^2/dx^2( x^n )
                           ...                            .
d^n/dx^n( Q(x) - Q(x - 1) ) = d^n/dx^n( x^n )

Assuming that Q(x) = a_1*x + a_2*x^2 + ... + a_(n+1)*x^(n+1), we will then have n+1 linear equations with unknowns a1, ..., a_(n+1), and it turns out the coefficient cj multiplying the unknown aj in equation i follows the pattern (where (k)_p = (k!)/(k - p)!):

if j < i, cj = 0
otherwise, cj = (j)_(i - 1)

and the independent value of the ith equation is (n)_(i - 1). Explaining why gets a bit messy, but you can check the proof here.

The above algorithm is equivalent to solving this system of linear equations. Plenty of implementations and further explanations can be found in https://github.com/fcard/PolySum. The main drawback of this algorithm is that it consumes a lot of memory, even my most memory efficient version uses almost 1gb for n=3000. But it's faster than both SymPy and Mathematica, so I assume it's okay. Compare to Schultz's method, which uses an alternate set of equations.

Examples

It's easy to apply this method by hand for small n. The matrix for n=1 is

| (1)_0   (2)_0   (1)_0 |     |   1       1       1   |
|   0     (2)_1   (1)_1 |  =  |   0       2       1   |

Applying a Gauss-Jordan elimination we then obtain

|   1       0      1/2  |
|   0       1      1/2  |

Result = {a1 = 1/2, a2 = 1/2} => Q(x) = x/2 + (x^2)/2

Note the matrix is always already in row echelon form, we just need to reduce it.

For n=2:

| (1)_0   (2)_0   (3)_0   (2)_0 |     |   1       1       1       1   |
|   0     (2)_1   (3)_1   (2)_1 |  =  |   0       2       3       2   |
|   0       0     (3)_2   (2)_2 |     |   0       0       6       2   |

Applying a Gauss-Jordan elimination we then obtain

|   1       1       0       2/3 |      |   1       0       0       1/6 |
|   0       2       0         1 |  =>  |   0       1       0       1/2 |
|   0       0       1       1/3 |      |   0       0       1       1/3 |

Result = {a1 = 1/6, a2 = 1/2, a3 = 1/3} => Q(x) = x/6 + (x^2)/2 + (x^3)/3

The key to the algorithm's speed is that it doesn't calculate a factorial for every element of the matrix, instead it knows that (k)_p = (k)_(p-1) * (k - (p - 1)), therefore A[i,j] = (j)_(i-1) = (j)_(i-2) * (j - (i - 2)) = A[i-1, j] * (j - (i - 2)), so it uses the previous row to calculate the current one.

Heelandtoe answered 26/3, 2019 at 7:22 Comment(0)

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Explanation

Examples

Recommended topics

Hot tags