Implementing FFT over finite fields
Asked Answered
D

1

9

I would like to implement multiplication of polynomials using NTT. I followed Number-theoretic transform (integer DFT) and it seems to work.

Now I would like to implement multiplication of polynomials over finite fields Z_p[x] where p is arbitrary prime number.

Does it changes anything that the coefficients are now bounded by p, compared to the former unbounded case?

In particular, original NTT required to find prime number N as the working modulus that is larger than (magnitude of largest element of input vector)^2 * (length of input vector) + 1 so that the result never overflows. If the result is going to be bounded by that p prime anyway, how small can the modulus be? Note that p - 1 does not have to be of form (some positive integer) * (length of input vector).

Edit: I copy-pasted the source from the link above to illustrate the problem:

# 
# Number-theoretic transform library (Python 2, 3)
# 
# Copyright (c) 2017 Project Nayuki
# All rights reserved. Contact Nayuki for licensing.
# https://www.nayuki.io/page/number-theoretic-transform-integer-dft
#

import itertools, numbers

def find_params_and_transform(invec, minmod):
    check_int(minmod)
    mod = find_modulus(len(invec), minmod)
    root = find_primitive_root(len(invec), mod - 1, mod)
    return (transform(invec, root, mod), root, mod)

def check_int(n):
    if not isinstance(n, numbers.Integral):
        raise TypeError()

def find_modulus(veclen, minimum):
    check_int(veclen)
    check_int(minimum)
    if veclen < 1 or minimum < 1:
        raise ValueError()
    start = (minimum - 1 + veclen - 1) // veclen
    for i in itertools.count(max(start, 1)):
        n = i * veclen + 1
        assert n >= minimum
        if is_prime(n):
            return n

def is_prime(n):
    check_int(n)
    if n <= 1:
        raise ValueError()
    return all((n % i != 0) for i in range(2, sqrt(n) + 1))

def sqrt(n):
    check_int(n)
    if n < 0:
        raise ValueError()
    i = 1
    while i * i <= n:
        i *= 2
    result = 0
    while i > 0:
        if (result + i)**2 <= n:
            result += i
        i //= 2
    return result

def find_primitive_root(degree, totient, mod):
    check_int(degree)
    check_int(totient)
    check_int(mod)
    if not (1 <= degree <= totient < mod):
        raise ValueError()
    if totient % degree != 0:
        raise ValueError()
    gen = find_generator(totient, mod)
    root = pow(gen, totient // degree, mod)
    assert 0 <= root < mod
    return root

def find_generator(totient, mod):
    check_int(totient)
    check_int(mod)
    if not (1 <= totient < mod):
        raise ValueError()
    for i in range(1, mod):
        if is_generator(i, totient, mod):
            return i
    raise ValueError("No generator exists")

def is_generator(val, totient, mod):
    check_int(val)
    check_int(totient)
    check_int(mod)
    if not (0 <= val < mod):
        raise ValueError()
    if not (1 <= totient < mod):
        raise ValueError()
    pf = unique_prime_factors(totient)
    return pow(val, totient, mod) == 1 and all((pow(val, totient // p, mod) != 1) for p in pf)

def unique_prime_factors(n):
    check_int(n)
    if n < 1:
        raise ValueError()
    result = []
    i = 2
    end = sqrt(n)
    while i <= end:
        if n % i == 0:
            n //= i
            result.append(i)
            while n % i == 0:
                n //= i
            end = sqrt(n)
        i += 1
    if n > 1:
        result.append(n)
    return result

def transform(invec, root, mod):
    check_int(root)
    check_int(mod)
    if len(invec) >= mod:
        raise ValueError()
    if not all((0 <= val < mod) for val in invec):
        raise ValueError()
    if not (1 <= root < mod):
        raise ValueError()

    outvec = []
    for i in range(len(invec)):
        temp = 0
        for (j, val) in enumerate(invec):
            temp += val * pow(root, i * j, mod)
            temp %= mod
        outvec.append(temp)
    return outvec

def inverse_transform(invec, root, mod):
    outvec = transform(invec, reciprocal(root, mod), mod)
    scaler = reciprocal(len(invec), mod)
    return [(val * scaler % mod) for val in outvec]

def reciprocal(n, mod):
    check_int(n)
    check_int(mod)
    if not (0 <= n < mod):
        raise ValueError()
    x, y = mod, n
    a, b = 0, 1
    while y != 0:
        a, b = b, a - x // y * b
        x, y = y, x % y
    if x == 1:
        return a % mod
    else:
        raise ValueError("Reciprocal does not exist")

def circular_convolve(vec0, vec1):
    if not (0 < len(vec0) == len(vec1)):
        raise ValueError()
    if any((val < 0) for val in itertools.chain(vec0, vec1)):
        raise ValueError()
    maxval = max(val for val in itertools.chain(vec0, vec1))
    minmod = maxval**2 * len(vec0) + 1
    temp0, root, mod = find_params_and_transform(vec0, minmod)
    temp1 = transform(vec1, root, mod)
    temp2 = [(x * y % mod) for (x, y) in zip(temp0, temp1)]
    return inverse_transform(temp2, root, mod)

vec0 = [24, 12, 28, 8, 0, 0, 0, 0]
vec1 = [4, 26, 29, 23, 0, 0, 0, 0]

print(circular_convolve(vec0, vec1))

def modulo(vec, prime):
    return [x % prime for x in vec]

print(modulo(circular_convolve(vec0, vec1), 31))

Prints:

[96, 672, 1120, 1660, 1296, 876, 184, 0]
[3, 21, 4, 17, 25, 8, 29, 0]

However, where I change minmod = maxval**2 * len(vec0) + 1 to minmod = maxval + 1, it stops working:

[14, 16, 13, 20, 25, 15, 20, 0]
[14, 16, 13, 20, 25, 15, 20, 0]

What is the smallest minmod (N in the link above) be in order to work as expected?

Diazole answered 11/9, 2018 at 6:57 Comment(11)
@JohnColeman There are two primes to work with: one is for the polynomial, the other is for NTT. As the first one doesn't have to be in the form needed for NTT, my question is, how small can the second one be?Diazole
I think that in your p can safely play the role of what that link is calling M. If I'm reading it right, the only reason that they spend time on specifying M is that they don't really want the answers to be mod M, so they want M so large that final output can be interpreted as ordinary integers. You want the output mod p, so there is no danger that finding a "safe" M would need to avoid.Triquetrous
@JohnColeman I'm interested in N, M (per link). N apparently has to have a special structure (k*...) which p might not have.Diazole
The form of N appears to be important. It seems that it needs to be like of the form kn+1 where n is the number of items. The smallest such prime N greater than p should be okay.Triquetrous
@Spektre When I try your "p is prime that p mod n == 1 and p>max" then it stops working. But maybe I misunderstood?Diazole
@Spektre When I break at p>maxval + 1, instead of p>maxval**2 * len(vec0) + 1, it prints [14, 16, 13, 20, 25, 15, 20, 0], instead of [3, 21, 4, 17, 25, 8, 29, 0]. So on one hand "p is prime that p mod n == 1 and p>max" seems not to be enough, on the other hand all texts I saw mention it as the stopping criterion.Diazole
@Diazole I think you are going backwards or over-thinking it. Why not use largest such prime that fits into your data word? That alone can get rid of the modulo (replace it by single substraction) and also some such primes can have a lot of grouped zeros in binary for additional optimizations by bit operations. Also as max possible value is used for such if the overflow is possible it can not be avoided (unless bigger data word is used) anyway so no need to tear your head in pieces because of it. Using minimal p will always get you in trouble (unless arbitrary precision is used)Cerveny
@Cerveny Because the polynomials are already reduced modulo some unrelated prime. I din't choose that prime and would still like to multiply them as efficiently as possible. So I'm wondering whether there is better bound than p>maxval**2 * len(vec0) + 1? And also, why is the p>maxval + 1 not working?Diazole
@Cerveny Of course, that's also true for school book algorithm, not just DFT. My question is, if anything changes if working over a finite field.Diazole
@Diazole I converted the comments into answer with some more info ...Cerveny
N needs to have the form kn+1 because we need to find a suitable ω. ω is an nth root of unity that we need for the length-n transform.Citrange
C
1

If your input of n integers is bound to some prime q (any mod q not just prime will be the same) You can use it as a max value +1 but beware you can not use it as a prime p for the NTT because NTT prime p has special properties. All of them are here:

so our max value of each input is q-1 but during your task computation (Convolution on 2 NTT results) the magnitude of first layer results can rise up to n.(q-1) but as we are doing convolution on them the input magnitude of final iNTT will rise up to:

m = n.((q-1)^2)

If you are doing different operations on the NTTs than the m equation might change.

Now let us get back to the p so in a nutshell you can use any prime p that upholds these:

p mod n == 1
p > m

and there exist 1 <= r,L < p such that:

p mod (L-1) = 0
r^(L*i) mod p == 1 // i = { 0,n }
r^(L*i) mod p != 1 // i = { 1,2,3, ... n-1 }

If all this is satisfied then p is nth root of unity and can be used for NTT. To find such prime and also the r,L look at the link above (there is C++ code that finds such).

For example during string multiplication we take 2 strings do NTT on them then convolute the result and iNTT back the result (that is sum of both input sizes). So for example:

                                99999999999999999999999999999999
                               *99999999999999999999999999999999
----------------------------------------------------------------
9999999999999999999999999999999800000000000000000000000000000001

the q = 10 and both operands are 9^32 so n=32 hence m = 9*9*32 = 2592 and the found prime is p = 2689. As you can see the result matches so no overflow occurs. However if I use any smaller prime that still fit all the other conditions the result will not match. I used this specifically to stretch the NTT values as much as possible (all values are q-1 and sizes are equal to the same power of 2)

In case your NTT is fast and n is not a power of 2 then you need to zero pad to nearest higher or equal power of 2 size for each NTT. But that should not affect the m value as zero pad should not increase the magnitude of values. My testing proves it so for convolution you can use:

m = (n1+n2).((q-1)^2)/2

where n1,n2 are the raw inputs sizes before zeropad.

For more info about implementing NTT you can check out mine in C++ (extensively optimized):

So to answer your questions:

  1. yes you can take advantage of the fact that input is mod q but you can not use q as p !!!

  2. You can use minmod = n * (maxval + 1) only for single NTT (or first layer of NTTs) but as you are chaining them with convolution during your NTT usage you can not use that for the final INTT stage !!!

However as I mentioned in the comments easiest is to use max possible p that fits in the data type you are using and is usable for all power of 2 sizes of input supported.

Which basically renders your question irrelevant. The only case I can think of where this is not possible/desired is on arbitrary precision numbers where there is "no" max limit. There are many performance issues binded to variable p as the search for p is really slow (may be even slower than the NTT itself) and also variable p disables many performance optimizations of the modular arithmetics needed making the NTT really slow.

Cerveny answered 15/9, 2018 at 8:0 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.