Arbitrary precision integer on GPU
Asked Answered
F

1

8

I'm currently doing a primality test on huge numbers (up to 10M digits).

Right now, I'm using a c program using the GMP library. I did some parallelization using OpenMP and got a nice speedup (3.5~ with 4 cores). The problem is that I don't have enough CPU cores to make it feasible to run with my whole dataset.

I have an NVidia GPU and, I tried to find an alternative to GMP, but for GPUs. It can be either CUDA or OpenCL.

Is there an arbitrary precision library that I can run on my GPU? I'm also open to using another programming language if there is a simple or more elegant way of doing it.

Fathead answered 25/12, 2021 at 4:14 Comment(2)
Welcome to Stack Overflow. Please take the tour to learn how Stack Overflow works and read How to Ask on how to improve the quality of your question. Then check the help center to see which questions are on-topic on this site.Eighty
The problem itself (primarily tests) should be parallelizable? Then Cuda threads can do C or C++ instructions. You should optimize memory accesses (coalescing and/or using shared memory. Perhaps using the 32 threads of a warp for cooperating on one task and the other threads for higher-level parallelism. What operations would you need to test for primality? Multiplications? Would there be optimizations? FFT-like?Millstream
A
9

It seems the Julia Language is already able to do multiprecision arithmetic and use the GPU (see here for a simple example combining these two), but you might have to learn Julia and rewrite your code.

The CUMP library is meant to be a GMP substitute for CUDAs, it attempts to make it easier for porting GMP code to CUDAs by offering an interface similar to GMP, for example you replace mpf_... variables and functions by cumpf_... ones. There is a demo you can make if it fits your CUDA. No documentation or support though, you'll have to go through the code to see if it works.

The CAMPARY library from people in LAAS-CNRS might be a shot as well, but no documentation either. It has been applied in more research than CUMP, so some chance there. An answer here gives some clarification on how to use it.

GRNS uses the residue number system on CUDA-compatible processors, it has no documentation but an interesting paper is available. Also, see this one.

XMP comes directly from NVIDIA labs, but seems incomplete and has no docs also. Some info here and a paper here.

XMP 2.0 seems newer but only support sizes up to 32k bits for now.

GPUMP looks promising, but seems not available for download, perhaps by contacting the authors.

The MPRES-BLAS is a multiprecision library for linear algebra on CUDAs, and does - of course - have code for basic arithmetic. Also, papers here and here.

I haven't tested any of those, so please let us know if any of those work well.

Awlwort answered 7/1, 2022 at 5:36 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.