How to calculate pairwise distance matrix on the GPU

About

Asked 9/10, 2017 at 22:26 Answered 14/6, 2018 at 20:17

The bottleneck in my code is the area where I calculate a pairwise distance matrix. Since this is the slowest part by far, I have spent much time in speeding up my code.

I have found many speedups using articles online, but the gains have been minimal. So, I am looking for a method to use my GPU to create a distance matrix in order to speed it up further. However, I know very little about using the GPU for computation. Can anyone help me do this?

In my research I have found the following, but none of them used the GPU:

This article was useful, but the speedups were minimal.
This article was informative on how to use cython and numba.

Here is an example snippet of how to calculate a pairwise distance matrix:

import numpy as np
from scipy import spatial

rows = 1000
cols = 10
mat = np.random.randn(rows, cols)
d_mat = spatial.distance.cdist(mat, mat)

My graphics card is an Nvidia Quadro M2000M

Salesroom answered 9/10, 2017 at 22:26 Comment(4)

I'm dealing with the same problem as you - any chance you've had any breakthroughs in the past few months you'd be able to share out here? – Landan 21/2, 2018 at 0:53

Yes I was able to use cuda within the numba library to code this and I got significant speedups... Give me ~24 hours and I can find my code and post it here as an answer. – Salesroom 21/2, 2018 at 14:32

@PaulTerwilliger are you still planning to post an answer? I am looking for the same thing before I make an effort to get some persistent homology computations up and running for a project. – Kareem 7/6, 2018 at 16:48

check out the posted answer – Salesroom 18/6, 2018 at 17:26

I was able to use this:

import numpy as np
from numba import cuda

USE_64 = True

if USE_64:
    bits = 64
    np_type = np.float64
else:
    bits = 32
    np_type = np.float32

@cuda.jit("void(float{}[:, :], float{}[:, :])".format(bits, bits))
def distance_matrix(mat, out):
    m = mat.shape[0]
    n = mat.shape[1]
    i, j = cuda.grid(2)
    d = 0
    if i < m and j < m:
        for k in range(n):
            tmp = mat[i, k] - mat[j, k]
            d += tmp * tmp
        out[i, j] = d

def gpu_dist_matrix(mat):
    rows = mat.shape[0]

    block_dim = (16, 16)
    grid_dim = (int(rows/block_dim[0] + 1), int(rows/block_dim[1] + 1))

    stream = cuda.stream()
    mat2 = cuda.to_device(np.asarray(mat, dtype=np_type), stream=stream)
    out2 = cuda.device_array((rows, rows))
    distance_matrix[grid_dim, block_dim](mat2, out2)
    out = out2.copy_to_host(stream=stream)

    return out

Salesroom answered 14/6, 2018 at 20:17 Comment(1)

Interested to see a shared memory implementation of this. Nice job – Telugu 21/8, 2021 at 7:47

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags