Memory alignment for fast FFT in Python using shared arrays

Asked 27/3, 2012 at 18:53 Answered 20/1, 2021 at 9:12

Solved python numpy multiprocessing fft memory-alignment

I write an image processing app that needs to do multiple things and it has to do them as much real-time as possible. Acquisition of the data and their processing runs in separate processes (mainly for performance reasons). The data itself is quite large (2MPix 16-bit grayscale images).

I can share arrays between processes as it is described in this post: How do I pass large numpy arrays between python subprocesses without saving to disk? (I use the shmarray script from the numpy-shared package). I can perform the supplied Numpy FFT on those data without problem, but it is quite slow.

Calling FFTW would probably be much faster, but in order to fully benefit from it, I am supposed to run my operations on arrays that are memory aligned.

The question: Is there a way how to create and share Numpy-like arrays between processes, that are, at the same time, guaranteed to be memory aligned?

Anthology answered 27/3, 2012 at 18:53 Comment(2)

What Python FFTW bindings do you use? These bindings should provide a way to allocate correctly aligned memory blocks. – Betelgeuse 27/3, 2012 at 19:1

You are right, but the tricky part is that I need to share those arrays between multiple processes. I am quite sure that those functions don't create "shared" arrays. – Anthology 27/3, 2012 at 21:19

The simplest standard trick to get correctly aligned memory is to allocate a bit more than needed and skip the first few bytes if the alignment is wrong. If I remember correctly, NumPy arrays will always be 8-byte aligned, and FFTW requires 16-byte aligment to perform best. So you would simply allocate 8 bytes more than needed, and skip the first 8 bytes if necessary.

Edit: This is rather easy to implement. The pointer to the data is available as an integer in the ctypes.data attribute of a NumPy array. Using the shifted block can be achieved by slicing, viewing as a different data type and reshaping -- all these won't copy the data, but rather reuse the same buf.

To allocate an 16-byte aligned 1000x1000 array of 64-bit floating point numbers, we could use this code:

m = n = 1000
dtype = numpy.dtype(numpy.float64)
nbytes = m * n * dtype.itemsize
buf = numpy.empty(nbytes + 16, dtype=numpy.uint8)
start_index = -buf.ctypes.data % 16
a = buf[start_index:start_index + nbytes].view(dtype).reshape(m, n)

Now, a is an array with the desired properties, as can be verified by checking that a.ctypes.data % 16 is indeed 0.

Betelgeuse answered 27/3, 2012 at 19:6 Comment(5)

Could you provide an example, please? Let's suppose, for instance, that I want to deal with 2D complex numbers arrays. – Anthology 27/3, 2012 at 21:23

I see two problems here: 1. How to find out how many bytes to skip? 2. How to carry out the skip and get a rectangular array of desired dimensions as the output? 3. Do you think that contributing this (maybe quite simple) trick to the shmarray module would make sense? ( bitbucket.org/cleemesser/numpy-sharedmem/raw/5ca092f8222a/… )? – Anthology 27/3, 2012 at 21:29

@MarkBorgerding: Python has automatic memory management -- you won't have to bother with such details. – Betelgeuse 27/3, 2012 at 22:54

@bubla: Sorry for not including the code example right from the beginning. The code in the edit to my answer should cover 1 and 2. Regarding 3, I don't think this would be necessary since the above trick can also be applied to an array allocated by shmarray. – Betelgeuse 27/3, 2012 at 22:56

Cool, your solution works. The shifted shm-allocated array is indeed accessible from other processes. Many thanks! – Anthology 27/3, 2012 at 23:45

Generalizing on Sven's answer, this function will return an aligned copy (if needed) of any numpy array:

import numpy as np
def aligned(a, alignment=16):
    if (a.ctypes.data % alignment) == 0:
        return a

    extra = alignment / a.itemsize
    buf = np.empty(a.size + extra, dtype=a.dtype)
    ofs = (-buf.ctypes.data % alignment) / a.itemsize
    aa = buf[ofs:ofs+a.size].reshape(a.shape)
    np.copyto(aa, a)
    assert (aa.ctypes.data % alignment) == 0
    return aa

Martella answered 29/11, 2013 at 20:49 Comment(0)

I ran payne's answer in 2021 and got type errors (Python 3.7, Numpy 1.18.5), so I've adjusted the code:

def aligned(a, alignment = 16):
    if (a.ctypes.data % alignment) == 0:
        return a
    assert alignment % a.itemsize == 0
    extra = alignment // a.itemsize
    buf = np.empty(a.size + extra, dtype = a.dtype)
    ofs = (-buf.ctypes.data % alignment) // a.itemsize
    aa = buf[ofs:ofs + a.size].reshape(a.shape)
    np.copyto(aa, a)
    assert aa.ctypes.data % alignment == 0
    return aa

I changed it to use integer division to remove the type errors, and added an extra assert for a sanity check.

Carricarriage answered 20/1, 2021 at 9:12 Comment(2)

While this may prove useful, you should provide detailed explanation. Maybe an edit or a comment on a prior answer would be more appropriate? – Torietorii 20/1, 2021 at 9:46

Unfortunately I am a new member, and neither cannot comment ("you must have 50 reputation to comment") or edit ("edit queue is full") – Carricarriage 21/1, 2021 at 10:48

Recommended topics

Hot tags