I am trying to parallelize the FFT transforms of an acoustic fingerprinting library known as Chromaprint. It works by "splitting the original audio into many overlapping frames and applying the Fourier transform on them." Chromaprint uses a frame size of 4096, with a 2/3 overlap. For instance, the first frame consists of elements [0...4095], then the second frame is something like [1366.. 5462].
With cufftPlanMany, I know that you can specify batches of size 4096, that will perform batches of [0... 4095], [4096... 8192], etc. Is there some way to make the batched transforms overlap, or should I consider another approach that doesn't use batched execution?