There is a sub-field in image processing called Mathematical Morphology. The operation you are implementing is a core concept in this field, called dilation. Obviously, this operation has been studied extensively and we know how to implement it very efficiently.
The most efficient algorithm for this problem was proposed in 1992 and 1993, independently by van Herk, and Gil and Werman. This algorithm needs exactly 3 comparisons per sample, independently of the size of T
.
Some years later, Gil and Kimmel further refined the algorithm to need only 2.5 comparisons per sample. Though the increased complexity of the method might offset the fewer comparisons (I find that more complex code runs more slowly). I have never implemented this variant.
The HGW algorithm, as it's called, needs two intermediate buffers of the same size as the input. For ridiculously large inputs (billions of samples), you could split up the data into chunks and process it chunk-wise.
In sort, you walk through the data forward, computing the cumulative max over chunks of size T
. You do the same walking backward. Each of these require one comparison per sample. Finally, the result is the maximum over one value in each of these two temporary arrays. For data locality, you can do the two passes over the input at the same time.
I guess you could even do a running version, where the temporary arrays are of length 2*T
, but that would be more complex to implement.
van Herk, "A fast algorithm for local minimum and maximum filters on rectangular and octagonal kernels", Pattern Recognition Letters 13(7):517-521, 1992 (doi)
Gil, Werman, "Computing 2-D min, median, and max filters", IEEE Transactions on Pattern Analysis and Machine Intelligence 15(5):504-507 , 1993 (doi)
Gil, Kimmel, "Efficient dilation, erosion, opening, and closing algorithms", IEEE Transactions on Pattern Analysis and Machine Intelligence 24(12):1606-1617, 2002 (doi)
(Note: cross-posted from this related question on Code Review.)
A
is fixed andT
varies, you may do aO(N*log(N))
preparation and then for everyT
, you can getB
in O(N) time. – Amylolysis