Having an array of for instance 4 integers how can one determine it's non-zero minimum - in the fastest way ?
There is a parallel solution to this problem, but its probably not worth the effort.
First we define an operation xchg(m, n)
over an array a:
xchg(m, n) => ((a[m] > a[n] && a[n] != 0) || a[m] == 0) ? swap(a[m],a[n])
This operation sorts two elements 'm' and 'n' in ascending order if they both contain non-zero values, or swaps them if the value in the 'm' element is zero.
Next we execute a set of five such operations as follows:
xchg(0,2) xchg(1,3)
xchg(0,1) xchg(2,3)
xchg(1,2)
The paired xchg
operations can be executed in parallel, reducing the time cost by 40% over a strictly sequential execution. When we're finished, any non-zero elements in the array will be sorted in ascending order. The smallest-value element will be in a[0]. If that value is zero, there are no non-zero values in the array.
This solution takes advantage of the inherent parallelism provided by sorting networks ( http://en.wikipedia.org/wiki/Sorting_network), but a sequential scan of 4 elements also uses no more than three comparison operations, and crucially requires half as many storage writes on average:
sequential scan
int v = a[0]
for (n = 1; n < 4; n++) {
if ((a[n] < v && a[n] != 0 ) || v == 0) v = a[n]
}
Unless you keep the minimum value as elements are added to the array, or you keep the array in a sorted order - I see no other solution but to iterate every member to determine the minimum value.
There is no 'fast' way of testing each member.
Generally I suggest do not optimize something unless it actually proves to be slow. The old rule of your program spends 90% of its time in 10% of the code generally holds true. So does the rules that programmers are 99.99% likely to optimize code not in that 10%.
Profile your code - profile your code - profile your code
There is a parallel solution to this problem, but its probably not worth the effort.
First we define an operation xchg(m, n)
over an array a:
xchg(m, n) => ((a[m] > a[n] && a[n] != 0) || a[m] == 0) ? swap(a[m],a[n])
This operation sorts two elements 'm' and 'n' in ascending order if they both contain non-zero values, or swaps them if the value in the 'm' element is zero.
Next we execute a set of five such operations as follows:
xchg(0,2) xchg(1,3)
xchg(0,1) xchg(2,3)
xchg(1,2)
The paired xchg
operations can be executed in parallel, reducing the time cost by 40% over a strictly sequential execution. When we're finished, any non-zero elements in the array will be sorted in ascending order. The smallest-value element will be in a[0]. If that value is zero, there are no non-zero values in the array.
This solution takes advantage of the inherent parallelism provided by sorting networks ( http://en.wikipedia.org/wiki/Sorting_network), but a sequential scan of 4 elements also uses no more than three comparison operations, and crucially requires half as many storage writes on average:
sequential scan
int v = a[0]
for (n = 1; n < 4; n++) {
if ((a[n] < v && a[n] != 0 ) || v == 0) v = a[n]
}
Depends on the input. If the array is not sorted, then you'll have to loop through the full array. If the array is sorted, then you just need to loop until you find something that isn't zero - it's much shorter.
If we're thinking of micro-optimizations, then potentially it could be faster to compute min(min(a,b),min(c,d))
instead of min(min(min(a,b),c),d)
on a modern out-of-order processor, because of less sequential dependencies: in the former the processor can compute min(a,b)
and min(c,d)
independently in parallel, if it has sufficient execution units available. This is assuming that the processor has a conditional move instruction, so that computing min
does not require branching.
Well the fastest way to code it is std::min({a,b,c,d})
.
On a more serious note: If you application is bottlenecking on something like taking the minimum of a lot of values, a better solution might be to find a way to split that minimum finding task into parts and send to the GPU(or many threads), which can then operate many minimum finding calculations at the same time.
Parallelism would probably help more than trying to write a minimum function in assembly.
std::min()
also takes a comparison object i.e. it should be possible to map the zeros to infinity. –
Castlereagh std::min(a,b)
–
Crambo © 2022 - 2024 — McMap. All rights reserved.