GCC has a function attribute target_clones
which can be used to create different versions of a function that are compiled to use different instruction sets in such a way that, when the binary is executed, the version with the highest-level instruction set is selected to execute.
Assuming I have some piece of piece of code doing lots of floating point operations, I can have it use the highest-level SIMD instruction set by writing something like this:
__attribute__((target_clones("default", "sse", "sse2", "sse3", "ssse3", "sse4.1", "sse4.2", "avx", "avx2", "avx512")))
double my_ddot1(int n, double x[], double y[])
{
double result = 0;
for (int i = 0; i < n; i++)
result += x[i] * y[i];
return result;
}
But that involved manually-specifying all the possible instruction sets that I want it to specialize for.
Now, assuming n
is large, the code above is something that will evidently just run faster the more operations are done at once, so it just has to generate versions for each SIMD level (sse2/3/4, avx1/2). I can manually list the available ones for x64-64 and put them in the function attribute, but in 10 years time the repertoire of available options is likely to grow - e.g. if an "avx2048" gets created later, which will evidently benefit the function being optimized, the code above will not pick it, getting stuck instead with "avx2".
How can I tell target_clones
to compile for every SIMD level without having to list them, in such a way that the list would automatically update according to what the compiler supports?