Restrict pointers and inlining
Asked Answered
D

1

6

I have tried to use restrict qualified pointers, and I have encountered a problem. The program below is just a simple one only to present the problem.

The calc_function uses three pointers, which is restricted so they "SHALL" not alias with each other. When compiling this code in visual studio, the function will be inlined, so for no reason Visual Studio 2010 ignores the qualifiers. If I disable inlining, the code executes more then six times faster (from 2200ms to 360ms). But I do not want to disable inlining in the whole project nor the whole file (because then will it be call overheads in e.g. all getters and setters, which would be horrible).

(Might the only solution be to disable inlining of only this function?)

I have tried to create temporary restrict qualified pointers in the function, both at the top and in the inner loop to try to tell the compiler that I promise that there is no aliasing, but the compiler won't believe me, and it will not work. I have also tried to tweaking compiler settings, but the only one that i have found that works, is to disable inlining.

I would appreciate some help to solve this optimization problem.

To run the program (in realeasemode) don't forget to use the arguments 0 1000 2000. Why the use of userinput/program arguments is to be sure that the compiler can't know if there is or isn't aliasing between the pointers a, b and c.

#include <cstdlib>
#include <cstdio>
#include <ctime>

// Data-table where a,b,c will point into, so the compiler cant know if they alias.
const size_t listSize = 10000;
int data[listSize];

//void calc_function(int * a, int * b, int * c){
void calc_function(int *__restrict a, int *__restrict b, int *__restrict c){
    for(size_t y=0; y<1000*1000; ++y){  // <- Extra loop to be able to messure the time.
        for(size_t i=0; i<1000; ++i){
            *a += *b;
            *c += *a;
        }
    }
}
int main(int argc, char *argv[]){ // argv SHALL be "0 1000 2000" (with no quotes)
    // init
    for(size_t i=0; i<listSize; ++i)
        data[i] = i;

    // get a, b and c from argv(0,1000,2000)
    int *a,*b,*c;
    sscanf(argv[1],"%d",&a);
    sscanf(argv[2],"%d",&b);
    sscanf(argv[3],"%d",&c);
    a = data + int(a);  // a, b and c will (after the specified argv) be,
    b = data + int(b);  // a = &data[0], b = &data[1000], c = &data[2000],
    c = data + int(c);  // So they will not alias, and the compiler cant know.

    // calculate and take time
    time_t start = clock();
        funcResticted(a,b,c);
    time_t end = clock();
    time_t t = (end-start);
    printf("funcResticted       %u (microSec)\n", t);

    system("PAUSE");
    return EXIT_SUCCESS;
}
Discriminant answered 15/7, 2012 at 20:40 Comment(4)
+1 for good profiling practices. I'll opt not to complain about the format specifier. P.S. clock returns a clock_t, not a time_t.Eyeglasses
Try guarding the function call with a check that the offsets are sufficiently large. You'll probably have to use real int variables t store the offsets though, rather than the hack you used.Eyeglasses
@Hurkyl I thought that clock_t and time_t was both typedefs to the same thing, but you are correct. (Btw, how do I edit my question-post?)Discriminant
I tried to guard it, with both if-statements and __assume's, with no success. But the __declspec(noinline), which Mystical pointed out, works.Discriminant
O
3

If you declare a function with __declspec(noinline), it will force it not to be inlined:

http://msdn.microsoft.com/en-us/library/kxybs02x%28v=vs.80%29.aspx

You can use this to manually disable inlining on a per-function basis.


As for restrict, the compiler is free to use it only when it wants to. So fiddling around with different versions of the same code is somewhat unavoidable when attempting to "trick" compilers to do such optimizations.

Outfoot answered 15/7, 2012 at 20:43 Comment(3)
This solution works, both in the test-code in the question, and also in my real application. But there will be some problems if a very small function, that are called many times, need restricted qualified pointers, where the __declspec(noinline) will force a quite big call overhead. Therefor, I'll wait with accepting this as the best answer.Discriminant
Yeah I know what you mean. My guess is that the pointer-aliasing analysis used in VS2010 is only at function-level granularity. So it's not able to distinguish non-aliasing pointers that are "generated" in the middle of a function. I'm not sure if restrict can be used on locally declared pointers. If it does, it might be something to try.Outfoot
You're totally right, and I tried to use locally declared pointers with restrict with no luck. Your "__declspec(noinline)" is the best solution, and works in my current case (my application) so I accept it as the Answer. Thanks.Discriminant

© 2022 - 2024 — McMap. All rights reserved.