Why is this program not optimized away?

Asked 10/6, 2015 at 8:28 Answered 10/6, 2015 at 9:40

Consider the following, simple program (adapted from this question):

#include <cstdlib>

int main(int argc, char** argv) {
    int mul1[10] = { 4, 1, 8, 6, 3, 2, 5, 8, 6, 7 }; // sum = 50
    int mul2[10] = { 4, 1, 8, 6, 7, 9, 5, 1, 2, 3 }; // sum = 46

    int x1 = std::atoi(argv[1]);
    int x2 = std::atoi(argv[2]);

    int result = 0;

    // For each element in mul1/mul2, accumulate the product with x1/x2 in result
    for (int i = 0; i < 10; ++i) {
        result += x1 * mul1[i] + x2 * mul2[i];
    }

    return result;
}

I believe it is functionally equivalent to the following one:

#include <cstdlib>

int main(int argc, char** argv) {
    int x1 = std::atoi(argv[1]);
    int x2 = std::atoi(argv[2]);

    return x1 * 50 + x2 * 46;
}

And yet clang 3.7.1, gcc 5.3 and icc 13.0.1 seem to be unable to make such optimization, even with -Ofast. (Note by the way how the generated assembly is vastly different between compilers!). However, when removing mul2 and x2 from the equation, clang is able to perform a similar optimization, even with -O2.

What prevents both compilers from optimizing the first program into the second?

Johnjohna answered 10/6, 2015 at 8:28 Comment(7)

@buttiful buttefly In general case operation x1 * mul1[i] can result in overflow, – Takamatsu 10/6, 2015 at 8:31

I am surprised that clang is able to optimize the loop when you remove x2 * mul2[i] from the equation. Personally I feel that one should not expect anything from compiler optimizer; whatever is received is a bonus! – Casein 10/6, 2015 at 8:34

@vlad overflow is UB, so can be presumed not to occur when optimizing. – Flabby 10/6, 2015 at 8:40

@VladfromMoscow also, overflow can be considered as modulo arithmetic, thus the shown optimization should yield the same result (i.e. (a1*x + a2*x)%overflow == (a1*x%overflow + a2*x%overflow)%overflow ) – Mushro 10/6, 2015 at 9:46

I have a feeling that the compiler would be able to optimize it away if the arrays were marked as const. In the current code I would imagine the compiler is refraining from the optimization because there In Theory could be some code that would modify a value within the array that would change the sum of the arrays values. As to why it is not just doing it since the code does not actually modify the array is a mystery to me. – Carboxylase 10/6, 2015 at 12:24

That's actually not a trivial optimization. Yes, it's legal, but it would require a pretty sophisticated set of optimizer passed in the right order for it to do this. – Jordison 10/6, 2015 at 13:41

If you don't have AI compiler, there always will be some usless (or not) code that can't be optimized. Keep searching! – Ludeman 11/9, 2015 at 9:40

The complete expression is too complicated even for clang. If you split it then everything gets optimized again:

int result1 = 0;
int result2 = 0;

for (int i = 0; i < 10; ++i) {
    result1 += x1 * mul1[i];
    result2 += x2 * mul2[i];
}

std::cout << (result1 + result2);

Pratte answered 10/6, 2015 at 8:55 Comment(0)

I'm not a compiler programmer so this can only be a guess. IMHO, the answer is part in @dlask's answer and part in the remark that clang does the optimisation when you remove x2 and mul2 from the expression.

The compiler may optimize away all that it can do. But I also think that optimizing compilers are already huge and complicated programs, in which bugs can have high consequences because they are at the base of almost everything else (Python most current interpreter is written in ... C)

So there must be a balance between the efficiency of the optimizer and its complexity, and I think that this example program is beyond it for gcc, and just around for clang. Nothing prevents them to do that optimization except that it is just too complicated for current version of those compilers.

Jer answered 10/6, 2015 at 9:40 Comment(2)

+1, but I'm not sure it's a matter of complexity. Optimizations are created by compiler developers, and as any other programmer, they too have to prioritize. Programs rarely contain hard-coded numbers like in the example, and a decent programmer would have hand-optimized the code anyway. So having the optimizer handle this particular case (and a million other simple cases that don't exactly match some model) is just a waste of resources. – Bullet 19/1, 2016 at 7:27

@eran I would strongly disagree with you. Sometimes, hand-optimization reduces readability and hence the maintainability. Low maintainability causes much much bigger waste of resources (time). It's compiler's domain to produce end code, so it should try its best, with the available information, to produce optimal result. – Impignorate 19/1, 2016 at 8:8

Recommended topics

Hot tags