Why doesn't GCC optimize this call to printf?

Asked 25/5, 2016 at 11:37 Answered 22/2, 2017 at 7:5

Solved c gcc assembly printf compiler-optimization

#include <stdio.h>
int main(void) { 
    int i;
    scanf("%d", &i);
    if(i != 30) { return(0); } 
    printf("i is equal to %d\n", i);
}

It appears that the resulting string will always be "i is equal to 30", so, why doesn't GCC optimize this call to printf with a call to puts(), or write(), for example?

(Just checked the generated assembly, with gcc -O3 (version 5.3.1), or on the Godbolt Compiler Explorer)

Piselli answered 25/5, 2016 at 11:37 Comment(12)

gcc does not (cannot..?) predict the printf output. – Aedile 25/5, 2016 at 11:41

@Aedile AFAIK, it changes calls to printf() with calls to puts() and putchar() where possible. – Piselli 25/5, 2016 at 11:43

@Piselli only with known compile-time constants. What you're looking at in your code can only be determined by running it. Easy for you to do in your head - less easy for a compiler. – Figurine 25/5, 2016 at 11:44

Suppose you had a double type and the format was %4.3f. This would require the compiler, to actually run your code, which it does not. – Letreece 25/5, 2016 at 11:45

Required logic to detect this optimization opportunity is probably not worth the effort compared to benefits in real-world usage. – Lilian 25/5, 2016 at 11:45

Maybe it will only replace printf("i is equal to 30"); to puts("i is equal to 30") ? – Griskin 25/5, 2016 at 12:8

@ant gcc can do that. -fbuiltins . It does this for printf if it detects a constant string. – Macaco 25/5, 2016 at 12:9

I can't even get gcc to optimize printf("i is equal to %d\n", 30); into a puts(), which I would presume to be quite a bit simpler. – Giamo 25/5, 2016 at 12:9

GCC allows to change the behaviour of how conversion specifiers are interpreted during run-time. – Grimbal 25/5, 2016 at 12:17

An interesting aspect of this question is that the i on the last line introduces a data dependency in the execution, whereas replacing it with 30 would have no such dependency. In more complicated, concurrent executions that use consume memory ordering, such "optimizations" may have surprising effects. – Circumjacent 25/5, 2016 at 13:23

@KerrekSB GCC actually removes the data dependency on i. It realizes i is going to be 30, but it doesn't change the call to printf. – Piselli 26/5, 2016 at 13:25

Why would you care about optimizing calls to printf? GCC apparently doesn't and there's really no good reason to. – Cordova 26/5, 2016 at 23:35

First of all, the problem is not the if; as you saw, gcc sees through the if and manages to pass 30 straight to printf.

Now, gcc does have some logic to handle special cases of printf (in particular, it does optimize printf("something\n") and even printf("%s\n", "something") to puts("something")), but it is extremely specific and doesn't go much further; printf("Hello %s\n", "world"), for example, is left as-is. Even worse, any of the variants above without a trailing newline are left untouched, even if they could be transformed to fputs("something", stdout).

I imagine that this comes down to two main problems:

the two cases above are extremely easy patterns to implement and happen quite frequently, but for the rest probably it's rarely worth the effort; if the string is constant and the performance is important, the programmer can take care of it easily - actually, if the performance of printf is critical he shouldn't be relying on this kind of optimization, which may break at the slightest change of format string.

If you ask me, even just the puts optimizations above are already "going for the style points": you are not really going to gain serious performance in anything but artificial test cases.
When you start to go outside the realm of %s\n, printf is a minefield, because it has a strong dependency on the runtime environment; in particular, many printf specifiers are (unfortunately) affected by the locale, plus there are a host of implementation-specific quirks and specifiers (and gcc can work with printf from glibc, musl, mingw/msvcrt, ... - and at compile time you cannot invoke the target C runtime - think when you are cross-compiling).

I agree that this simple %d case is probably safe, but I can see why they probably decided to avoid being overly smart and only perform the dumbest and safest optimizations here.

For the curious reader, here is where this optimization is actually implemented; as you can see, the function matches a restricted number of very simple cases (and GIMPLE aside, hasn't changed a lot since this nice article outlining them was written). Incidentally, the source actually explains why they couldn't implement the fputs variant for the non-newline case (there's no easy way to reference the stdout global at that compilation stage).

Composed answered 26/5, 2016 at 23:47 Comment(1)

You fixed your example (that didn't end with a newline) before I finished this comment, but I'll post this godbolt link anyway, where I added v3 and v4 to an example I had already made for something else. – Cascade 26/5, 2016 at 23:54

Modern compilers are quite clever, but not clever enough to foresee the output using logic. In this case, it's quite simple for human programmers to optimise this code, but this task is too hard for machines. In fact, predicting the output of a program without running it is impossible for programs (gcc for example). For proof, see halting problem.

Anyway, you don't expect all programs without inputs to be optimised to several puts() statements, so it's perfectly reasonable for GCC not to optimise this code containing one scanf() statement.

However, this does not mean compilers cannot or should not be optimised to generate more optimised executive files. Although it's impossible to predict the result all programs, it's perfectly possible and hopeful to improve many of them.

Entrust answered 25/5, 2016 at 13:29 Comment(2)

in this sample the compiler actually replaced i with 30 , so it doesnt care about the input. Its just not clever enough to know that it can replace printf wiht puts – Rubinrubina 26/5, 2016 at 23:35

"Program analysis can never be perfect because of the Halting Problem, so we shouldn't bother with it" is such a ridiculous attitude. Predicting the output of programs is impossible in general. That doesn't mean you can't (or shouldn't) predict specific cases. As has already been pointed out, GCC (just like any decent compiler) is perfectly capable of predicting that i will be 30 and it does in fact optimize the code based on that. – Poplar 30/5, 2016 at 19:7

Not sure if this is a convincing answer, but I would expect that compilers shouldn't optimize printf("%d\n", 10) case to puts("10") .

Why? Because this case could be more complicated than you think. Here are some of the problems I can think of at the moment:

Convert binary numbers to ASCII increases size of string literal, and thus overall code size. Although this is irrelevant to small numbers, but if it's printf("some number: %d", 10000) ---- 5 digits or more (assuming int is 32-bit), the string size increased will beat the size saved for the integer, and some people could consider this a drawback. Yes, with the conversion I saved a "push to stack" instruction, but how many bytes the instruction is and how many would be saved is architecture-specific. It's non-trivial for a compiler to say if it's worth it.
Padding, if used in formats, can also increase size of expanded string literal. Example: printf("some number: %10d", 100)
Sometimes I the developer would share a format string among printf calls, for code size reasons:
```
printf("%-8s: %4d\n", "foo", 100);
printf("%-8s: %4d\n", "bar", 500);
printf("%-8s: %4d\n", "baz", 1000);
printf("%-8s: %4d\n", "something", 10000);
```
Converting them to different string literals might lose the size advantage.
For %f, %e, and %g, there is a problem that decimal point "." is locale-dependent. Hence the compiler cannot expand it to string constant for you. Although we are only discussing about %d I mention this here for completeness.

Assemblyman answered 22/2, 2017 at 7:5 Comment(7)

Actually, on most modern systems you wouldn't even save a "push to stack" - the first several arguments get passed in registers in almost every widely used ABI nowadays. (You still might have values moved around in registers instead, or a stack push to save whatever was in the register, but a modern compiler will often craftily arrange for the right values to be in the right registers as part of the other instructions it has to do anyway, if it can.) – Deviationism 24/12, 2019 at 14:17

Arguments one through three are the kind of thing that could and in fact probably ideally should be controlled by optimization flags. So they do mot prevent a sufficiently-intelligent compiler from implementing them, so long as the compiler then turned those optimizations off when "optimize for size" was in effect. – Deviationism 24/12, 2019 at 14:20

All that said, +1 to the answer overall. – Deviationism 24/12, 2019 at 14:21

@Deviationism In x86_64 ABI for Unix, the first 6 arguments to a function are passed in registers, but you still need 'mov' instructions to do the work, which doesn't necessaily save code size (this kind of detail is complicated to explain and not needed with the given question's scope). – Assemblyman 25/12, 2019 at 23:28

Yeah, but like I said above, a good compiler can sometimes avoid separate mov instructions, but x86-64 is really a bad example architecture to see that on, because it has so few registers, and has many operations that operate "on" one of the operand registers instead of "into" a result register. – Deviationism 27/12, 2019 at 16:37

Better examples can be found by looking at what a modern compiler might do when generating code for an ARM (any version) ABI, or any of the other RISC chips (MIPS, SPARC, etc - but ARM probably gets the most human effort into compiler optimizations due to its massive prevalence). – Deviationism 27/12, 2019 at 16:47

A toy example: if I have to add two numbers, and then call a function with the result, there have to be instructions to do the add anyway, and the compiler can just make sure that those instructions end up leaving the result in the register that is used to pass the function argument. (Whether this is possible in practice depends a lot of details of the code, instruction set, and ABI, and I agree that it's not worth explaining in-depth.) – Deviationism 27/12, 2019 at 16:53

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags