Can `memset` function call be removed by compiler?
Asked Answered
T

1

22

I have read here that compiler is free to remove call to memset if it knows that passed memory buffer is never used again. How is that possible? It seems to me that (from the point of view of core language) memset is just a regular function, and compiler has no right to assume that whatever happens inside it, will have no side effects.

In linked article they show how Visual C++ 10 removed memset. I know that Microsoft compilers are not leading in standard compliance, so I ask - is it according to standard, or is it just msvc-ism? If it's according to standard, please elaborate ;)

EDIT: @Cubbi

Following code:

void testIt(){
    char foo[1234];
    for (int i=0; i<1233; i++){
        foo[i] = rand()%('Z'-'A'+1)+'A';
    }
    foo[1233]=0;
    printf(foo);
    memset(foo, 0, 1234);
}

Compiled under mingw with lines:

g++ -c -O2 -frtti -fexceptions -mthreads -Wall -DUNICODE -o main.o main.cpp
g++ -Wl,-s -Wl,-subsystem,console -mthreads -o main.exe main.o
objdump -d -M intel -S main.exe > dump.asm

Gave output:

 4013b0:    55                      push   ebp
 4013b1:    89 e5                   mov    ebp,esp
 4013b3:    57                      push   edi
 4013b4:    56                      push   esi
 4013b5:    53                      push   ebx
 4013b6:    81 ec fc 04 00 00       sub    esp,0x4fc
 4013bc:    31 db                   xor    ebx,ebx
 4013be:    8d b5 16 fb ff ff       lea    esi,[ebp-0x4ea]
 4013c4:    bf 1a 00 00 00          mov    edi,0x1a
 4013c9:    8d 76 00                lea    esi,[esi+0x0]
 4013cc:    e8 6f 02 00 00          call   0x401640
 4013d1:    99                      cdq    
 4013d2:    f7 ff                   idiv   edi
 4013d4:    83 c2 41                add    edx,0x41
 4013d7:    88 14 1e                mov    BYTE PTR [esi+ebx*1],dl
 4013da:    43                      inc    ebx
 4013db:    81 fb d1 04 00 00       cmp    ebx,0x4d1
 4013e1:    75 e9                   jne    0x4013cc
 4013e3:    c6 45 e7 00             mov    BYTE PTR [ebp-0x19],0x0
 4013e7:    89 34 24                mov    DWORD PTR [esp],esi
 4013ea:    e8 59 02 00 00          call   0x401648
 4013ef:    81 c4 fc 04 00 00       add    esp,0x4fc
 4013f5:    5b                      pop    ebx
 4013f6:    5e                      pop    esi
 4013f7:    5f                      pop    edi
 4013f8:    c9                      leave  
 4013f9:    c3                      ret   

In line 4013ea there is memset call, so mingw haven't removed it. Since mingw is really GCC in windows skin, I suppose GCC does it the same - I will check it when I reboot into linux.

Still having trouble finding such compiler?

EDIT2:

I just found out about GCC's __attribute__ ((pure)). So it's not that compiler knows something special about memset and elides it, it's just that it's allowed in it's header - where programmer using it should also see it ;) My mingw doesn't have this attribute in memset declaration, thus it's not eliding from the assembly no matter what - as I would expect. I will have to investigate this.

Troche answered 21/3, 2013 at 2:23 Comment(8)
But memset() is not a regular function. The compiler knows it has no side-effects, so it often gets special treatment.Avertin
I am having trouble finding a compiler that doesn't eliminate memset in such case.Ides
@Avertin That'd be the answer.Eustatius
@Avertin Is it not? It comes with compiler, but it's source is not in headers. I am just looking at mingw implementation, and it looks just like any other function. Standard library, while shipped with compiler shouldn't get any special treatment.Troche
@Troche It doesn't matter. The standard specifies the behavior of memset() (among other things). The compiler is therefore allowed to do whatever it wants as long as it respects that behavior.Avertin
Shouldn't get any special treatment? Why not? And the next day you will change your mind and start demanding the compiler optimize the *** out of your code and remove unused variables, function calls and what not. :)Eustatius
The Standard Library does get special treatment. The names in the Standard Library are reserved to the implementation. If you include the standard header, you are invoking the standard function, and the implementation can use its knowledge of that to implement the function more efficiently than a function call possibly can. If you aren't including the standard header to declare memset(), you are playing with fire.Malcom
@Ides see my edit - specially for you ;)Troche
H
12

"compiler has no right to assume that whatever happens inside it, will have no side effects."

That's correct. But if the compiler in fact knows what actually happens inside it and can determine that it really has no side effects, then no assumption is needed.

This is how almost all compiler optimizations work. The code says "X". The compiler determines that if "Y" is true, then it can replace code "X" with code "Z" and there will be no detectable difference. It determines "Y" is true, and then it replaces "X" with "Z".

For example:

void func()
{
  int j = 2;
  foo();
  if (j == 2) bar();
   else baz();
}

The compiler can optimize this to foo(); bar();. The compiler can see that foo cannot legally modify the value of j. If foo() somehow magically figures out where j is on the stack and modifies it, then the optimization will change the behavior of the code, but that's the programmer's fault for using "magic".

void func()
{
  int j = 2;
  foo(&j);
  if (j == 2) bar();
   else baz();
}

Now it can't because foo can legally modify the value of j without any magic. (Assuming the compiler can't look inside foo, which in some cases it can.)

If you do "magic", then the compiler can make optimizations that break your code. Stick to the rules and don't use magic.

In the example you linked to, the code relies on the compiler bothering to put a particular value in a variable that is never accessed and immediately ceases to exist. The compiler is not required to do anything that has no effect on the operation of your code.

The only way that could effect the code is if it peeked at unallocated portions of the stack or relied on new allocations on the stack having values they previously had. Requiring the compiler to do that would make a huge number of optimizations impossible, including replacing local variables with registers.

Hauler answered 21/3, 2013 at 2:40 Comment(16)
There is difference between code in your example, where all about j is handled locally, and putting it into function like memset - memset is not language construct. It is actual code, that exist in some already compiled file (every compiler has bunch of them bundled together with headers). It's not inline, so compiler has no right to assume that it is putting any value anywhere - it's just needs to know what parameters it should pass to it. It's linker that is joining my code with actual memset implementation.Troche
It can be only elided by compiler if right attributes are given to function declaration - so memset gets no special treatment. See my second edit.Troche
@Troche That's not necessarily true. The attribute is not required for the compiler to optimize out calls to memset. Lots of compilers implement functions like memset and strlen as intrinsics (or at least, allow you to enable an option that treats them as intrinsics). In this case, the compiler does have special knowledge of these functions and how they work, and can therefore decide to eliminate them as "dead code" if they determine that the calls have no visible side effects.Bork
Does the standard say that standard functions are allowed to be treated differently than any other function?Troche
@Troche Note that attributes are not mentioned in the standard (except for generalized attributes in c++11) and under the as if rule the compiler can change absolutely anything it likes so long as there is no observable difference (from the standard's perspective). That could (in the extreme case) quite legally include implementing the entire standard library within the compiler.Benumb
@j_kubik: The standard says that so long as code can't tell the difference, the compiler can do whatever it thinks is most efficient. That rule, called the "as-if rule", is the basis of all optimization.Hauler
@j_kubik: Here's a link to the as-if rule. Basically, whatever code you write, the compiler must make the program act as if it actually did what you told it to. But it can accomplish that however it likes and needn't do what you actually told it to if it can achieve the same effect a better way. Code that relies on the compiler not doing this is fragile and non-portable.Hauler
So under as-if rule, this is proper behavior, no matter if achieved by some own C++ extension (GCC's pure attribute) or internal non-header knowledge inside compiler about it's standard library. In case of memset it means that properly written C++ code shouldn't change behavior when it's removed. Is need to keep it against compiler's good will is a sign that there are some errors elsewhere in the program, that are allowing to read those non-overwritten data? (thus changing the behavior).Troche
One more stupid question: what does standard say about using one compiler and a different standard library (eg. coming from another compiler)? I know that practically you might just as well shoot yourself in the head, but is it actually forbidden by the standard?Troche
@j_kubik: It's not forbidden by the standard. The standard just doesn't say what will happen in that case, so nothing that happens can possibly violate the standard.Hauler
I think this answer is wrong - memset is a standard library function, not just any function. C compi lers can legally assume that memset works exactly as specified, without having to look inside it. I.e. if you somehow obverride memset with your own function, the compiler can stilla ssume it follows ISO-C semantics, even if it doesn't. The asnwer would be correct fdor user-defined functions.Divorcement
@RememberMonica Can you explain at least one specific thing in the answer that you think is incorrect and how it is incorrect? I read the answer over again and don't see how your comment applies. Compilers remove memory accesses from user-supplied code as part of optimizations all the time if they can show that the code can't tell the difference. Writes to memory that the compiler can see is freed before being accessed are routinely eliminated.Hauler
@DavidSchwartz the first two paragraphs: the compiler does not need to know what happens inside memset, it can blindly assume that it does what the standard C memset fuinction is supposed to do, regardless of what ot actually does. memset is not, asd the OP states, just a regular function.Divorcement
@RememberMonica It can blindly assume what the standard memset does. But if the code was called memoryset and contained an implementation that just iterates memory and zeroes it, then the compiler can see it's identical to the memset in the standard and, for example, eliminate it entirely if it can prove that the memory zeroed is freed before any possible access. Real-world compilers actually do this. *j = 0; free(j); can be optimized to free(j); by actual compilers. The same is true if the *j=0; occurs in a function call.Hauler
@DavidSchwartz the question is clearly about memset and not memoryset though, so your argument does not apply and the answer is incorrect, even misleading.Divorcement
@RememberMonica That's complete nonsense. A compliant compiler could detect that memoyset is an implementation of memset and treat the calls identically under the as-if rule.Hauler

© 2022 - 2024 — McMap. All rights reserved.